黄聃

副教授

联系邮箱： huangd79@mail.sysu.edu.cn

联系地址： 超算中心421A

个人主页： https://scholar.google.com/citations?user=Bo6PwnQAAAAJ&hl=en

教师简介:

黄聃，jbo竞博电竞官方网站“百人计划”副教授。2018年8月获得美国中佛罗里达大学计算机工程专业博士学位。2015年11月至2016年8月在美国橡树岭国家实验室从事研究工作。在国家超级计算广州中心从事超算和大数据、人工智能融合创新发展的技术、系统和应用研究和实现。研究成果发表在IEEE TC, IEEE TPDS, SC, PPoPP,ICML, IPDPS, ICDCS, ICS, ICPP等期刊和会议。

欢迎有意报考jbo竞博电竞官方网站的硕士生和博士生与我联系。

硕士培养路径：探索一个领域应用，研发一个系统软件，熟悉一个硬件设备（1+1+1）

博士培养路径：在硕士培养路径基础上，独立发现问题、诊断分析问题、提出技术方案、系统研发验证、论文撰写修改、PPT设计汇报、项目资料整理

研究领域:

项目组已研发系统：

1. ParM:基于国产处理器的异构并行编程模型

->国内外相关系统：Kokkos、RAJA

2. RTAI系列框架：面向超算平台的高效流式HPC-AI协同开发运行框架

->国内外相关系统：Ray、Google Pathways、Parsl、Radical、DeepDriveMD、Colmena

3. SAIH：高性能计算系统智能计算能力的并行可扩展评测集

4. 资源受限下的大规模并行训练和推理系统

5. HeteroHC：基于GPU 的基因序列比对并行工作流

->国内外相关系统：GATK HaplotypeCaller、Samtools、NVIDIA Clara Parabricks

6. FastRPC：基于RDMA的高性能远程过程调用框架

->国内外相关系统：bRPC、gRPC、Margo

What is system research:

Design, implementation, analysis of complex software systems.

For example, operating systems and large complex software driven systems in data centers, clouds and HPC clusters.

Three key factors of system research:

Abstractions, Principles, Techniques.

Books for system knowledge:

->The C Programming Language

-> Operating Systems: Three Easy Pieces

-> Advanced Programming in the UNIX Environment

-> Operating Systems: Principles and Practice - Operating System Concepts -

Operating Systems:

-> Internals and Design Principles (8th Edition) - Linux Kernel Development (3rd Edition) or Understanding the Linux Kernel or The Design of the UNIX Operating System - UNIX: The Textbook - The Linux Programming Interface: A Linux and UNIX System Programming Handbook

-> Computer Architecture: A Quantitative Approach (Turing Award)

-> Deep learning: Neural Networks and Deep Learning , Dive into Deep Learning

Tools for system research:

Compiler and build systems: GCC, GCC and Make. Beginner's Guide to Linkers

Make project: Make and Cmake，Bazel, Linux Documentation Project Guides

Bash: Bash Guide for Beginners

Parallel computing frameworks: MPI, OpenMP, CUDA

Performance profiling: Perf, nvprof, DUMPI and valgrind, TAU, etc.

Performance benchmarks: IOzone, IOR, HP Linpack, SKaMPI, LMBench

Distributed and parallel file systems: Luster, HDFS and Ceph

Distributed AI systems: Pytorch, Tensorflow, and Ray

Distributed bigdata systems: Spark, Flink and etc

HPC job manager: Slurm

Tools for document: Gnuplot, Latex, Texlive, vim, awk/grep/sed, Git

教育背景:

(1) 2014-8至2018-8, 中佛罗里达大学 (University of Central Florida), 博士

(2) 2012-8至2014-8, 佐治亚州立大学 (Georgia State University), 硕士

(3) 2007-9至2010-6, 东南大学, 硕士

(4) 2003-9至2007-6, 吉林大学, 学士

获奖及荣誉:

美国橡树岭国家实验室 ASTRO 奖励计划：面向容器环境的软件定义存储I/O研究

代表性论著:

在上述系统研发的牵引下，项目组发表了以下论文：

2024:
Jiang, Jiazhi, Dan, Huang, Hu, Chen, Yutong, Lu, and Xiangke, Liao. "HTDcr: a job execution framework for high-throughput computing on supercomputers".Science China Information Sciences (SCIS) 67, no.1 (2024): 112104. (CCF A 国内期刊)

Du, Jiangsu, Jinhui, Wei, Jiazhi, Jiang, Shenggan, Cheng, Dan, Huang, Zhiguang, Chen, and Yutong, Lu. "Liger: Interleaving Intra-and Inter-Operator Parallelism for Distributed Large Model Inference." . In Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP) (pp. 42–54).2024. (CCF A 会议)

Huang, Han, Tengyang, Zheng, Tianxing, Yang, Yang, Ye, Siran, Liu, Zhe, Tang, Shengyou, Lu, Guangnan, Feng, Zhiguang, Chen, and Dan, Huang. "Critique of “Productivity, Portability, Performance Data-Centric Python” by SCC Team From Sun Yat-sen University".IEEE Transactions on Parallel and Distributed Systems (TPDS) (2024). (CCF A 期刊)

Tian, Rui, Jiazhi, Jiang, Jiangsu, Du, Dan, Huang, and Yutong, Lu. "Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform".IEEE Transactions on Parallel and Distributed Systems (TPDS) (2024). (CCF A 期刊)

Jiang, Jiazhi, Hongbin, Zhang, Deyin, Liu, Jiangsu, Du, Xiaojiao, Yao, Jinhui, Wei, Pin, Chen, Dan, Huang, and Yutong, Lu. "Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters." . In European Conference on Parallel Processing (Euro-Par) (pp. 313–328).2024. (CCF B 会议)

Lin, Peijia, Pin, Chen, Rui, Jiao, Qing, Mo, Cen, Jianhuan, Wenbing, Huang, Yang, Liu, Dan, Huang, and Yutong, Lu. "Equivariant Diffusion for Crystal Structure Prediction." . In Forty-first International Conference on Machine Learning (ICML) .2024. (CCF A 会议)

Hu, Nan, Yutong, Lu, Zhuo, Tang, Zhiyong, Liu, Dan, Huang, and Zhiguang, Chen. "Topo: Towards a Fine-grained Topological Data Processing Framework on Tianhe-3 Supercomputer".Journal of Parallel and Distributed Computing (JPDC) (2024): 104926. (CCF B 期刊)

Wen, Yingpeng, Zhilin, Qiu, Dongyu, Zhang, Dan, Huang, Nong, Xiao, and Liang, Lin. "Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method".International Journal of Parallel Programming 52, no.3 (2024): 125–146.

Wei, Yuanxin, Shengyuan, Ye, Jiazhi, Jiang, Xu, Chen, Dan, Huang, Jiangsu, Du, and Yutong, Lu. "Communication-Efficient Model Parallelism for Distributed In-Situ Transformer Inference." . In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1–6).2024. (CCF B 会议)

Wen, Yingpeng, Weijiang, Yu, Fudan, Zheng, Dan, Huang, and Nong, Xiao. "AdaNAS: Adaptively Post-processing with Self-supervised Neural Architecture Search for Ensemble Rainfall Forecasts".IEEE Transactions on Geoscience and Remote Sensing (2024).

Du, Jiang-Su, Dong-Sheng, Li, Ying-Peng, Wen, Jia-Zhi, Jiang, Dan, Huang, Xiang-Ke, Liao, and Yu-Tong, Lu. "SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems".Journal of Computer Science and Technology (JCST) 39, no.2 (2024): 384–400. (CCF B 国内期刊)

2023:
Wen, Yingpeng, Weijiang, Yu, Dongsheng, Li, Jiangsu, Du, Dan, Huang, and Nong, Xiao. "CosNAS: Enhancing estimation on cosmological parameters via neural architecture search".New Astronomy 99 (2023): 101955.

Jiang, Jiazhi, Jiangsu, Du, Dan, Huang, Zhiguang, Chen, Yutong, Lu, and Xiangke, Liao. "Full-stack optimizing transformer inference on ARM many-core CPU".IEEE Transactions on Parallel and Distributed Systems (TPDS) 34, no.7 (2023): 2221–2235. (CCF A 期刊)

Jiang, Jiazhi, Zijian, Huang, Dan, Huang, Jiangsu, Du, Lin, Chen, Ziguan, Chen, and Yutong, Lu. "Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure".ACM Transactions on Architecture and Code Optimization (TACO) 20, no.3 (2023): 1–21. (CCF A 期刊)

Zheng, Jiang, Jiazhi, Jiang, Jiangsu, Du, Dan, Huang, and Yutong, Lu. "Optimizing massively parallel sparse matrix computing on ARM many-core processor".Parallel Computing 117 (2023): 103035. (CCF B 期刊)

Du, Jiangsu, Jiazhi, Jiang, Jiang, Zheng, Hongbin, Zhang, Dan, Huang, and Yutong, Lu. "Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs".ACM Transactions on Architecture and Code Optimization (TACO) 20, no.4 (2023): 1–22. (CCF A 期刊)

Jiang, Jiazhi, Rui, Tian, Jiangsu, Du, Dan, Huang, and Yutong, Lu. "MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform." . In 2023 IEEE 41st International Conference on Computer Design (ICCD) (pp. 366–374).2023. (CCF B 会议)

ZHU, Wen-long, Jia-zhi, JIANG, Dan, HUANG, and Nong, XIAO. "ParM: A heterogeneous programming model for domestic processors".Computer Engineering & Science (计算机工程与科学) 45, no.09 (2023): 1521. (CCF B 中文期刊)

2022:
Jiang, Jiazhi, Jiangsu, Du, Dan, Huang, Dongsheng, Li, Jiang, Zheng, and Yutong, Lu. "Characterizing and optimizing transformer inference on arm many-core processor." . In Proceedings of the 51st International Conference on Parallel Processing (ICPP) (pp. 1–11).2022. (CCF B 会议)

Huang, Dan, Zhenlu, Qin, Qing, Liu, Norbert, Podhorszki, and Scott, Klasky. "Identifying challenges and opportunities of in-memory computing on large HPC systems".Journal of Parallel and Distributed Computing (JPDC) 164 (2022): 106–122. (CCF B 期刊)

Du, Jiangsu, Jiazhi, Jiang, Yang, You, Dan, Huang, and Yutong, Lu. "Handling heavy-tailed input of transformer inference on GPUS." . In Proceedings of the 36th ACM International Conference on Supercomputing (ICS) (pp. 1–11).2022. (CCF B 会议)

Chen, Lin, Raphael C-W, Phan, Zhili, Chen, and Dan, Huang. "Persistent items tracking in large data streams based on adaptive sampling." . In IEEE INFOCOM 2022-IEEE Conference on Computer Communications (pp. 1948–1957).2022. (CCF A 会议)

Jiang, Jiazhi, Dan, Huang, Jiangsu, Du, Yutong, Lu, and Xiangke, Liao. "Optimizing small channel 3D convolution on GPU with tensor core".Parallel Computing 113 (2022): 102954. (CCF B 期刊)

Du, Jiangsu, Yunfei, Du, Dan, Huang, Yutong, Lu, and Xiangke, Liao. "Enhancing Distributed In-Situ CNN Inference in the Internet of Things".IEEE Internet of Things Journal 9, no.17 (2022): 15511–15524.

2021:
Li, Dongsheng, Dan, Huang, Zhiguang, Chen, and Yutong, Lu. "Optimizing massively parallel winograd convolution on arm processor." . In Proceedings of the 50th International Conference on Parallel Processing (ICPP) (pp. 1–12).2021. (CCF B 会议)

2020:
Huang, Dan, and Yutong, Lu. "Improving the efficiency of HPC data movement on container-based virtual cluster".CCF Transactions on High Performance Computing 2, no.1 (2020): 67–80.

Huang, Dan, Jun, Wang, Qing, Liu, Nong, Xiao, Huafeng, Wu, and Jiangling, Yin. "Enhancing proportional IO sharing on containerized big data file systems".IEEE Transactions on Computers (TC) 70, no.12 (2020): 2083–2097. (CCF A 期刊)

Huang, Dan, Zhenlu, Qin, Qing, Liu, Norbert, Podhorszki, and Scott, Klasky. "A comprehensive study of in-memory computing on large HPC systems." . In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) (pp. 987–997).2020. (CCF B 会议)

Before 2019：
Luo, Huizhang, Dan, Huang, Qing, Liu, Zhenbo, Qiao, Hong, Jiang, Jing, Bi, Haitao, Yuan, Mengchu, Zhou, Jinzhen, Wang, and Zhenlu, Qin. "Identifying latent reduced models to precondition lossy compression." . In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 293–302).2019. (CCF B 会议)

Huang, Dan, Qing, Liu, Scott, Klasky, Jun, Wang, Jong Youl, Choi, Jeremy, Logan, and Norbert, Podhorszki. "Harnessing data movement in virtual clusters for in-situ execution".IEEE transactions on parallel and distributed systems (TPDS) 30, no.3 (2018): 615–629. (CCF A 期刊)

Huang, Dan, Qing, Liu, Jong, Choi, Norbert, Podhorszki, Scott, Klasky, Jeremy, Logan, George, Ostrouchov, Xubin, He, and Matthew, Wolf. "Can i/o variability be reduced on qos-less hpc storage systems?".IEEE Transactions on Computers (TC) 68, no.5 (2018): 631–645. (CCF A 期刊)

Huang, Dan, Jun, Wang, and Dezhi, Han. "Performance Evaluation and Analysis for MPI-Based Data Movement in Virtual Switch Network." . In 2018 IEEE International Conference on Networking, Architecture and Storage (NAS) (pp. 1–4).2018.

Wang, Jun, Xuhong, Zhang, Junyao, Zhang, Jiangling, Yin, Dezhi, Han, Ruijun, Wang, and Dan, Huang. "Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution".Journal of Parallel and Distributed Computing (JPDC) 108 (2017): 3–13. (CCF B 期刊)

Wang, Jun, Dan, Huang, Huafeng, Wu, Jiangling, Yin, Xuhong, Zhang, Xunchao, Chen, and Ruijun, Wang. "SideIO: A Side I/O system framework for hybrid scientific workflow".Journal of Parallel and Distributed Computing (JPDC) 108 (2017): 45–58. (CCF B 期刊)

Huang, Dan, Dezhi, Han, Jun, Wang, Jiangling, Yin, Xunchao, Chen, Xuhong, Zhang, Jian, Zhou, and Mao, Ye. "Achieving load balance for parallel data access on distributed file systems".IEEE Transactions on Computers (TC) 67, no.3 (2017): 388–402. (CCF A 期刊)

Huang, Dan, Jun, Wang, Qing, Liu, Xuhong, Zhang, Xunchao, Chen, and Jian, Zhou. "DFS-container: Achieving containerized block I/O for distributed file systems." . In Proceedings of the 2017 Symposium on Cloud Computing (SoCC poster) (pp. 660–660).2017.

Chen, Xunchao, Navid, Khoshavi, Jian, Zhou, Dan, Huang, Ronald F, DeMara, Jun, Wang, Wujie, Wen, and Yiran, Chen. "AOS: Adaptive overwrite scheme for energy-efficient MLC STT-RAM cache." . In Proceedings of the 53rd Annual Design Automation Conference (DAC) (pp. 1–6).2016. (CCF A 会议)

Chen, Xunchao, Navid, Khoshavi, Ronald F, DeMara, Jun, Wang, Dan, Huang, Wujie, Wen, and Yiran, Chen. "Energy-aware adaptive restore schemes for MLC STT-RAM cache".IEEE Transactions on Computers (TC) 66, no.5 (2016): 786–798. (CCF A 期刊)

Huang, Dan, Jun, Wang, Qing, Liu, Jiangling, Yin, Xuhong, Zhang, and Xunchao, Chen. "Experiences in using OS-level virtualization for block I/O." . In Proceedings of the 10th Parallel Data Storage Workshop (pp. 13–18).2015.

Yin, Jiangling, Jun, Wang, Jian, Zhou, Tyler, Lukasiewicz, Dan, Huang, and Junyao, Zhang. "Opass: Analysis and optimization of parallel data access on distributed file systems." . In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 623–632).2015. (CCF B 会议)

Tan, Song, Wenzhan, Song, Dan, Huang, Qifen, Dong, and Lang, Tong. "Distributed software emulator for cyber-physical analysis in smart grid".IEEE Transactions on Emerging Topics in Computing 5, no.4 (2014): 506–517.