I’m currently a hpc engineer at PaddlePaddle, Baidu, Beijing. I obtained my master’s degree from School of Computer Science, Nankai University, in 2023. My research interests include computer architecture, software/hardware co-optimization for machine learning.
🔥 News
- 2024.04: 🎉🎉 Our work Hybrid-Memcached was accepted by IEEE Transactions on Computers 🎉🎉
- 2023.07: 🎉🎉 I will join Baidu PaddlePaddle as a HPC engineer!
- 2022.10: 🎉🎉 Won the first-class (Top 10%) scholarship of Nankai University 🎉🎉
- 2022.04: 🎉🎉 Join Shuhai Lab Huawei Cloud as Research Intern for three months 🎉🎉
- 2022.02: 🎉🎉 Out work ABM-SpConv-SIMD was accepted by IEEE Transactions on Network Science and Engineering 🎉🎉
📖 Educations
- 2020.09 - 2023.06: M.S. in Computer Technology, School of Computer Science and Cyber Science, Nankai University. Advisor: Prof.Xiaoli Gong.
- 2016.09 - 2020.06: B.E. in Electronic Information Science and Engineering, School of Computer Science and Technology, China University of Mining and Technology.
👨💻 Work Experience
- HPC Engineer / Machine Learning Framework Engineer, Baidu PaddlePaddle, Beijing.
- Semi-auto parallelism distributed training of Ernie LLMs, including Pretrain, SFT, LoRA, PTQ and so on
- AutoML. From hand-written parallel distributed training (such as megatron-lm.tensor_paralle.column_parallel/row_parallel), to semi-auto parallel training (such as parallelize_module in pytorch, shard_op/shard_tensor in paddle), to full-auto parallel training
-
Prim mechanism. Decompose op into primitive ops, to support back-end compiler and high-order differentiation
- Shuhai Lab at Huawei Cloud, Research Intern.
- Cloud workload characterization from micro-architectural perspective
- Benchmark suite design for latency-critical cloud applications (such as Memcached, Redis, et al) with a wide variety of latency requirements
📝 Publications
《Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization with Hybrid Memory》
IEEE Transactions on Computers, 2024
✍🏻 Zhang Jiang, Xianduo Li (joint first author), Xiaoli Gong, et al.
- DRAM-based data aggregation to avoid fine-grained writes
- data-object alignment mechanism to avoid write amplification
- non-temporal store instruction based writing to improve the bandwidth utilization
IEEE Transactions on Network Science and Engineering, 2022
✍🏻 Xianduo Li, Xiaoli Gong, Dong Wang, Jin Zhang, et al.
- Propose a framework that employs offline pruning and quantization and online SIMD optimization to fit DNN for cost-effective edge devices.
- Design and implement accumulation-before-multiplication sparse convolutional algorithm.
- Conduct a series of experiments to evaluate the performance of our framework on various edge devices.
💻 Research Experience
Systems
- Hybrid-Memcached, a novel approach for Memcached persistence optimization with hybrid memory. (Labels: Hybrid Memory, DRAM-based Data Aggregation)
- Linux THP, Source code analysis of Transparent Huge Page (THP) mechanism and linux memory management.
- Ucore OS, a micro-os for teaching.
Software/Hardware Co-optimization for Machine Learning
- ABM-SpConv-SIMD, an on-device optimization framework for low-cost and common ARM CPUs to reduce CNN inference latency by exploiting NEON instructions for parallelism. (Labels: ARM CPU, SIMD, DNN Inference Latency)
- CILPO, a framework for ARM Mobile SoCs to improve CNN inference throughput by exploiting heterogeneous CPU-GPU scheduling and pipeline technology. (Labels: ARM CPU and Mobile GPU, Heterogeneous Scheduling, DNN INference Throughput)
- NLP-DSP, design an efficient training framework to implement NLP models (such as Bert, GPT2, et al.) on a custom multi-core DSP processor (GPU-like). (Labels: DSP Accelerators, Custom DSP Scalar and Vector Instructions, GPU-like Kernel Design, NLP Training)
Areas of interest
- Deep Learning Compiler: TVM, OpenAI Triton, …
- Hashing-based DNN Training
- QoS-aware Scheduling in Cloud; Interference-aware Scheduling in Cloud; (Cloud Computing)
- Zero-Knowledge Proofs; (Security)
- P4 Language; In-network Computing; Distributed Training; Profiler; Network Telemetry; (Network)
🎖 Honors and Awards
- 2024.03 Member of Ernie-Team, Baidu Proud of 2024
- 2023.10 Best New-comer Award of Baidu PaddlePaddle
- 2023.06 Outstanding Master Thesis Award, Nankai University
- 2022-2023 The first-class GongNeng Scholarship of Nankai University
- 2020-2021 New Student Scholarship of Nankai University
- 2020.06 Outstanding Student of China University of Mining and Technology
- 2017-2018,2018-2019 Two times first-class scholarships of China University of Mining and Technology
💬 Services
- Teaching assistant, Operating System (OS) course in the fall semester of 2020 at Nankai University.