I’m currently a hpc engineer at PaddlePaddle, Baidu, Beijing. I obtained my master’s degree from School of Computer Science, Nankai University, in 2023. My research interests include computer architecture, software/hardware co-optimization for machine learning.

🔥 News

  • 2024.04:  🎉🎉 Our work Hybrid-Memcached was accepted by IEEE Transactions on Computers 🎉🎉
  • 2023.07:  🎉🎉 I will join Baidu PaddlePaddle as a HPC engineer!
  • 2022.10:  🎉🎉 Won the first-class (Top 10%) scholarship of Nankai University 🎉🎉
  • 2022.04:  🎉🎉 Join Shuhai Lab Huawei Cloud as Research Intern for three months 🎉🎉
  • 2022.02:  🎉🎉 Out work ABM-SpConv-SIMD was accepted by IEEE Transactions on Network Science and Engineering 🎉🎉

📖 Educations

  • 2020.09 - 2023.06: M.S. in Computer Technology, School of Computer Science and Cyber Science, Nankai University. Advisor: Prof.Xiaoli Gong.
  • 2016.09 - 2020.06: B.E. in Electronic Information Science and Engineering, School of Computer Science and Technology, China University of Mining and Technology.

👨‍💻 Work Experience

  • HPC Engineer / Machine Learning Framework Engineer, Baidu PaddlePaddle, Beijing.
  • Semi-auto parallelism distributed training of Ernie LLMs, including Pretrain, SFT, LoRA, PTQ and so on
  • AutoML. From hand-written parallel distributed training (such as megatron-lm.tensor_paralle.column_parallel/row_parallel), to semi-auto parallel training (such as parallelize_module in pytorch, shard_op/shard_tensor in paddle), to full-auto parallel training
  • Prim mechanism. Decompose op into primitive ops, to support back-end compiler and high-order differentiation

  • Shuhai Lab at Huawei Cloud, Research Intern.
  • Cloud workload characterization from micro-architectural perspective
  • Benchmark suite design for latency-critical cloud applications (such as Memcached, Redis, et al) with a wide variety of latency requirements

📝 Publications

《Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization with Hybrid Memory》

IEEE Transactions on Computers, 2024

✍🏻 Zhang Jiang, Xianduo Li (joint first author), Xiaoli Gong, et al.

  • DRAM-based data aggregation to avoid fine-grained writes
  • data-object alignment mechanism to avoid write amplification
  • non-temporal store instruction based writing to improve the bandwidth utilization

《ABM-SpConv-SIMD: Accelerating Convolutional Neural Network Inference for Industrial IoT Applications on Edge Devices》

IEEE Transactions on Network Science and Engineering, 2022

✍🏻 Xianduo Li, Xiaoli Gong, Dong Wang, Jin Zhang, et al.

  • Propose a framework that employs offline pruning and quantization and online SIMD optimization to fit DNN for cost-effective edge devices.
  • Design and implement accumulation-before-multiplication sparse convolutional algorithm.
  • Conduct a series of experiments to evaluate the performance of our framework on various edge devices.

💻 Research Experience

Systems

  • Hybrid-Memcached, a novel approach for Memcached persistence optimization with hybrid memory. (Labels: Hybrid Memory, DRAM-based Data Aggregation)
  • Linux THP, Source code analysis of Transparent Huge Page (THP) mechanism and linux memory management.
  • Ucore OS, a micro-os for teaching.

Software/Hardware Co-optimization for Machine Learning

  • ABM-SpConv-SIMD, an on-device optimization framework for low-cost and common ARM CPUs to reduce CNN inference latency by exploiting NEON instructions for parallelism. (Labels: ARM CPU, SIMD, DNN Inference Latency)
  • CILPO, a framework for ARM Mobile SoCs to improve CNN inference throughput by exploiting heterogeneous CPU-GPU scheduling and pipeline technology. (Labels: ARM CPU and Mobile GPU, Heterogeneous Scheduling, DNN INference Throughput)
  • NLP-DSP, design an efficient training framework to implement NLP models (such as Bert, GPT2, et al.) on a custom multi-core DSP processor (GPU-like). (Labels: DSP Accelerators, Custom DSP Scalar and Vector Instructions, GPU-like Kernel Design, NLP Training)

Areas of interest

  • Deep Learning Compiler: TVM, OpenAI Triton, …
  • Hashing-based DNN Training
  • QoS-aware Scheduling in Cloud; Interference-aware Scheduling in Cloud; (Cloud Computing)
  • Zero-Knowledge Proofs; (Security)
  • P4 Language; In-network Computing; Distributed Training; Profiler; Network Telemetry; (Network)

🎖 Honors and Awards

  • 2024.03 Member of Ernie-Team, Baidu Proud of 2024
  • 2023.10 Best New-comer Award of Baidu PaddlePaddle
  • 2023.06 Outstanding Master Thesis Award, Nankai University
  • 2022-2023 The first-class GongNeng Scholarship of Nankai University
  • 2020-2021 New Student Scholarship of Nankai University
  • 2020.06 Outstanding Student of China University of Mining and Technology
  • 2017-2018,2018-2019 Two times first-class scholarships of China University of Mining and Technology

💬 Services

  • Teaching assistant, Operating System (OS) course in the fall semester of 2020 at Nankai University.