李士刚 (Shigang Li)-ETH Zurich

ETH Zurich, Department of Computer Science, Postdoctoral Researcher Research Interests:Parallel and Distributed ComputingParallel and Distributed Deep Learning

2018.08-now, Postdoctoral researcher, ETH Zurich, Department of Computer Science, SPCL Lab

2014.06-2018.08, Assistant Professor, State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences

2013.11-2014.05, Intern Engineer, Institute of Deep Learning, Baidu

2011.09-2013.09, Joint Ph.D student, Department of Computer Science, University of Illinois at Urbana-Champaign. Supervisor: Prof. Marc Snir and Prof. Torsten Hoefler

2009.09-2014.06, Ph.D degree, majored in Computer Architecture, University of Science and Technology Beijing. Supervisor: Prof. Changjun Hu

2005.09-2009.07, Bachelor degree, majored in Computer Science and Technology, University of Science and Technology Beijing


My research interests revolve around the performance optimization for parallel and distributed computing systems, including parallel algorithms, parallel programming models, and intelligent methods for performance optimization.

Email: shigang.li@inf.ethz.ch  shigangli.cs@gmail.com

[News] ICS 2018 is comming! There are 6 workshops/tutorial this year, covering the recent hot research topics, such as deep learning and big data. Looking forward to seeing you there.

[News] ACM TURC 2018 is comming! Welcome to all the SIGs and Big Data and Artificial Intelligence Forum.


Click to see the conferences in the area of parallel and distributed computing in comming.

Click to check the conferences and journals recommended by CCF.

Click to check the rankings of HPC and compuater architecture conferences and journals.

Nov. 2017- now

1. Scalable parallel graph algorithms.

2. Scalable deep learning platform.


Jan. 2015 - now

1. Cache oblivious algorithm for MPI collectives.

2. Optimize MPI collectives on Intel Xeon Phi processor.


Aug. 2016 - now

1. Improving the scalability and efficiency for large-scale Atmosphere Model and KMC algorithm.


Sept. 2015 - now

1. Optimizing Convolutional Neural Network on Kepler GPU and ARM V7/V8.

2. Optimize Sparse Matrix-Vector multiplication (SpMV) on Intel Xeon Phi and NVIDIA/AMD GPUs using OpenCL.

3. Build platform for large-scale model training of Convoluational Neural Network on high performance multicore clusters.

  1. Kun Li, Shigang Li*, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20. (Corresponding Author)

  2. Baodong Wu, Shigang Li*, Hang Cao, Yunquan Zhang, He Zhang, Junmin Xiao, Minghua Zhang. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model based on 3D Decomposition. The 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS'18), pp. 355-364. IEEE, 2018. (Corresponding Author)

  3. Daning Cheng, Shigang Li*, and Yunquan Zhang. Asynchronous COMID: The Theoretic Basis for Transmitted Data Sparsification Tricks on Parameter Server. In Workshop on Big Scientific Data Benchmarks, Architecture, and Systems, pp. 55-70. Springer, Singapore, 2018. (Corresponding Author)

  4. Zhihao Li, Haipeng Jia, Yunquan Zhang, Shice Liu, Shigang Li, Xiao Wang, and Hao Zhang. Efficient parallel optimizations of a high-performance SIFT on GPUs. Journal of Parallel and Distributed Computing 124 (2019): 78-91.

  5. Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, Guangming Tan. Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model. The 47th International Conference on Parallel Processing (ICPP'18), p. 12. ACM, 2018.

  6. Shigang Li, Baodong Wu, Yunquan Zhang, et al. Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer. The 47th International Conference on Parallel Processing (ICPP'18), p. 47. ACM, 2018.

  7. Shigang Li, Yunquan Zhang, Torsten Hoefler. Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Transactions on Parallel and Distributed Systems (TPDS'18), 2018, 29(3): 542-555. (SCI, Impact factor: 4.181)

  8. Shigang Li, Yunquan Zhang, Torsten Hoefler. Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures. Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'17), poster, ACM, 2017, 445-446.  

  9. Yunquan Zhang, Ting Cao, Shigang Li, Xinhui Tian, Liang Yuan, Haipeng Jia, Athanasios V. Vasilakos. Parallel processing systems for big data: a survey. Proceedings of the IEEE, 2016, 104(11): 2114-2136. (SCI, Impact factor: 5.629)

  10. Yunquan Zhang, Shigang Li*, Shengen Yan, Huiyang Zhou. A cross-platform SpMV framework on many-core architectures. ACM Transactions on Architecture and Code Optimization (TACO), 2016, 13(4): 1-25. (Corresponding Author, SCI, Impact factor: 1.636)      

  11. Baodong Wu, Shigang Li*, Yunquan Zhang, Ningming Nie. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Computer Physics Communications, 2017, 211: 113-123. (Corresponding Author, SCI, Impact factor: 3.635)    

  12. Changjun Hu, Xianmeng Wang, Jianjiang Li, Xinfu He, Shigang Li, Yangde Feng, Shaofeng Yang, He Bai. Kernel optimization for short-range molecular dynamics. Computer Physics Communications, 2017, 211: 31-40. (SCI, Impact factor: 3.635)

  13. Shigang Li,  Yunquan Zhang, Chunyang Xiang, Lei Shi. Fast convolution operations on many-core architectures. Proceedings of the 17th International Conference on High Performance Computing and Communications (HPCC'15), IEEE, 2015, 316-323  

  14. Baodong Wu, Shigang Li, Yunquan Zhang. Optimizing parallel Kinetic Monte Carlo simulation by communication aggregation and scheduling. National Conference on Big Data Technology and Applications, Springer Singapore, 2015, 282-297.

  15. Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang and Pavan Balaji. Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations. Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, IEEE, 2015, 1099-1106.

  16. Shigang Li, Torsten Hoefler, Marc Snir. NUMA-Aware Shared-Memory Collective Communication for MPI. Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC'13), ACM, 2013, 85-96. (Acceptance rate: 15%, 20/131; best paper nomination, 3/20)    

  17. Shigang Li, Changjun Hu, Junchao Zhang, Yunquan Zhang. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences, 2015, 58(9): 1-14. (SCI, Impact factor: 1.626)    

  18. Shigang Li, Torsten Hoefler, Chungjin Hu, Marc Snir. Improved MPI collectives for MPI processes in shared address spaces. Cluster Computing, 2014, 17(4): 1139-1155. (SCI, Impact factor: 2.040)    

  19. Shigang Li, Jingyuan Hu, Xin Cheng, Chongchong Zhao. Asynchronous work stealing on distributed memory systems. Proceedings of the 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP’13), IEEE, 2013, 198-202.    

  20. Shigang Li, Changjun Hu, Jue Wang, Jianjiang Li. Support for multi-level parallelism on heterogeneous multi-core and performance optimization. Chinese Journal of Software, 2013, 24(12): 2782-2796.

  21. Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, Yunfeng Peng. Extending synchronization constructs in OpenMP to exploit pipeline parallelism on heterogeneous multi-core. Proceedings of the 11th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’11), Springer Berlin Heidelberg, 2011, 54-63.

  22. Qian Cao, Changjun Hu, Haohu He, Xiang Huang, Shigang Li. Support for OpenMP tasks on cell architecture. Proceedings of the 10th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’10), Springer Berlin Heidelberg, 2010, 308-317.

  23. Yunfeng Peng, Changjun Hu, Chongchong Zhao, Shigang Li, Shucai Yao. Management of Non-functional Attributes of Parallel Components. Procedia Computer Science (ICCS'11), Elsevier, 2011, 4: 461-470.

    Google citation is here!

1. Report at HPCC'15 (17th IEEE International Conference on High Performance and Communications), New York, U.Fast Convolution Operations on Many-Core Architectures HPCC'15.pdf

2. Report at PADAL'16 (Third Workshop on Programming Abstractions for Data Locality), RIKEN AICS, KOBE, JAPAN. Cache-Oblivious MPI All-to-All Collectives-Shiang Li.pdf
3. Neighbor Lists vs Linked Cell, 2017. Data structures for fnding neighbors in MD (Neighbor Lists and Linked Cell) .pdf

4. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model based on 3D Decomposition. (基于三维剖分的并行可扩展大气模式), 2018-08-13. AGCM3D-ShigangLi.pdf

1. Teaching Assistant. Numerical Methods for CSE, Autumn 2018, ETH Zurich. C++ Programming to solve numerical problems using Eigen library.

2. Teaching Assistant. Parallel Programming, Spring 2019, ETH Zurich. Java multi-threaded programming.

3. Teaching Assistant. Numerical Methods for CSE, Autumn 2019, ETH Zurich. C++ Programming to solve numerical problems using Eigen library.

1. cnnARMv7: a fast convolution kernel for ARM v7.

2. cnnARMv8: a fast convolution kernel for ARM v8.

1. National Natural Science Foundation of China under Grant No.61502450, "MPI Model Extension and Performance Optimization for Many-Core Clusters". (Program Director)

2. Innovation Research Project Supported by State Key Laboratory of Computer Architecture under Grant No.CARCH3504, "Optimizing Message Passing Interface for Irregular Parallel Algorithms". (Program Director)

3. Company cooperation program, "Development of deep learning algorithms on CPU and GPU architectures". (Technical Principal)

4. Company cooperation program, "Building large-scale deep learning training platform on CPU-GPU clusters ". (Technical Principal)

1. PC Member, ICPADS 2018.

2. Workshop Co-chair, ICS 2018.

3. PC Member, IPDPS 2017, 2018.

4. PC Member, HPC Asia 2018, 2019, 2020.

5. PC Member, HP3C 2018, 2019, 2020.

6. PC Member, ICPP 2017.

7. PC Member, HPC China 2016, 2017, 2018, 2019.

8. PC Member, SBAC-PAD 2016.

9. Associate editor of Cluster Computing (CLUS) - Springer.

10. Reviewer of IEEE Transactions on Parallel and Distributed Systems (TPDS).

11. Reviewer of IEEE Transactions on Services Computing (TSC).

12. Reviewer of Journal of Parallel and Distributed Computing (JPDC) - Elsevier.

13. Reviewer of Journal of Supercomputing - Springer.

14. Reviewer of Concurrency and Computation: Practice and Experience.

15. Reviewer of IEEE Transactions on Big Data.

16. Reviewer of IEEE Transactions on Circuits and Systems II: Express Briefs.

Lei Liu@ICT,CAS;  Ying Wang@ICT,CAS

Updated on:2019-12-05 15:06      Total Visits:2757

Scholars

Recently Visited

Similar Subject

Same institution

Similar Interests