Kun Li, Shigang Li*, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20. (Corresponding Author)
Baodong Wu, Shigang Li*, Hang Cao, Yunquan Zhang, He Zhang, Junmin Xiao, Minghua Zhang. AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model based on 3D Decomposition. The 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS'18), pp. 355-364. IEEE, 2018. (Corresponding Author)
Daning Cheng, Shigang Li*, and Yunquan Zhang. Asynchronous COMID: The Theoretic Basis for Transmitted Data Sparsification Tricks on Parameter Server. In Workshop on Big Scientific Data Benchmarks, Architecture, and Systems, pp. 55-70. Springer, Singapore, 2018. (Corresponding Author)
Zhihao Li, Haipeng Jia, Yunquan Zhang, Shice Liu, Shigang Li, Xiao Wang, and Hao Zhang. Efficient parallel optimizations of a high-performance SIFT on GPUs. Journal of Parallel and Distributed Computing 124 (2019): 78-91.
Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, Guangming Tan. Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model. The 47th International Conference on Parallel Processing (ICPP'18), p. 12. ACM, 2018.
Shigang Li, Baodong Wu, Yunquan Zhang, et al. Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer. The 47th International Conference on Parallel Processing (ICPP'18), p. 47. ACM, 2018.
Shigang Li, Yunquan Zhang, Torsten Hoefler. Cache-oblivious MPI all-to-all communications based on Morton order. IEEE Transactions on Parallel and Distributed Systems (TPDS'18), 2018, 29(3): 542-555. (SCI, Impact factor: 4.181)
Shigang Li, Yunquan Zhang, Torsten Hoefler. Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures. Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'17), poster, ACM, 2017, 445-446.
Yunquan Zhang, Ting Cao, Shigang Li, Xinhui Tian, Liang Yuan, Haipeng Jia, Athanasios V. Vasilakos. Parallel processing systems for big data: a survey. Proceedings of the IEEE, 2016, 104(11): 2114-2136. (SCI, Impact factor: 5.629)
Yunquan Zhang, Shigang Li*, Shengen Yan, Huiyang Zhou. A cross-platform SpMV framework on many-core architectures. ACM Transactions on Architecture and Code Optimization (TACO), 2016, 13(4): 1-25. (Corresponding Author, SCI, Impact factor: 1.636)
Baodong Wu, Shigang Li*, Yunquan Zhang, Ningming Nie. Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation. Computer Physics Communications, 2017, 211: 113-123. (Corresponding Author, SCI, Impact factor: 3.635)
Changjun Hu, Xianmeng Wang, Jianjiang Li, Xinfu He, Shigang Li, Yangde Feng, Shaofeng Yang, He Bai. Kernel optimization for short-range molecular dynamics. Computer Physics Communications, 2017, 211: 31-40. (SCI, Impact factor: 3.635)
Shigang Li, Yunquan Zhang, Chunyang Xiang, Lei Shi. Fast convolution operations on many-core architectures. Proceedings of the 17th International Conference on High Performance Computing and Communications (HPCC'15), IEEE, 2015, 316-323
Baodong Wu, Shigang Li, Yunquan Zhang. Optimizing parallel Kinetic Monte Carlo simulation by communication aggregation and scheduling. National Conference on Big Data Technology and Applications, Springer Singapore, 2015, 282-297.
Xiaomin Zhu, Junchao Zhang, Kazutomo Yoshii, Shigang Li, Yunquan Zhang and Pavan Balaji. Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations. Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, IEEE, 2015, 1099-1106.
Shigang Li, Torsten Hoefler, Marc Snir. NUMA-Aware Shared-Memory Collective Communication for MPI. Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC'13), ACM, 2013, 85-96. (Acceptance rate: 15%, 20/131; best paper nomination, 3/20)
Shigang Li, Changjun Hu, Junchao Zhang, Yunquan Zhang. Automatic tuning of sparse matrix-vector multiplication on multicore clusters. Science China Information Sciences, 2015, 58(9): 1-14. (SCI, Impact factor: 1.626)
Shigang Li, Torsten Hoefler, Chungjin Hu, Marc Snir. Improved MPI collectives for MPI processes in shared address spaces. Cluster Computing, 2014, 17(4): 1139-1155. (SCI, Impact factor: 2.040)
Shigang Li, Jingyuan Hu, Xin Cheng, Chongchong Zhao. Asynchronous work stealing on distributed memory systems. Proceedings of the 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP’13), IEEE, 2013, 198-202.
Shigang Li, Changjun Hu, Jue Wang, Jianjiang Li. Support for multi-level parallelism on heterogeneous multi-core and performance optimization. Chinese Journal of Software, 2013, 24(12): 2782-2796.
Shigang Li, Shucai Yao, Haohu He, Lili Sun, Yi Chen, Yunfeng Peng. Extending synchronization constructs in OpenMP to exploit pipeline parallelism on heterogeneous multi-core. Proceedings of the 11th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’11), Springer Berlin Heidelberg, 2011, 54-63.
Qian Cao, Changjun Hu, Haohu He, Xiang Huang, Shigang Li. Support for OpenMP tasks on cell architecture. Proceedings of the 10th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’10), Springer Berlin Heidelberg, 2010, 308-317.
Yunfeng Peng, Changjun Hu, Chongchong Zhao, Shigang Li, Shucai Yao. Management of Non-functional Attributes of Parallel Components. Procedia Computer Science (ICCS'11), Elsevier, 2011, 4: 461-470.
Google citation is here!