Education
|
Cornell University
M.S./Ph.D. in Electrical and Computer Engineering
Advisor: Prof. Zhiru Zhang
Aug 2021 — Present
|
|
Sun Yat-sen University
B.Eng. in Telecommunication Engineering
Advisor: Prof. Xiang Chen
Outstanding thesis
Aug 2016 — Jun 2020
|
|
Work Experience
|
NVIDIA Research
Research Intern
Design Automation Research
Mentor/Manager: Anthony Agnesina, Mark Ren
May 2024 — Aug 2024
|
|
Advanced Micro Devices
Compiler Intern
Advanced Compilers for Distribution and Computation (ACDC)
Chief Technology Organization (CTO)
Manager: Stephen Neuendorffer
May 2023 — Aug 2023
|
|
Intel Labs
Exempt Tech Employee
Specification and Validation End-to-End (SAVE) Group, SCL/ADR/IL
Manager: Jin Yang, Sunny Zhang
Feb 2021 — Aug 2021
|
|
Tsinghua University
Research Assistant
Nanoscale Integrated Circuits and System Lab
(NICS-EFC)
Advisor: Prof. Yu Wang
Nov 2019 — Aug 2021
|
|
The University of Waterloo
MITACS
Research Intern
WatCAG
Advisor: Prof. Nachiket Kapre
Jul 2019 — Oct 2019
|
|
Research & Publication
My research interests include EDA (Electronic Design Automation) for FPGA, DSL (Domain-Specific
Language), and efficient machine learning.
|
|
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen* Niansong Zhang*, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
* Equal Contribution
PLDI 2024 | paper | code
Special-purpose hardware accelerators are vital for performance improvements, but current tools are inadequate for complex designs. Allo, a new composable programming model, decouples hardware customizations from algorithms and outperforms existing tools, showing significant improvements in benchmarks and deep learning models like GPT-2.
|
|
Formal Verification of Source-to-Source Transformations for HLS
Louis-Noël Pouchet, Emily Tucker, Niansong Zhang, Hongzheng Chen, Debjit Pal, Gaberiel Rodríguez, Zhiru Zhang
FPGA 2024 | paper | code
Best Paper Award
We target the problem of efficiently checking the semantics equivalence between two programs written in C/C++ as a means to
ensuring the correctness of the description provided to the HLS toolchain, by proving an optimized code version fully
preserves the semantics of the unoptimized one.
|
|
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator
Courtney Golden, Dan Ilan, Caroline Huang, Niansong Zhang, Zhiru Zhang, Christopher Batten
IEEE Computer Architecture Letters |
paper
We implement a virtual vector instruction set on a commercial Compute-in-SRAM device,
and perform detailed instruction microbenchmarking to identify performance benefits and overheads.
|
|
Serving Multi-DNN Workloads on FPGAs: a Coordinated Architecture, Scheduling, and Mapping Perspective
Shulin Zeng, Guohao Dai, Niansong Zhang, Xinhao Yang, Haoyu Zhang, Zhenhua Zhu, Huazhong Yang, Yu Wang
IEEE Transactions on Computers |
paper
Featured Paper in the May 2023 Issue
This paper proposes a Design Space Exploration framework to jointly optimize heterogeneous multi-core architecture,
layer scheduling, and compiler mapping for serving DNN workloads on cloud FPGAs.
|
|
Accelerator Design with Decoupled Hardware Customizations: Benefits and Challenges
Debjit Pal, Yi-Hsiang Lai, Shaojie Xiang, Niansong Zhang, Hongzheng Chen, Jeremy Casas, Pasquale Cocchini, Zhenkun Yang, Jin Yang, Louis-Noël Pouchet, Zhiru Zhang
Invited Paper, DAC 2022,
paper
We show the advantages of the decoupled programming model and further discuss some of our recent efforts to enable a robust and viable verification solution in the future.
|
|
CodedVTR: Codebook-Based Sparse Voxel Transformer with Geometric Guidance
Tianchen Zhao, Niansong Zhang, Xuefei Ning, He Wang, Li Yi, Yu Wang
CVPR 2022,
paper |
website |
slides |
poster |
video
We propose a flexible 3D Transformer on sparse voxels to address transformer's generalization issue.
CodedVTR (Codebook-based Voxel TRansformer)
decomposes attention space into linear combinations of learnable prototypes to regularize attention
learning.
We also propose geometry-aware self-attention to guide training with geometric pattern and voxel
density.
|
|
HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for
Software-Defined FPGAs
Shaojie Xiang, Yi-Hsiang Lai, Yuan Zhou, Hongzheng Chen, Niansong Zhang, Debjit
Pal, Zhiru Zhang
FPGA 2022,
paper |
code
We propose an FPGA accelerator programming model that decouples the algorithm specification from
optimizations related to
orchestrating the placement of data across a customized memory hierarchy.
|
|
RapidLayout: Fast Hard Block Placement of FPGA-optimized Systolic Arrays using
Evolutionary Algorithms
Niansong Zhang, Xiang Chen, Nachiket Kapre
ACM TRETS Best Paper Award
Invited Paper, ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Volume 15, Issue 4, Article No.: 38, pp 1–23
We extend the previous work on RapidLayout with cross-SLR routing, placement transfer learning, and
placement bootstrapping from a much
smaller device to improve runtime and design quality.
|
|
aw_nas: A Modularized and Extensible NAS Framework
Xuefei Ning, Changcheng Tang, Wenshuo Li, Songyi Yang, Tianchen Zhao, Niansong
Zhang, Tianyi Lu, Shuang Liang, Huazhong Yang, Yu Wang
Arxiv Preprint,
paper |
code
We build an open-source Python framework implementing various NAS algorithms in a modularized and
extensible manner.
|
|
RapidLayout: Fast Hard Block Placement of FPGA-optimized Systolic Arrays using
Evolutionary Algorithms
Niansong Zhang, Xiang Chen, Nachiket Kapre
FPL 2020,
paper |
code
Michal Servit Best Paper Award Nominee
We build a fast and high-performance evolutionary placer for FPGA-optimized hard block designs that
targets high clock frequency such as 650+MHz.
|
Workshops & Talks
|
An MLIR-based Intermediate Representation for Accelerator Design with Decoupled Customizations
Hongzheng Chen*, Niansong Zhang*, Shaojie Xiang, Zhiru Zhang
MLIR Open Design Meeting (08/11/2022) |
video |
slides |
website
CRISP Liaison Meeting (09/28/2022) |
news |
slides |
website
We decouple hardware customizations from the algorithm specifications at the IR level to:
(1) provide a general platform for high-level DSLs, (2) boost performance and productivity, and (3)
make customization verification scalable.
|
|
Enabling Fast Deployment and Efficient Scheduling for Multi-Node and Multi-Tenant DNN
Accelerators in the Cloud
Shulin Zeng, Guohao Dai, Niansong Zhang, Yu Wang
MICRO 2021 ASCMD Workshop,
paper
|
video
We propose a multi-node and multi-core accelerator architecture and a decoupled compiler for
cloud-backed INFerence-as-a-Service (INFaaS).
|
Awards and Honors
Best Paper Award for FPGA 2024
Best Paper Award for ACM TRETS in 2023
DAC Young Fellow 2021 & 2023
FPL 2020 Best Paper Nomination (Michal Servit Award)
Outstanding Bachelor Thesis Award | Sun Yat-sen University
Mitacs Globalink Research Internship Award | Mitacs, Canada
First-class Merit Scholarship x2 | Sun Yat-sen University
Lin and Liu Foundation Scholarship | SEIT, Sun Yat-sen University
|
Patents
Niansong Zhang, Songyi Yang, Shun Fu, Xiang Chen,
"Industry Profile Geometric Dimension Automatic Measuring Method Based on Computer Vision
Imaging." Chinese Patent CN201811539019.8A, filed December 17, 2018, and issued April 19, 2019.
Niansong Zhang (at Novauto Technology), "A Pruning Method and Device of
Multi-task Neural Network Models", Chinese Patent 202010805327.1, filed August 12, 2020.
|
|