Hung-Wei Tseng

Assistant Professor, University of California, Riverside

I am currently an assistant professor in the Department of Electrical and Computer Engineering and a cooperating faculty of the Department of Computer Science and Engineering at University of California, Riverside. I am now leading the Extreme Storage & Computer Architecture Laboratory.

I am interested in diverse research topics that allow applications and programmers to more efficiently use modern heterogeneous hardware components. Together with my students, our most recent work has demonstrated the potential of using emerging AI/ML accelerators (e.g., Google's Edge TPUs) in improving the performance of non-AI/ML workloads through our lastest GPTPU framework [GitHub]. We also showed how intelligent storage devices can help improve performance, power and energy for heterogeneous computers. Our serious of work on intelligent storage systems has been recognized by two best paper nominations from IEEE/ACM International Symposium on Microarchitecture in 2021 and 2019, IEEE Micro "Top Picks from the 2019 Computer Architecture Conferences" (IEEE MICRO Top Picks 2020) and Facebook Research Award, 2018. In addition, we also applied our knowledge in optimizing storage systems to wireless network system stacks and developed the OpenUVR project [GitHub] that enables high-quality, untethered VR experience on commodity hardware components and won the outstanding paper award in RTAS 2021.

Prior to joining UCR, I served as an assistant professor for the Department of Computer Science and the Department of Electrical and Computer Engineering at NC State University . I was a PostDoc of the Non-volatile Systems Laboratory and a lecturer of the Department of Computer Science and Engineering at University of California, San Diego with Professor Steven Swanson. My thesis work with Professor Dean Tullsen is data-triggered threads, also selected by IEEE MICRO Top Picks in 2012.

Research Projects

Accelerating non-AI/ML Applications using AI/ML Accelerators

We built a full-stack system and revised the algorithms of general-purpose applications to demonstrate the potential of using emerging AI/ML accelerators, essentially matrix processors to improve performance. The resulting systems demonstrate significant speedups and energy savings with edge TPUs that are just about USD 25. (More)

Intelligent Data Storage

We built Intelligent Data Storage Systems, including Summarizer and Morpheus, to demonstrate the potential of exposing the processing power inside the controllers of modern non-volatile storage devices. The resulting systems demonstrate significant speedups in GPGPU applications, database systems, and the potential of machine learning applications. (More)

Efficient Heterogeneous Computers for Big-Data Applications

As the slowdown of Moore's Law and the discontinuation of Dennard Scailing, big-data applications (e.g. machine learning, data analytics, scientific computing, and etc.,.) must rely on heterogeneous computing units, including GPUs, FPGAs or ASICs as well as heterogeneous data storage, including DRAM, NVRAM, flash memory, to complete their own tasks. We built systems that make more efficient use of heterogeneous system components to accelerate applications. The resulting system can accelerate (More)

Optimizing the I/O system software stack for emerging applications

With hardware accelerators improving the latency in computation, the system software stack that were traditionally underrated in designing applications becomes more critical. In ESCAL, we focus on those underrated bottlenecks to achieve significant performance improvement without using new hardware. The most recent example is the OpenUVR system, where we eliminate unnecessary memory copies and allow the VR system loop to complete within 14 ms latency with just modern desktop PC, existing WiFi network links, raspberry Pi 4b+ and an HDMI compatible head mount display.(More)

Accelerating non-AI/ML applications using AI/ML accelerators

The explosive demand on AI/ML workloads drive the emergence of AI/ML accelerators, including commercialized NVIDIA Tensor Cores and Google TPUs. These AI/ML accelerators are essentially matrix processors and are theoretically helpful to any application with matrix operations. This project bridges the missing system/architecture/programming language support in democratizing AI/ML accelerators. As matrix operations are conventionally inefficient, this project also revises the core algorithm in compute kernels to better utilize operators of AI/ML accelerators.

  • Kuan-Chieh Hsu and Hung-Wei Tseng. Accelerating Applications using Edge Tensor Processing Units. In The International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2021. [arXiv] [GitHub]

Intelligent Data Storage

As parallel computer architectures significantly shrinking the execution time in compute kernels, the performance bottlenecks of applications shift to the rest of part of execution, including data movement, object deserialization/serialization as well as other software overheads in managing data storage. To address this new bottleneck, the best approach is to not move data and endow storage devices with new roles.

Morpheus is one of the very first research project that implements this concept in real systems. We utilize existing, commercially available hardware components to build the Morpheus-SSD. The Morpheus model not only speeds up a set of heterogeneous computing applications by 1.32x, but also allows these applications to better utilize emerging data transfer methods that can send data directly to the GPU via peer-to-peer to further achieve 1.39x speedup. Summarizer further provides mechanisms to dynamically adjust the workload between the host and intelligent SSDs, making more efficient use of all computing units in a system and boost the performance of big data analytics. This line of research also helps Hung-Wei's team receive Facebook research award, 2018.

Building Efficient Heterogeneous Computers

As the discontinuation of Dannard scaling and Moore's Law, computers become heterogeneous. However, moving data among heterogeneous computing units and storage devices becomes an emerging bottleneck in these systems.

My research proposes the "Hippogriff" system that revisits the programming model of moving data in heterogeneous computer systems. Instead of using the conventional CPU-centric, programmer-specified methods, the Hippogriff system simplifies the application interface and provide a middle layer to efficiently handle the data movement. We also implemented peer-to-peer data transfer between the GPU and the SSD in the Hippogriff system.

The preliminary result demonstrates 46% performance gain by applying Hippogriff to a set of rodinia GPU applications. For highly optimized GPU MapReduce framework, Hippogriff still demonstrates up to 27% performance gain.

High-Quality, Low-Latency Realtime Systems

With hardware accelerators improving the latency in computation, the system software stack that were traditionally underrated in designing applications becomes more critical. In ESCAL, we focus on those underrated bottlenecks to achieve significant performance improvement without using new hardware. The most recent example is the OpenUVR system, where we eliminate unnecessary memory copies and allow the VR system loop to complete within 14 ms latency with just modern desktop PC, existing WiFi network links, raspberry Pi 4b+ and an HDMI compatible head mount display.

  • Alec Rohloff, Zackary Allen, Kung-Min Lin, Joshua Okrend, Chengyi Nie, Yu-Chia Liu, and Hung-Wei Tseng. OpenUVR: an Open-Source System Framework for Untethered Virtual Reality Applications. In 27th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2021, 2021. [arXiv] [Github]

Publications

(Full listing)

General-purpose computing on AI/ML accelerators

  • Kuan-Chieh Hsu and Hung-Wei Tseng. Accelerating Applications using Edge Tensor Processing Units. In The International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2021. [arXiv] [GitHub]

Smart Storage Systems

Application/Storage/Memory Systems Interaction

Data-triggered threads

Non-volatile Storage Systems

Computer Science Education

Wireless Networks

Software

Advising & Join

Graduate Students

I am currently advising the following top-notch graduate students:

Undergraduate Students

I also work with the following talented undergraduate students:

Alumni

I have also advised these students
  • Ziliang Zhang
  • Zecao Lu (C.S., M.S., NC State University, 2019. Now at Didi Labs)
  • Xindi Li (C.S., M.S., NC State University, 2018. Now at Bloomberg)
  • Chao Huang (C.S., M.S., NC State University, 2018. Now at Amazon)
  • Zackary Allen (C.S., B.S., NC State University, 2018. Now at Red Hat)
  • Alec Rohloff (C.S., B.S., NC State University, 2018.)
  • Te I (C.S., M.S., NC State University, 2018. Now at Google)
  • Vaibhava Lakshmi (ECE, M.S., NC State University, 2018. Dell EMC)
  • Murtuza Taher Lokhandwala (ECE, M.S., NC State University, 2018. Apple)
  • Mahesh Bonagiri(ECE, M.S., NC State University, 2018. Nvidia)
  • Hao Zhang (Continuing as PhD student at NC State University)
  • Joshua Okrend (Now working at a Government Contractor)
  • Timotius Oentung (Continuing at NC State University)
  • Kung-Min "Leo" Lin (Now pursuing C.S., B.S., University of California, Berkeley)
  • Chengyi "Eric" Nie (Now pursuing E.E., Ph.D., Stony Brooks University)

For prospects

Developing awesome ideas and training researchers are my duties as a professor. I am always looking for new graduate students. If you are interested at working with me, please apply to either Department of Electrical and Computer Engineering (Preferred) or the Department of Computer Science and Engineering of University of California, Riverside and mention me as a potential advisor in the application system.

Teaching

Youtube Channel

Upcoming

Prior Courses

News

  • Our paper, NDS: N-Dimensional Storage, is accepted by MICRO 2021 and nominated as a best paper award candidate!
  • Our paper, Accelerating Applications using Edge Tensor Processing Units, is accepted by SC 21! You may preview the paper in arXiv! Code and a github repo is coming up soon!
  • Our paper, OpenUVR: an Open-Source System Framework for Untethered Virtual Reality Applications, receives outstanding paper award by RTAS 2021! You may preview the paper in arXiv and build your own VR system now using our OpenUVR Github
  • Our paper, TPUPoint: Automatically Characterizing Hardware Accelerated Data Center Machine Learning Program Behavior, is accepted by ISPASS 2021!
  • Our paper, OpenUVR: an Open-Source System Framework for Untethered Virtual Reality Applications, is accepted by RTAS 2021! You may preview the paper in arXiv and build your own VR system now using our OpenUVR Github
  • Our paper, Dancing in the Dark: Profiling in the Age of Tiered Memory, is accepted by IPDPS 2021! More details and the open source software are coming!
  • My NSF proposal, CNS Core: Small: Re-engineering Applications for Tensor Processing Units , is awarded! Thanks NSF!
  • Our paper, Dynamic Multi-Resolution Data Storage, is chosen as one of the 12-paper IEEE Micro 2020 Top Picks from Computer Architecture Conferences.
  • Our paper, Dynamic Multi-Resolution Data Storage, is accepted by MICRO 2019.
  • Our paper, Pensieve: a Machine Learning Assisted SSD Layer for Extending the Lifetime, is accepted by ICCD 2018.
  • My NSF proposal, CSR: Small: IOQL: an I/O Interface for Near-Data Processing , is awarded! Thanks NSF!
  • Hung-Wei received Facebook Research Award, 2018.
  • Yu-Ching presented his work during the poster session of Non-Volatile Memory Workshop, 2018
  • I am invited to join the organization committee as a registration chair of ISCA, 2018 (The 45th International Symposium on Computer Architecture)
  • I am invited to join the organization committee as a web chair of ASPLOS, 2018 (The 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems)
  • Our paper with Murali's team at USC, "Summarizer: Trading Bandwidth with Computing Near Storage", is accepted by MICRO 2017!
  • My NSF proposal, CRII: CSR: Rethinking the FTL in SSDs -- a file translation layer instead of a flash translation layer, is awarded!
  • My paper with Dr. Steven Swanson's team at UCSD, KAML: A Flexible, High-Performance Key-Value SSD, is accepted by HPCA 2017!
  • I am invited to join the organization committee as a web chair of HPCA, 2017 (The 23rd IEEE Symposium on High Performance Computer Architecture)
  • I am invited to join the organization committee as a web chair of MICRO, 2016 (The 49th Annual IEEE/ACM International Symposium on Microarchitecture)