Hung-Wei Tseng

Associate Professor, University of California, Riverside

Hung-Wei Tseng

I am currently an associate professor in the Department of Electrical and Computer Engineering and a cooperating faculty of the Department of Computer Science and Engineering at University of California, Riverside, where I am now leading the Extreme Scale & Computer Architecture Laboratory.

I am interested in designing architecture, programming language frameworks, and system infrastructures that allow applications and programmers to use modern heterogeneous hardware components more efficiently. My recent focus is on using AI/ML accelerators to improve the performance of non-AI/ML workloads (e.g., GPTPU, TCUDB).

Our research has been recognized by IEEE MICRO Top Picks From Computer Architecture Conferences in 2024 and 2020. We also received best paper nominations from IEEE/ACM International Symposium on Microarchitecture in 2021 and 2019 as well as Facebook Research Award, 2018. In addition, we also applied our knowledge in optimizing storage systems to wireless network system stacks. We developed the OpenUVR that enabled high-quality, untethered VR experience on commodity hardware components and won the outstanding paper award in RTAS 2021.

If you're interested in joining my research group, you should apply to UCR's ECE or CSE programs. Please also fill out this form to express your interest in my group after you have applied. I won't interview or respond to students until we received your applications.

News

Top

Research Projects

GPTPU Prototype

Accelerating non-AI/ML applications using AI/ML accelerators

The explosive demand on AI/ML workloads drive the emergence of AI/ML accelerators, including commercialized NVIDIA Tensor Cores and Google TPUs. These AI/ML accelerators are essentially matrix processors and are theoretically helpful to any application with matrix operations. This project bridges the missing system/architecture/programming language support in democratizing AI/ML accelerators. As matrix operations are conventionally inefficient, this project also revises the core algorithm in compute kernels to better utilize operators of AI/ML accelerators.

Democractizing hardware accelerators

Beyond AI/ML accelerators, modern systems also integrate other types of accelerators for more application domains. Ray Tracing accelerator is one example type of hardware that becomes more popular modern systems to fulfill the demand of gaming and virtual/mixed realities. These accelerators complement the deficiency of AI/ML accelerators in accelerating algorithms with divergent control flows or irregular memory access patterns. Democractizing these accelerators will improve the performance of traditionally hard-to-parallelize problems that currently have to rely on slowly improved CPU architectures.

Innovative hardware accelerator architectures

ESCAL also focuses on designing hardware accelerators for important application domains and complement the missing problems that existing accelerators cannot tackle.

Selected Publications

(Full listing)

Top

Open Source Projects

Top

Advising & Join

ESCAL Group Photo

If you're interested at joining my research group, you should apply to UCR's ECE or CSE" programs. Please also fill the following form to express your interests in my group. https://forms.gle/nhakt2qRwEsYrxGf7. I will not review and consider applicants who did not fill this form.

Graduate Students

I am currently advising the following top-notch graduate students:

Undergraduate Students

I also work with the following talented undergraduate students:

Alumni

I have also advised these students who graduated

For prospects

Developing awesome ideas and training researchers are my duties as a professor. I am always looking for new graduate students. If you are interested at working with me, please apply to either Department of Electrical and Computer Engineering (Preferred) or the Department of Computer Science and Engineering of University of California, Riverside and mention me as a potential advisor in the application system.

Top

Teaching

Youtube Channel

Upcoming

Present

Prior Courses

Top

Services

I have been actively serving both in the research community and my department.

Research community

Department service

Top

Prior Research Projects

Intelligent Data Storage

As parallel computer architectures significantly shrinking the execution time in compute kernels, the performance bottlenecks of applications shift to the rest of part of execution, including data movement, object deserialization/serialization as well as other software overheads in managing data storage. To address this new bottleneck, the best approach is to not move data and endow storage devices with new roles.

Morpheus is one of the very first research project that implements this concept in real systems. We utilize existing, commercially available hardware components to build the Morpheus-SSD. The Morpheus model not only speeds up a set of heterogeneous computing applications by 1.32x, but also allows these applications to better utilize emerging data transfer methods that can send data directly to the GPU via peer-to-peer to further achieve 1.39x speedup. Summarizer further provides mechanisms to dynamically adjust the workload between the host and intelligent SSDs, making more efficient use of all computing units in a system and boost the performance of big data analytics. This line of research also helps Hung-Wei's team receive Facebook research award, 2018.

Building Efficient Heterogeneous Computers

As the discontinuation of Dannard scaling and Moore's Law, computers become heterogeneous. However, moving data among heterogeneous computing units and storage devices becomes an emerging bottleneck in these systems.

My research proposes the "Hippogriff" system that revisits the programming model of moving data in heterogeneous computer systems. Instead of using the conventional CPU-centric, programmer-specified methods, the Hippogriff system simplifies the application interface and provide a middle layer to efficiently handle the data movement. We also implemented peer-to-peer data transfer between the GPU and the SSD in the Hippogriff system.

The preliminary result demonstrates 46% performance gain by applying Hippogriff to a set of rodinia GPU applications. For highly optimized GPU MapReduce framework, Hippogriff still demonstrates up to 27% performance gain.

High-Quality, Low-Latency Realtime Systems

With hardware accelerators improving the latency in computation, the system software stack that were traditionally underrated in designing applications becomes more critical. In ESCAL, we focus on those underrated bottlenecks to achieve significant performance improvement without using new hardware. The most recent example is the OpenUVR system, where we eliminate unnecessary memory copies and allow the VR system loop to complete within 14 ms latency with just modern desktop PC, existing WiFi network links, raspberry Pi 4b+ and an HDMI compatible head mount display.

Top