We built Intelligent Data Storage Systems, including Summarizer and Morpheus, to demonstrate the potential of exposing the processing power inside the controllers of modern non-volatile storage devices. The resulting systems demonstrate significant speedups in GPGPU applications, database systems, and the potential of machine learning applications. (More)
As the slowdown of Moore's Law and the discontinuation of Dennard Scailing, big-data applications (e.g. machine learning, data analytics, scientific computing, and etc.,.) must rely on heterogeneous computing units, including GPUs, FPGAs or ASICs as well as heterogeneous data storage, including DRAM, NVRAM, flash memory, to complete their own tasks. We built systems that make more efficient use of heterogeneous system components to accelerate applications. The resulting system can accelerate (More)
With the advance of machine learning techniques, many heurestic-based mechanisms can potentially be replaced by machine-learning models. We demonstrated that an machine-learning assisted SSD can extend its life time by 17%, without modifications and additional hints from the software systems but adding zero costs to existing storage devices. (More)
NWith hardware accelerators improving the latency in computation, the system software stack that were traditionally underrated in designing applications becomes more critical. In ESCAL, we focus on those underrated bottlenecks to achieve significant performance improvement without using new hardware. The most recent example is the OpenUVR system, where we eliminate unnecessary memory copies and allow the VR system loop to complete within 14 ms latency with just modern desktop PC, existing WiFi network links, raspberry Pi 4b+ and an HDMI compatible head mount display.(More)
As parallel computer architectures significantly shrinking the execution time in compute kernels, the performance bottlenecks of applications shift to the rest of part of execution, including data movement, object deserialization/serialization as well as other software overheads in managing data storage. To address this new bottleneck, the best approach is to not move data and endow storage devices with new roles.
Morpheus is one of the very first research project that implements this concept in real systems. We utilize existing, commercially available hardware components to build the Morpheus-SSD. The Morpheus model not only speeds up a set of heterogeneous computing applications by 1.32x, but also allows these applications to better utilize emerging data transfer methods that can send data directly to the GPU via peer-to-peer to further achieve 1.39x speedup. Summarizer further provides mechanisms to dynamically adjust the workload between the host and intelligent SSDs, making more efficient use of all computing units in a system and boost the performance of big data analytics. This line of research also helps Hung-Wei's team receive Facebook research award, 2018.
As the discontinuation of Dannard scaling and Moore's Law, computers become heterogeneous. However, moving data among heterogeneous computing units and storage devices becomes an emerging bottleneck in these systems.
My research proposes the "Hippogriff" system that revisits the programming model of moving data in heterogeneous computer systems. Instead of using the conventional CPU-centric, programmer-specified methods, the Hippogriff system simplifies the application interface and provide a middle layer to efficiently handle the data movement. We also implemented peer-to-peer data transfer between the GPU and the SSD in the Hippogriff system.
The preliminary result demonstrates 46% performance gain by applying Hippogriff to a set of rodinia GPU applications. For highly optimized GPU MapReduce framework, Hippogriff still demonstrates up to 27% performance gain.
The advancement of machine learning techniques enables more accurate predictions, data classifications and lead to improved decision making. This is especially helpful for dealing with system design issues that traditionally rely on heuristics. In this project, we use machine learning models to replace traditional heuristic-based mechanisms to better assist the management of storage systems. The initial result shows 19% extension in SSD lifetime without adding any hardware cost.
With hardware accelerators improving the latency in computation, the system software stack that were traditionally underrated in designing applications becomes more critical. In ESCAL, we focus on those underrated bottlenecks to achieve significant performance improvement without using new hardware. The most recent example is the OpenUVR system, where we eliminate unnecessary memory copies and allow the VR system loop to complete within 14 ms latency with just modern desktop PC, existing WiFi network links, raspberry Pi 4b+ and an HDMI compatible head mount display.