The paper is the first to propose a new virtualization mechanism that can natively support scale-out acceleration across **heterogeneous** cloud FPGAs and demonstrate its effectiveness on a custom FPGA cluster with heterogeneous FPGA resources. Cloud …
3D-stacking memory technology such as High-Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) provides orders of magnitude more bandwidth and significantly increased channel-level parallelism (CLP) due to the new parallel memory architecture. …
Field-Programmable Gate Arrays (FPGAs) have been integrated into the cloud infrastructure to enhance its computing performance by supporting on-demand acceleration. However, system support for FPGAs in the context of the cloud environment is still in …
A nonvolatile fully programmable processing-in-memory (PIM) processor named Liquid Silicon (L-Si) is demonstrated, which combines the superior programmability of general-purpose computing devices (e.g. FPGA) and the high power efficiency of …
Emerging 3D memory technologies, such as the Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM), provide increased bandwidth and massive memory-level parallelism. Efficiently integrating emerging memories into existing system pose new …
We present a new method for Product Quantization (PQ) based approximated nearest neighbor search (ANN) in high dimensional spaces. Specifically, we first propose a quantization scheme for the codebook of coarse quantizer, product quantizer, and …
Despite the state-of-the-art accuracy of Deep Neural Networks (DNN) in various classification problems, their deployment onto resource constrained edge computing devices remains challenging due to their large size and complexity. Several recent …