Offload Erasure Coing to RDMA NIC

EC

Background

Offload Erasure Coing to RDMA NIC

High speed networks like RDMA is under super fast development. 800 Gbps NIC is about to show. Though a bunch of works focus on how to make EC faster[1], onload erasure coding like ISA-L is not following up network closely. As a result, the bottleneck of systems switches to computing. Besides martix computing in EC is appropriate for offloading to processors on PCI-E, which can save CPU resouce at the same time

image.png

Jerasure without SSE, ISA-L, Mellanox-EC with CX5, Gibraltar with K80 GPU.

HPDC ‘19 UMR-EC: A unified and multi-rail erasure coding library for high-performance distributed storage systems

SC ‘19 Triec: tripartite graph based erasure coding NIC offload

BiEC: One NIC with one node, this node use this NIC to transfer data and coding data.

problem:

  • only one NIC involved in coding
  • not fully utilize NIC’s network BW
  • NIC only supports encode_and_send. receive_and_decode is ingnored

So they split coding tasks to some sub-tasks to different nodes with NIC. As a result, it changes to a triparitite graph.

image.png

Lots opts here looks like extending local coding tricks to a distributed systems. Can we do more?

SC ‘20 INEC: fast and coherent in-network erasure coding

refer

  1. Shi, Haiyang, Xiaoyi Lu, and Dhabaleswar K. Panda. “EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures.” International Symposium on Benchmarking, Measuring and Optimization. Springer, Cham, 2018.
  2. Shi, Haiyang, and Xiaoyi Lu. “Triec: tripartite graph based erasure coding NIC offload.” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019.
  3. Shi, Haiyang, and Xiaoyi Lu. “INEC: fast and coherent in-network erasure coding.” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020.
  4. Shi, Haiyang, et al. “High-performance multi-rail erasure coding library over modern data center architectures: early experiences.” Proceedings of the ACM Symposium on Cloud Computing. 2018.
  5. Shi, Haiyang, et al. “UMR-EC: A unified and multi-rail erasure coding library for high-performance distributed storage systems.” Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. 2019.

Search

    Table of Contents