Offload Erasure Coing to RDMA NIC
High speed networks like RDMA is under super fast development. 800 Gbps NIC is about to show. Though a bunch of works focus on how to make EC faster, onload erasure coding like ISA-L is not following up network closely. As a result, the bottleneck of systems switches to computing. Besides martix computing in EC is appropriate for offloading to processors on PCI-E, which can save CPU resouce at the same time
Jerasure without SSE, ISA-L, Mellanox-EC with CX5, Gibraltar with K80 GPU.
HPDC ‘19 UMR-EC: A unified and multi-rail erasure coding library for high-performance distributed storage systems
SC ‘19 Triec: tripartite graph based erasure coding NIC offload
BiEC: One NIC with one node, this node use this NIC to transfer data and coding data.
- only one NIC involved in coding
- not fully utilize NIC’s network BW
- NIC only supports encode_and_send. receive_and_decode is ingnored
So they split coding tasks to some sub-tasks to different nodes with NIC. As a result, it changes to a triparitite graph.
Lots opts here looks like extending local coding tricks to a distributed systems. Can we do more?
SC ‘20 INEC: fast and coherent in-network erasure coding
- Shi, Haiyang, Xiaoyi Lu, and Dhabaleswar K. Panda. “EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures.” International Symposium on Benchmarking, Measuring and Optimization. Springer, Cham, 2018.
- Shi, Haiyang, and Xiaoyi Lu. “Triec: tripartite graph based erasure coding NIC offload.” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019.
- Shi, Haiyang, and Xiaoyi Lu. “INEC: fast and coherent in-network erasure coding.” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2020.
- Shi, Haiyang, et al. “High-performance multi-rail erasure coding library over modern data center architectures: early experiences.” Proceedings of the ACM Symposium on Cloud Computing. 2018.
- Shi, Haiyang, et al. “UMR-EC: A unified and multi-rail erasure coding library for high-performance distributed storage systems.” Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. 2019.