(updating) Some in-mem erasure-code systems

2021/06/23 Storage

background

EC is mainly for cold storage, providing low redundancy costs and availability. While hot storage need to keep QoS, so expensive replication is more common.

Also EC introduces expensive encoding and transferring costs when writing or reading data. As a result, more CPU and disk I/O will be used. But thanks to ISA-L and some lib, encoding in cache is quite fast. So usually, encoding is not the bottleneck in a carefully designed EC system.

Some popular EC works:

  • reading
    • LRC, Clay
  • writing
    • logging: append update of data and parity to prevent write-after-read when updating, but it will incur extra reading cost.
    • Parity Logging: only logging parity deltas to improve READs(data), and small updates. But logging of parity stills makes recovery slow (random read).
    • Parity Logging with reserved space (PLR, FAST ‘14): pre-allocate space beside the primary parity chunks to prevent random read when recovery.
    • PARIX (ATC ‘17): use delta of data instead of delta of parity to encode new parity.

Why we need EC in hot storage?

just don’t get it…

some guys claim that:

  1. distributed memory systems cause high tail-latency
    • EC can off-loads requests to nodes in low loading, while keeping lower redundancy

caching

(OSDI ‘16) EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding

EC-Cache from K. V.Rashmi.

story: in-mem systems suffer skew workloads -> so they use selective replication -> mem space is limited -> EC.

split an object to $k$ parts, and encode to $r$ parity. So

image.png

this pic is from 朱乐乐 in zhihu

data management

(TODS ‘05) LH* RS—a highly-available scalable distributed data structure

(SYSTOR ‘17) Erasure coding for small objects in in-memory KV storage

MemEC from Patrick, which applies EC for big values of KVS.

hybrid encodings

To save storage cost while keeping availability, they use EC for large data (like value).

image.png

all-encoding

but if value is too small, then MemEC encode the whole objects. And one chunk (4KB) is related to multiple [metadata, key, value] pairs.

Once the size of a new chunk reach its limit, it will be sealed. And only sealed data chunk will be encoded. Also, data in chunks can be updated unless the size is larger than the limit. Thanks to linear combination, the cost of updating parity chunk is small.

Chunk and object index maps are based on cuckoo hashing.

seems like a lazy-approach…

image.png

(FAST ‘16, TOS ‘17) Efficient and available in-memory KV-store with hybrid erasure coding and replication

Cocytus from IPADS, SJTU.

Just like MemEC, there are excessive updates on metadata

  • -> so metadata in replication, and data in EC

Race Condition in Online Recovery. image.png this is pic is from their slides.


image.png

EC-GROUP is the basic component:

  • M -> metadata
  • D -> data
  • P -> parity

image.png

a lil bit of engineering work? but they did lots of works about high perf in-memory systems with high impacts. Mojim, Pilaf, FaRM, HERD are also cued in this paper.

consistent hashing

(SoCC ‘19) Coupling decentralized key-value stores with erasure coding

(EuroSys ‘18) Fast and strongly consistent per-item resilience in key-value stores

this work is from ETH Zurich. Ring

Stretched RS code: image.png SRS code just changes the placement strategy, like this figure above. The reliability is not changed.

quick SRS example: image.png

wide-stripe

(SC ‘21) LogECMem: Coupling Erasure-Coded In-memory Key-Value Stores with Parity Logging

LogECMem uses a hybrid updating strategy with parity logging and in-place updating in wide-stripe EC system.

ref

  1. Rashmi, K. V., et al. “EC-Cache: Load-balanced, low-latency cluster caching with online erasure coding.” 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016.
  2. Litwin, Witold, Rim Moussa, and Thomas Schwarz. “LH* RS—a highly-available scalable distributed data structure.” ACM Transactions on Database Systems (TODS) 30.3 (2005): 769-811.
  3. Yiu, Matt MT, Helen HW Chan, and Patrick PC Lee. “Erasure coding for small objects in in-memory kv storage.” Proceedings of the 10th ACM International Systems and Storage Conference. 2017.
  4. Chen, Haibo, et al. “Efficient and available in-memory KV-store with hybrid erasure coding and replication.” ACM Transactions on Storage (TOS) 13.3 (2017): 1-30.
  5. Cheng, Liangfeng, Yuchong Hu, and Patrick PC Lee. “Coupling decentralized key-value stores with erasure coding.” Proceedings of the ACM Symposium on Cloud Computing. 2019.
  6. Taranov, Konstantin, Gustavo Alonso, and Torsten Hoefler. “Fast and strongly-consistent per-item resilience in key-value stores.” Proceedings of the Thirteenth EuroSys Conference. 2018.

Search

    Table of Contents