(SC '21) LogECMem: Coupling Erasure-Coded In-memory Key-Value Stores with Parity Logging

2021/10/11 Storage

LogECMem uses a hybrid method of in-place update and Parity logging (PL) for parity updates.

motivation

old update policies:

  1. direct reconstruction: read all non-updated data, and compute the new parity with old parities. (huge data transfer costs)
  2. in-place update: read old parity, and compute the parity delta by data delta. (too many reads to old parity
  3. full-stripe update: out-of-place update, and GC stale data chunks. (no parity reads but it brings high space cost
  4. PL: logging the parity deltas (But PL is designed for disk-based systems

So they built LogECMem by in-place update and PL.

methods

design

They use Markov models to do reliability analysis, and as a result, they claim that the single-failure repair is the most important case. Based on this observation, they keep only one parity chunk of each stripe, the XOR parity, in DRAM nodes.

Like buffer logging of RAMCloud[2], they use buffer logging for other parity chunks to accelerate writes:
image.png (the buffer here is DRAM)

note: PM is a good log device, and not limited by capacity. This approach may beat [3], which keeps data in PM, and parity in DRAM.
btw, the persistence in buffer logging of RAMCloud[2] is ensured by battery-based DRAM. So PM is appropriate and may provide fast recovery (since the gap between DRAM and disk is big)? But it’s weird to talk about logging persistence on a storage system keeping all data in DRAM….
related: flatstore

With this XOR parity in DRAM, systems can perform degraded read in DRAM nodes.

Op

update: image.png

check paper for details

merge-based buffer logging: a log merging trick

multiple chunk failures repair

PLR (FAST ‘14) trades write performance for the repair performance. Simple merging logging on PLR only can merge incoming parity deltas.
So they use a lazy merging strategy (parity logging with merging, PLM) that writes parity to extra continuous disk space, and then reads them back for merging later.

image.png

note: kindof 2-level? the first non-ordered level won’t hurt perf?
also, any perf bottleneck in logging and 2nd-level logging replacement (GC-like)? It seems like only overall perf test in the experiment part.

expriments

……

ref

  1. Cheng, et al. LogECMem: Coupling Erasure-Coded In-memory Key-Value Stores with Parity Logging, SC ‘21.
  2. Ousterhout, John, et al. “The RAMCloud storage system.” ACM Transactions on Computer Systems (TOCS) 33.3 (2015): 1-55.
  3. 吴炀, et al. “种高效的混合内存布局机制与编码技术.” 计算机科学 44.6 (2017).

Search

    Table of Contents