(OSDI '20) Persistent State Machines for Recoverable In-memory Storage Systems with NVRam

2020/11/20 Storage

paper notes about Persimmon[1] from UCB & MSR uses state machine scheme to utilize PM to add persistence to in-mem storage systems

background

Challenge:

  • Move DRAM system directly can’t bring crash consistency (all or nothing)

    here the crash consistency is like transaction.

  • normal approach is logging
    • complex codes and high overhead

Persimmon

overview

Key insight: in-mem storage system is a state machine

  • encapsulate state
  • atomic operations

So state machines can be PM abstraction.

Persimmon keeps 2 copies of state machine in DRAM and PM.

op logging for crash consistency

  • fast and 1 seq write to PM
  • replay on the PM state machine (aka “shadow execution”)

image.png

state machine ops should follow:

  • no external dependencies. e.g keep file descriptor
  • execute deterministically
  • no external side effect. e.g. network op

log replay => no concurrency!

PSM (Persistent State Machines) API:

  • psm_init(): initialize
  • psm_invoke_rw(op): read write op with persistence
  • psm_invoke_ro(op): read without persistence

like a middle ware for coding…

pros:

  • low coding effort
  • low latency

cons:

  • cost 2x CPU and space
  • shadow execution would be bottleneck when write-heavy workload

implements

dynamic binary instrumentation (undo logging) to ensure PM write’s crash consistency.
Optimization:

  • undo-log in 32B blocks
  • dedup: log each block only once

just a undo logging?

recovery:

  • use undo log to reover PM state machines
  • shadow execute remaining operations
  • copy PM state machine to DRAM

Usage

code modification needed for software upon PSM:

image.png

While 3000 lines on Redis built-in PM supports

eval

TAPIR: similar performance with redis…

image.png

note: it works well for read-heavy workloads, but fastly degenerates after write percentage over 40% because of the logging queue is blocked. While 40~50% write workloads are not rare on real enviroments[2]

image.png naive logging persistence strategy can handle >10% write workload!

image.png recovery use more space to achieve high speed. but PM is cheaper!

note: ofc, Persimmon is like a local shadow server on PM… And PM is big enough to handle the second storage system
also, persimmon’s logging suffers write amplification too.
entries granularity is small and may finally decrease PM write performance, even with some de-dup tricks mentioned in paper.
Besides, DNs on some closely coupling storage system have to handle network io to ensure consistency or something else. This kind of “complex system” can’t be modeled by state machines.
Set up multi state machines on a single machine may do help to realize a certain degree of concurrency

refer

[1] Zhang, Wen, Scott Shenker, and Irene Zhang. “Persistent State Machines for Recoverable In-memory Storage Systems with NVRam.” 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 2020. [2] Yang, Juncheng, Yao Yue, and K. V. Rashmi. “A large scale analysis of hundreds of in-memory cache clusters at Twitter.” Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20), Banff, AL, Canada. 2020.

Search

    Table of Contents