In this article, we will list several papers on local NVM/PM fault tolerance. (the details may be filled in later.
QoS (LB) on persistent memory systems to avoid interference.
Problem Due to RDMA NIC implementation, RNIC doesn’t have remote persistent flush primitives. So one-sided write data from clients will write to the volatile cache on RNIC first and then RNIC directly sends ACK back before writing data to PM. As a result, a power loss will break remote data persistence easily.
LogECMem uses a hybrid method of in-place update and Parity logging (PL) for parity updates.
learned index + PM. APEX: A High-Performance Learned Index on Persistent Memory[1]
Some industry works about how to utilize DRAM+PM archi as cache (from facebook and twitter).
RDMA+KVS. Different with local hashing, insertion, deletion and update are expensive in RDMA environments. So a carefully index design based on one-sided RDMA ops is crucial.
An experiment-driven work from SJTU-IPADS shows some methods to achieve better performance in NVM+RDMA systems.
Some eADR related works
StripeMerge shows a method to merge narrow stripes to wide stripes.