Just a TLB for OSDI '20. Not all papers in this conference would be mentioned as title lists.
links of the title (Part - N) are from SJTU IPADS Notes

Part 1 link:

Session-1: Correctness
Session-2: Storage
- Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache
  把Learned Index做到传统分布式的RDMA KVS上。Client上小缓存也有好效果。open source
- CrossFS: A Cross-layered Direct-Access File System 做了一个跨多层（user, kernel, firmware）的文件系统。用per-inode Interval Tree粒度锁+FD Queue将文件系统上的并发优化转化为I/O队列的排序问题。open source, more
- From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees
  因为sstable大多不变，因此在之上做Learned Index。利用sstable的生命周期、访问特征来优化模型（贪心分段线性回归）训练开销。优先学低层和活了一定时间的文件，避免学习将被快速compaction等的低寿命文件。more, open-source
- LinnOS: Predictability on Unpredictable Flash Storage with a Light Neural Network
  用NN预测因为FTL影响的SSD实时性能。直接用回归模型的效果很差，根据SSD latency的CDF指定一个threshold转变为分类问题。feature选择上，用related I/O addr是最准的，但速度很慢。最后选择的feature：I/O pending queue size（当前的和以前的）; 最近R次I/O的延迟。acc 87-97%, extra overhead: 4-6 μs。另外因为FP和TN的惩罚不同，也用了biased training。
- A large scale analysis of hundreds of in-memory cache clusters at Twitter
  twitter的KVS工业界大规模负载特征分析。>35%集群有>30%的写请求。TTL短。Zipf。KV size小。Key size大。open source
Session-3: OS & Networking
- Semeru: A Memory-Disaggregated Managed Runtime
  在DSM上做了一个基于Java的分布式GC。DN可以更快速完成GC。1. DN通过遍历，得到存活对象的闭包。2. CN和MN暂停，一起开始GC，回收mem，迁移活数据，回收cache。open source
- Caladan: Mitigating Interference at Microsecond Timescales
  解决单机上延迟敏感任务被其他任务干扰导致的尾延迟（e.g. Memcached & GC）。Caladan通过一个中心scheduler接收应用通过runtime发送的信息、硬件的信息等，来进行检测干扰（3 cases: 内存带宽、超线程、LLC缓存干扰）。
- AIFM: High-Performance, Application-Integrated Far Memory
  far memory场景下，不手动通过RDMA语义来修改codes会带来性能下降。AIFM做了一个牺牲透明性（code需要显式区分近端远端mem，以及维护一些flag），但不需要大量修改code的方法（如通过RDMA verb访问），避免了原来的方案在分页、cache污染、阻塞等上产生的问题。open source
Session-4: Consistency

Part 2 link

Session-5: Machine Learning
Session-6: Consensus
Session-7: Bugs
Session-8: Scheduling

Part 3 link

Session-9: Machine Learning 2
Session-10: Hardware
- Assise: Performance and Availability via Client-local NVM in a Distributed File System 把data和一部分metadata尽可能存在本地，以此减少主导的网络开销，提升recovery速度。open source, more
- Persistent State Machines for Recoverable In-memory Storage Systems with NVRam 用状态机来抽象存储系统的语义，通过状态同步来在本地PM上维护一个存储系统的副本，实现recoverable的同时尽可能减少开发上的开销。 more
- AGAMOTTO - How Persistent is your Persistent Memory Application 一个bugs相关工作，检测code中Missing Flush/Fence Pattern和Extra Flush/Fence Pattern两种bug分别造成的正确性和性能问题。
Session-11: Security
Session-12: Clusters
- Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions 基于静态规则和ML通过A/B test来分析大规模的分布式VM日志，预测系统稳定性。

Gray's grind

OSDI '20 TLB

Part 1 link:

Part 2 link

Part 3 link

Search

Table of Contents