ACM SIGOPS名人堂系列
The Structure of the “THE”-Multiprogramming
The Working Set Model for Program
Hints for Computer System Design
Safe Kernel Extensions Without Run-Time Checking
Time, Clocks, and the Ordering of Events in a Distributed System
Implementing Remote Procedure Calls
End-To-End Arguments in System Design
Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial
an exercise in distributed computing
Scale and performance in a distributed file system
Disco : Running commodity operating system on scalable multiprocessor
The Multics Virtual Memory: Concepts and Design
Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency
Experience with processes and monitors in Mesa
VAXclusters: A Closely-Coupled Distributed System
Using Encryption for Authentication in Large Networks of Computers
Crash Recovery in a Distributed Data Storage System
The Recovery Manager of the System R Database Manager
Why Do Computers Stop and What Can Be Done About It?
Programming Semantics for Multiprogrammed Computations
A Case for Redundant Arrays of Inexpensive Disks(RAID)
A New Primary Copy Method to Support Highly-Available Distributed Systems
Memory Coherence in Shared Virtual Memory Systems
The Design and Implementation of a Log-Structured File System
Tenex, A Paged Time Sharing System for the PDP-10
Distributed Snapshots: Determining Global States of a Distributed System
Exploiting Virtual Synchrony in Distributed Systems
A virtual machine time-sharing system
On the criteria to be used in decomposing systems into modules
On optimistic methods for concurrency control
Disconnected operation in the Coda File System
Transactional memory: architectural support for lock-free data structures
Efficient software-based fault isolation
ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay
MapReduce : Simplified Data Processing on Large Clusters
Bigtable : A Distributed Storage System for Structured Data
The Chubby lock service for loosely-coupled distributed systems
Dynamo: Amazon’s Highly Available Key-value Store
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
Memory resource management in VMware ESX server
资源调度
Large-scale cluster management at Google with Borg
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
分布式
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
Gossip(Gossip and Epidemic Protocols , Gossip Survy)
The implementation of reliable distributed multiprocess systems
The byzantine generals problem
Impossibility of Distributed Consensus with One Faulty
Process
Paxos vs. Viewstamped Replication vs. Zab
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
大数据系列
基础
An Efficient Design and Implementation of LSM-Tree based Key-Value Store on Open-Channel SSD
文件存储
The Hadoop Distributed File System
Ceph: A Scalable, High-Performance Distributed File System
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
Column-Stores vs. Row-Stores: How Different Are They Really?
RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
数据存储
Dynamo: Amazon’s Highly Available Key-value Store
Cassandra - A Decentralized Structured Storage System
Serving Large-scale Batch Computed Data with Project Voldemort
Bigtable: A Distributed Storage System for Structured Data
Spanner: Google’s Globally-Distributed Database
Hypertable Architecture Overview
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
数据处理
MapReduce: Simplified Data Processing on Large Clusters
Dremel: Interactive Analysis of Web-Scale Datasets
Pregel: A System for Large-Scale Graph Processing
Large-scale Incremental Processing Using Distributed Transactions and Notifications