Google File System
Reading notes
Features
- a scalable distributed file system
- component failures are the norm rather than the exception.
- files are huge by traditional standards.
- most files are mutated by appending new data rather than overwriting existing data. Random writes within a file are practically non-existent.
- co-designing the applications and the file system API benefits the overall system by increasing our flexibility.
Architecture
- Files are divided into fixed-size chunks.
- The master maintains all file system metadata.
- Neither the client nor the chunkserver caches file data.
- Clients do cache metadata
- Clients never read and write file data through the master.
Questions
How does it tolerate the failure of a chunk server?
- Master notices missing heartbeats. Serve requests from other replicas.
- Master decrements count of replicas for all chunks on dead chunkserver
- Master re-replicates chunks missing replicas