Log-Structured File System
Overview
- disk storage management: a log-structured file system
- The log is the only structure on disk, it contains indexing information so that files can be read back from the log efficiently.
- Based on this assumption: files are cached in main memory and that increasing memory sizes will make the caches more and more effective at satisfying read requests. As a result, disk traffic will become dominated by writes.
- This approach (log) increases write performance dramatically by eliminating almost all seeks.
- The sequential nature of the log also permits much faster crash recovery
Problems of existing FS
- spread information around the disk in a way that causes too many samll acesses.
- write synchronously (the app must wait)
Problems of FFS
- Layout
- possibly related files are physically separated
- inodes are separate from files
- directory entries are separate from inodes
- low disk b/w utilization
- sync
- Unix FFS write file data blocks asynchronously, but metadata synchronously.
LFS
At high-level, how does it works
- treat disk as a single log
- buffer small writes into very large writes to max the usage of disk b/w
2 key challenge
- how to find data in the log
- reading data, invalidating deleted or overwritten data
- how to manage free space on disk
- will run out of disk
- need to recover deleted blocks in old log
- how to find data in the log
Location
How to solve that inodes are hard to be found?
- level of indirection (add overhead)
- use index tables (inode maps)
- inode maps map file # into the inode location
- they are also written into log
- pointers in all inode maps blocks are kept in checkpoint region
- checkpoint region has fixed location, so it's easy to find
level of indirection adds overhead
- solution: keep inode map blocks in the memory (they are small)
Free space management
- Threading or Copying? -> combine
Segment Cleaning
- fragment log into segments
- thread segment within disk
- copying within segment
To identify which blocks of each segment are live, a segment summary block is added as part of each segment. The segment summary block contains the file number and block number for the block.
- No free-block list or bitmap is needed in Sprite LFS.
Clean plicy
- cost-benefit
LFS maintains a data structure called the segment usage table. For each segment, the table records the number of live bytes in the segment and the most recent modified time of any block in the segment.
Which one to clean?
- clean with age of segment
Crash Recovery
- checkpoint (every 30s)
- roll-forward
Differences with flash
- no need for sequential writes