Haystack
Notes
Overview
Network File System (NFS) is a distributed file system protocol, allowing a user on a client computer to access files over a computer network much like local storage is accessed.
Features of FB users
- upload much each week
- visit often
Long Tail Issue
Goals
- high throughout low latency
- fault-tolerate
- cost-effective
- simplicity
features
- enormous amount of metadata (namespace directories and file inodes)
- the amount of metadata far exceeds the caching abilites of the NFS storage tier, resulting in mulitple I/O operations per photo upload or read requests
- high degree of reliance on CDNs = expensive
Haystack
- reduce I/O
Step
- web server receive the request
- uses Haystack Directory to construct URL
- http://⟨CDN⟩/⟨Cache⟩/⟨Machine id⟩/⟨Logical volume, Photo⟩
Haystack Directory Main functions
- it provides a mapping from logical volumes to physical volumes. Web servers use this mapping when uploading photos and also when constructing the image URLs for a page request.
- the Directory load balances writes across logical volumes and reads across physical volumes.
- the Directory determines whether a photo request should be handled by the CDN or by the Cache. This functionality lets us adjust our dependence on CDNs.
- the Directory identifies those logical volumes that are read-only either because of operational reasons or because those volumes have reached their storage capacity. We mark volumes as read-only at the granularity of machines for operational ease.
Haystack Cache
- distributed hash table, uses photo's id to locate cached data
- receives HTTP requests for photos from CDNs and also directly from users’ browsers.
- Add a photo to Cache if two conditions are met:
- The request comes directly from a user and not the CDN and
- The photo is fetched from a write-enabled Store machine. (which shows that this photo was uploaded recently)
needle
- A Store machine represents a physical volume as a large file consisting of a superblock followed by a sequence of needles.
- Each needle represents a photo stored in Haystack.
Questions
why put all metadata in memory?
much faster to look up