Haystack

Notes

Overview

Network File System (NFS) is a distributed file system protocol, allowing a user on a client computer to access files over a computer network much like local storage is accessed.

Features of FB users

  • upload much each week
  • visit often

Long Tail Issue

Goals

  • high throughout low latency
  • fault-tolerate
  • cost-effective
  • simplicity

features

  • enormous amount of metadata (namespace directories and file inodes)
  • the amount of metadata far exceeds the caching abilites of the NFS storage tier, resulting in mulitple I/O operations per photo upload or read requests
  • high degree of reliance on CDNs = expensive

Haystack

  • reduce I/O

Step

Haystack Directory Main functions

  1. it provides a mapping from logical volumes to physical volumes. Web servers use this mapping when uploading photos and also when constructing the image URLs for a page request.
  2. the Directory load balances writes across logical volumes and reads across physical volumes.
  3. the Directory determines whether a photo request should be handled by the CDN or by the Cache. This functionality lets us adjust our dependence on CDNs.
  4. the Directory identifies those logical volumes that are read-only either because of operational reasons or because those volumes have reached their storage capacity. We mark volumes as read-only at the granularity of machines for operational ease.

Haystack Cache

  • distributed hash table, uses photo's id to locate cached data
  • receives HTTP requests for photos from CDNs and also directly from users’ browsers.
  • Add a photo to Cache if two conditions are met:
    • The request comes directly from a user and not the CDN and
    • The photo is fetched from a write-enabled Store machine. (which shows that this photo was uploaded recently)

needle

  • A Store machine represents a physical volume as a large file consisting of a superblock followed by a sequence of needles.
  • Each needle represents a photo stored in Haystack.

Questions

why put all metadata in memory?

much faster to look up

results matching ""

    No results matching ""