MapReduce

Class notes

  • Developer's To-Do-list:
    • input/output files
    • M Map tasks
    • R reduce tasks
    • W machines
  • No reduce can begin until map is complete
  • Tasks scheduled based on location of data
  • if map worker fails any time, task must be restart.

Fault Tolerance

  • workers are periodically pinged by master
  • master writes periodic checkpoints
  • on errors, workers send 'last gasp' UDP packet to master
    • Detect records that cause deterministic crashes and skips them.
  • input file blocks stored on multiple machines
  • When computation almost done, reschedule in-progress tasks
    • Avoids "stragglers"

Inside MapRedure

Questions

How does MapReduce handle the failure of a worker?

  • Detect failure via heartbeats
  • Re-execute completed and in-progress map tasks
  • Re-execute in-progress reduce tasks
  • Task completion committed through master

results matching ""

    No results matching ""