MapReduce
Class notes
- Developer's To-Do-list:
- input/output files
- M Map tasks
- R reduce tasks
- W machines
- No reduce can begin until map is complete
- Tasks scheduled based on location of data
- if map worker fails any time, task must be restart.
Fault Tolerance
- workers are periodically pinged by master
- master writes periodic checkpoints
- on errors, workers send 'last gasp' UDP packet to master
- Detect records that cause deterministic crashes and skips them.
- input file blocks stored on multiple machines
- When computation almost done, reschedule in-progress tasks
Inside MapRedure
Questions
How does MapReduce handle the failure of a worker?
- Detect failure via heartbeats
- Re-execute completed and in-progress map tasks
- Re-execute in-progress reduce tasks
- Task completion committed through master