Powered by GitBook

MapReduce

Class notes

Developer's To-Do-list:
- input/output files
- M Map tasks
- R reduce tasks
- W machines

No reduce can begin until map is complete
Tasks scheduled based on location of data
if map worker fails any time, task must be restart.

Fault Tolerance

workers are periodically pinged by master
master writes periodic checkpoints
on errors, workers send 'last gasp' UDP packet to master
- Detect records that cause deterministic crashes and skips them.
input file blocks stored on multiple machines
When computation almost done, reschedule in-progress tasks
- Avoids "stragglers"

Inside MapRedure

Questions

How does MapReduce handle the failure of a worker?

Detect failure via heartbeats
Re-execute completed and in-progress map tasks
Re-execute in-progress reduce tasks
Task completion committed through master

results matching ""

No results matching ""