Hello all,

Occasionally when running jobs, Hadoop fails to clean up the "_temporary" directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a known issue in hadoop? Is the data from that task guaranteed to be duplicate data of what was outputted by another task? Is it safe to just delete this directory without worrying about losing data?

Thanks,
Nathan Marz
Rapleaf

Reply via email to