https://issues.apache.org/jira/browse/MAPREDUCE-7282
"MR v2 commit algorithm is dangerous, should be deprecated and not the
default"
someone do a PR to change the default & if it doesn't break too much I'l
merge it
On Mon, 29 Jun 2020 at 13:20, Steve Loughran wrote:
> v2 does a file-by-file
v2 does a file-by-file copy to the dest dir in task commit; v1 promotes
task attempts to job attempt dir by dir rename, job commit lists those and
moves the contents
if the worker fails during task commit -the next task attempt has to
replace every file -so it had better use the same filenames.
T
I was trying to make my email short and concise, but the rationale behind
setting that as 1 by default is because it's safer. With algorithm version
2 you run the risk of having bad data in cases where tasks fail or even
duplicate data if a task fails and succeeds on a reattempt (I don't know if
th
I think is a Hadoop property that is just passed through? if the
default is different in Hadoop 3 we could mention that in the docs. i
don't know if we want to always set it to 1 as a Spark default, even
in Hadoop 3 right?
On Thu, Jun 25, 2020 at 2:43 PM Waleed Fateem wrote:
>
> Hello!
>
> I noti