Hey Matt,

There's some prior work that compares consolidation performance on some
medium-scale workload:
http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf

There we noticed about 2x performance degradation in the reduce phase on
ext3. I am not aware of any other concrete numbers. Maybe others have more
experiences to add.

-Andrew

2014-11-03 17:26 GMT-08:00 Matt Cheah <mch...@palantir.com>:

> Hi everyone,
>
> I'm running into more and more cases where too many files are opened when
> spark.shuffle.consolidateFiles is turned off.
>
> I was wondering if this is a common scenario among the rest of the
> community, and if so, if it is worth considering the setting to be turned
> on by default. From the documentation, it seems like the performance could
> be hurt on ext3 file systems. However, what are the concrete numbers of
> performance degradation that is seen typically? A 2x slowdown in the
> average job? 3x? Also, what cause the performance degradation on ext3 file
> systems specifically?
>
> Thanks,
>
> -Matt Cheah
>
>
>

Reply via email to