Hey Matt, There's some prior work that compares consolidation performance on some medium-scale workload: http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf
There we noticed about 2x performance degradation in the reduce phase on ext3. I am not aware of any other concrete numbers. Maybe others have more experiences to add. -Andrew 2014-11-03 17:26 GMT-08:00 Matt Cheah <mch...@palantir.com>: > Hi everyone, > > I'm running into more and more cases where too many files are opened when > spark.shuffle.consolidateFiles is turned off. > > I was wondering if this is a common scenario among the rest of the > community, and if so, if it is worth considering the setting to be turned > on by default. From the documentation, it seems like the performance could > be hurt on ext3 file systems. However, what are the concrete numbers of > performance degradation that is seen typically? A 2x slowdown in the > average job? 3x? Also, what cause the performance degradation on ext3 file > systems specifically? > > Thanks, > > -Matt Cheah > > >