Hi Nathan, There's some explanation in the spark configuration section:
``` If set to "true", consolidates intermediate files created during a shuffle. Creating fewer files can improve filesystem performance for shuffles with large numbers of reduce tasks. It is recommended to set this to "true" when using ext4 or xfs filesystems. On ext3, this option might degrade performance on machines with many (>8) cores due to filesystem limitations. ``` 2014-05-23 16:00 GMT+02:00 Nathan Kronenfeld <nkronenf...@oculusinfo.com>: > In trying to sort some largish datasets, we came across the > spark.shuffle.consolidateFiles property, and I found in the source code > that it is set, by default, to false, with a note to default it to true > when the feature is stable. > > Does anyone know what is unstable about this? If we set it true, what > problems should we anticipate? > > Thanks, > -Nathan Kronenfeld > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com > -- *JU Han* Data Engineer @ Botify.com +33 0619608888