By default, with P partitions (for both the pre-shuffle stage and post-shuffle), there are P^2 files created. With spark.shuffle.consolidateFiles turned on, we would instead create only P files. Disk space consumption is largely unaffected, however. by the number of partitions unless each partition is particularly small.
You might look at the actual executors' logs, as it's possible that this error was caused by an earlier exception, such as "too many open files". On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski < og...@plainvanillagames.com> wrote: > On 3/23/14, 5:49 PM, Matei Zaharia wrote: > > You can set spark.local.dir to put this data somewhere other than /tmp if > /tmp is full. Actually it's recommended to have multiple local disks and > set to to a comma-separated list of directories, one per disk. > > Matei, does the number of tasks/partitions in a transformation influence > something in terms of disk space consumption? Or inode consumption? > > Thanks, > Ognen >