set ulimit quite high in root mode & that should resolve it. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi>
On Mon, May 26, 2014 at 7:48 PM, Matt Kielo <mki...@oculusinfo.com> wrote: > Hello, > > I currently have a task always failing with > "java.io.FileNotFoundException: [...]/shuffle_0_257_2155 (Too many open > files)" when I run sorting operations such as distinct, sortByKey, or > reduceByKey on a large number of partitions. > > Im working with 365 GB of data which is being split into 5959 partitions. > The cluster Im using has over 1000GB of memory with 20GB of memory per node. > > I have tried adding .set("spark.shuffle.consolidate.files", "true") when > making my spark context but it doesnt seem to make a difference. > > Has anyone else had similar problems? > > Best regards, > > Matt > >