Hi everyone,

My team (cc'ed in this e-mail) and I are running a Spark reduceByKey
operation on a cluster of 10 slaves where I don't have the privileges to
set "ulimit -n" to a higher number. I'm running on a cluster where "ulimit
-n" returns 1024 on each machine.

When I attempt to run this job with the data originating from a text file,
stored in an HDFS cluster running on the same nodes as the Spark cluster,
the job crashes with the message, "Too many open files".

My question is, why are so many files being created, and is there a way to
configure the Spark context to avoid spawning that many files? I am already
setting spark.shuffle.consolidateFiles to true.

I want to repeat - I can't change the maximum number of open file
descriptors on the machines. This cluster is not owned by me and the system
administrator is responding quite slowly.

Thanks,

-Matt Cheah

Reply via email to