Hi,

Recently, we ran into this notorious exception while doing large
shuffle in mesos at Netflix. We ensure that `ulimit -n` is a very
large number, but still have the issue.

It turns out that mesos overrides the `ulimit -n` to a small number
causing the problem. It's very non-trivial to debug (as logging in on
the slave gives the right ulimit - it's only in the mesos context that
it gets overridden).

Here is the code you can run in Spark shell to get the actual allowed
# of open files for Spark.

import sys.process._
val p = 1 to 100
val rdd = sc.parallelize(p, 100)
val openFiles = rdd.map(x=> Seq("sh", "-c", "ulimit
-n").!!.toDouble.toLong).collect

Hope this can help someone in the same situation.

Sincerely,

DB Tsai
----------------------------------------------------------
Blog: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to