Hi, Recently, we ran into this notorious exception while doing large shuffle in mesos at Netflix. We ensure that `ulimit -n` is a very large number, but still have the issue.
It turns out that mesos overrides the `ulimit -n` to a small number causing the problem. It's very non-trivial to debug (as logging in on the slave gives the right ulimit - it's only in the mesos context that it gets overridden). Here is the code you can run in Spark shell to get the actual allowed # of open files for Spark. import sys.process._ val p = 1 to 100 val rdd = sc.parallelize(p, 100) val openFiles = rdd.map(x=> Seq("sh", "-c", "ulimit -n").!!.toDouble.toLong).collect Hope this can help someone in the same situation. Sincerely, DB Tsai ---------------------------------------------------------- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org