OK so it's '70000 threads overwhelming off heap mem in the JVM' kind of thing. Or running afoul of ulimits in the OS.
On Fri, Apr 9, 2021 at 11:19 AM Attila Zsolt Piros < piros.attila.zs...@gmail.com> wrote: > Hi Sean! > > So the "coalesce" without shuffle will create a CoalescedRDD which during > its computation delegates to the parent RDD partitions. > As the CoalescedRDD contains only 1 partition so we talk about 1 task and > 1 task context. > > The next stop is PythonRunner. > > Here the python workers at least are reused (when > "spark.python.worker.reuse" is true, and true is the default) but the > MonitorThreads are not reused and what is worse all the MonitorThreads are > created for the same worker and same TaskContext. > This means the CoalescedRDD's 1 tasks should be completed to stop the > first monitor thread, relevant code: > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L570 > > So this will lead to creating 70000 extra threads when 1 would be enough. > > The jira is: https://issues.apache.org/jira/browse/SPARK-35009 > The PR will next week maybe (I am a bit uncertain as I have many other > things to do right now). > > Best Regards, > Attila > >> >>>