OK so it's '70000 threads overwhelming off heap mem in the JVM' kind of
thing. Or running afoul of ulimits in the OS.

On Fri, Apr 9, 2021 at 11:19 AM Attila Zsolt Piros <
piros.attila.zs...@gmail.com> wrote:

> Hi Sean!
>
> So the "coalesce" without shuffle will create a CoalescedRDD which during
> its computation delegates to the parent RDD partitions.
> As the CoalescedRDD contains only 1 partition so we talk about 1 task and
> 1 task context.
>
> The next stop is PythonRunner.
>
> Here the python workers at least are reused (when
> "spark.python.worker.reuse" is true, and true is the default) but the
> MonitorThreads are not reused and what is worse all the MonitorThreads are
> created for the same worker and same TaskContext.
> This means the CoalescedRDD's 1 tasks should be completed to stop the
> first monitor thread, relevant code:
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L570
>
> So this will lead to creating 70000 extra threads when 1 would be enough.
>
> The jira is: https://issues.apache.org/jira/browse/SPARK-35009
> The PR will next week maybe (I am a bit uncertain as I have many other
> things to do right now).
>
> Best Regards,
> Attila
>
>>
>>>

Reply via email to