A spark job creates 200 partitions, and executors try to deserialize
the task at the same time. That creates a chain of blocking situations, as
all executors are deserializing the same task and loadClass does a lock per
class name. I often observe that many threads are making that chain from
the th
I don't know if java serialization is slow in that case; that shows
blocking on a class load, which may or may not be directly due to
deserialization.
Indeed I don't think (some) things are serialized in local mode within one
JVM, so not sure that's actually what's going on.
On Thu, Sep 2, 2021 at
Hi Kohki,
Serialization of tasks happens in local mode too and as far as I am
aware there is no way to disable this (although it would definitely be
useful in my opinion).
You can see the local mode as a testing mode, in which you would want to
catch any serialization errors, before they appear i
I'm seeing many threads doing deserialization of a task, I understand since
lambda is involved, we can't use Kryo for those purposes. However I'm
running it in local mode, this serialization is not really necessary, no?
Is there any trick I can apply to get rid of this thread contention ? I'm
seei