Good idea to do multi-threading in spark job?

Ruijing Li Sun, 03 May 2020 09:32:10 -0700

Hi all,

We have a spark job (spark 2.4.4, hadoop 2.7, scala 2.11.12) where we use
semaphores / parallel collections within our spark job. We definitely
notice a huge speedup in our job from doing this, but were wondering if
this could cause any unintended side effects? Particularly I’m worried
about any deadlocks and if it could mess with the fixes for issues such as
this
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-26961


We do run with multiple cores.

Thanks!
-- 
Cheers,
Ruijing Li

Good idea to do multi-threading in spark job?

Reply via email to