Tasks are never dependent on each other.  Stages are dependent on each other. 
The Spark task manager will make sure that it plans the tasks so that they can 
run indepdendently.

Out of the 80K tasks, how many are complete when you have 7 remaining? Is it 
80k - 7 ? It could be that you have data skew and 7 of the partitions are much 
larger than the other 79,9993 partitions. Spark completes the 799993 tasks 
while those 7 are running. I would check the size of the partitions. If the 7 
are much larger, I would try to use salting to rebalance the partitions.

On 9/10/21, 10:22 AM, "Joris Billen" <joris.bil...@bigindustries.be> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    Dear community,

    I have a job that runs quite well for most stages: resource are consumed 
quite optimal (not much memoy/vcoresleft idle). My cluster is managed and works 
well.
    I end up with 27 executors and have 2 cores for each, so can run 54 tasks. 
For many stages I see I have a high number of tasks running in parallel. For 
the longest stage (80k tasks) however I see in the beginning for first 20k 
tasks that I still have many tasks in parallel so it advances fastly), but 
after a while   only 7 tasks run concurrently, all on the same 3 executors 
residing on the same node. So that means that my other nodes (8 of them and 
over 45 healthy executors) are idle for over 3 hours.
    I notice in the logs that all tasks are run at "NODE_LOCAL"

    I wonder what is causing this and if I can do something to make the idle 
executors also do work. 2 options:
    1)It is just the way it is: at some point in this stage, there are 
dependencies of the further tasks. So the task manager can not submit more 
tasks at once?
    I thought that especially the stages have dependencies on each other-thats 
why often (not always) they have to wait on the previous one for the next to 
start. But I cant find anywhere if also the tasks within a stage can be 
dependent on each other? I thought the number of tasks was the number of 
partitions and that these by definition could be executed in parallel. I guess 
probably they are dependent, as when I look at the DAG of this stage, it is 
very complex.
    2)I am playing with spark.local.wait to see if this can help. Maybe somehow 
he likes to run everything as close to the data as possible . HEnce NODE_LOCAL 
for all tasks. Maybe if I decrease the time spark.local.wait from default 3s to 
lower (1s), then he will also start shuffling more data and give more tasks to 
the idle executors.



    Anyone any idea?
    THanks!
    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to