Strategy for large amount of small tasks

Saif.A.Ellafi Wed, 02 Dec 2015 12:04:07 -0800

Hello all,

I am running a 3~4 cluster nodes under yarn. I have a small dataset (500k~) but 
a huge amount of internal tasks, for example loop for different segments of the 
data and run many computations inside each.


It looks like strategies such as disabling serialization, and increasing the 
amount of executors at expenses of number of cores for each one helps a lot. I 
need to consider context switching and data localty,
any general ideas?

Thanks,
Saif

Strategy for large amount of small tasks

Reply via email to