Hi,
I have a 2 node yarn cluster and I am using spark 1.1.0 to submit my tasks.
As per the documentation of spark, number of cores are maximum cores available.
So does it mean each node creates no of cores = no of threads to process the
job assigned to that node.
For ex,
List<Integer> data = new ArrayList<Integer>();
for(int i=0;i<1000;i++)
data.add(i);
JavaRDD<Integer> distData = sc.parallelize(data);
distData=distData.map(
new Function<Integer, Integer>() {
@Override
public Integer call(Integer arg0) throws
Exception {
return arg0*arg0;
}
}
);
distData.count();
The above program dividing my RDD into 2 batches of 500 size, and submitting to
the executors.
1) So each executor will use all the cores of the node CPU to process 500 size
batch am I right?
2) If so, Does it mean each executor uses multi threading? Is that execution
parallel or sequential on node.
3) How to check how many cores an executor is using to process my jobs?
4) Do we have any chance to control the batch division on nodes?
Please give some clarity on above.
Thanks & Regards,
Naveen