Thanks Marcelo!
The reason I was asking that question is that I was expecting my spark job
to be a "map only" job. In other words, it should finish after the
mapPartitions run for all partitions. This is because the job is only
mapPartitions() plus count() where mapPartitions only yield one integ
I am running a spark job with only two operations: mapPartition and then
collect(). The output data size of mapPartition is very small. One integer
per partition. I saw there is a stage 2 for this job that runs this java
program. I am not a java programmer. Could anyone please let me know what
this