Hi, I have set the partitions as 6000, and requested 100 nodes, with 32 cores each node, and the number of executors is 32 per node
spark-submit --master $SPARKURL --executor-cores 32 --driver-memory 20G --executor-memory 80G single-file-test.py And I'm reading a 2.2 TB, the code, just has simple two steps, rdd=sc.read rdd.count Then I checked the log file, and history server, it shows that the count stage has a really large tasks launching range, e.g., 16/03/19 22:40:17 16/03/19 22:30:56 which is about 10 minutes, Has anyone experienced this before? Could you please let me know the reason and internal of Spark relating to this issue, and how to resolve it? Thanks much. Best, Jialin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org