Hi all, I am running a Spark Streaming application with Kinesis on EMR 4.7.1. The application runs on YARN and use client mode. There are 17 worker nodes (c3.8xlarge) with 100 executors and 100 receivers. This setting works fine.
But when I increase the number of worker nodes to 50, and increase the number of executors to 250, with the 250 receivers, the processing time of batches increase from ~50s to 2.3min, and scheduler delay for tasks increase from ~0.2s max to 20s max (while 75th percentile is about 2-3s). I tried to only increase the number executors but keep the number of receivers, but then I still see performance degrade from ~50s to 1.1min, and for tasks the scheduler delay increased from ~0.2s max to 4s max (while 75th percentile is about 1s). The spark-submit is as follow. The only parameter I changed here is the num-executors. spark-submit --deploy-mode client --verbose --master yarn --jars /usr/lib/spark/extras/lib/spark-streaming-kinesis-asl.jar --driver-memory 20g --driver-cores 20 --num-executors 250 --executor-cores 5 --executor-memory 8g --conf spark.yarn.executor.memoryOverhead=1600 --conf spark.driver.maxResultSize=0 --conf spark.dynamicAllocation.enabled=false --conf spark.rdd.compress=true --conf spark.streaming.stopGracefullyOnShutdown=true --conf spark.streaming.backpressure.enabled=true --conf spark.speculation=true --conf spark.task.maxFailures=15 --conf spark.ui.retainedJobs=100 --conf spark.ui.retainedStages=100 --conf spark.executor.logs.rolling.maxRetainedFiles=1 --conf spark.executor.logs.rolling.strategy=time --conf spark.executor.logs.rolling.time.interval=hourly --conf spark.scheduler.mode=FAIR --conf spark.scheduler.allocation.file=/home/hadoop/fairscheduler.xml --conf spark.metrics.conf=/home/hadoop/spark-metrics.properties --class Main /home/hadoop/Main-1.0.jar I found this issue seems relevant: https://issues.apache.org/jira/browse/SPARK-14327 Any suggestion for me to troubleshoot this issue? Thanks, Renxia