Additional information: The batch duration in my app is 1 minute, from Spark UI, for each batch, the difference between Output Op Duration and Job Duration is big. E.g. Output Op Duration is 1min while Job Duration is 19s.
2016-07-14 10:49 GMT-07:00 Renxia Wang <renxia.w...@gmail.com>: > Hi all, > > I am running a Spark Streaming application with Kinesis on EMR 4.7.1. The > application runs on YARN and use client mode. There are 17 worker nodes > (c3.8xlarge) with 100 executors and 100 receivers. This setting works fine. > > But when I increase the number of worker nodes to 50, and increase the > number of executors to 250, with the 250 receivers, the processing time of > batches increase from ~50s to 2.3min, and scheduler delay for tasks > increase from ~0.2s max to 20s max (while 75th percentile is about 2-3s). > > I tried to only increase the number executors but keep the number of > receivers, but then I still see performance degrade from ~50s to 1.1min, > and for tasks the scheduler delay increased from ~0.2s max to 4s max (while > 75th percentile is about 1s). > > The spark-submit is as follow. The only parameter I changed here is the > num-executors. > > spark-submit > --deploy-mode client > --verbose > --master yarn > --jars /usr/lib/spark/extras/lib/spark-streaming-kinesis-asl.jar > --driver-memory 20g --driver-cores 20 > --num-executors 250 > --executor-cores 5 > --executor-memory 8g > --conf spark.yarn.executor.memoryOverhead=1600 > --conf spark.driver.maxResultSize=0 > --conf spark.dynamicAllocation.enabled=false > --conf spark.rdd.compress=true > --conf spark.streaming.stopGracefullyOnShutdown=true > --conf spark.streaming.backpressure.enabled=true > --conf spark.speculation=true > --conf spark.task.maxFailures=15 > --conf spark.ui.retainedJobs=100 > --conf spark.ui.retainedStages=100 > --conf spark.executor.logs.rolling.maxRetainedFiles=1 > --conf spark.executor.logs.rolling.strategy=time > --conf spark.executor.logs.rolling.time.interval=hourly > --conf spark.scheduler.mode=FAIR > --conf spark.scheduler.allocation.file=/home/hadoop/fairscheduler.xml > --conf spark.metrics.conf=/home/hadoop/spark-metrics.properties > --class Main /home/hadoop/Main-1.0.jar > > I found this issue seems relevant: > https://issues.apache.org/jira/browse/SPARK-14327 > > Any suggestion for me to troubleshoot this issue? > > Thanks, > > Renxia > >