Re: Performance issue when running Spark-1.6.1 in yarn-client mode with Hadoop 2.6.0

Jörn Franke Tue, 06 Jun 2017 22:46:48 -0700

What does your Spark job do? Have you tried standard configurations and 
changing them gradually?


Have you checked the logfiles/ui which tasks  take long?

17 Mio records does not sound much, but it depends what you do with it. 

I do not think that for such a small "cluster" it makes sense to have a special 
scheduling configuration.

> On 6. Jun 2017, at 18:02, satishjohn <satish.johnbo...@gmail.com> wrote:
> 
> Performance issue / time taken to complete spark job in yarn is 4 x slower,
> when considered spark standalone mode. However, in spark standalone mode
> jobs often fails with executor lost issue.
> 
> Hardware configuration
> 
> 
> 32GB RAM 8 Cores (16) and 1 TB HDD  3 (1 Master and 2 Workers)
> 
> Spark configuration:
> 
> 
> spark.executor.memory 7g
> Spark cores Max 96
> Spark driver 5GB
> spark.sql.autoBroadcastJoinThreshold::-1 (Without this key the job fails or
> job takes 50x times more time)
> spark.driver.maxResultSize::2g
> spark.driver.memory::5g
> No of Instances 4 per machine.
> 
> With the above spark configuration the spark job for the business flow of 17
> million records completes in 8 Minutes.
> 
> Problem Area:
> 
> 
> When run in yarn client mode with the below configuration which takes 33 to
> 42 minutes to run the same flow. Below is the yarn-site.xml configuration
> data.
> 
> <configuration>
>  <property><name>yarn.label.enabled</name><value>true</value></property>
> 
> <property><name>yarn.log-aggregation.enable-local-cleanup</name><value>false</value></property>
> 
> <property><name>yarn.resourcemanager.scheduler.client.thread-count</name><value>64</value></property>
> 
> <property><name>yarn.resourcemanager.resource-tracker.address</name><value>satish-NS1:8031</value></property>
> 
> <property><name>yarn.resourcemanager.scheduler.address</name><value>satish-NS1:8030</value></property>
> 
> <property><name>yarn.dispatcher.exit-on-error</name><value>true</value></property>
> 
> <property><name>yarn.nodemanager.container-manager.thread-count</name><value>64</value></property>
> 
> <property><name>yarn.nodemanager.local-dirs</name><value>/home/satish/yarn</value></property>
> 
> <property><name>yarn.nodemanager.localizer.fetch.thread-count</name><value>20</value></property>
> 
> <property><name>yarn.resourcemanager.address</name><value>satish-NS1:8032</value></property>
> 
> <property><name>yarn.scheduler.increment-allocation-mb</name><value>512</value></property>
> 
> <property><name>yarn.log.server.url</name><value>http://satish-NS1:19888/jobhistory/logs</value></property>
> 
> <property><name>yarn.nodemanager.resource.memory-mb</name><value>28000</value></property>
> 
> <property><name>yarn.nodemanager.labels</name><value>MASTER</value></property>
> 
> <property><name>yarn.nodemanager.resource.cpu-vcores</name><value>48</value></property>
> 
> <property><name>yarn.scheduler.minimum-allocation-mb</name><value>1024</value></property>
> 
> <property><name>yarn.log-aggregation-enable</name><value>true</value></property>
> 
> <property><name>yarn.nodemanager.localizer.client.thread-count</name><value>20</value></property>
> 
> <property><name>yarn.app.mapreduce.am.labels</name><value>CORE</value></property>
> 
> <property><name>yarn.log-aggregation.retain-seconds</name><value>172800</value></property>
> 
> <property><name>yarn.nodemanager.address</name><value>${yarn.nodemanager.hostname}:8041</value></property>
> 
> <property><name>yarn.resourcemanager.hostname</name><value>satish-NS1</value></property>
> 
> <property><name>yarn.scheduler.maximum-allocation-mb</name><value>8192</value></property>
> 
> <property><name>yarn.nodemanager.remote-app-log-dir</name><value>/home/satish/satish/hadoop-yarn/apps</value></property>
> 
> <property><name>yarn.resourcemanager.resource-tracker.client.thread-count</name><value>64</value></property>
> 
> <property><name>yarn.scheduler.maximum-allocation-vcores</name><value>1</value></property>
> 
> <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle,</value></property>
> 
> <property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property>
> 
> <property><name>yarn.resourcemanager.client.thread-count</name><value>64</value></property>
> 
> <property><name>yarn.nodemanager.container-metrics.enable</name><value>true</value></property>
> 
> <property><name>yarn.nodemanager.log-dirs</name><value>/home/satish/hadoop-yarn/containers</value></property>
>  <property> <name>yarn.nodemanager.aux-services</name>
> <value>spark_shuffle,mapreduce_shuffle</value></property>    
> <property>
> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>     
> <value>org.apache.hadoop.mapred.ShuffleHandler</value>    </property>
>  <property><name>yarn.nodemanager.aux-services.spark_shuffle.class</name>   
> <value>org.apache.spark.network.yarn.YarnShuffleService</value></property>
> 
> <property><name>yarn.scheduler.minimum-allocation-vcores</name><value>1</value></property>
>  <property><name>yarn.scheduler.increment-allocation-vcores</name>       
> <value>1</value>    </property>
> <property> <name>yarn.resourcemanager.scheduler.class</name>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value></property>
> <property><name>yarn.scheduler.fair.preemption</name><value>true</value></property>
> 
> </configuration>
> 
> Also in capacity scheduler I am using Dominant resource calculator. I have
> tried hands on other fair and default as well.
> 
> In order make the test simple, I ran sort on the same cluster with
> yarn-client mode and spark standalone mode. I can share the data for your
> comparative test analysis as well.
> 
> 136 seconds - Yarn-client mode
> 40 seconds  - Spark Standalone mode
> 
> To conclude I am looking for a reason and solution for yarn-client mode
> performance issue best configuration possible to achieve performance from
> yarn. 
> 
> When I use spark.sql.autoBroadcastJoinThreshold::-1 the jobs that takes long
> completes in time and also does not fail often when compared to without as I
> have had history of issues when running job in spark without this option
> enabled. 
> 
> Let me know how to get similar performance from yarn-client or spark
> standalone.
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Performance-issue-when-running-Spark-1-6-1-in-yarn-client-mode-with-Hadoop-2-6-0-tp28747.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Performance issue when running Spark-1.6.1 in yarn-client mode with Hadoop 2.6.0

Reply via email to