Thanks for the suggestions.

I removed the  "persist" call from program. Doing so I started it with:

spark-submit --class com.xxx.analytics.spark.AnalyticsJob --master yarn
/tmp/analytics.jar --input_directory hdfs://ip:8020/flume/events/2015/02/


This takes all the default and only runs 2 executors. This runs with no
failures but takes 17 hours.


After this I tried to run it with

spark-submit --class com.extole.analytics.spark.AnalyticsJob
--num-executors 5 --executor-cores 2 --master yarn /tmp/analytics.jar
--input_directory
hdfs://ip-10-142-198-50.ec2.internal:8020/flume/events/2015/02/

This results in lots of executor failures and restarts and failures. I
can't seem to get any kind of parallelism or throughput. The next try will
be to set the yarn memory overhead.


What other configs should I list to help figure out the sweet spot here.




On Sat, Feb 21, 2015 at 12:29 AM, Davies Liu <dav...@databricks.com> wrote:

> How many executors you have per machine? It will be helpful if you
> could list all the configs.
>
> Could you also try to run it without persist? Caching do hurt than
> help, if you don't have enough memory.
>
> On Fri, Feb 20, 2015 at 5:18 PM, Lee Bierman <leebier...@gmail.com> wrote:
> > Thanks for the suggestions.
> > I'm experimenting with different values for spark memoryOverhead and
> > explictly giving the executors more memory, but still have not found the
> > golden medium to get it to finish in a proper time frame.
> >
> > Is my cluster massively undersized at 5 boxes, 8gb 2cpu ?
> > Trying to figure out a memory setting and executor setting so it runs on
> > many containers in parallel.
> >
> > I'm still struggling as pig jobs and hive jobs on the same whole data set
> > don't take as long. I'm wondering too if the logic in our code is just
> doing
> > something silly causing multiple reads of all the data.
> >
> >
> > On Fri, Feb 20, 2015 at 9:45 AM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
> >>
> >> If that's the error you're hitting, the fix is to boost
> >> spark.yarn.executor.memoryOverhead, which will put some extra room in
> >> between the executor heap sizes and the amount of memory requested for
> them
> >> from YARN.
> >>
> >> -Sandy
> >>
> >> On Fri, Feb 20, 2015 at 9:40 AM, lbierman <leebier...@gmail.com> wrote:
> >>>
> >>> A bit more context on this issue. From the container logs on the
> executor
> >>>
> >>> Given my cluster specs above what would be appropriate parameters to
> pass
> >>> into :
> >>> --num-executors --num-cores --executor-memory
> >>>
> >>> I had tried it with --executor-memory 2500MB
> >>>
> >>> 015-02-20 06:50:09,056 WARN
> >>>
> >>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> >>> Container
> [pid=23320,containerID=container_1423083596644_0238_01_004160]
> >>> is
> >>> running beyond physical memory limits. Current usage: 2.8 GB of 2.7 GB
> >>> physical memory used; 4.4 GB of 5.8 GB virtual memory used. Killing
> >>> container.
> >>> Dump of the process-tree for container_1423083596644_0238_01_004160 :
> >>>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> >>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> >>>         |- 23320 23318 23320 23320 (bash) 0 0 108650496 305 /bin/bash
> -c
> >>> /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError='kill %p'
> >>> -Xms2400m
> >>> -Xmx2400m
> >>>
> >>>
> -Djava.io.tmpdir=/dfs/yarn/nm/usercache/root/appcache/application_1423083596644_0238/container_1423083596644_0238_01_004160/tmp
> >>>
> >>>
> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160
> >>> org.apache.spark.executor.CoarseGrainedExecutorBackend
> >>>
> >>> akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal
> :42535/user/CoarseGrainedScheduler
> >>> 8 ip-10-99-162-56.ec2.internal 1 application_1423083596644_0238 1>
> >>>
> >>>
> /var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160/stdout
> >>> 2>
> >>>
> >>>
> /var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160/stderr
> >>>         |- 23323 23320 23320 23320 (java) 922271 12263 4612222976
> 724218
> >>> /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError=kill %p
> >>> -Xms2400m
> >>> -Xmx2400m
> >>>
> >>>
> -Djava.io.tmpdir=/dfs/yarn/nm/usercache/root/appcache/application_1423083596644_0238/container_1423083596644_0238_01_004160/tmp
> >>>
> >>>
> -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160
> >>> org.apache.spark.executor.CoarseGrainedExecutorBackend
> >>> akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html
> >>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >>> For additional commands, e-mail: user-h...@spark.apache.org
> >>>
> >>
> >
>

Reply via email to