Thanks for the suggestions. I removed the "persist" call from program. Doing so I started it with:
spark-submit --class com.xxx.analytics.spark.AnalyticsJob --master yarn /tmp/analytics.jar --input_directory hdfs://ip:8020/flume/events/2015/02/ This takes all the default and only runs 2 executors. This runs with no failures but takes 17 hours. After this I tried to run it with spark-submit --class com.extole.analytics.spark.AnalyticsJob --num-executors 5 --executor-cores 2 --master yarn /tmp/analytics.jar --input_directory hdfs://ip-10-142-198-50.ec2.internal:8020/flume/events/2015/02/ This results in lots of executor failures and restarts and failures. I can't seem to get any kind of parallelism or throughput. The next try will be to set the yarn memory overhead. What other configs should I list to help figure out the sweet spot here. On Sat, Feb 21, 2015 at 12:29 AM, Davies Liu <dav...@databricks.com> wrote: > How many executors you have per machine? It will be helpful if you > could list all the configs. > > Could you also try to run it without persist? Caching do hurt than > help, if you don't have enough memory. > > On Fri, Feb 20, 2015 at 5:18 PM, Lee Bierman <leebier...@gmail.com> wrote: > > Thanks for the suggestions. > > I'm experimenting with different values for spark memoryOverhead and > > explictly giving the executors more memory, but still have not found the > > golden medium to get it to finish in a proper time frame. > > > > Is my cluster massively undersized at 5 boxes, 8gb 2cpu ? > > Trying to figure out a memory setting and executor setting so it runs on > > many containers in parallel. > > > > I'm still struggling as pig jobs and hive jobs on the same whole data set > > don't take as long. I'm wondering too if the logic in our code is just > doing > > something silly causing multiple reads of all the data. > > > > > > On Fri, Feb 20, 2015 at 9:45 AM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> > >> If that's the error you're hitting, the fix is to boost > >> spark.yarn.executor.memoryOverhead, which will put some extra room in > >> between the executor heap sizes and the amount of memory requested for > them > >> from YARN. > >> > >> -Sandy > >> > >> On Fri, Feb 20, 2015 at 9:40 AM, lbierman <leebier...@gmail.com> wrote: > >>> > >>> A bit more context on this issue. From the container logs on the > executor > >>> > >>> Given my cluster specs above what would be appropriate parameters to > pass > >>> into : > >>> --num-executors --num-cores --executor-memory > >>> > >>> I had tried it with --executor-memory 2500MB > >>> > >>> 015-02-20 06:50:09,056 WARN > >>> > >>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > >>> Container > [pid=23320,containerID=container_1423083596644_0238_01_004160] > >>> is > >>> running beyond physical memory limits. Current usage: 2.8 GB of 2.7 GB > >>> physical memory used; 4.4 GB of 5.8 GB virtual memory used. Killing > >>> container. > >>> Dump of the process-tree for container_1423083596644_0238_01_004160 : > >>> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > >>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > >>> |- 23320 23318 23320 23320 (bash) 0 0 108650496 305 /bin/bash > -c > >>> /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError='kill %p' > >>> -Xms2400m > >>> -Xmx2400m > >>> > >>> > -Djava.io.tmpdir=/dfs/yarn/nm/usercache/root/appcache/application_1423083596644_0238/container_1423083596644_0238_01_004160/tmp > >>> > >>> > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160 > >>> org.apache.spark.executor.CoarseGrainedExecutorBackend > >>> > >>> akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal > :42535/user/CoarseGrainedScheduler > >>> 8 ip-10-99-162-56.ec2.internal 1 application_1423083596644_0238 1> > >>> > >>> > /var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160/stdout > >>> 2> > >>> > >>> > /var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160/stderr > >>> |- 23323 23320 23320 23320 (java) 922271 12263 4612222976 > 724218 > >>> /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError=kill %p > >>> -Xms2400m > >>> -Xmx2400m > >>> > >>> > -Djava.io.tmpdir=/dfs/yarn/nm/usercache/root/appcache/application_1423083596644_0238/container_1423083596644_0238_01_004160/tmp > >>> > >>> > -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1423083596644_0238/container_1423083596644_0238_01_004160 > >>> org.apache.spark.executor.CoarseGrainedExecutorBackend > >>> akka.tcp://sparkDriver@ip-10-168-86-13.ec2.internal:42535/user/Coarse > >>> > >>> > >>> > >>> > >>> -- > >>> View this message in context: > >>> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p21739.html > >>> Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>> For additional commands, e-mail: user-h...@spark.apache.org > >>> > >> > > >