Re: long GC pause during file.cache()

2014-06-16 Thread Wei Tan
i Tan, PhD Research Staff Member IBM T. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Wei Tan/Watson/IBM@IBMUS To: user@spark.apache.org, Date: 06/16/2014 10:47 AM Subject: Re: long GC pause during file.cache() Thanks you all for advice including (1) using

Re: long GC pause during file.cache()

2014-06-16 Thread Wei Tan
. J. Watson Research Center http://researcher.ibm.com/person/us-wtan From: Aaron Davidson To: user@spark.apache.org, Date: 06/15/2014 09:06 PM Subject:Re: long GC pause during file.cache() Note also that Java does not work well with very large JVMs due to this exact issue

Re: long GC pause during file.cache()

2014-06-15 Thread Aaron Davidson
Note also that Java does not work well with very large JVMs due to this exact issue. There are two commonly used workarounds: 1) Spawn multiple (smaller) executors on the same machine. This can be done by creating multiple Workers (via SPARK_WORKER_INSTANCES in standalone mode[1]). 2) Use Tachyon

Re: long GC pause during file.cache()

2014-06-15 Thread Nan Zhu
Yes, I think in the spark-env.sh.template, it is listed in the comments (didn’t check….) Best, -- Nan Zhu On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote: > Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0? > > > > On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu (mailto:zhunanm

Re: long GC pause during file.cache()

2014-06-15 Thread Surendranauth Hiraman
Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0? On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu wrote: > SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you > don’t mind the WARNING in the logs > > you can set spark.executor.extraJavaOpts in your SparkConf obj > > Best, > > -- > Nan Zhu > >

Re: long GC pause during file.cache()

2014-06-15 Thread Nan Zhu
SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t mind the WARNING in the logs you can set spark.executor.extraJavaOpts in your SparkConf obj Best, -- Nan Zhu On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote: > Hi, Wei > > You may try to set JVM opts in spark-

Re: long GC pause during file.cache()

2014-06-15 Thread Hao Wang
Hi, Wei You may try to set JVM opts in *spark-env.sh* as follow to prevent or mitigate GC pause: export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -Xmx2g -XX:MaxPermSize=256m" There are more options you could add, please just Google :) Regards, Wang Hao(王灏) CloudTeam | S

long GC pause during file.cache()

2014-06-14 Thread Wei Tan
Hi, I have a single node (192G RAM) stand-alone spark, with memory configuration like this in spark-env.sh SPARK_WORKER_MEMORY=180g SPARK_MEM=180g In spark-shell I have a program like this: val file = sc.textFile("/localpath") //file size is 40G file.cache() val output = file.map(line =>