Re: GC overhead limit exceeded

2016-05-16 Thread Takeshi Yamamuro
To understand the issue, you need to describe more about your case; what's the version of spark you use and what's your job? Also, what if you directly use scala interfaces instead of python ones? On Mon, May 16, 2016 at 11:56 PM, Aleksandr Modestov < aleksandrmodes...@gmail.com> wrote: > Hi, > >

Re: GC overhead limit exceeded

2016-05-16 Thread Aleksandr Modestov
Hi, "Why did you though you have enough memory for your task? You checked task statistics in your WebUI?". I mean that I have jnly about 5Gb data but spark.driver memory in 60Gb. I check task statistics in web UI. But really spark says that *"05-16 17:50:06.254 127.0.0.1:54321

Re: GC overhead limit exceeded

2016-05-16 Thread Takeshi Yamamuro
Hi, Why did you though you have enough memory for your task? You checked task statistics in your WebUI? Anyway, If you get stuck with the GC issue, you'd better off increasing the number of partitions. // maropu On Mon, May 16, 2016 at 10:00 PM, AlexModestov wrote: > I get the error in the apa

Re: GC overhead limit exceeded

2014-04-16 Thread Nicholas Chammas
But wait, does Spark know to unpersist() RDDs that are not referenced anywhere? That would’ve taken care of the RDDs that I kept creating and then orphaning as part of my job testing/profiling. Is that what SPARK-1103 is about, btw? (Sorry to keep

Re: GC overhead limit exceeded

2014-04-16 Thread Nicholas Chammas
Never mind. I'll take it from both Andrew and Syed's comments that the answer is yes. Dunno why I thought otherwise. On Wed, Apr 16, 2014 at 5:43 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I’m running into a similar issue as the OP. I’m running the same job over > and over (with

Re: GC overhead limit exceeded

2014-04-16 Thread Nicholas Chammas
I’m running into a similar issue as the OP. I’m running the same job over and over (with minor tweaks) in the same cluster to profile it. It just recently started throwing java.lang.OutOfMemoryError: Java heap space. > Are you caching a lot of RDD's? If so, maybe you should unpersist() the > ones

Re: GC overhead limit exceeded

2014-03-28 Thread Syed A. Hashmi
Default is MEMORY_ONLY ... if you explicitly persist a RDD, you have to explicitly unpersist it if you want to free memory during the job. On Thu, Mar 27, 2014 at 11:17 PM, Sai Prasanna wrote: > Oh sorry, that was a mistake, the default level is MEMORY_ONLY !! > My doubt was, between two differe

Re: GC overhead limit exceeded

2014-03-27 Thread Sai Prasanna
Oh sorry, that was a mistake, the default level is MEMORY_ONLY !! My doubt was, between two different experiments, are the RDDs cached in memory need to be unpersisted??? Or it doesnt matter ?

Re: GC overhead limit exceeded

2014-03-27 Thread Sai Prasanna
I dint mention anything, so by default it should be MEMORY_AND_DISK right? My doubt was, between two different experiments, are the RDDs cached in memory need to be unpersisted??? Or it doesnt matter ? On Fri, Mar 28, 2014 at 1:43 AM, Syed A. Hashmi wrote: > Which storage scheme are you using?

Re: GC overhead limit exceeded

2014-03-27 Thread Syed A. Hashmi
Which storage scheme are you using? I am guessing it is MEMORY_ONLY. In large datasets, MEMORY_AND_DISK or MEMORY_AND_DISK_SER work better. You can call unpersist on an RDD to remove it from Cache though. On Thu, Mar 27, 2014 at 11:57 AM, Sai Prasanna wrote: > No i am running on 0.8.1. > Yes i

Re: GC overhead limit exceeded

2014-03-27 Thread Sai Prasanna
No i am running on 0.8.1. Yes i am caching a lot, i am benchmarking a simple code in spark where in 512mb, 1g and 2g text files are taken, some basic intermediate operations are done while the intermediate result which will be used in subsequent operations are cached. I thought that, we need not m

Re: GC overhead limit exceeded

2014-03-27 Thread Andrew Or
Are you caching a lot of RDD's? If so, maybe you should unpersist() the ones that you're not using. Also, if you're on 0.9, make sure spark.shuffle.spill is enabled (which it is by default). This allows your application to spill in-memory content to disk if necessary. How much memory are you givin

Re: GC overhead limit exceeded

2014-03-27 Thread Ognen Duzlevski
Look at the tuning guide on Spark's webpage for strategies to cope with this. I have run into quite a few memory issues like these, some are resolved by changing the StorageLevel strategy and employing things like Kryo, some are solved by specifying the number of tasks to break down a given ope

Re: GC overhead limit exceeded

2014-03-27 Thread Sean Owen
This is another way of Java saying "you ran out of heap space". As less and less room is available, the GC kicks in more often, freeing less each time. Before the very last byte of memory is gone, Java may declare defeat. That's why it's taking so long, and you simply need a larger heap in whatever

Re: GC overhead limit exceeded in Spark-interactive shell

2014-03-24 Thread Sai Prasanna
Thanks Aaron !! On Mon, Mar 24, 2014 at 10:58 PM, Aaron Davidson wrote: > 1. Note sure on this, I don't believe we change the defaults from Java. > > 2. SPARK_JAVA_OPTS can be used to set the various Java properties (other > than memory heap size itself) > > 3. If you want to have 8 GB executor

Re: GC overhead limit exceeded in Spark-interactive shell

2014-03-24 Thread Aaron Davidson
1. Note sure on this, I don't believe we change the defaults from Java. 2. SPARK_JAVA_OPTS can be used to set the various Java properties (other than memory heap size itself) 3. If you want to have 8 GB executors then, yes, only two can run on each 16 GB node. (In fact, you should also keep a sig

Re: GC overhead limit exceeded in Spark-interactive shell

2014-03-24 Thread Sai Prasanna
Thanks Aaron and Sean... Setting SPARK_MEM finally worked. But i have a small doubt. 1)What is the default value that is allocated for JVM and for HEAP_SPACE for Garbage collector. 2)Usually we set 1/3 of total memory for heap. So what should be the practice for Spark processes. Where & how shoul

Re: GC overhead limit exceeded in Spark-interactive shell

2014-03-24 Thread Sean Owen
PS you have a typo in "DEAMON" - its DAEMON. Thanks Latin. On Mar 24, 2014 7:25 AM, "Sai Prasanna" wrote: > Hi All !! I am getting the following error in interactive spark-shell > [0.8.1] > > > *org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed more > than 0 times; aborting job jav

Re: GC overhead limit exceeded in Spark-interactive shell

2014-03-24 Thread Aaron Davidson
To be clear on what your configuration will do: - SPARK_DAEMON_MEMORY=8g will make your standalone master and worker schedulers have a lot of memory. These do not impact the actual amount of useful memory given to executors or your driver, however, so you probably don't need to set this. - SPARK_W