Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-09 Thread Tóth Zoltán
Thanks Zoltan. So far I got to a full repro which works both in docker and on a bigger real-world cluster. Also, the whole thing only happens in `cluster` mode. I issued a ticket for it. Any thoughts? https://issues.apache.org/jira/browse/SPARK-10487 On Mon, Sep 7, 2015 at 7:59 PM, Zsolt Tóth

Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-07 Thread Zsolt Tóth
Hi, I ran your example on Spark-1.4.1 and 1.5.0-rc3. It succeeds on 1.4.1 but throws the OOM on 1.5.0. Do any of you know which PR introduced this issue? Zsolt 2015-09-07 16:33 GMT+02:00 Zoltán Zvara : > Hey, I'd try to debug, profile ResolvedDataSource. As far as I know, your > write will b

Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-07 Thread Zoltán Zvara
Hey, I'd try to debug, profile ResolvedDataSource. As far as I know, your write will be performed by the JVM. On Mon, Sep 7, 2015 at 4:11 PM Tóth Zoltán wrote: > Unfortunately I'm getting the same error: > The other interesting things are that: > - the parquet files got actually written to HDFS

Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-07 Thread Tóth Zoltán
Unfortunately I'm getting the same error: The other interesting things are that: - the parquet files got actually written to HDFS (also with .write.parquet() ) - the application gets stuck in the RUNNING state for good even after the error is thrown 15/09/07 10:01:10 INFO spark.ContextCleaner: C

Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-07 Thread boci
Hi, Can you try to using save method instead of write? ex: out_df.save("path","parquet") b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.com On Mon, Sep 7, 2015 at 3:

Re: OutOfMemory error with Spark ML 1.5 logreg example

2015-09-07 Thread Zoltán Tóth
Aaand, the error! :) Exception in thread "org.apache.hadoop.hdfs.PeerCache@4e000abf" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "org.apache.hadoop.hdfs.PeerCache@4e000abf" Exception in thread "Thread-7" Exception: java.lang.OutOfMemoryError thrown from

Re: OutOfMemory error in Spark Core

2015-01-15 Thread Akhil Das
Did you try increasing the parallelism? Thanks Best Regards On Fri, Jan 16, 2015 at 10:41 AM, Anand Mohan wrote: > We have our Analytics App built on Spark 1.1 Core, Parquet, Avro and Spray. > We are using Kryo serializer for the Avro objects read from Parquet and we > are using our custom Kryo

RE: OutOfMemory Error

2014-08-20 Thread Shao, Saisai
/configuration.html Thanks Jerry From: MEETHU MATHEW [mailto:meethu2...@yahoo.co.in] Sent: Wednesday, August 20, 2014 4:48 PM To: Akhil Das; Ghousia Cc: user@spark.apache.org Subject: Re: OutOfMemory Error Hi , How to increase the heap size? What is the difference between spark executor memory and heap

Re: OutOfMemory Error

2014-08-20 Thread MEETHU MATHEW
 Hi , How to increase the heap size? What is the difference between spark executor memory and heap size? Thanks & Regards, Meethu M On Monday, 18 August 2014 12:35 PM, Akhil Das wrote: I believe spark.shuffle.memoryFraction is the one you are looking for. spark.shuffle.memoryFraction

Re: OutOfMemory Error

2014-08-19 Thread Ghousia
Hi, Any further info on this?? Do you think it would be useful if we have a in memory buffer implemented that stores the content of the new RDD. In case the buffer reaches a configured threshold, content of the buffer are spilled to the local disk. This saves us from OutOfMememory Error. Appreci

Re: OutOfMemory Error

2014-08-18 Thread Ghousia
But this would be applicable only to operations that have a shuffle phase. This might not be applicable to a simple Map operation where a record is mapped to a new huge value, resulting in OutOfMemory Error. On Mon, Aug 18, 2014 at 12:34 PM, Akhil Das wrote: > I believe spark.shuffle.memoryFr

Re: OutOfMemory Error

2014-08-18 Thread Akhil Das
I believe spark.shuffle.memoryFraction is the one you are looking for. spark.shuffle.memoryFraction : Fraction of Java heap to use for aggregation and cogroups during shuffles, if spark.shuffle.spill is true. At any given time, the collective size of all in-memory maps used for shuffles is bounded

Re: OutOfMemory Error

2014-08-17 Thread Ghousia
Thanks for the answer Akhil. We are right now getting rid of this issue by increasing the number of partitions. And we are persisting RDDs to DISK_ONLY. But the issue is with heavy computations within an RDD. It would be better if we have the option of spilling the intermediate transformation resul

Re: OutOfMemory Error

2014-08-17 Thread Akhil Das
Hi Ghousia, You can try the following: 1. Increase the heap size 2. Increase the number of partitions 3. You could try persi