Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-23 Thread Shivani Rao
Hello Eugene, Thanks for your patience and answers. The issue was that one of the third party libraries was not build with "sbt assembly" but just packaged as "sbt package". So it did not contain all the source dependencies. Thanks for all your help Shivani On Fri, Jun 20, 2014 at 1:46 PM, Eug

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Eugen Cepoi
In short, ADD_JARS will add the jar to your driver classpath and also send it to the workers (similar to what you are doing when you do sc.addJars). ex: MASTER=master/url ADD_JARS=/path/to/myJob.jar ./bin/spark-shell You also have SPARK_CLASSPATH var but it does not distribute the code, it is on

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Shivani Rao
Hello Eugene, You are right about this. I did encounter the "pergmgenspace" in the spark shell. Can you tell me a little more about "ADD_JARS". In order to ensure my spark_shell has all required jars, I added the jars to the "$CLASSPATH" in the compute_classpath.sh script. is there another way of

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Eugen Cepoi
In my case it was due to a case class I was defining in the spark-shell and not being available on the workers. So packaging it in a jar and adding it with ADD_JARS solved the problem. Note that I don't exactly remember if it was an out of heap space exception or pergmen space. Make sure your jarsP

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Shivani Rao
Hello Abhi, I did try that and it did not work And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So how did you overcome this problem? Shivani On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi wrote: > > Le 20 juin 2014 01:46, "Shivani Rao" a écrit : > > > > > Hello Andrew, >

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Eugen Cepoi
Le 20 juin 2014 01:46, "Shivani Rao" a écrit : > > Hello Andrew, > > i wish I could share the code, but for proprietary reasons I can't. But I can give some idea though of what i am trying to do. The job reads a file and for each line of that file and processors these lines. I am not doing anythin

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-19 Thread abhiguruvayya
Once you have generated the final RDD before submitting it to reducer try to repartition the RDD either using coalesce(partitions) or repartition() into known partitions. 2. Rule of thumb to create number of data partitions (3 * num_executors * cores_per_executor). -- View this message in conte

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-19 Thread Shivani Rao
Hello Andrew, i wish I could share the code, but for proprietary reasons I can't. But I can give some idea though of what i am trying to do. The job reads a file and for each line of that file and processors these lines. I am not doing anything intense in the "processLogs" function import argonau

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-18 Thread Andrew Ash
Wait, so the file only has four lines and the job running out of heap space? Can you share the code you're running that does the processing? I'd guess that you're doing some intense processing on every line but just writing parsed case classes back to disk sounds very lightweight. I On Wed, Ju