Wait, so the file only has four lines and the job running out of heap space? Can you share the code you're running that does the processing? I'd guess that you're doing some intense processing on every line but just writing parsed case classes back to disk sounds very lightweight.
I On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao <raoshiv...@gmail.com> wrote: > I am trying to process a file that contains 4 log lines (not very long) > and then write my parsed out case classes to a destination folder, and I > get the following error: > > > java.lang.OutOfMemoryError: Java heap space > > at > org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183) > > at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244) > > at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280) > > at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75) > > at > org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) > > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) > > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) > > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > > at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165) > > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) > > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) > > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) > > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) > > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) > > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > > > Sadly, there are several folks that have faced this error while trying to > execute Spark jobs and there are various solutions, none of which work for > me > > > a) I tried ( > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736) > changing the number of partitions in my RDD by using coalesce(8) and the > error persisted > > b) I tried changing SPARK_WORKER_MEM=2g, SPARK_EXECUTOR_MEMORY=10g, and > both did not work > > c) I strongly suspect there is a class path error ( > http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-td4719.html) > Mainly because the call stack is repetitive. Maybe the OOM error is a > disguise ? > > d) I checked that i am not out of disk space and that i do not have too > many open files (ulimit -u << sudo ls /proc/<spark_master_process_id>/fd | > wc -l) > > > I am also noticing multiple reflections happening to find the right > "class" i guess, so it could be "class Not Found: error disguising itself > as a memory error. > > > Here are other threads that are encountering same situation .. but have > not been resolved in any way so far.. > > > > http://apache-spark-user-list.1001560.n3.nabble.com/no-response-in-spark-web-UI-td4633.html > > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html > > > Any help is greatly appreciated. I am especially calling out on creators > of Spark and Databrick folks. This seems like a "known bug" waiting to > happen. > > > Thanks, > > Shivani > > -- > Software Engineer > Analytics Engineering Team@ Box > Mountain View, CA >