Hi All,
I am desperately looking for some help.
My cluster is 6 nodes having dual core and 8GB ram each. Spark version
running on the cluster is spark-0.9.0-incubating-bin-cdh4.
I am getting OutOfMemoryError when running a Spark Streaming job
(non-streaming version works fine) which queries Cassandra table (simple
query returning 3-4 rows) by connecting to the Spark standalone cluster
master.
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.io.WritableUtils.readCompressedByteArray(WritableUtils.java:38)
at
org.apache.hadoop.io.WritableUtils.readCompressedString(WritableUtils.java:87)
at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:185)
at
org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
at
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
Apr 15, 2014 6:53:39 PM org.apache.spark.Logging$class logInfo
Spark job dependencies are
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>0.9.0-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>0.9.0-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>2.0.6</version>
</dependency>
<dependency>
<groupId>com.tuplejump</groupId>
<artifactId>calliope_2.10</artifactId>
<version>0.9.0-U1-C2-EA</version>
</dependency>
Various memory variables are configured as below.
spark.executor.memory = 4g
SPARK_MEM = 2g
SPARK_WORKER_MEMORY = 4g
Can you you please let me know where am I going wrong.
Thanks,
Sony
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-job-having-Cassandra-query-OutOfMemoryError-tp4280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.