I put the same dataset into scala (using spark-shell) and it acts weird. I
cannot do a count on it, the executors seem to hang. The WebUI shows 0/96
in the status bar, shows details about the worker nodes but there is no
progress.
sc.parallelize does finish (takes too long for the data size) in scala.


On Fri, Jul 11, 2014 at 2:00 PM, Mohit Jaggi <mohitja...@gmail.com> wrote:

> spark_data_array here has about 35k rows with 4k columns. I have 4 nodes
> in the cluster and gave 48g to executors. also tried kyro serialization.
>
> traceback (most recent call last):
>
>   File "/mohit/./m.py", line 58, in <module>
>
>     spark_data = sc.parallelize(spark_data_array)
>
>   File "/mohit/spark/python/pyspark/context.py", line 265, in parallelize
>
>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>
>   File "/mohit/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 537, in __call__
>
>   File "/mohit/spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line
> 300, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>
> : java.lang.OutOfMemoryError: Java heap space
>
> at
> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>
> at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>
> at py4j.Gateway.invoke(Gateway.java:259)
>
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>
> at java.lang.Thread.run(Thread.java:745)
>

Reply via email to