Thanks Zoltan. So far I got to a full repro which works both in docker and on a bigger real-world cluster. Also, the whole thing only happens in `cluster` mode. I issued a ticket for it. Any thoughts?
https://issues.apache.org/jira/browse/SPARK-10487 On Mon, Sep 7, 2015 at 7:59 PM, Zsolt Tóth <toth.zsolt....@gmail.com> wrote: > Hi, > > I ran your example on Spark-1.4.1 and 1.5.0-rc3. It succeeds on 1.4.1 but > throws the OOM on 1.5.0. Do any of you know which PR introduced this > issue? > > Zsolt > > > 2015-09-07 16:33 GMT+02:00 Zoltán Zvara <zoltan.zv...@gmail.com>: > >> Hey, I'd try to debug, profile ResolvedDataSource. As far as I know, your >> write will be performed by the JVM. >> >> On Mon, Sep 7, 2015 at 4:11 PM Tóth Zoltán <t...@looper.hu> wrote: >> >>> Unfortunately I'm getting the same error: >>> The other interesting things are that: >>> - the parquet files got actually written to HDFS (also with >>> .write.parquet() ) >>> - the application gets stuck in the RUNNING state for good even after >>> the error is thrown >>> >>> 15/09/07 10:01:10 INFO spark.ContextCleaner: Cleaned accumulator 19 >>> 15/09/07 10:01:10 INFO spark.ContextCleaner: Cleaned accumulator 5 >>> 15/09/07 10:01:12 INFO spark.ContextCleaner: Cleaned accumulator 20 >>> Exception in thread "Thread-7" >>> Exception: java.lang.OutOfMemoryError thrown from the >>> UncaughtExceptionHandler in thread "Thread-7" >>> Exception in thread "org.apache.hadoop.hdfs.PeerCache@4070d501" >>> Exception: java.lang.OutOfMemoryError thrown from the >>> UncaughtExceptionHandler in thread >>> "org.apache.hadoop.hdfs.PeerCache@4070d501" >>> Exception in thread "LeaseRenewer:r...@docker.rapidminer.com:8020" >>> Exception: java.lang.OutOfMemoryError thrown from the >>> UncaughtExceptionHandler in thread >>> "LeaseRenewer:r...@docker.rapidminer.com:8020" >>> Exception in thread "Reporter" >>> Exception: java.lang.OutOfMemoryError thrown from the >>> UncaughtExceptionHandler in thread "Reporter" >>> Exception in thread "qtp2134582502-46" >>> Exception: java.lang.OutOfMemoryError thrown from the >>> UncaughtExceptionHandler in thread "qtp2134582502-46" >>> >>> >>> >>> >>> On Mon, Sep 7, 2015 at 3:48 PM, boci <boci.b...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Can you try to using save method instead of write? >>>> >>>> ex: out_df.save("path","parquet") >>>> >>>> b0c1 >>>> >>>> >>>> ---------------------------------------------------------------------------------------------------------------------------------- >>>> Skype: boci13, Hangout: boci.b...@gmail.com >>>> >>>> On Mon, Sep 7, 2015 at 3:35 PM, Zoltán Tóth <zoltanct...@gmail.com> >>>> wrote: >>>> >>>>> Aaand, the error! :) >>>>> >>>>> Exception in thread "org.apache.hadoop.hdfs.PeerCache@4e000abf" >>>>> Exception: java.lang.OutOfMemoryError thrown from the >>>>> UncaughtExceptionHandler in thread >>>>> "org.apache.hadoop.hdfs.PeerCache@4e000abf" >>>>> Exception in thread "Thread-7" >>>>> Exception: java.lang.OutOfMemoryError thrown from the >>>>> UncaughtExceptionHandler in thread "Thread-7" >>>>> Exception in thread "LeaseRenewer:r...@docker.rapidminer.com:8020" >>>>> Exception: java.lang.OutOfMemoryError thrown from the >>>>> UncaughtExceptionHandler in thread >>>>> "LeaseRenewer:r...@docker.rapidminer.com:8020" >>>>> Exception in thread "Reporter" >>>>> Exception: java.lang.OutOfMemoryError thrown from the >>>>> UncaughtExceptionHandler in thread "Reporter" >>>>> Exception in thread "qtp2115718813-47" >>>>> Exception: java.lang.OutOfMemoryError thrown from the >>>>> UncaughtExceptionHandler in thread "qtp2115718813-47" >>>>> >>>>> Exception: java.lang.OutOfMemoryError thrown from the >>>>> UncaughtExceptionHandler in thread "sparkDriver-scheduler-1" >>>>> >>>>> Log Type: stdout >>>>> >>>>> Log Upload Time: Mon Sep 07 09:03:01 -0400 2015 >>>>> >>>>> Log Length: 986 >>>>> >>>>> Traceback (most recent call last): >>>>> File "spark-ml.py", line 33, in <module> >>>>> out_df.write.parquet("/tmp/logparquet") >>>>> File >>>>> "/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1441224592929_0022/container_1441224592929_0022_01_000001/pyspark.zip/pyspark/sql/readwriter.py", >>>>> line 422, in parquet >>>>> File >>>>> "/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1441224592929_0022/container_1441224592929_0022_01_000001/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", >>>>> line 538, in __call__ >>>>> File >>>>> "/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1441224592929_0022/container_1441224592929_0022_01_000001/pyspark.zip/pyspark/sql/utils.py", >>>>> line 36, in deco >>>>> File >>>>> "/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1441224592929_0022/container_1441224592929_0022_01_000001/py4j-0.8.2.1-src.zip/py4j/protocol.py", >>>>> line 300, in get_return_value >>>>> py4j.protocol.Py4JJavaError >>>>> >>>>> >>>>> >>>>> On Mon, Sep 7, 2015 at 3:27 PM, Zoltán Tóth <zoltanct...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> When I execute the Spark ML Logisitc Regression example in pyspark I >>>>>> run into an OutOfMemory exception. I'm wondering if any of you >>>>>> experienced >>>>>> the same or has a hint about how to fix this. >>>>>> >>>>>> The interesting bit is that I only get the exception when I try to >>>>>> write the result DataFrame into a file. If I only "print" any of the >>>>>> results, it all works fine. >>>>>> >>>>>> My Setup: >>>>>> Spark 1.5.0-SNAPSHOT built for Hadoop 2.6.0 (I'm working with the >>>>>> latest nightly build) >>>>>> Build flags: -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn >>>>>> -DzincPort=3034 >>>>>> >>>>>> I'm using the default resource setup >>>>>> 15/09/07 08:49:04 INFO yarn.YarnAllocator: Will request 2 executor >>>>>> containers, each with 1 cores and 1408 MB memory including 384 MB >>>>>> overhead >>>>>> 15/09/07 08:49:04 INFO yarn.YarnAllocator: Container request (host: >>>>>> Any, capability: <memory:1408, vCores:1>) >>>>>> 15/09/07 08:49:04 INFO yarn.YarnAllocator: Container request (host: >>>>>> Any, capability: <memory:1408, vCores:1>) >>>>>> >>>>>> The script I'm executing: >>>>>> from pyspark import SparkContext, SparkConf >>>>>> from pyspark.sql import SQLContext >>>>>> >>>>>> conf = SparkConf().setAppName("pysparktest") >>>>>> sc = SparkContext(conf=conf) >>>>>> sqlContext = SQLContext(sc) >>>>>> >>>>>> from pyspark.mllib.regression import LabeledPoint >>>>>> from pyspark.mllib.linalg import Vector, Vectors >>>>>> >>>>>> training = sc.parallelize(( >>>>>> LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)), >>>>>> LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)), >>>>>> LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)), >>>>>> LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5)))) >>>>>> >>>>>> training_df = training.toDF() >>>>>> >>>>>> from pyspark.ml.classification import LogisticRegression >>>>>> >>>>>> reg = LogisticRegression() >>>>>> >>>>>> reg.setMaxIter(10).setRegParam(0.01) >>>>>> model = reg.fit(training.toDF()) >>>>>> >>>>>> test = sc.parallelize(( >>>>>> LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)), >>>>>> LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)), >>>>>> LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5)))) >>>>>> >>>>>> out_df = model.transform(test.toDF()) >>>>>> >>>>>> out_df.write.parquet("/tmp/logparquet") >>>>>> >>>>>> And the command: >>>>>> spark-submit --master yarn --deploy-mode cluster spark-ml.py >>>>>> >>>>>> Thanks, >>>>>> z >>>>>> >>>>> >>>>> >>>> >>> >