Yin, that did the trick. I'm curious what was the effect of the environment variable, however, as the behavior of the shell changed from hanging to quitting when the env var value got to 1g.
/Sim Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/> @simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746 From: Yin Huai <yh...@databricks.com<mailto:yh...@databricks.com>> Date: Monday, July 6, 2015 at 11:41 AM To: Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> Cc: Simeon Simeonov <s...@swoop.com<mailto:s...@swoop.com>>, Andy Huang <andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, I think the right way to set the PermGen Size is through driver extra JVM options, i.e. --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" Can you try it? Without this conf, your driver's PermGen size is still 128m. Thanks, Yin On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> wrote: I went ahead and tested your file and the results from the tests can be seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94. Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the query ran without any issues. I was able to recreate the issue with {Java 7, default}. I included the commands I used to start the spark-shell but basically I just used all defaults (no alteration to driver or executor memory) with the only additional call was with driver-class-path to connect to MySQL Hive metastore. This is on OSX Macbook Pro. One thing I did notice is that your version of Java 7 is version 51 while my version of Java 7 version 79. Could you see if updating to Java 7 version 79 perhaps allows you to use the MaxPermSize call? On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov <s...@swoop.com<mailto:s...@swoop.com>> wrote: The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-00000.gz?dl=1 The command was included in the gist SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g /Sim Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/> @simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746<tel:617.299.6746> From: Yin Huai <yh...@databricks.com<mailto:yh...@databricks.com>> Date: Monday, July 6, 2015 at 12:59 AM To: Simeon Simeonov <s...@swoop.com<mailto:s...@swoop.com>> Cc: Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>>, Andy Huang <andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application? Thanks, Yin On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <s...@swoop.com<mailto:s...@swoop.com>> wrote: Yin, With 512Mb PermGen, the process still hung and had to be kill -9ed. At 1Gb the spark shell & associated processes stopped hanging and started exiting with scala> println(dfCount.first.getLong(0)) 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB) 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB) 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at <console>:30 java.lang.OutOfMemoryError: PermGen space Stopping spark context. Exception in thread "main" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main" 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB) That did not change up until 4Gb of PermGen space and 8Gb for driver & executor each. I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner. I'd be happy to share the test file so you can reproduce this in your own environment. /Sim Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/> @simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746<tel:617.299.6746> From: Yin Huai <yh...@databricks.com<mailto:yh...@databricks.com>> Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> Cc: Andy Huang <andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>>, Simeon Simeonov <s...@swoop.com<mailto:s...@swoop.com>>, user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> wrote: I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>> wrote: We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, sim <s...@swoop.com<mailto:s...@swoop.com>> wrote: @bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> -- Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979