Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Yin Huai
nov.com | 617.299.6746 > > > From: Yin Huai > Date: Monday, July 6, 2015 at 11:41 AM > To: Denny Lee > Cc: Simeon Simeonov , Andy Huang , > user > > Subject: Re: 1.4.0 regression: out-of-memory errors on small data > > Hi Sim, > > I

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Simeon Simeonov
ang mailto:andy.hu...@servian.com.au>>, user mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, I think the right way to set the PermGen Size is through driver extra JVM options, i.e. --conf "spark.driver.extraJavaOptions=-XX:MaxPe

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Yin Huai
s> | blog.simeonov.com | 617.299.6746 >> >> >> From: Yin Huai >> Date: Monday, July 6, 2015 at 12:59 AM >> To: Simeon Simeonov >> Cc: Denny Lee , Andy Huang < >> andy.hu...@servian.com.au>, user >> >> Subject: Re: 1.4.0 regression:

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Denny Lee
I stopped at this point because the exercise started looking silly. It >> is clear that 1.4.0 is using memory in a substantially different manner. >> >> I'd be happy to share the test file so you can reproduce this in your >> own environment. >> >> /Sim >>

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Simeon Simeonov
12:59 AM To: Simeon Simeonov mailto:s...@swoop.com>> Cc: Denny Lee mailto:denny.g@gmail.com>>, Andy Huang mailto:andy.hu...@servian.com.au>>, user mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data I have never seen issue like thi

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Yin Huai
lt;http://swoop.com/> > @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746 > > > From: Yin Huai > Date: Sunday, July 5, 2015 at 11:04 PM > To: Denny Lee > Cc: Andy Huang , Simeon Simeonov , > user > Subject: Re: 1.4.0 regression: out-of-m

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Simeon Simeonov
to:yh...@databricks.com>> Date: Sunday, July 5, 2015 at 11:04 PM To: Denny Lee mailto:denny.g@gmail.com>> Cc: Andy Huang mailto:andy.hu...@servian.com.au>>, Simeon Simeonov mailto:s...@swoop.com>>, user mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression:

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Yin Huai
Sim, Can you increase the PermGen size? Please let me know what is your setting when the problem disappears. Thanks, Yin On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee wrote: > I had run into the same problem where everything was working swimmingly > with Spark 1.3.1. When I switched to Spark 1.4

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Denny Lee
I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang wrote: > We have hit the same

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Andy Huang
We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file. Regards Andy On Sat, Jul 4, 2015 at 2:54 AM, s

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-03 Thread sim
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html Sent from the Apache Spark User List mailing list archive at Na

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-03 Thread bipin
I have a hunch I want to share: I feel that data is not being deallocated in memory (at least like in 1.3). Once it goes in-memory it just stays there. Spark SQL works fine, the same query when run on a new shell won't throw that error, but when run on a shell which has been used for other queries

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-03 Thread bipin
I will second this. I very rarely used to get out-of-memory errors in 1.3. Now I get these errors all the time. I feel that I could work on 1.3 spark-shell for long periods of time without spark throwing that error, whereas in 1.4 the shell needs to be restarted or gets killed frequently. -- Vie

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Simeon Simeonov
From: Yin Huai mailto:yh...@databricks.com>> Date: Thursday, July 2, 2015 at 4:34 PM To: Simeon Simeonov mailto:s...@swoop.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data Hi Sim, Seems you already set the PermGe

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Yin Huai
Hi Sim, Seems you already set the PermGen size to 256m, right? I notice that in your the shell, you created a HiveContext (it further increased the memory consumption on PermGen). But, spark shell has already created a HiveContext for you (sqlContext. You can use asInstanceOf to access HiveContext

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread Yin Huai
Hi Sim, Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" in the command you used to launch Spark shell? This will increase the PermGen size f

1.4.0 regression: out-of-memory errors on small data

2015-07-02 Thread sim
A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and fails with a series of out-of-memory errors in 1.4.0. This gist includes the code and the full output from the 1.3.1 and 1.4.0 runs, including the command line