subject:"Long GC pauses with Spark SQL 1.3.0 and billion row tables"

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-09-23 Thread tridib

Setting spark.sql.shuffle.partitions = 2000 solved my issue. I am able to join 2 1 billion rows tables in 3 minutes. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-GC-pauses-with-Spark-SQL-1-3-0-and-billion-row-tables-tp22750p24782.html Sent from the A

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-09-22 Thread tridib

By skewed did you mean it's not distributed uniformly across partition? All of my columns are string and almost of same size. i.e. id1,field11,fields12 id2,field21,field22 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-GC-pauses-with-Spark-SQL-1-3-0-

RE: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-09-22 Thread java8964

Or at least tell us how many partitions you are using. Yong > Date: Tue, 22 Sep 2015 02:06:15 -0700 > From: belevts...@gmail.com > To: user@spark.apache.org > Subject: Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables > > Could it be that your data is skewed? Do

RE: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-09-22 Thread java8964

Or at least tell us how many partitions you are using. Yong > Date: Tue, 22 Sep 2015 02:06:15 -0700 > From: belevts...@gmail.com > To: user@spark.apache.org > Subject: Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables > > Could it be that your data is skewed? Do

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-09-22 Thread dmytro

Could it be that your data is skewed? Do you have variable-length column types? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-GC-pauses-with-Spark-SQL-1-3-0-and-billion-row-tables-tp22750p24762.html Sent from the Apache Spark User List mailing list ar

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-09-21 Thread tridib

Did you get any solution to this? I am getting same issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-GC-pauses-with-Spark-SQL-1-3-0-and-billion-row-tables-tp22750p24759.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-04 Thread Michael Armbrust

If you data is evenly distributed (i.e. no skewed datapoints in your join keys), it can also help to increase spark.sql.shuffle.partitions (default is 200). On Mon, May 4, 2015 at 8:03 AM, Richard Marscher wrote: > In regards to the large GC pauses, assuming you allocated all 100GB of > memory p

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-04 Thread Richard Marscher

In regards to the large GC pauses, assuming you allocated all 100GB of memory per worker you may consider running with less memory on your Worker nodes, or splitting up the available memory on the Worker nodes amongst several worker instances. The JVM's garbage collection starts to become very slow

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-03 Thread Nick Travers

Could you be more specific in how this is done? A DataFrame class doesn't have that method. On Sun, May 3, 2015 at 11:07 PM, ayan guha wrote: > You can use custom partitioner to redistribution using partitionby > On 4 May 2015 15:37, "Nick Travers" wrote: > >> I'm currently trying to join two

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-03 Thread ayan guha

You can use custom partitioner to redistribution using partitionby On 4 May 2015 15:37, "Nick Travers" wrote: > I'm currently trying to join two large tables (order 1B rows each) using > Spark SQL (1.3.0) and am running into long GC pauses which bring the job to > a halt. > > I'm reading in both

Long GC pauses with Spark SQL 1.3.0 and billion row tables

2015-05-03 Thread Nick Travers

I'm currently trying to join two large tables (order 1B rows each) using Spark SQL (1.3.0) and am running into long GC pauses which bring the job to a halt. I'm reading in both tables using a HiveContext with the underlying files stored as Parquet Files. I'm using something along the lines of Hiv

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

RE: Long GC pauses with Spark SQL 1.3.0 and billion row tables

RE: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Re: Long GC pauses with Spark SQL 1.3.0 and billion row tables

Long GC pauses with Spark SQL 1.3.0 and billion row tables

11 matches

Site Navigation

Mail list logo

Footer information