RE: Force inner join to shuffle the smallest table

2015-06-25 Thread Ulanov, Alexander
[68] at explain at :25 Could Spark SQL developers suggest why it happens? Best regards, Alexander From: Stephen Carman [mailto:scar...@coldlight.com] Sent: Wednesday, June 24, 2015 12:33 PM To: Ulanov, Alexander Cc: CC GP; dev@spark.apache.org Subject: Re: Force inner join to shuffle the smallest

Re: Force inner join to shuffle the smallest table

2015-06-24 Thread Stephen Carman
...@hp.com>> wrote: It also fails, as I mentioned in the original question. From: CC GP [mailto:chandrika.gopalakris...@gmail.com] Sent: Wednesday, June 24, 2015 12:08 PM To: Ulanov, Alexander Cc: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Re: Force inner join to shuffle the sm

RE: Force inner join to shuffle the smallest table

2015-06-24 Thread Ulanov, Alexander
It also fails, as I mentioned in the original question. From: CC GP [mailto:chandrika.gopalakris...@gmail.com] Sent: Wednesday, June 24, 2015 12:08 PM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Force inner join to shuffle the smallest table Try below and see if it makes a

Re: Force inner join to shuffle the smallest table

2015-06-24 Thread CC GP
Try below and see if it makes a difference: val result = sqlContext.sql(“select big.f1, big.f2 from small inner join big on big.s=small.s and big.d=small.d”) On Wed, Jun 24, 2015 at 11:35 AM, Ulanov, Alexander wrote: > Hi, > > > > I try to inner join of two tables on two fields(string and doub

Force inner join to shuffle the smallest table

2015-06-24 Thread Ulanov, Alexander
Hi, I try to inner join of two tables on two fields(string and double). One table is 2B rows, the second is 500K. They are stored in HDFS in Parquet. Spark v 1.4. val big = sqlContext.paquetFile("hdfs://big") data.registerTempTable("big") val small = sqlContext.paquetFile("hdfs://small") data.reg