Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
cally > pulling > >>>>>> out > >>>>>> selected columns from the query, but there is no roll up happening > or > >>>>>> anything that would possible make it suspicious that there is any > >>>>>> difference > &

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Michael Segel
matched based on three keys >>>>>> that >>>>>> are present in both tables (ad, account, and date), on top of this >>>>>> they are >>>>>> filtered by date being above 2016-01-03. Since all the joins are >>>

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
> > difference > >>>> > besides the type of joins. The tables are matched based on three > keys > >>>> > that > >>>> > are present in both tables (ad, account, and date), on top of this > >>>> > they are > >>>

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Davies Liu
mo_lt where date >>>> >>='2016-01-03'").count >>>> > >>>> > res14: Long = 34158 >>>> > >>>> > scala> sqlContext.sql("select * from dps_pin_promo_lt where date >>>> >>='2016-01-03&#

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Cesar Flores
res15: Long = 42693 >>> > >>> > >>> > The above two queries filter out the data based on date used by the >>> joins of >>> > 2016-01-03 and you can see the row count between the two tables are >>> > different, which is why I am suspecting something is wrong w

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Gourav Sengupta
joins in spark sql, because in this situation the right and outer joins >> may >> > produce the same results, but it should not be equal to the left join >> and >> > definitely not the inner join; unless I am missing something. >> > >> > >

Re: Weird results with Spark SQL Outer joins

2016-05-03 Thread Kevin Peng
l value is res16: Long = 42694 > > > > > > Thanks, > > > > > > KP > > > > > > > > > > On Mon, May 2, 2016 at 12:50 PM, Yong Zhang > wrote: > >> > >> We are still not sure what is the problem, if you cannot show u

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Davies Liu
ultSet row count as dps right outer join >> with swig on 3 columns, with same additional filters. >> >> Without knowing your data, I cannot see the reason that has to be a bug in >> the spark. >> >> Am I misunderstanding your bug? >> >> Yong >> >> ___

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Gourav Sengupta
nowing your data, I cannot see the reason that has to be a bug >> in the spark. >> >> Am I misunderstanding your bug? >> >> Yong >> >> -- >> From: kpe...@gmail.com >> Date: Mon, 2 May 2016 12:11:18 -0700 >>

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
ter join > with swig on 3 columns, with same additional filters. > > Without knowing your data, I cannot see the reason that has to be a bug in > the spark. > > Am I misunderstanding your bug? > > Yong > > -------------- > From: kpe...@gmail.com >

RE: Weird results with Spark SQL Outer joins

2016-05-02 Thread Yong Zhang
: Mon, 2 May 2016 12:11:18 -0700 Subject: Re: Weird results with Spark SQL Outer joins To: gourav.sengu...@gmail.com CC: user@spark.apache.org Gourav, I wish that was case, but I have done a select count on each of the two tables individually and they return back different number of rows

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
Gourav, I wish that was case, but I have done a select count on each of the two tables individually and they return back different number of rows: dps.registerTempTable("dps_pin_promo_lt") swig.registerTempTable("swig_pin_promo_lt") dps.count() RESULT: 42632 swig.count() RESULT: 42034 On

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Gourav Sengupta
This shows that both the tables have matching records and no mismatches. Therefore obviously you have the same results irrespective of whether you use right or left join. I think that there is no problem here, unless I am missing something. Regards, Gourav On Mon, May 2, 2016 at 7:48 PM, kpeng1

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread kpeng1
Also, the results of the inner query produced the same results: sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s INNER JOIN dps_pin_promo_lt d ON (s.date

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Gourav Sengupta
Hi Kevin, Thanks. Please post the result of the same query with INNER JOIN and then it will give us a bit of insight. Regards, Gourav On Mon, May 2, 2016 at 7:10 PM, Kevin Peng wrote: > Gourav, > > Apologies. I edited my post with this information: > Spark version: 1.6 > Result from spark s

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Kevin Peng
Gourav, Apologies. I edited my post with this information: Spark version: 1.6 Result from spark shell OS: Linux version 2.6.32-431.20.3.el6.x86_64 ( mockbu...@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Thu Jun 19 21:14:45 UTC 2014 Thanks, KP On Mon,

Re: Weird results with Spark SQL Outer joins

2016-05-02 Thread Gourav Sengupta
Hi, As always, can you please write down details regarding your SPARK cluster - the version, OS, IDE used, etc? Regards, Gourav Sengupta On Mon, May 2, 2016 at 5:58 PM, kpeng1 wrote: > Hi All, > > I am running into a weird result with Spark SQL Outer joins. The results > for all of them seem