Hi Kevin, Thanks.
Please post the result of the same query with INNER JOIN and then it will give us a bit of insight. Regards, Gourav On Mon, May 2, 2016 at 7:10 PM, Kevin Peng <[email protected]> wrote: > Gourav, > > Apologies. I edited my post with this information: > Spark version: 1.6 > Result from spark shell > OS: Linux version 2.6.32-431.20.3.el6.x86_64 ( > [email protected]) (gcc version 4.4.7 20120313 (Red Hat > 4.4.7-4) (GCC) ) #1 SMP Thu Jun 19 21:14:45 UTC 2014 > > Thanks, > > KP > > On Mon, May 2, 2016 at 11:05 AM, Gourav Sengupta < > [email protected]> wrote: > >> Hi, >> >> As always, can you please write down details regarding your SPARK cluster >> - the version, OS, IDE used, etc? >> >> Regards, >> Gourav Sengupta >> >> On Mon, May 2, 2016 at 5:58 PM, kpeng1 <[email protected]> wrote: >> >>> Hi All, >>> >>> I am running into a weird result with Spark SQL Outer joins. The results >>> for all of them seem to be the same, which does not make sense due to the >>> data. Here are the queries that I am running with the results: >>> >>> sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , >>> d.account AS >>> d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , >>> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s FULL OUTER JOIN >>> dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND >>> s.ad = >>> d.ad) WHERE s.date >= '2016-01-03' AND d.date >= >>> '2016-01-03'").count() >>> RESULT:23747 >>> >>> >>> sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , >>> d.account AS >>> d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , >>> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s LEFT OUTER JOIN >>> dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND >>> s.ad = >>> d.ad) WHERE s.date >= '2016-01-03' AND d.date >= >>> '2016-01-03'").count() >>> RESULT:23747 >>> >>> sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , >>> d.account AS >>> d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend , >>> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s RIGHT OUTER JOIN >>> dps_pin_promo_lt d ON (s.date = d.date AND s.account = d.account AND >>> s.ad = >>> d.ad) WHERE s.date >= '2016-01-03' AND d.date >= >>> '2016-01-03'").count() >>> RESULT: 23747 >>> >>> Was wondering if someone had encountered this issues before. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Weird-results-with-Spark-SQL-Outer-joins-tp26861.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >
