well. i had some free time to search it. from here: http://www.postgresql.org/docs/9.3/static/sql-select.html#SQL-UNION you'll see the default is indeed UNION DISTINCT. so changing it to UNION ALL you'll get different results - are they the ones you're expecting?
On Wed, Mar 12, 2014 at 9:36 AM, Stephen Sprague <sprag...@gmail.com> wrote: > interesting. don't know the answer but could you change the UNION in > the Postgres to UNION ALL? I'd be curious if the default is UNION DISTINCT > on that platform. That would at least partially explain postgres behaviour > leaving hive the odd man out. > > > > On Wed, Mar 12, 2014 at 6:47 AM, Martin Kudlej <mkud...@redhat.com> wrote: > >> Hi all, >> >> I've tried BigTop test for join_filters: >> CREATE TABLE myinput1(key int, value int); >> LOAD DATA LOCAL INPATH 'seed_data_files/in3.txt' INTO TABLE myinput1; >> >> where seed_data_files/in3.txt: >> 12 35 >> NULL 40 >> 48 NULL >> 100 100 >> >> I've tried: >> SELECT * FROM myinput1 a FULL OUTER JOIN myinput1 b on a.key > 40 AND >> a.value > 50 AND a.key = a.value AND b.key > 40 AND b.value > 50 AND b.key >> = b.value ORDER BY a.key, a.value, b.key, b.value; >> >> and expected result in test is: >> NULL NULL NULL 40 >> NULL NULL NULL 40 >> NULL NULL NULL 40 >> NULL NULL NULL 40 >> NULL NULL 12 35 >> NULL NULL 12 35 >> NULL NULL 12 35 >> NULL NULL 12 35 >> NULL NULL 48 NULL >> NULL NULL 48 NULL >> NULL NULL 48 NULL >> NULL NULL 48 NULL >> NULL 40 NULL NULL >> NULL 40 NULL NULL >> NULL 40 NULL NULL >> NULL 40 NULL NULL >> 12 35 NULL NULL >> 12 35 NULL NULL >> 12 35 NULL NULL >> 12 35 NULL NULL >> 48 NULL NULL NULL >> 48 NULL NULL NULL >> 48 NULL NULL NULL >> 48 NULL NULL NULL >> 100 100 NULL NULL >> 100 100 NULL NULL >> 100 100 NULL NULL >> 100 100 100 100 >> >> >> but real hive result is: >> NULL NULL NULL 40 >> NULL NULL 12 35 >> NULL NULL 48 NULL >> NULL 40 NULL NULL >> 12 35 NULL NULL >> 48 NULL NULL NULL >> 100 100 100 100 >> >> btw. result from postgresql is: >> (SELECT * >> FROM myinput1 a >> LEFT JOIN >> myinput1 b on >> a.key > 40 AND >> a.value > 50 AND >> a.key = a.value AND >> b.key > 40 AND >> b.value > 50 AND >> b.key = b.value ORDER BY a.key, a.value, b.key, b.value) >> UNION (SELECT * >> FROM myinput1 a >> RIGHT JOIN >> myinput1 b on >> a.key > 40 AND >> a.value > 50 AND >> a.key = a.value AND >> b.key > 40 AND >> b.value > 50 AND >> b.key = b.value >> ORDER BY a.key, a.value, b.key, b.value); >> | | 12 | 35 >> 12 | 35 | | >> | | 48 | >> 48 | | | >> | 40 | | >> | | | 40 >> 100 | 100 | 100 | 100 >> >> so it's the same like in hive. >> >> What is the right result for this full outer join in HiveQL, please? >> >> -- >> Best Regards, >> Martin Kudlej. >> MRG/Grid & RHS-Hadoop Senior Quality Assurance Engineer >> Red Hat Czech s.r.o. >> >> Phone: +420 532 294 155 >> E-mail:mkudlej at redhat.com >> IRC: mkudlej at #brno, #messaging, #grid, #rhs, #distcomp >> > >