Hi Praveen Have you tried applying DISTINCT without the brackets around T1.ID
select distinct T1.ID, T1.url, T1.timestamp from T1 LEFT OUTER JOIN T2 on T1.ID= T2.ID AND T2.ID=NULL Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Praveen Kumar K J V S <praveenkjvs.develo...@gmail.com> Date: Fri, 9 Nov 2012 21:30:28 To: <user@hive.apache.org> Reply-To: user@hive.apache.org Subject: Re: Hive Join with distinct rows Thanks Mark, I do understand that how Hive works with Distinct keyword. What I was looking for is a solution for my requirement in Hive, I am not an expert in SQL, hence looking for suggestions On Fri, Nov 9, 2012 at 9:54 AM, Mark Grover <grover.markgro...@gmail.com>wrote: > Hi Praveen, > Let's take an example: > (from > https://cwiki.apache.org/Hive/languagemanual-select.html#LanguageManualSelect-ALLandDISTINCTClauses > ) > > -- Print out contents of the table > hive> SELECT col1, col2 FROM t1; > 1 3 > 1 3 > 1 4 > 2 5 > > -- Selects distinct col1, col2 tuple > hive> SELECT DISTINCT col1, col2 FROM t1; > 1 3 > 1 4 > 2 5 > > > Similar to the second query above, your Query 2 selects each of the > distinct values for <id, url, timestamp> tuple possibly giving you multiple > records for a given id on the left side of the join. Consequently you don't > get the result you expect. > > Mark > > On Thu, Nov 8, 2012 at 6:15 PM, Praveen Kumar K J V S < > praveenkjvs.develo...@gmail.com> wrote: > >> Hi, >> >> I have 2 tables, T1 and T2 of structure columns= {ID, url, timestamp} >> >> In T1 ID's are repeatable. But T2 ID's are kind of primary key hence only >> unique values are present. >> >> I want to join both tables T1 & T2 such that select all the unique ID's >> in T1 which are not in T2 >> >> I have written 2 queries: >> >> Query1: select distinct T1.ID from T1 LEFT OUTER JOIN T2 on T1.ID=T2.IDAND >> T2.ID=NULL >> >> this gives me expected results, but with the below I am getting all the >> rows in T1 >> >> Query2: select distinct(T1.ID), T1.url, T1.timestamp from T1 LEFT OUTER >> JOIN T2 on T1.ID=T2.ID AND T2.ID=NULL >> >> Can some one point me how to achieve: select all the unique ID's in T1 >> which are not in T2 >> >> Thanks, >> Praveen >> >> >