Thanks for sharing your experience Marcin Sunita
On Tue, Jul 30, 2013 at 11:54 AM, Marcin Mejran <marcin.mej...@hooklogic.com > wrote: > I’ve used a rank udf for this previously, distribute and sort by the > column then select all rows where rank=1. That should work with a join but > I never tried it. It’d be an issue if the join outputs a lot of records > that then are all dropped since that’d slow down the query.**** > > ** ** > > I’ve actually forked Hive internally and added a distinct join based on > the, now deprecated I guess, unique join code. It’s ugly in terms of syntax > and I haven’t had a chance to open source it but it allows a good amount of > control over what is joined to what (ie: select the row in table A whose > column x is closets to column y in table B, for example request time). I > really wish Hive had better support for such “non-SQL” types of queries > which are common in a world of unstructured and un-clean data.**** > > ** ** > > -Marcin**** > > ** ** > > *From:* Sunita Arvind [mailto:sunitarv...@gmail.com] > *Sent:* Tuesday, July 30, 2013 11:00 AM > *To:* user@hive.apache.org > *Subject:* Hive Join with distinct rows**** > > ** ** > > Hi Praveen / All, > > I also have a requirement similar to the one explained (by Praveen) below: > distinct rows on a single column with corresponding data from other > columns. > > > http://mail-archives.apache.org/mod_mbox/hive-user/201211.mbox/%3ccahmb8ta+r0h5a+armutookhkp8fxctown68qoz6lkfmwbrk...@mail.gmail.com%3E > **** > > This email thread dates back to Nov 2012 and is a very common use case.I > just wanted to check if there is a solution already or we still need to > write a UDF.**** > > regards > Sunita**** >