Re: Hive Join with distinct rows

Sunita Arvind Tue, 30 Jul 2013 17:09:37 -0700

Thanks for sharing your experience Marcin

Sunita



On Tue, Jul 30, 2013 at 11:54 AM, Marcin Mejran <marcin.mej...@hooklogic.com
> wrote:

>  I’ve used a rank udf for this previously, distribute and sort by the
> column then select all rows where rank=1. That should work with a join but
> I never tried it. It’d be an issue if the join outputs a lot of records
> that then are all dropped since that’d slow down the query.****
>
> ** **
>
> I’ve actually forked Hive internally and added a distinct join based on
> the, now deprecated I guess, unique join code. It’s ugly in terms of syntax
> and I haven’t had a chance to open source it but it allows a good amount of
> control over what is joined to what (ie: select the row in table A whose
> column x is closets to column y in table B, for example request time). I
> really wish Hive had better support for such “non-SQL” types of queries
> which are common in a world of unstructured and un-clean data.****
>
> ** **
>
> -Marcin****
>
> ** **
>
> *From:* Sunita Arvind [mailto:sunitarv...@gmail.com]
> *Sent:* Tuesday, July 30, 2013 11:00 AM
> *To:* user@hive.apache.org
> *Subject:* Hive Join with distinct rows****
>
> ** **
>
> Hi Praveen / All,
>
> I also have a requirement similar to the one explained (by Praveen) below:
> distinct rows on a single column with corresponding data from other
> columns.
>
>
> http://mail-archives.apache.org/mod_mbox/hive-user/201211.mbox/%3ccahmb8ta+r0h5a+armutookhkp8fxctown68qoz6lkfmwbrk...@mail.gmail.com%3E
> ****
>
> This email thread dates back to Nov 2012 and is a very common use case.I
> just wanted to check if there is a solution already or we still need to
> write a UDF.****
>
> regards
> Sunita****
>

Re: Hive Join with distinct rows

Reply via email to