It went from about 60 mins to 3 mins. Hive was traversing the whole table
multiple times, which is obviously inefficient!
> Date: Tue, 7 Jul 2015 15:55:19 -0700
> Subject: Re: Limiting outer join
> From: gop...@apache.org
> To: user@hive.apache.org
>
>
> > Never min
> Never mind, I got it working with UDF. I just pass the file location to
>my evaluate function. Thanks! :)
Nice. Would be very interested in looking at performance of such a UDF, if
you have numbers before/after.
I suspect it will be a magnitude or more faster than the BETWEEN/JOIN
clauses.
Ch
Never mind, I got it working with UDF. I just pass the file location to my
evaluate function. Thanks! :)
From: tben...@hotmail.com
To: user@hive.apache.org
Subject: RE: Limiting outer join
Date: Tue, 7 Jul 2015 09:59:22 -0700
Thanks for your replies.
I see how extracting the first country
Subject: Re: Limiting outer join
> From: gop...@apache.org
> To: user@hive.apache.org
>
>
> > In the following query, it is possible to limit the amount of entries
> >returned by an outer join to a single value? I want to obtain a single
> >country from ipv4geotable for ea
> In the following query, it is possible to limit the amount of entries
>returned by an outer join to a single value? I want to obtain a single
>country from ipv4geotable for each entry in logontable.
Yes, the PTF DENSE_RANK()/ROW_NUMBER() basically gives you that - you can
read the first row out
Hi,
In the following query, it is possible to limit the amount of entries returned
by an outer join to a single value? I want to obtain a single country from
ipv4geotable for each entry in logontable.
CREATE TABLE ipv4table AS
SELECT logon.IP, ipv4.Country
FROM
(SELECT * FROM logontable WHE