>It¹s really a very simple query that I¹m trying to run:
>select
...
>bloom_contains(a_id, b_id_bloom)
That's nearly impossible to optimize directly - there is no way to limit
the number of table_b rows which may match table a.
More than one bloom filter can successfully match a single row f
Hi Gopal,
Thanks for the detailed response.
It’s really a very simple query that I’m trying to run:
select
a.a_id,
b.b_id,
count(*) as c
from
table_a a,
table_b b
where
bloom_contains(a_id, b_id_bloom)
group by
a.a_id,
b.b_id;
Where “bloom_contains” is a custom U
>I¹m having trouble doing a cross join between two tables that are too big
>for a map-side join.
The actual query would help btw. Usually what is planned as a cross-join
can be optimized out into a binning query with a custom UDF.
In particular with 2-D geo queries with binning, which people ten
Hi Gopal,
Thanks for the speedy response! A follow-up question though: 10Mb input sounds
like that would work for a map join. I’m having trouble doing a cross join
between two tables that are too big for a map-side join. Trying to break down
one table into small enough partitions and then union
> Over the last few week I¹ve been trying to use cross joins/cartesian
>products and was wondering why, exactly, this all gets sent to one
>reducer. All I¹ve heard or read is that Hive can¹t/doesn¹t parallelize
>the job.
The hashcode of the shuffle key is 0, since you need to process every row
a
Periya,
I am using Hive-0.8.1 and was able to get joins to work with:
1. No on clause
2. "on (true)" as the on clause to work.
I am not entirely sure why you are getting exceptions with the above
queries. Perhaps, there were some bugs in 0.7.1 that got resolved in 0.8.1?
As far as workarounds go,
Very cool. What is the non strict option for?
Thanks,
Ranjith
From: Ashish Thusoo
To:
Sent: Mon May 02 16:00:39 2011
Subject: Re: Cross join in Hive.
you could probably just say (1 = 1) in the on clause for the join.
set hive.mapred.mode=nonstrict;
select
first table to be equal to the same column in the other table.
Thanks,
Ranjith
From: Raghunath, Ranjith
mailto:ranjith.raghuna...@usaa.com>>
To: 'user@hive.apache.org'
mailto:user@hive.apache.org>>
Sent: Mon May 02 00:21:57 2011
Subject: Re:
11
Subject: Re: Cross join in Hive.
I haven't tested this out but plan to in 6 hours. Add an extra column and set
it to 1 in both tables. Perform an inner join between the two tables.
Thanks,
Ranjith
From: Abhinov Agarwal
To: user@hive.apache.org
Sent: S
I haven't tested this out but plan to in 6 hours. Add an extra column and set
it to 1 in both tables. Perform an inner join between the two tables.
Thanks,
Ranjith
From: Abhinov Agarwal
To: user@hive.apache.org
Sent: Sun May 01 22:50:34 2011
Subject: Cross join
10 matches
Mail list logo