MMERCIAL:Re: Partitioned table and Bucket Map Join
<http://markmail.org/message/5vwyo5wfleozqzjh>[image: permalink]
<http://markmail.org/message/5vwyo5wfleozqzjh>From:Matthew Dixon (matt...@
jagex.com)Date:Jan 30, 2015 1:17:35 amList:*org.apache.hadoop.hive-user*
Not sure if this is going to so
for each table in a subquery (before joining) might encourage hive to join
them efficiently in the subsequent stage...
From: murali parimi [mailto:muralikrishna.par...@icloud.com]
Sent: 29 January 2015 18:56
To: user@hive.apache.org
Cc: user
Subject: COMMERCIAL:Re: Partitioned table and Bucket Map
agreed!
On Jan 29, 2015, at 11:42 PM, matshyeq wrote:
no confusion here.
My use case is exactly the same.
1. What I was saying is my/your join condition looks like (or should look like,
in your terms):
FROM A JOIN B
ON A.X = B.X
AND A.Y = B.Y
which should trigger merge bucket map join in my
no confusion here.
My use case is exactly the same.
1. What I was saying is my/your join condition looks like (or should look
like, in your terms):
FROM A JOIN B
ON A.X = B.X
AND A.Y = B.Y
which should trigger merge bucket map join in my opinion:
Data locality information is full - you may look a
Hello apologize for the confusion. Here I will iterate the problem again.
I have two tables A, B which are partitioned on column X and bucketed (Same
number of buckets) based on column Y. Table A is huge in terms of size (~135GB)
and Table B is smaller table in terms of size (33GB). Both the ta
My hunch is while partitioning is in fact very similar to bucketing
(actually superior as you have some control over what file data goes to)
the hive optimizer only applies bucket joins if your tables are bucketed so
your join condition
t1.bucketed_column = t2.bucketed_column
triggers the bucket
I faced the same situation where two tables with 3 billion records on each side
and partitioned, sorted on same key. Set the following parameters in the hive
query assuming the join will happen in the map phase.
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set h
I do have two tables partitioned on the same criteria.
Could I still take advantage of Bucket Map Join or better, Sort Merge
Bucket Map Join?
How?
~Maciek