Re: Partitioned table and Bucket Map Join

2015-01-30 Thread matshyeq
MMERCIAL:Re: Partitioned table and Bucket Map Join <http://markmail.org/message/5vwyo5wfleozqzjh>[image: permalink] <http://markmail.org/message/5vwyo5wfleozqzjh>From:Matthew Dixon (matt...@ jagex.com)Date:Jan 30, 2015 1:17:35 amList:*org.apache.hadoop.hive-user* Not sure if this is going to so

RE: COMMERCIAL:Re: Partitioned table and Bucket Map Join

2015-01-30 Thread Matthew Dixon
for each table in a subquery (before joining) might encourage hive to join them efficiently in the subsequent stage... From: murali parimi [mailto:muralikrishna.par...@icloud.com] Sent: 29 January 2015 18:56 To: user@hive.apache.org Cc: user Subject: COMMERCIAL:Re: Partitioned table and Bucket Map

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread murali parimi
agreed! On Jan 29, 2015, at 11:42 PM, matshyeq wrote: no confusion here. My use case is exactly the same. 1. What I was saying is my/your join condition looks like (or should look like, in your terms): FROM A JOIN B ON A.X = B.X AND A.Y = B.Y which should trigger merge bucket map join in my

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread matshyeq
no confusion here. My use case is exactly the same. 1. What I was saying is my/your join condition looks like (or should look like, in your terms): FROM A JOIN B ON A.X = B.X AND A.Y = B.Y which should trigger merge bucket map join in my opinion: Data locality information is full - you may look a

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread murali parimi
Hello apologize for the confusion. Here I will iterate the problem again. I have two tables A, B which are partitioned on column X and bucketed (Same number of buckets) based on column Y. Table A is huge in terms of size (~135GB) and Table B is smaller table in terms of size (33GB). Both the ta

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread matshyeq
My hunch is while partitioning is in fact very similar to bucketing (actually superior as you have some control over what file data goes to) the hive optimizer only applies bucket joins if your tables are bucketed so your join condition t1.bucketed_column = t2.bucketed_column triggers the bucket

Re: Partitioned table and Bucket Map Join

2015-01-29 Thread murali parimi
I faced the same situation where two tables with 3 billion records on each side and partitioned, sorted on same key. Set the following parameters in the hive query assuming the join will happen in the map phase. set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set h

Partitioned table and Bucket Map Join

2015-01-29 Thread matshyeq
I do have two tables partitioned on the same criteria. Could I still take advantage of Bucket Map Join or better, Sort Merge Bucket Map Join? How? ~Maciek