Hi Guys,

I wonder if you could help me.

I have a huge Hive table partitioned by some field. It has thousands of 
partitions.
Now I have another small table containing tens of partitions id. I'd like to 
get the data only from those partitions.

However when I run
Select * from A join B on (A.partition_id = B.partition_id),
It reads all data from A, then from B and on reduce stage performs join.

I tried /*+ MAPJOIN*/ it ran faster sparing reduce operation, but still read 
the whole A table.

Is there a more efficient way to perform the query w/o reading the whole A 
content?


Thanks
Dima

Reply via email to