Hi Guys, I wonder if you could help me.
I have a huge Hive table partitioned by some field. It has thousands of partitions. Now I have another small table containing tens of partitions id. I'd like to get the data only from those partitions. However when I run Select * from A join B on (A.partition_id = B.partition_id), It reads all data from A, then from B and on reduce stage performs join. I tried /*+ MAPJOIN*/ it ran faster sparing reduce operation, but still read the whole A table. Is there a more efficient way to perform the query w/o reading the whole A content? Thanks Dima