Can you please provide us more details: Number of rows in each table and per partition, the table structure, hive version, table format, is table sorted or partitioned on dt?
Why don’t you use a join, potentially with a mapjoin hint? > Am 19.12.2018 um 09:02 schrieb Prabhakar Reddy <prabha.cl...@gmail.com>: > > Hello, > > I have a table large_table with more than 50K partitions and when I run below > query it is running for ever.The other table small_table2 has only five > partitions and when ever I run below query it seems to be scanning all > partitions rather than scanning only five partitions which are there in > smaller table. > > select * from large_table a where a.dt in (select dt from small_table2) > limit 5; > > Could you please confirm if this is the expected behavior or any way we can > tune this query to fetch results faster? > > Regards > Prabhakar Reddy