> Yes both of these are valid ways of filtering data before join in Hive.
This has several implementation specifics attached to it. If you're looking at
Hive 1.1 or before, it might not work the same way as Vineet mentioned.
In older versions Calcite rewrites aren't triggered, which prevented so
Hi Varun,
Yes both of these are valid ways of filtering data before join in Hive.
As long as the join is not outer and the ON condition is not on non-null
generating side of join Hive planner will try to push the predicate down to
table scan.
In fact Hive goes one step ahead and also generate IS
When performing a join in Hive and then filtering the output with a where
clause, the Hive compiler will try to filter data before the tables are
joined. This is known as predicate pushdown (
http://allabouthadoop.net/what-is-predicate-pushdown-in-hive/)
For example:
SELECT * FROM a JOIN b ON a.s