[ https://issues.apache.org/jira/browse/HIVE-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-27142: ---------------------------------- Labels: pull-request-available (was: ) > Map Join not working as expected when joining non-native tables with native > tables > ----------------------------------------------------------------------------------- > > Key: HIVE-27142 > URL: https://issues.apache.org/jira/browse/HIVE-27142 > Project: Hive > Issue Type: Bug > Affects Versions: All Versions > Reporter: Syed Shameerur Rahman > Assignee: Syed Shameerur Rahman > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *1. Issue :* > When *_hive.auto.convert.join=true_* and if the underlying query is trying to > join a large non-native hive table with a small native hive table, The map > join is happening in the wrong side i.e on the map task which process the > small native hive table and it can lead to OOM when the non-native table is > really large and only few map tasks are spawned to scan the small native hive > tables. > > *2. Why is this happening ?* > This happens due to improper stats collection/computation of non native hive > tables. Since the non-native hive tables are actually stored in a different > location which Hive does not know of and only a temporary path which is > visible to Hive while creating a non native table does not store the actual > data, The stats collection logic tend to under estimate the data/rows and > hence causes the map join to happen in the wrong side. > > *3. Potential Solutions* > 3.1 Turn off *_hive.auto.convert.join=false._* This can have a negative > impact of the query if the same query is trying to do multiple joins i.e > one join with non-native tables and other join where both the tables are > native. > 3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> > command before joining native and non-native commands. The user may or may > not choose to do it. > 3.3 Don't collect/estimate stats for non-native hive tables by default > (Preferred solution) -- This message was sent by Atlassian Jira (v8.20.10#820010)