[ https://issues.apache.org/jira/browse/HIVE-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702338#comment-14702338 ]
Ted Xu commented on HIVE-11576: ------------------------------- Thanks [~gopalv] for looking into this. I'm running on 1TB scale TPC-H. Note that I replaced all int schema with bigint. > Data loss in MapJoin > -------------------- > > Key: HIVE-11576 > URL: https://issues.apache.org/jira/browse/HIVE-11576 > Project: Hive > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Ted Xu > Assignee: Matt McCline > > In query (TPC-H query4) > {code:title=query4.sql|borderStyle=solid} > create table q4_result as > select > o_orderpriority, > count(*) as order_count > from > orders o > join > ( > select > distinct l_orderkey > from > ( > select > * > from > lineitem > where > l_commitdate < l_receiptdate > ) tab1 > ) tab2 > on tab2.l_orderkey = o.o_orderkey > where > o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01' > group by > o_orderpriority > order by > o_orderpriority; > {code} > The query will cause data-loss if MapJoin is enabled. Both side of join have > expected output but some data can't be joined together here. After disabling > auto convert join, the problem is gone. > Context: > l_orderkey & o_orderkey are bigint. > vectorized execution enabled. > execution engine is tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)