[ https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789191#comment-17789191 ]
Denys Kuzmenko commented on HIVE-22636: --------------------------------------- Checked in latest master. Issue is not reproduced with qtest in the description: returned count = 309 > Data loss on skewjoin for ACID tables. > -------------------------------------- > > Key: HIVE-22636 > URL: https://issues.apache.org/jira/browse/HIVE-22636 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Aditya Shah > Priority: Blocker > Labels: check, hive-4.0.0-must > > I am trying to do a skewjoin and writing the result into a FullAcid table. > The results are incorrect. The issue is similar to seen for MM tables in > HIVE-16051 where the fix was to skip having a skewjoin for MM table. > Steps to reproduce: > Used a qtest similar to HIVE-16051: > {code:java} > --! qt:dataset:src1 > --! qt:dataset:src > -- MASK_LINEAGE > set hive.mapred.mode=nonstrict; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.optimize.skewjoin=true; > set hive.skewjoin.key=2; > set hive.optimize.metadataonly=false; > CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties > ("transactional"="true"); > FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE > skewjoin_acid SELECT src1.key, src2.value; > select count(distinct key) from skewjoin_acid; > drop table skewjoin_acid; > {code} > The expected result for the count was 309 but got 173. > -- This message was sent by Atlassian Jira (v8.20.10#820010)