[ https://issues.apache.org/jira/browse/HIVE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-20868: --------------------------- Description: In MapRecordProcessor::getFinalOp() due to external cause(not known), the TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to this, the fetchDone remains set to true for the DummyOp which was set by previous task. Ideally, fetchDone should be reset for each task. This eventually leads to the join op skip rows from that dummy op resulting in wrong results. Good init order {code} 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = TS[3] (core) 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = FIL[24] 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = SEL[5] 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = DUMMY_STORE[45] 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating children of dummy op DUMMY_STORE[45] 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp returns DUMMY_STORE[45] 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: InitProcessor : setting fetchDone to false {code} Bad init order {code} 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = TS[3] (core) 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = FIL[24] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = SEL[5] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = DUMMY_STORE[45] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating children of dummy op DUMMY_STORE[45] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Child of Dummy Op MERGEJOIN[44] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = MERGEJOIN[44] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = SEL[13] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = RS[14] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp returns RS[14] {code} was:In MapRecordProcessor::getFinalOp() due to external cause(not known), the TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to this, the fetchDone remains set to true for the DummyOp which was set by previous task. Ideally, fetchDone should be reset for each task. This eventually leads to the join op skip rows from that dummy op resulting in wrong results. > SMB Join fails intermittently when TezDummyOperator has child op in > getFinalOp in MapRecordProcessor > ---------------------------------------------------------------------------------------------------- > > Key: HIVE-20868 > URL: https://issues.apache.org/jira/browse/HIVE-20868 > Project: Hive > Issue Type: Bug > Reporter: Deepak Jaiswal > Assignee: Deepak Jaiswal > Priority: Major > Attachments: HIVE-20868.1.patch > > > In MapRecordProcessor::getFinalOp() due to external cause(not known), the > TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to > this, the fetchDone remains set to true for the DummyOp which was set by > previous task. Ideally, fetchDone should be reset for each task. This > eventually leads to the join op skip rows from that dummy op resulting in > wrong results. > Good init order > {code} > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = TS[3] (core) > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = FIL[24] > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = SEL[5] > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = DUMMY_STORE[45] > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating > children of dummy op DUMMY_STORE[45] > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp returns DUMMY_STORE[45] > 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: > InitProcessor : setting fetchDone to false > {code} > Bad init order > {code} > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = TS[3] (core) > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = FIL[24] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = SEL[5] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = DUMMY_STORE[45] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > Iterating children of dummy op DUMMY_STORE[45] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Child of > Dummy Op MERGEJOIN[44] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = MERGEJOIN[44] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = SEL[13] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp child Ops = RS[14] > 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: > getFinalOp returns RS[14] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)