[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154115#comment-14154115 ]
Zhichun Wu commented on HIVE-8151: ---------------------------------- @ [~prasanth_j] , I find that the explain of the insert sql in the testcase diff a little when enable/disable this optimization. After digging into the code it seems that before applying NonBlockingOpDeDupProc optimization, there are three select operators in a row before FileSink operator. NonBlockingOpDeDupProc would try to deduplicate these select operators. Casting _col1 into int before writing to file is lost durning the deduplication process. More precisely, cSELExprNodeDesc backtracks fails due to missing of columnExprMap : {code} ExprNodeDesc newPSELExprNodeDesc = ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL); {code} Here I try to include the columnExprMap in SemanticAnalyzer#genConversionSelectOperator and the testcase passes. Please correct me if I'm wrong. > Dynamic partition sort optimization inserts record wrongly to partition when > used with GroupBy > ---------------------------------------------------------------------------------------------- > > Key: HIVE-8151 > URL: https://issues.apache.org/jira/browse/HIVE-8151 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0, 0.13.1 > Reporter: Prasanth J > Assignee: Prasanth J > Priority: Blocker > Fix For: 0.14.0 > > Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, > HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, > HIVE-8151.8.patch > > > HIVE-6455 added dynamic partition sort optimization. It added startGroup() > method to FileSink operator to look for changes in reduce key for creating > partition directories. This method however is not reliable as the key called > with startGroup() is different from the key called with processOp(). > startGroup() is called with newly changed key whereas processOp() is called > with previously aggregated key. This will result in processOp() writing the > last row of previous group as the first row of next group. This happens only > when used with group by operator. > The fix is to not rely on startGroup() and do the partition directory > creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)