[ https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei Zhang updated HIVE-25170: ----------------------------- Description: {code:java} // code placeholder EXPLAIN SELECT constant_col, key, max(value) FROM ( SELECT 'constant' as constant_col, key, value FROM src DISTRIBUTE BY constant_col, key SORT BY constant_col, key, value ) a GROUP BY constant_col, key LIMIT 10; OK Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0 Fetch Operator limit:10 Stage-1 Reducer 3 File Output Operator [FS_10] Limit [LIM_9] (rows=1 width=368) Number of rows:10 Select Operator [SEL_8] (rows=1 width=368) Output:["_col0","_col1","_col2"] Group By Operator [GBY_7] (rows=1 width=368) Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant' <-Reducer 2 [SIMPLE_EDGE] SHUFFLE [RS_6] PartitionCols:'constant', 'constant' Group By Operator [GBY_5] (rows=1 width=368) Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant' Select Operator [SEL_3] (rows=500 width=178) Output:["_col2"] <-Map 1 [SIMPLE_EDGE] SHUFFLE [RS_2] PartitionCols:'constant', _col1 Select Operator [SEL_1] (rows=500 width=178) Output:["_col1","_col2"] TableScan [TS_0] (rows=500 width=10) src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code} Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1 That's because after HIVE-13808, SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns. Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error. In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem. was: {code:java} // code placeholder EXPLAIN SELECT constant_col, key, max(value) FROM ( SELECT 'constant' as constant_col, key, value FROM src DISTRIBUTE BY constant_col, key SORT BY constant_col, key, value ) a GROUP BY constant_col, key LIMIT 10; OK Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0 Fetch Operator limit:10 Stage-1 Reducer 3 File Output Operator [FS_10] Limit [LIM_9] (rows=1 width=368) Number of rows:10 Select Operator [SEL_8] (rows=1 width=368) Output:["_col0","_col1","_col2"] Group By Operator [GBY_7] (rows=1 width=368) Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant' <-Reducer 2 [SIMPLE_EDGE] SHUFFLE [RS_6] PartitionCols:'constant', 'constant' Group By Operator [GBY_5] (rows=1 width=368) Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant' Select Operator [SEL_3] (rows=500 width=178) Output:["_col2"] <-Map 1 [SIMPLE_EDGE] SHUFFLE [RS_2] PartitionCols:'constant', _col1 Select Operator [SEL_1] (rows=500 width=178) Output:["_col1","_col2"] TableScan [TS_0] (rows=500 width=10) src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code} Obviously, the `PartitionCols` in Reducer 2 is wrong. Instead of `'constant', 'constant'`, it should be `'constant', _col1` That's because after HIVE-13808, `SemanticAnalyzer` uses `sortCols` to generate the `colExprMap` structure in the key part, while the key columns are generated by `newSortCols`, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns. > Data error in constant propagation caused by wrong colExprMap generated in > SemanticAnalyzer > ------------------------------------------------------------------------------------------- > > Key: HIVE-25170 > URL: https://issues.apache.org/jira/browse/HIVE-25170 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 3.1.2 > Reporter: Wei Zhang > Assignee: Wei Zhang > Priority: Major > > > {code:java} > // code placeholder > EXPLAIN > SELECT constant_col, key, max(value) > FROM > ( > SELECT 'constant' as constant_col, key, value > FROM src > DISTRIBUTE BY constant_col, key > SORT BY constant_col, key, value > ) a > GROUP BY constant_col, key > LIMIT 10; > OK > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0 > Fetch Operator > limit:10 > Stage-1 > Reducer 3 > File Output Operator [FS_10] > Limit [LIM_9] (rows=1 width=368) > Number of rows:10 > Select Operator [SEL_8] (rows=1 width=368) > Output:["_col0","_col1","_col2"] > Group By Operator [GBY_7] (rows=1 width=368) > > Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', > 'constant' > <-Reducer 2 [SIMPLE_EDGE] > SHUFFLE [RS_6] > PartitionCols:'constant', 'constant' > Group By Operator [GBY_5] (rows=1 width=368) > > Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', > 'constant' > Select Operator [SEL_3] (rows=500 width=178) > Output:["_col2"] > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_2] > PartitionCols:'constant', _col1 > Select Operator [SEL_1] (rows=500 width=178) > Output:["_col1","_col2"] > TableScan [TS_0] (rows=500 width=10) > > src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code} > Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', > 'constant', it should be 'constant', _col1 > > That's because after HIVE-13808, SemanticAnalyzer uses sortCols to generate > the colExprMap structure in the key part, while the key columns are generated > by newSortCols, leading to a column and expr mismatch when the constant > column is not the trailing column in the key columns. > Constant propagation optimizer uses this colExprMap and finds extra const > expression in the mismatched map, resulting in this error. > > In fact, colExprMap is used by multiple optimizers, which makes this quite a > serious problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)