[ https://issues.apache.org/jira/browse/HIVE-26110?focusedWorklogId=752721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752721 ]
ASF GitHub Bot logged work on HIVE-26110: ----------------------------------------- Author: ASF GitHub Bot Created on: 05/Apr/22 09:00 Start Date: 05/Apr/22 09:00 Worklog Time Spent: 10m Work Description: szlta commented on code in PR #3174: URL: https://github.com/apache/hive/pull/3174#discussion_r842544398 ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java: ########## @@ -648,7 +648,12 @@ public ReduceSinkOperator getReduceSinkOp(List<Integer> partitionPositions, List ArrayList<ExprNodeDesc> partCols = Lists.newArrayList(); for (Function<List<ExprNodeDesc>, ExprNodeDesc> customSortExpr : customSortExprs) { - keyCols.add(customSortExpr.apply(allCols)); + ExprNodeDesc colExpr = customSortExpr.apply(allCols); + // Custom sort expressions are marked as KEYs, which is required for sorting the rows that are going for + // a particular reducer instance. They also need to be marked as 'partition' columns for MapReduce shuffle + // phase, in order to gather the same keys to the same reducer instances. + keyCols.add(colExpr); + partCols.add(colExpr); Review Comment: Thx! Issue Time Tracking ------------------- Worklog Id: (was: 752721) Time Spent: 50m (was: 40m) > bulk insert into partitioned table creates lots of files in iceberg > ------------------------------------------------------------------- > > Key: HIVE-26110 > URL: https://issues.apache.org/jira/browse/HIVE-26110 > Project: Hive > Issue Type: Bug > Reporter: Rajesh Balamohan > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > For e.g, create web_returns table in tpcds in iceberg format and try to copy > over data from regular table. More like "insert into web_returns_iceberg as > select * from web_returns". > This inserts the data correctly, however there are lot of files present in > each partition. IMO, dynamic sort optimisation isn't working fine and this > causes records not to be grouped in the final phase. -- This message was sent by Atlassian Jira (v8.20.1#820001)