[ https://issues.apache.org/jira/browse/HIVE-26319?focusedWorklogId=782016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782016 ]
ASF GitHub Bot logged work on HIVE-26319: ----------------------------------------- Author: ASF GitHub Bot Created on: 16/Jun/22 11:11 Start Date: 16/Jun/22 11:11 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3362: URL: https://github.com/apache/hive/pull/3362#discussion_r898969956 ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ########## @@ -127,14 +130,23 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { .run(output -> { Table table = HiveIcebergStorageHandler.table(context.getJobConf(), output); if (table != null) { - HiveIcebergWriter writer = writers.get(output); + Collection<DataFile> dataFiles = Lists.newArrayList(); + Collection<DeleteFile> deleteFiles = Lists.newArrayList(); String fileForCommitLocation = generateFileForCommitLocation(table.location(), jobConf, - attemptID.getJobID(), attemptID.getTaskID().getId()); - if (writer != null) { - createFileForCommit(writer.files(), fileForCommitLocation, table.io()); - } else { + attemptID.getJobID(), attemptID.getTaskID().getId()); + if (writers.get(output) != null) { + for (HiveIcebergWriter writer : writers.get(output)) { + if (writer != null) { + dataFiles.addAll(writer.files().dataFiles()); Review Comment: I found this usage: https://github.com/apache/hive/blob/67c2d4910ff17c694653eb8bd9c9ed2405cec38b/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/writer/HiveIcebergWriterBase.java#L59 `HiveIcebergWriter` does not have `dataFiles()`, `deleteFiles()` methods and it can be a `HiveIcebergRecordWriter`, `HiveIcebergDeleteWriter` etc which treats data and delete files a different way. If we want to avoid creating the `FilesForCommit` object creation to replace `HiveIcebergWriter.files()` * create a method like `HiveIcebergWriter.collectFiles(List<> dataFiles, List<> deleteFiles)` or * create dataFiles(), deleteFiles() methods. I prefer wrapping the returned lists into unmodifiableList which is also a new object creation. Which do you prefer? On the other hand I don't think creating the `FilesForCommit` objects is critical. These are created only when a result of a statement should be committed/aborted not per record basis Issue Time Tracking ------------------- Worklog Id: (was: 782016) Time Spent: 1h 10m (was: 1h) > Iceberg integration: Perform update split early > ----------------------------------------------- > > Key: HIVE-26319 > URL: https://issues.apache.org/jira/browse/HIVE-26319 > Project: Hive > Issue Type: Improvement > Components: File Formats > Reporter: Krisztian Kasa > Assignee: Krisztian Kasa > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Extend update split early to iceberg tables like in HIVE-21160 for native > acid tables -- This message was sent by Atlassian Jira (v8.20.7#820007)