Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22332 )
Change subject: IMPALA-13656: MERGE redundantly accumulates memory in HDFS WRITER ...................................................................... IMPALA-13656: MERGE redundantly accumulates memory in HDFS WRITER When IcebergMergeImpl created the table sink it didn't set 'inputIsClustered' to true. Therefore HdfsTableSink expected random input and kept the output writers open for every partition, which resulted in high memory consumption and potentially a Memory Limit Exceeded error when the number of partitions are high. Since we actually sort the rows before the sink we can set 'inputIsClustered' to true, which means HdfsTableSink can write files one by one, because whenever it gets a row that belongs to a new partition it knows that it can close the current output writer, and open a new one. Testing: - e2e regression test Change-Id: I7bad0310e96eb482af9d09ba0d41e44c07bf8e4d Reviewed-on: http://gerrit.cloudera.org:8080/22332 Reviewed-by: Peter Rozsa <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/analysis/IcebergMergeImpl.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-merge-partition.test 2 files changed, 24 insertions(+), 1 deletion(-) Approvals: Peter Rozsa: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/22332 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7bad0310e96eb482af9d09ba0d41e44c07bf8e4d Gerrit-Change-Number: 22332 Gerrit-PatchSet: 3 Gerrit-Owner: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]>
