Ádám Szita created HIVE-25772: --------------------------------- Summary: Use ClusteredWriter when writing to Iceberg tables Key: HIVE-25772 URL: https://issues.apache.org/jira/browse/HIVE-25772 Project: Hive Issue Type: Improvement Reporter: Ádám Szita Assignee: Ádám Szita
Currently Hive relies on PartitionedFanoutWriter to write records to Iceberg tables. This has a big disadvantage when it comes to writing many-many partitions, as it keeps a file handle open to each of the partitions. For some file systems like S3, there can be a big resource waste due to keeping bytebuffers for each of the handles, that may result in OOM scenarios. ClusteredWriter will only write one file at a time, and will not keep other file handles open, but it will expect that the input data is sorted by partition keys. -- This message was sent by Atlassian Jira (v8.20.1#820001)