[ https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marton Bod reassigned HIVE-25843: --------------------------------- > Add flag to disable Iceberg FileIO config serialization > ------------------------------------------------------- > > Key: HIVE-25843 > URL: https://issues.apache.org/jira/browse/HIVE-25843 > Project: Hive > Issue Type: Improvement > Reporter: Marton Bod > Assignee: Marton Bod > Priority: Major > > Hive serializes the Iceberg table object into each individual split. Since > the FileIO is part of the Iceberg table and it has its own hadoop > configuration, this configuration will be the dominant factor determining the > size of the serialized split. In our tests we have found that due to this > serialized config, iceberg splits are 15-20x larger than normal Hive splits > (which led to OOM in some of our perf tests). > This PR proposes to introduce a config which can turn off this config > serialization, and let the deserializer-side fill out the config values > instead (which works for Hive executors, since they have all the config > values in hand). This can reduce the Iceberg split size by ~20x based on > local tests. -- This message was sent by Atlassian Jira (v8.20.1#820001)