[ 
https://issues.apache.org/jira/browse/IMPALA-14583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042202#comment-18042202
 ] 

Zoltán Borók-Nagy commented on IMPALA-14583:
--------------------------------------------

Linked IMPALA-11402 which is about how we deal with this problem with legacy 
tables.

Especially this CR is interesting: [https://gerrit.cloudera.org/#/c/22559/]

For Iceberg tables it would work a bit differently, as we handle partitions 
differently. But at least we should use the same flag 
"catalog_partial_fetch_max_files" to control the number of files.

> Limit the number of file descriptors per RPC to avoid JVM OOM in Catalog
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-14583
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14583
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>            Reporter: Noémi Pap-Takács
>            Assignee: Mihaly Szjatinya
>            Priority: Critical
>              Labels: Catalog, OOM, impala-iceberg
>
> We often get OOM error when Impala tries to load a very large Iceberg table. 
> This happens because the Catalog loads all the file descriptors and sends 
> them to the Coordinator in one RPC, serializing all the file descriptors into 
> one big byte array. However, JVM has a limit on the array length, so trying 
> to send the entire table in one call can exceed this limit if there are too 
> many files in the table.
> We could limit the number of files per RPC, so that the 2GB JVM array limit 
> is not exceeded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to