[
https://issues.apache.org/jira/browse/IMPALA-13918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940002#comment-17940002
]
Zoltán Borók-Nagy commented on IMPALA-13918:
--------------------------------------------
Maybe we could add paging for loading file metadata partially. I.e.
coordinators would fetch 1 Million FileDescriptors per call. In most cases it
would be still a single RPC.
> Iceberg Table with 6M files might hit OutOfMemoryError: Requested array size
> exceeds VM limit
> ---------------------------------------------------------------------------------------------
>
> Key: IMPALA-13918
> URL: https://issues.apache.org/jira/browse/IMPALA-13918
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Priority: Critical
>
> TIcebergTable puts all files in TIcebergContentFileStore which might hit
> OutOfMemoryError of "Requested array size exceeds VM limit" when there are
> more than 6M files which requires a byte array larger than 2GB to serialize
> them.
> For non-iceberg tables, we can divide the files into different partitions and
> truncate them in partition level (IMPALA-11402). We need a similar approach
> for Iceberg tables.
> CC [~boroknagyz] , [~daniel.becker] , [~gaborkaszab] , [~noemi]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]