[ 
https://issues.apache.org/jira/browse/IMPALA-13918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940002#comment-17940002
 ] 

Zoltán Borók-Nagy commented on IMPALA-13918:
--------------------------------------------

Maybe we could add paging for loading file metadata partially. I.e. 
coordinators would fetch 1 Million FileDescriptors per call. In most cases it 
would be still a single RPC.

> Iceberg Table with 6M files might hit OutOfMemoryError: Requested array size 
> exceeds VM limit
> ---------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13918
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13918
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Critical
>
> TIcebergTable puts all files in TIcebergContentFileStore which might hit 
> OutOfMemoryError of "Requested array size exceeds VM limit" when there are 
> more than 6M files which requires a byte array larger than 2GB to serialize 
> them.
> For non-iceberg tables, we can divide the files into different partitions and 
> truncate them in partition level (IMPALA-11402). We need a similar approach 
> for Iceberg tables.
> CC [~boroknagyz] , [~daniel.becker] , [~gaborkaszab] , [~noemi] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to