[
https://issues.apache.org/jira/browse/IMPALA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-13789:
------------------------------------
Description:
When loading file metadata of a table, we create several Java Maps that using
org.apache.hadoop.fs.Path as the key type, e.g. in ParallelFileMetadataLoader:
{code:java}
private final Map<Path, FileMetadataLoader> loaders_;
private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
Keeping these Path objects in memory is expensive as there are as many of them
as the number of partitions.
The following histogram shows that 4.3M of such Path objects takes 3GB in
memory:
!histogram_path_objects.png|width=737,height=463!
We can use the String of the partition location as the key type and only create
Path objects when loading that partition.
was:
When loading file metadata of a table, we create several Java Maps that using
org.apache.hadoop.fs.Path as the key type, e.g. in ParallelFileMetadataLoader:
{code:java}
private final Map<Path, FileMetadataLoader> loaders_;
private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
Keeping these Path objects in memory is expensive as there are as many of them
as the number of partitions.
The following histogram shows that 4.3M of such Path objects takes 3GB in
memory:
!histogram_path_objects.png!
We can use the String of the partition location as the key type and only create
Path objects when loading that partition.
> Avoid holding lots of org.apache.hadoop.fs.Path objects in memory
> -----------------------------------------------------------------
>
> Key: IMPALA-13789
> URL: https://issues.apache.org/jira/browse/IMPALA-13789
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments: histogram_path_objects.png
>
>
> When loading file metadata of a table, we create several Java Maps that using
> org.apache.hadoop.fs.Path as the key type, e.g. in ParallelFileMetadataLoader:
> {code:java}
> private final Map<Path, FileMetadataLoader> loaders_;
> private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
> Keeping these Path objects in memory is expensive as there are as many of
> them as the number of partitions.
> The following histogram shows that 4.3M of such Path objects takes 3GB in
> memory:
> !histogram_path_objects.png|width=737,height=463!
> We can use the String of the partition location as the key type and only
> create Path objects when loading that partition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]