[ 
https://issues.apache.org/jira/browse/IMPALA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13789:
------------------------------------
    Description: 
When loading file metadata of a table, we create several Java Maps that using 
org.apache.hadoop.fs.Path as the key type, e.g. in 
[ParallelFileMetadataLoader|https://github.com/apache/impala/blob/cfeb57c128c7f514f3433a0399966f46a49a1a4a/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L74-L75]:
{code:java}
  private final Map<Path, FileMetadataLoader> loaders_;
  private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
Keeping these Path objects in memory is expensive as there are as many of them 
as the number of partitions.

The following histogram shows that 4.3M of such Path objects takes 3GB in 
memory:
!histogram_path_objects.png|width=737,height=463!

Here is an example Path object which takes 704 bytes. The actual partition 
location string just takes 160 bytes. The other space are wasted by fields of 
java.net.URI:

!path_example.png|width=818,height=287!

We can use the String of the partition location as the key type and only create 
Path objects when loading that partition.

  was:
When loading file metadata of a table, we create several Java Maps that using 
org.apache.hadoop.fs.Path as the key type, e.g. in 
[ParallelFileMetadataLoader|https://github.com/apache/impala/blob/cfeb57c128c7f514f3433a0399966f46a49a1a4a/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L74-L75]:
{code:java}
  private final Map<Path, FileMetadataLoader> loaders_;
  private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
Keeping these Path objects in memory is expensive as there are as many of them 
as the number of partitions.

The following histogram shows that 4.3M of such Path objects takes 3GB in 
memory:
!histogram_path_objects.png|width=737,height=463!

We can use the String of the partition location as the key type and only create 
Path objects when loading that partition.


> Avoid holding lots of org.apache.hadoop.fs.Path objects in memory
> -----------------------------------------------------------------
>
>                 Key: IMPALA-13789
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13789
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: histogram_path_objects.png, path_example.png
>
>
> When loading file metadata of a table, we create several Java Maps that using 
> org.apache.hadoop.fs.Path as the key type, e.g. in 
> [ParallelFileMetadataLoader|https://github.com/apache/impala/blob/cfeb57c128c7f514f3433a0399966f46a49a1a4a/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L74-L75]:
> {code:java}
>   private final Map<Path, FileMetadataLoader> loaders_;
>   private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
> Keeping these Path objects in memory is expensive as there are as many of 
> them as the number of partitions.
> The following histogram shows that 4.3M of such Path objects takes 3GB in 
> memory:
> !histogram_path_objects.png|width=737,height=463!
> Here is an example Path object which takes 704 bytes. The actual partition 
> location string just takes 160 bytes. The other space are wasted by fields of 
> java.net.URI:
> !path_example.png|width=818,height=287!
> We can use the String of the partition location as the key type and only 
> create Path objects when loading that partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to