[
https://issues.apache.org/jira/browse/IMPALA-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062468#comment-18062468
]
Csaba Ringhofer commented on IMPALA-14792:
------------------------------------------
Flamegraphs were created with
asprof -e cpu -d 15 -i 10000 -o flamegraph `pgrep catalogd`
meanwhile in impala shell for already loaded 1M file Iceberg table:
alter table bigice set tblproperties("a"="b");
This operation leads to reloading the table but reusing all file descriptors as
only a table property was changed.
> Incremental updates of Iceberg tables is slow even with 0 new files
> -------------------------------------------------------------------
>
> Key: IMPALA-14792
> URL: https://issues.apache.org/jira/browse/IMPALA-14792
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Csaba Ringhofer
> Priority: Major
> Labels: iceberg
> Attachments: incremental_after.html, incremental_before.html
>
>
> Noticed with a very big Iceberg table (25K partitions, ~1M files) that
> incremental refresh is not much faster than full load.
> Full reload: 13s
> {code}
> 0227 11:35:53.533893 2966519 IcebergFileMetadataLoader.java:311]
> 194b9270d0ad58d9:d277d65700000000] Collected 956989 Iceberg content files
> into 25000 partitions. Duration: 2s747ms
> I20260227 11:35:53.533991 2966519 ParallelFileMetadataLoader.java:230]
> 194b9270d0ad58d9:d277d65700000000] Parallel Iceberg file metadata listing
> using a thread pool of size 5
> I20260227 11:36:03.145077 2966519 IcebergTable.java:548]
> 194b9270d0ad58d9:d277d65700000000] Loaded file and block metadata for
> default.bigice. Time taken: 13s166ms
> {code}
> reload after ALTER TABLE SET TBLPROPERTY: 9s
> {code}
> I20260227 11:25:19.225029 2964808 HdfsTable.java:1311]
> 50478fc5b1d7a62f:4609a73e00000000] Incrementally loaded table metadata for:
> default.bigice
> I20260227 11:25:22.142279 2964808 IcebergFileMetadataLoader.java:311]
> 50478fc5b1d7a62f:4609a73e00000000] Collected 0 Iceberg content files into 0
> partitions. Duration: 21.044us
> I20260227 11:25:26.278229 2964808 IcebergTable.java:548]
> 50478fc5b1d7a62f:4609a73e00000000] Loaded file and block metadata for
> default.bigice. Time taken: 8s835msI20260227
> {code}
> Based on some random jstacks most time is spent dealing with pathes:
> {code}
> java.lang.Thread.State: RUNNABLE
> at java.net.URI$Parser.scan([email protected]/URI.java:3082)
> at java.net.URI$Parser.parseAuthority([email protected]/URI.java:3261)
> at
> java.net.URI$Parser.parseHierarchical([email protected]/URI.java:3221)
> at java.net.URI$Parser.parse([email protected]/URI.java:3177)
> at java.net.URI.<init>([email protected]/URI.java:781)
> at org.apache.hadoop.fs.Path.initialize(Path.java:259)
> at org.apache.hadoop.fs.Path.<init>(Path.java:220)
> at
> org.apache.impala.catalog.IcebergFileMetadataLoader.getOldFd(IcebergFileMetadataLoader.java:359)
> at
> org.apache.impala.catalog.IcebergFileMetadataLoader.loadContentFilesWithOldFds(IcebergFileMetadataLoader.java:188)
> at
> org.apache.impala.catalog.IcebergFileMetadataLoader.loadInternal(IcebergFileMetadataLoader.java:130)
> at
> org.apache.impala.catalog.IcebergFileMetadataLoader.load(IcebergFileMetadataLoader.java:98)
> at
> org.apache.impala.catalog.IcebergTable.loadFileMetadata(IcebergTable.java:534)
> at org.apache.impala.catalog.IcebergTable.load(IcebergTable.java:467)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]