Csaba Ringhofer created IMPALA-14792:
----------------------------------------

             Summary: Incremental updates of Iceberg tables is slow even with 0 
new files
                 Key: IMPALA-14792
                 URL: https://issues.apache.org/jira/browse/IMPALA-14792
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
            Reporter: Csaba Ringhofer


Noticed with a very big Iceberg table (25K partitions, ~1M files) that 
incremental refresh is not much faster than full load.

Full reload: 13s
{code}
0227 11:35:53.533893 2966519 IcebergFileMetadataLoader.java:311] 
194b9270d0ad58d9:d277d65700000000] Collected 956989 Iceberg content files into 
25000 partitions. Duration: 2s747ms
I20260227 11:35:53.533991 2966519 ParallelFileMetadataLoader.java:230] 
194b9270d0ad58d9:d277d65700000000] Parallel Iceberg file metadata listing using 
a thread pool of size 5
I20260227 11:36:03.145077 2966519 IcebergTable.java:548] 
194b9270d0ad58d9:d277d65700000000] Loaded file and block metadata for 
default.bigice. Time taken: 13s166ms
{code}

reload after ALTER TABLE SET TBLPROPERTY: 9s
{code}
I20260227 11:25:19.225029 2964808 HdfsTable.java:1311] 
50478fc5b1d7a62f:4609a73e00000000] Incrementally loaded table metadata for: 
default.bigice
I20260227 11:25:22.142279 2964808 IcebergFileMetadataLoader.java:311] 
50478fc5b1d7a62f:4609a73e00000000] Collected 0 Iceberg content files into 0 
partitions. Duration: 21.044us
I20260227 11:25:26.278229 2964808 IcebergTable.java:548] 
50478fc5b1d7a62f:4609a73e00000000] Loaded file and block metadata for 
default.bigice. Time taken: 8s835msI20260227
{code}

Based on some random jstacks most time is spent dealing with pathes:


{code}
   java.lang.Thread.State: RUNNABLE
        at java.net.URI$Parser.scan([email protected]/URI.java:3082)
        at java.net.URI$Parser.parseAuthority([email protected]/URI.java:3261)
        at 
java.net.URI$Parser.parseHierarchical([email protected]/URI.java:3221)
        at java.net.URI$Parser.parse([email protected]/URI.java:3177)
        at java.net.URI.<init>([email protected]/URI.java:781)
        at org.apache.hadoop.fs.Path.initialize(Path.java:259)
        at org.apache.hadoop.fs.Path.<init>(Path.java:220)
        at 
org.apache.impala.catalog.IcebergFileMetadataLoader.getOldFd(IcebergFileMetadataLoader.java:359)
        at 
org.apache.impala.catalog.IcebergFileMetadataLoader.loadContentFilesWithOldFds(IcebergFileMetadataLoader.java:188)
        at 
org.apache.impala.catalog.IcebergFileMetadataLoader.loadInternal(IcebergFileMetadataLoader.java:130)
        at 
org.apache.impala.catalog.IcebergFileMetadataLoader.load(IcebergFileMetadataLoader.java:98)
        at 
org.apache.impala.catalog.IcebergTable.loadFileMetadata(IcebergTable.java:534)
        at org.apache.impala.catalog.IcebergTable.load(IcebergTable.java:467)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to