Quanlong Huang created IMPALA-14169:
---------------------------------------

             Summary: No needs to collect writeIdList of txn tables in local 
catalog mode catalog updates
                 Key: IMPALA-14169
                 URL: https://issues.apache.org/jira/browse/IMPALA-14169
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang


Saw a case that catalog update thread is slow in collecting a long writeIdList 
of a transactional table. Jstack:
{noformat}
"Thread-14 [Getting catalog delta from version 3628]" #77 prio=5 os_prio=0 
cpu=2123339.81ms elapsed=4329.08s tid=0x00000000100d7b80 nid=0x2c5d53 runnable  
[0x00007f5158efd000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Arrays.copyOf([email protected]/Arrays.java:3481)
        at java.util.ArrayList.toArray([email protected]/ArrayList.java:369)
        at com.google.common.primitives.Longs.toArray(Longs.java:674)
        at 
org.apache.impala.hive.common.MutableValidReaderWriteIdList.getInvalidWriteIds(MutableValidReaderWriteIdList.java:196)
        at 
org.apache.impala.catalog.Hive3MetastoreShimBase.convertToTValidWriteIdList(Hive3MetastoreShimBase.java:537)
        at 
org.apache.impala.catalog.HdfsTable.getTHdfsTable(HdfsTable.java:2449)
        at 
org.apache.impala.catalog.HdfsTable.toThriftWithMinimalPartitions(HdfsTable.java:2147)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDeltaHelper(CatalogServiceCatalog.java:1691)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.lockTableAndAddToCatalogDelta(CatalogServiceCatalog.java:1575)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDelta(CatalogServiceCatalog.java:1517)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.addDatabaseToCatalogDelta(CatalogServiceCatalog.java:1377)
        at 
org.apache.impala.catalog.CatalogServiceCatalog.getCatalogDelta(CatalogServiceCatalog.java:1028)
        at 
org.apache.impala.service.JniCatalog.lambda$getCatalogDelta$2(JniCatalog.java:287)
        at 
org.apache.impala.service.JniCatalog$$Lambda$212/0x00007f5166601e88.call(Unknown
 Source) 
        at 
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
        at 
org.apache.impala.service.JniCatalogOp$$Lambda$214/0x00007f5166602708.call(Unknown
 Source)
        at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
        at 
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231)
        at 
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
        at 
org.apache.impala.service.JniCatalog.getCatalogDelta(JniCatalog.java:286{noformat}
This slow down the whole cluster to get new metadata updates.

The writeIdList is actually not used in sending updates in local catalog mode. 
Only the type, catalog_version, last_modified_time_ms, db_name and tbl_name are 
used:
{code:java}
    private TCatalogObject getMinimalObjectForV2(TCatalogObject obj) {
      Preconditions.checkState(topicMode_ == TopicMode.MINIMAL ||
          topicMode_ == TopicMode.MIXED);
      TCatalogObject min = new TCatalogObject(obj.type, obj.catalog_version);
      min.setLast_modified_time_ms(obj.last_modified_time_ms);
      switch (obj.type) {
      ...
      case TABLE:
      case VIEW:
        min.setTable(new TTable(obj.table.db_name, obj.table.tbl_name));{code}
So in local catalog mode, getCatalogDelta() should skip collecting such unused 
fields of a table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to