Quanlong Huang created IMPALA-14169:
---------------------------------------
Summary: No needs to collect writeIdList of txn tables in local
catalog mode catalog updates
Key: IMPALA-14169
URL: https://issues.apache.org/jira/browse/IMPALA-14169
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Saw a case that catalog update thread is slow in collecting a long writeIdList
of a transactional table. Jstack:
{noformat}
"Thread-14 [Getting catalog delta from version 3628]" #77 prio=5 os_prio=0
cpu=2123339.81ms elapsed=4329.08s tid=0x00000000100d7b80 nid=0x2c5d53 runnable
[0x00007f5158efd000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf([email protected]/Arrays.java:3481)
at java.util.ArrayList.toArray([email protected]/ArrayList.java:369)
at com.google.common.primitives.Longs.toArray(Longs.java:674)
at
org.apache.impala.hive.common.MutableValidReaderWriteIdList.getInvalidWriteIds(MutableValidReaderWriteIdList.java:196)
at
org.apache.impala.catalog.Hive3MetastoreShimBase.convertToTValidWriteIdList(Hive3MetastoreShimBase.java:537)
at
org.apache.impala.catalog.HdfsTable.getTHdfsTable(HdfsTable.java:2449)
at
org.apache.impala.catalog.HdfsTable.toThriftWithMinimalPartitions(HdfsTable.java:2147)
at
org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDeltaHelper(CatalogServiceCatalog.java:1691)
at
org.apache.impala.catalog.CatalogServiceCatalog.lockTableAndAddToCatalogDelta(CatalogServiceCatalog.java:1575)
at
org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDelta(CatalogServiceCatalog.java:1517)
at
org.apache.impala.catalog.CatalogServiceCatalog.addDatabaseToCatalogDelta(CatalogServiceCatalog.java:1377)
at
org.apache.impala.catalog.CatalogServiceCatalog.getCatalogDelta(CatalogServiceCatalog.java:1028)
at
org.apache.impala.service.JniCatalog.lambda$getCatalogDelta$2(JniCatalog.java:287)
at
org.apache.impala.service.JniCatalog$$Lambda$212/0x00007f5166601e88.call(Unknown
Source)
at
org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
at
org.apache.impala.service.JniCatalogOp$$Lambda$214/0x00007f5166602708.call(Unknown
Source)
at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
at
org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:231)
at
org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:245)
at
org.apache.impala.service.JniCatalog.getCatalogDelta(JniCatalog.java:286{noformat}
This slow down the whole cluster to get new metadata updates.
The writeIdList is actually not used in sending updates in local catalog mode.
Only the type, catalog_version, last_modified_time_ms, db_name and tbl_name are
used:
{code:java}
private TCatalogObject getMinimalObjectForV2(TCatalogObject obj) {
Preconditions.checkState(topicMode_ == TopicMode.MINIMAL ||
topicMode_ == TopicMode.MIXED);
TCatalogObject min = new TCatalogObject(obj.type, obj.catalog_version);
min.setLast_modified_time_ms(obj.last_modified_time_ms);
switch (obj.type) {
...
case TABLE:
case VIEW:
min.setTable(new TTable(obj.table.db_name, obj.table.tbl_name));{code}
So in local catalog mode, getCatalogDelta() should skip collecting such unused
fields of a table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]