Hello Zoltan Borok-Nagy, Noemi Pap-Takacs, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24049
to look at the new patch set (#4).
Change subject: POC: LocalIcebergTable loads files in coordinator
......................................................................
POC: LocalIcebergTable loads files in coordinator
If load_iceberg_files_in_coordinator=true, load files on
the coordinator for Iceberg tables instead of getting from
the catalogd. This is inefficient at the moment as catalogd
still loads the files but not really uses them. The long
term goal is to load only minimal info for the table on
catalogd side.
Instead of caching TPartialTableInfo this solution caches
IcebergFileContentStore + hostIndex pair. This would be
suitable for REST catalog too if the key contained
snapshot ID instead of catalog version.
Incremental table loading is implemented by storing a
weak pointer to the last used IcebergFileContentStore for
each table in the cache. When a new IcebergFileContentStore
is requested for the table, this weak pointer is looked
up and if found, the old partition and files lists are
reused, similarly to existing implementation in catalogd.
Examples are for 1M files, 25K partitions Iceberg table
on my dev machine.
Pros:
- File descs are not transferred in getPartialCatalogObject RPC
- Size of cache objects seem to decrease:
Before:
543 org.apache.impala.thrift.TPartialTableInfo
0.5MB iceberg.BaseTable
After
431 org.apache.impala.catalog.IcebergFileContentStore
3MB org.apache.impala.thrift.TPartialTableInfo
0.5MB org.apache.iceberg.BaseTable
- Plans look faster due to skipping construction of
IcebergFileContentStore:
DESCRIBE t: 1s->10ms
EXPLAIN SELECT * FROM t : 3s->1.5s
EXPLAIN SELECT * FROM t WHERE part_col=100: 1s->0.3s
- Probably needs to worry less about inconsistant metadata
exceptions, as the old file list remains loadable even after
catalogd updated to a newer version.
Cons:
- Initial table loading is ~doubled as both the catalog and
the coordinator need to load the files.
- Multiple coordinators will all load files, increasing
Namenode pressure.
With the exception of the last one all cons seem solvable.
Change-Id: I6732af76a2e040fa57e39260302951466037b934
---
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/IcebergMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProviderDecorator.java
M fe/src/main/java/org/apache/impala/catalog/local/MultiMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M
fe/src/test/java/org/apache/impala/catalog/local/MetaProviderDecoratorTest.java
14 files changed, 245 insertions(+), 39 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/24049/4
--
To view, visit http://gerrit.cloudera.org:8080/24049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6732af76a2e040fa57e39260302951466037b934
Gerrit-Change-Number: 24049
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>