[
https://issues.apache.org/jira/browse/IMPALA-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012945#comment-18012945
]
ASF subversion and git services commented on IMPALA-13453:
----------------------------------------------------------
Commit a7efa7665fc113ec13a53a77efcc2d9138d44330 in impala's branch
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a7efa7665 ]
IMPALA-13453: Avoid reloading partition if it is unchanged
In table level REFRESH, we check whether the partition is actually
changed and skip updating unchanged partitions in catalog. However, in
partition REFRESH, we always drop and add the partition. This leads to
unecessarily dropping the partition metadata, column statistics and
adding them back again. This patch adds a check to verify if the
partition really changed before reloading the partition to avoid
unnecessary drop-add sequence.
Change-Id: I72d5d20fa2532d49313d5e88f2d66f98b9537b2e
Reviewed-on: http://gerrit.cloudera.org:8080/22962
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Quanlong Huang <[email protected]>
> REFRESH <table> PARTITION <partition> always update the partition
> -----------------------------------------------------------------
>
> Key: IMPALA-13453
> URL: https://issues.apache.org/jira/browse/IMPALA-13453
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Sai Hemanth Gantasala
> Priority: Major
>
> In table level REFRESH, we check whether the partition is actually changed
> and skip updating unchanged partitions in catalog:
> [https://github.com/apache/impala/blob/42fda24364786cc1a457890bd212bb3922479e95/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1098-L1101]
> {code:java}
> public void updatePartition(HdfsPartition.Builder partBuilder) throws
> CatalogException {
> HdfsPartition oldPartition = partBuilder.getOldInstance();
> ...
> boolean partitionNotChanged = partBuilder.equalsToOriginal(oldPartition);
> LOG.trace("Partition {} {}", oldPartition.getName(),
> partitionNotChanged ? "changed" : "unchanged");
> if (partitionNotChanged) return;
> HdfsPartition newPartition = partBuilder.build();
> // Partition is reloaded and hence cache directives are not dropped.
> dropPartition(oldPartition, false);
> addPartition(newPartition);
> }{code}
> However, in partition REFRESH, we always drop and add the partition:
> [https://github.com/apache/impala/blob/42fda24364786cc1a457890bd212bb3922479e95/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L3093-L3096]
> {code:java}
> for (Map.Entry<HdfsPartition.Builder, HdfsPartition> entry :
> partBuilderToPartitions.entrySet()) {
> if (entry.getValue() != null) {
> dropPartition(entry.getValue(), false);
> }
> addPartition(entry.getKey().build());
> }{code}
> We should add the same check to avoid updating unchanged partitions.
> CC [~csringhofer], [~hemanth619]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]