[ 
https://issues.apache.org/jira/browse/IMPALA-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954481#comment-17954481
 ] 

ASF subversion and git services commented on IMPALA-14089:
----------------------------------------------------------

Commit b37f4509fa03359be77bd7966e40cb2ffd1ec3e4 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b37f4509f ]

IMPALA-14089: Support REFRESH on multiple partitions

Currently we just support REFRESH on the whole table or a specific
partition:
  REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])]

If users want to refresh multiple partitions, they have to submit
multiple statements each for a single partition. This has some
drawbacks:
 - It requires holding the table write lock inside catalogd multiple
   times, which increase lock contention with other read/write
   operations on the same table, e.g. getPartialCatalogObject requests
   from coordinators.
 - Catalog version of the table will be increased multiple times.
   Coordinators in local catalog mode is more likely to see different
   versions between their getPartialCatalogObject requests so have to
   retry planning to resolve InconsistentMetadataFetchException.
 - Partitions are reloaded in sequence. They should be reloaded in
   parallel like we do in refreshing the whole table.

This patch extends the syntax to refresh multiple partitions in one
statement:
  REFRESH [db_name.]table_name
  [PARTITION (key_col1=val1 [, key_col2=val2...])
   [PARTITION (key_col1=val3 [, key_col2=val4...])...]]
Example:
  REFRESH foo PARTITION(p=0) PARTITION(p=1) PARTITION(p=2);

TResetMetadataRequest is extended to have a list of partition specs for
this. If the list has only one item, we still use the existing logic of
reloading a specific partition. If the list has more than one item,
partitions will be reloaded in parallel. This is implemented in
CatalogServiceCatalog#reloadTable(). Previously it always invokes
HdfsTable#load() with partitionsToUpdate=null. Now the parameter is
set when TResetMetadataRequest has the partition list.

HMS notification events in RELOAD type will be fired for each partition
if enable_reload_events is turned on. Once HIVE-28967 is resolved, we
can fire a single event for multiple partitions.

Updated docs in impala_refresh.xml.

Tests:
 - Added FE and e2e tests

Change-Id: Ie5b0deeaf23129ed6e1ba2817f54291d7f63d04e
Reviewed-on: http://gerrit.cloudera.org:8080/22938
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Support REFRESH on multiple partitions
> --------------------------------------
>
>                 Key: IMPALA-14089
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14089
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Catalog, Frontend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> This is the first step for IMPALA-4105 to help users combine multiple 
> partition level REFRESH statements into one:
> {code:sql}
> REFRESH <table>
>   PARTITION (...)
>   PARTITION (...)
>   PARTITION (...)
>   ...
> {code}
> This reduces table lock contention in catalogd side and avoids the table 
> version being increased multiple times. Also improves the overall performance 
> by loading partitions in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to