[ 
https://issues.apache.org/jira/browse/IMPALA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952100#comment-17952100
 ] 

Quanlong Huang commented on IMPALA-4106:
----------------------------------------

Recently saw dropping 200 partitions takes nearly 2 minutes. Profile:  
[^slow_drop_part_profile.txt] 
I think we should revive this.

If the concern is on inconsistent exprs between Impala and Hive, I think we can 
use the partition names directly. The partition list can be represent as a list 
of partition name strings or a list of DropPartitionsExpr, i.e. 
RequestPartsSpec is a union:
{noformat}
struct DropPartitionsExpr {
  1: required binary expr;
  2: optional i32 partArchiveLevel;
}

union RequestPartsSpec {
  1: list<string> names;
  2: list<DropPartitionsExpr> exprs;
}

// Request type for drop_partitions_req
// TODO: we might want to add "bestEffort" flag; where a subset can fail 
struct DropPartitionsRequest {
  1: required string dbName,
  2: required string tblName,
  3: required RequestPartsSpec parts,
  4: optional bool deleteData,
  5: optional bool ifExists=true, // currently verified on client
  6: optional bool ignoreProtection,
  7: optional EnvironmentContext environmentContext,
  8: optional bool needResult=true,
  9: optional string catName,
  10: optional bool skipColumnSchemaForPartition
}{noformat}
https://github.com/apache/hive/blob/fa17cd60e3cd5368573cacdbfb6a053cc80ce6ad/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L959

> Use Hive Metastore bulk API for dropping multiple partitions.
> -------------------------------------------------------------
>
>                 Key: IMPALA-4106
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4106
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>            Priority: Major
>              Labels: performance, ramp-up
>         Attachments: slow_drop_part_profile.txt
>
>
> IMPALA-1654 added the ability to drop several partitions at once by selecting 
> the set of partitions with predicates. In order to make the feature complete 
> and usable at scale we should use the Hive Metastore bulk API in the 
> CatalogServer to implement the dropping of partitions.
> We should not include IMPALA-1654 in a release unless this improvement is 
> also addressed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to