[
https://issues.apache.org/jira/browse/IMPALA-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952100#comment-17952100
]
Quanlong Huang commented on IMPALA-4106:
----------------------------------------
Recently saw dropping 200 partitions takes nearly 2 minutes. Profile:
[^slow_drop_part_profile.txt]
I think we should revive this.
If the concern is on inconsistent exprs between Impala and Hive, I think we can
use the partition names directly. The partition list can be represent as a list
of partition name strings or a list of DropPartitionsExpr, i.e.
RequestPartsSpec is a union:
{noformat}
struct DropPartitionsExpr {
1: required binary expr;
2: optional i32 partArchiveLevel;
}
union RequestPartsSpec {
1: list<string> names;
2: list<DropPartitionsExpr> exprs;
}
// Request type for drop_partitions_req
// TODO: we might want to add "bestEffort" flag; where a subset can fail
struct DropPartitionsRequest {
1: required string dbName,
2: required string tblName,
3: required RequestPartsSpec parts,
4: optional bool deleteData,
5: optional bool ifExists=true, // currently verified on client
6: optional bool ignoreProtection,
7: optional EnvironmentContext environmentContext,
8: optional bool needResult=true,
9: optional string catName,
10: optional bool skipColumnSchemaForPartition
}{noformat}
https://github.com/apache/hive/blob/fa17cd60e3cd5368573cacdbfb6a053cc80ce6ad/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L959
> Use Hive Metastore bulk API for dropping multiple partitions.
> -------------------------------------------------------------
>
> Key: IMPALA-4106
> URL: https://issues.apache.org/jira/browse/IMPALA-4106
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 2.8.0
> Reporter: Alexander Behm
> Priority: Major
> Labels: performance, ramp-up
> Attachments: slow_drop_part_profile.txt
>
>
> IMPALA-1654 added the ability to drop several partitions at once by selecting
> the set of partitions with predicates. In order to make the feature complete
> and usable at scale we should use the Hive Metastore bulk API in the
> CatalogServer to implement the dropping of partitions.
> We should not include IMPALA-1654 in a release unless this improvement is
> also addressed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]