Mohit Sabharwal created HIVE-13884:
--------------------------------------
Summary: Disallow queries fetching more than a configured number
of partitions in PartitionPruner
Key: HIVE-13884
URL: https://issues.apache.org/jira/browse/HIVE-13884
Project: Hive
Issue Type: Improvement
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
Currently the PartitionPruner requests either all partitions or partitions
based on filter expression. In either scenarios, if the number of partitions
accessed is large there can be significant memory pressure at the HMS server
end.
We already have a config {{hive.limit.query.max.table.partition}} that enforces
limits on number of partitions that may be scanned per operator. But this check
happens after the PartitionPruner has already fetched all partitions.
We should add an option at PartitionPruner level to disallow queries that
attempt to access number of partitions beyond a configurable limit.
Note that {{hive.mapred.mode=strict}} disallow queries without a partition
filter in PartitionPruner, but this check accepts any query with a pruning
condition, even if partitions fetched are large. In multi-tenant environments,
admins could use more control w.r.t. number of partitions allowed based on HMS
memory capacity.
One option is to have PartitionPruner first fetch the partition names (instead
of partition specs) and throw an exception if number of partitions exceeds the
configured value. Otherwise, fetch the partition specs.
Looks like the existing {{listPartitionNames}} call could be used if extended
to take partition filter expressions like {{getPartitionsByExpr}} call does.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)