Mohit Sabharwal created HIVE-13884:
--------------------------------------

             Summary: Disallow queries fetching more than a configured number 
of partitions in PartitionPruner
                 Key: HIVE-13884
                 URL: https://issues.apache.org/jira/browse/HIVE-13884
             Project: Hive
          Issue Type: Improvement
            Reporter: Mohit Sabharwal
            Assignee: Mohit Sabharwal


Currently the PartitionPruner requests either all partitions or partitions 
based on filter expression. In either scenarios, if the number of partitions 
accessed is large there can be significant memory pressure at the HMS server 
end.

We already have a config {{hive.limit.query.max.table.partition}} that enforces 
limits on number of partitions that may be scanned per operator. But this check 
happens after the PartitionPruner has already fetched all partitions.

We should add an option at PartitionPruner level to disallow queries that 
attempt to access number of partitions beyond a configurable limit.

Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
filter in PartitionPruner, but this check accepts any query with a pruning 
condition, even if partitions fetched are large. In multi-tenant environments, 
admins could use more control w.r.t. number of partitions allowed based on HMS 
memory capacity.

One option is to have PartitionPruner first fetch the partition names (instead 
of partition specs) and throw an exception if number of partitions exceeds the 
configured value. Otherwise, fetch the partition specs.

Looks like the existing {{listPartitionNames}} call could be used if extended 
to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to