[ 
https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992287#comment-15992287
 ] 

Rui Li commented on HIVE-16552:
-------------------------------

Hi [~xuefuz], could you give some example when we want to put a limit on the 
number of tasks? Because in my opinion, a user's share of YARN resources is 
determined by the number of slots instead of number of tasks (although more 
tasks means the slots will be held for longer time). And the number of slots 
can be controlled by things like {{spark.executor.memory}} and 
{{spark.executor.instances}}. Besides, what should we recommend to user for 
this new config, given that the #reducers is automatically set by Hive?

> Limit the number of tasks a Spark job may contain
> -------------------------------------------------
>
>                 Key: HIVE-16552
>                 URL: https://issues.apache.org/jira/browse/HIVE-16552
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-16552.1.patch, HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN 
> resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
> stop a query that invokes a Spark job that contains too many tasks. The 
> proposal here is to introduce hive.spark.job.max.tasks with a default value 
> of -1 (no limit), which an admin can set to block queries that trigger too 
> many spark tasks.
> Please note that this control knob applies to a spark job, though it's 
> possible that one query can trigger multiple Spark jobs (such as in case of 
> map-join). Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to