[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

Xuefu Zhang (JIRA) Mon, 01 May 2017 22:00:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992334#comment-15992334
 ]


Xuefu Zhang commented on HIVE-16552:
------------------------------------

[~lirui], while you have a point, but the resource usage doesn't necessarily 
limit to the number of concurrent tasks. For instance, if a query scanning lots 
of partitions can create a spike in NN calls at compile time. On the other 
hand, the large number of tasks usually means more total resource consumption, 
which is also important a resource queue is shared in a team.

I can certainly understand this is rather a poor man's choice when it's 
desirable to block a large or bad query. The debate can also go on for the 
similar configurations for MR. I'm open to better ideas if there are any. Plus, 
for those who don't care or need this, the default value would just work as if 
the configuration didn't exist.

> Limit the number of tasks a Spark job may contain
> -------------------------------------------------
>
>                 Key: HIVE-16552
>                 URL: https://issues.apache.org/jira/browse/HIVE-16552
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-16552.1.patch, HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN 
> resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
> stop a query that invokes a Spark job that contains too many tasks. The 
> proposal here is to introduce hive.spark.job.max.tasks with a default value 
> of -1 (no limit), which an admin can set to block queries that trigger too 
> many spark tasks.
> Please note that this control knob applies to a spark job, though it's 
> possible that one query can trigger multiple Spark jobs (such as in case of 
> map-join). Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

Reply via email to