[jira] [Commented] (HIVE-15947) Enhance Templeton service job operations reliability

Kiran Kumar Kolli (JIRA) Fri, 17 Feb 2017 00:50:54 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871453#comment-15871453
 ]


Kiran Kumar Kolli commented on HIVE-15947:
------------------------------------------

BusyException.java: Default constructor to call new constructor with message. 
AppConfig.java: Init() changing the order of conf loading now. This might have 
impact on current scenarios. Is this change must? 
Code repetition: Same pattern is used between, Submit, list & status, lets 
re-use the code. 
Semaphore release in finally block: Finally block is not guaranteed to run when 
thread is killed or interrupted 
(http://docs.oracle.com/javase/tutorial/essential/exceptions/finally.html) and 
this might lead to starvation. Why not do catch-all and then release (careful 
with memory pressure scenario). 
Troubleshooting: I guess Log4j supports tracing calling thread, otherwise 
explicit tracing will help in troubleshooting.

Will review unit test later. 


> Enhance Templeton service job operations reliability
> ----------------------------------------------------
>
>                 Key: HIVE-15947
>                 URL: https://issues.apache.org/jira/browse/HIVE-15947
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Subramanyam Pattipaka
>            Assignee: Subramanyam Pattipaka
>         Attachments: HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation 
> requests. It simply accepts and tries to run all operations. If more number 
> of concurrent job submit requests comes then the time to submit job 
> operations can increase significantly. Templetonused hdfs to store staging 
> file for job. If HDFS storage can't respond to large number of requests and 
> throttles then the job submission can take very large times in order of 
> minutes.
> This behavior may not be suitable for all applications and client 
> applications  may be looking for predictable and low response for successful 
> request or send throttle response to client to wait for some time before 
> re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of 
> cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs 
> which controls maximum number of concurrent active job submissions within 
> Templeton and use this config to control better response times. If a new job 
> submission request sees that there are already 
> templeton.job.submit.exec.max-procs jobs getting submitted concurrently then 
> the request will fail with Http error 503 with reason 
>    “Too many concurrent job submission requests received. Please wait for 
> some time before retrying.”
>  
> The client is expected to catch this response and retry after waiting for 
> some time. The default value for the config 
> templeton.job.submit.exec.max-procs is set to ‘0’. This means by default job 
> submission requests are always accepted. The behavior needs to be enabled 
> based on requirements.
> We can have similar behavior for Status and List operations with configs 
> templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs 
> respectively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15947) Enhance Templeton service job operations reliability

Reply via email to