[ 
https://issues.apache.org/jira/browse/FLINK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612224#comment-16612224
 ] 

ASF GitHub Bot commented on FLINK-10319:
----------------------------------------

Clarkkkkk commented on issue #6680: [FLINK-10319] [runtime] Too many 
requestPartitionState would crash JM
URL: https://github.com/apache/flink/pull/6680#issuecomment-420670340
 
 
   @TisonKun Currently, the task will try to ask JM to check producer state. If 
it is a Timeout Exception, it will try again and assume it's still running.
   I am not sure about when the triggerPartitionProducerStateCheck get called, 
is it possible that the producer state is still running? If it is possible, 
then we might restart the Execution which is not necessary and go through the 
whole task cancellation logic(it might restart the whole job in Streaming 
mode). 
   And by using a single-thread thread pool, we will not introduce too much 
pressure on the JM and avoid unnecessary task cancellation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Too many requestPartitionState would crash JM
> ---------------------------------------------
>
>                 Key: FLINK-10319
>                 URL: https://issues.apache.org/jira/browse/FLINK-10319
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.7.0
>            Reporter: 陈梓立
>            Assignee: 陈梓立
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> Do not requestPartitionState from JM on partition request fail, which may 
> generate too many RPC requests and block JM.
> We gain little benefit to check what state producer is in, which in the other 
> hand crash JM by too many RPC requests. Task could always 
> retriggerPartitionRequest from its InputGate, it would be fail if the 
> producer has gone and succeed if the producer alive. Anyway, no need to ask 
> for JM for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to