zhougit86 commented on PR #21080:
URL: https://github.com/apache/flink/pull/21080#issuecomment-1281021051

   
   > Could you elaborate more on the motivation behind this? I'm not sure how 
useful is the idle information provided by this PR. From one hand, if there is 
some data waiting to be sent, and it is not being sent, that's clearly visible 
via a number of metrics (backpressured status, number of bytes sent, queues 
lengths etc). So this is a bit redundant. On the other hand, there can be many 
different reasons behind this timeout being triggered, like for example:
   > 
   > * idling operator not producing any data
   > * operator aggregating for a longer period of time (window)
   > * filtering out all of the records
   > * operator busy doing some very heavy work for a long period
   > * sorted shuffle service
   > * some unhealthy JVM/TM state (long GC pauses, memory swapping, long 
blocking IO)
   > 
   > All of the above would produce a false warning that would be misleading.
   
   Hi Master, I have modified the commit a little bit. the client side will 
send a heartbeat frame, if it detects there is no packet was sent within some 
time.
   
   thus the server can detect the network related issue, even the business flow 
is stopped for a while. 
   
   And this commit is to detect the consuming stop issue haunted us for long. 
But with this commit, the consuming could stop immediately.... It would be very 
kind of you, if you can just only take a few mins review it and give 
suggestions, thus we could use this in our own env quickly.
   
   thanks a lot for your time, it will be valued!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to