Michael Ho created KUDU-2192:
--------------------------------

             Summary: KRPC should have a timer to close stuck connections
                 Key: KUDU-2192
                 URL: https://issues.apache.org/jira/browse/KUDU-2192
             Project: Kudu
          Issue Type: Improvement
          Components: rpc
            Reporter: Michael Ho


If the remote host goes down or its network gets unplugged, all pending RPCs to 
that host will be stuck if there is no timeout specified. While those RPCs 
which have finished sending their payloads or those which haven't started 
sending payloads can be cancelled quickly, those in mid-transmission (i.e. an 
RPC at the front of the outbound queue with part of its payload sent already) 
cannot be cancelled until the payload has been completely sent. Therefore, it's 
beneficial to have a timeout to kill a connection if it's not making any 
progress for an extended period of time so the RPC will fail and get unstuck. 
The timeout may need to be conservatively large to avoid aggressive closing of 
connections due to transient network issue. One can consider augmenting the 
existing maintenance thread logic which checks for idle connection to check for 
this kind of timeout. Please feel free to propose other alternatives (e.g. TPC 
keepalive timeout) in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to