Todd Lipcon created KUDU-2288:
---------------------------------

             Summary: Client should fail fast upon access to an unavailable 
tablet
                 Key: KUDU-2288
                 URL: https://issues.apache.org/jira/browse/KUDU-2288
             Project: Kudu
          Issue Type: Improvement
          Components: supportability
            Reporter: Todd Lipcon


Currently if a tablet has become unavailable for some reason (eg it has lost a 
majority of replicas), the client will still faithfully retry up to its maximum 
timeout for a read or write operation. After that timeout, it will sometimes 
indicate a "timed out" error rather than something more indicative of the root 
cause.

The retry-on-unavailability behavior is desirable in the case of transient 
unavailability (eg a node has just failed and a re-election is occurring). But 
if the tablet has been unavailable for quite some time (eg longer than the 
client timeout, or longer than N heartbeat intervals for some N) than we can 
assume that it's unlikely to recover within the timeout, and it would be 
preferable to fail fast with an appropriate exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to