Todd Lipcon created KUDU-2288:
---------------------------------
Summary: Client should fail fast upon access to an unavailable
tablet
Key: KUDU-2288
URL: https://issues.apache.org/jira/browse/KUDU-2288
Project: Kudu
Issue Type: Improvement
Components: supportability
Reporter: Todd Lipcon
Currently if a tablet has become unavailable for some reason (eg it has lost a
majority of replicas), the client will still faithfully retry up to its maximum
timeout for a read or write operation. After that timeout, it will sometimes
indicate a "timed out" error rather than something more indicative of the root
cause.
The retry-on-unavailability behavior is desirable in the case of transient
unavailability (eg a node has just failed and a re-election is occurring). But
if the tablet has been unavailable for quite some time (eg longer than the
client timeout, or longer than N heartbeat intervals for some N) than we can
assume that it's unlikely to recover within the timeout, and it would be
preferable to fail fast with an appropriate exception.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)