Dan Burkert created KUDU-1608:
---------------------------------
Summary: Catalog Manager DeleteTablet retry logic is broken
Key: KUDU-1608
URL: https://issues.apache.org/jira/browse/KUDU-1608
Project: Kudu
Issue Type: Bug
Components: master
Reporter: Dan Burkert
There are a couple of issues with the Catalog Manager's retry logic for
DeleteTablet requests:
1. The retries loop indefinitely
2. The RPC response is checked against a whitelist of fatal errors, instead of
a list of retriable errors. Additionally, we are missing many fatal errors on
this list such as WRONG_SERVER_UUID and UNKNOWN_ERROR. I think we should
instead only retry on errors which we know we can recover from.
3. The catalog manager aggressively sends out DeleteTablet requests to tablet
servers when tablets are ejected from the group. Arguably this should only be
done lazily when the dead tablets report in, since most of the time the tablet
will be ejected due to failure (and will never be seen again).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)