[ https://issues.apache.org/jira/browse/KUDU-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3587: -------------------------------- Description: As of Kudu 1.17.0, the implementation of RetriableRpc for WriteRpc in the C++ client uses linear back-off strategy, where the hold-off time interval (in milliseconds) is computed as {noformat} num_attempts + (rand() % 5) {noformat} Even if Kudu servers use separate incoming queues for different RPC interfaces (e.g. TabletServerService, ConsensusService, etc.), in the presence of many active clients, many tablet replicas per tablet server, and on-going Raft election storms due to frozen and/or slow RPC worker threads, many more unrelated write requests might be dropped out of the overflown TabletServerService RPC queues because the queues are flooded with too many retried write requests to tablets whose leader replicas aren't yet established. It doesn't make sense to self-inflict such a DoS condition because of non-optimal RPC retry strategy at the client side. One option might be using linear back-off strategy when going round-robin through the recently refreshed list of tablet replicas, but using exponential strategy upon completing a full circle and issuing next GetTablesLocation request to Kudu master. was: As of Kudu 1.17.0, the implementation of RetriableRpc for WriteRpc in the C++ client uses linear back-off strategy, where the hold-off time interval (in milliseconds) is computed as {noformat} num_attempts + (rand() % 5) {noformat} Since Kudu servers use a single queue for all their RPC interfaces (e.g. TabletServerService, ConsensusService, etc.), in the presence of many active clients and busy server nodes, this might start Raft election storm or exacerbate an existing one by keeping the RPC queue full or almost full, so more ConsensusService requests are dropped out of overflown RPC queues. Of course, separating RPC queues for different interfaces is one part of the remedy (e.g., see [KUDU-2955|https://issues.apache.org/jira/browse/KUDU-2955]), but even with separate RPC queues it doesn't make sense to self-inflict a DoS condition because of non-optimal RPC retry strategy when there are many active clients and tablet leadership transition is in progress for many "hot" tables. One option might be using linear back-off strategy when going round-robin through the recently refreshed list of tablet replicas, but using exponential strategy upon completing a full circle and issuing next GetTablesLocation request to Kudu master. > Implement smarter back-off strategy for RetriableRpc upon receving > REPLICA_NOT_LEADER response > ---------------------------------------------------------------------------------------------- > > Key: KUDU-3587 > URL: https://issues.apache.org/jira/browse/KUDU-3587 > Project: Kudu > Issue Type: Improvement > Components: client > Reporter: Alexey Serbin > Priority: Major > > As of Kudu 1.17.0, the implementation of RetriableRpc for WriteRpc in the C++ > client uses linear back-off strategy, where the hold-off time interval (in > milliseconds) is computed as > {noformat} > num_attempts + (rand() % 5) > {noformat} > Even if Kudu servers use separate incoming queues for different RPC > interfaces (e.g. TabletServerService, ConsensusService, etc.), in the > presence of many active clients, many tablet replicas per tablet server, and > on-going Raft election storms due to frozen and/or slow RPC worker threads, > many more unrelated write requests might be dropped out of the overflown > TabletServerService RPC queues because the queues are flooded with too many > retried write requests to tablets whose leader replicas aren't yet > established. It doesn't make sense to self-inflict such a DoS condition > because of non-optimal RPC retry strategy at the client side. > One option might be using linear back-off strategy when going round-robin > through the recently refreshed list of tablet replicas, but using exponential > strategy upon completing a full circle and issuing next GetTablesLocation > request to Kudu master. -- This message was sent by Atlassian Jira (v8.20.10#820010)