[ https://issues.apache.org/jira/browse/CASSANDRA-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904258#comment-17904258 ]
Alex Petrov commented on CASSANDRA-20059: ----------------------------------------- +1 on Accord side: having retry indefinitely in the verb handler was an oversight, well spotted. > TCM's Retry.Deadline#retryIndefinitely is dangerous if used with > RemoteProcessor as the deadline does not impact message retries > -------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-20059 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20059 > Project: Apache Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata > Reporter: David Capwell > Assignee: David Capwell > Priority: Normal > Fix For: 5.x > > Attachments: > ci_summary-trunk-3fa63cf81ce03bfa45c2b312c1c2846a1d84eee5.html, > result_details-trunk-3fa63cf81ce03bfa45c2b312c1c2846a1d84eee5.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > public static Deadline retryIndefinitely(long timeoutNanos, Meter retryMeter) > { > return new Deadline(Clock.Global.nanoTime() + timeoutNanos, > new Retry.Jitter(Integer.MAX_VALUE, > DEFAULT_BACKOFF_MS, new Random(), retryMeter)) > { > @Override > public boolean reachedMax() > { > return false; > } > @Override > public long remainingNanos() > { > return timeoutNanos; > } > public String toString() > { > return String.format("RetryIndefinitely{tries=%d}", > currentTries()); > } > }; > } > {code} > Sample usage pattern (example is in Accord, but same pattern exists in > RemoteProcessor.commit) > {code} > Promise<LogState> request = new AsyncPromise<>(); > List<InetAddressAndPort> candidates = new > ArrayList<>(log.metadata().fullCMSMembers()); > sendWithCallbackAsync(request, > Verb.TCM_RECONSTRUCT_EPOCH_REQ, > new ReconstructLogState(lowEpoch, highEpoch, > includeSnapshot), > new CandidateIterator(candidates), > retryPolicy); > return request.get(retryPolicy.remainingNanos(), TimeUnit.NANOSECONDS); > {code} > The issue here is that the networking retry has no clue that we gave up > waiting on the request, so we will keep retrying until success! The reason > for this is “reachedMax” is used to see if its safe to run again, but it > isn’t as the deadline has passed! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org