[jira] [Commented] (CASSANDRA-20059) TCM's Retry.Deadline#retryIndefinitely is dangerous if used with RemoteProcessor as the deadline does not impact message retries

Alex Petrov (Jira) Mon, 09 Dec 2024 11:08:10 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904258#comment-17904258
 ]


Alex Petrov commented on CASSANDRA-20059:
-----------------------------------------

+1 on Accord side: having retry indefinitely in the verb handler was an 
oversight, well spotted. 

> TCM's Retry.Deadline#retryIndefinitely is dangerous if used with 
> RemoteProcessor as the deadline does not impact message retries
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20059
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20059
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 
> ci_summary-trunk-3fa63cf81ce03bfa45c2b312c1c2846a1d84eee5.html, 
> result_details-trunk-3fa63cf81ce03bfa45c2b312c1c2846a1d84eee5.tar.gz
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> public static Deadline retryIndefinitely(long timeoutNanos, Meter retryMeter)
> {
>     return new Deadline(Clock.Global.nanoTime() + timeoutNanos,
>                         new Retry.Jitter(Integer.MAX_VALUE, 
> DEFAULT_BACKOFF_MS, new Random(), retryMeter))
>     {
>         @Override
>         public boolean reachedMax()
>         {
>             return false;
>         }
>         @Override
>         public long remainingNanos()
>         {
>             return timeoutNanos;
>         }
>         public String toString()
>         {
>             return String.format("RetryIndefinitely{tries=%d}", 
> currentTries());
>         }
>     };
> }
> {code}
> Sample usage pattern (example is in Accord, but same pattern exists in 
> RemoteProcessor.commit)
> {code}
> Promise<LogState> request = new AsyncPromise<>();
> List<InetAddressAndPort> candidates = new 
> ArrayList<>(log.metadata().fullCMSMembers());
> sendWithCallbackAsync(request,
>                       Verb.TCM_RECONSTRUCT_EPOCH_REQ,
>                       new ReconstructLogState(lowEpoch, highEpoch, 
> includeSnapshot),
>                       new CandidateIterator(candidates),
>                       retryPolicy);
> return request.get(retryPolicy.remainingNanos(), TimeUnit.NANOSECONDS);
> {code}
> The issue here is that the networking retry has no clue that we gave up 
> waiting on the request, so we will keep retrying until success!  The reason 
> for this is “reachedMax” is used to see if its safe to run again, but it 
> isn’t as the deadline has passed!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-20059) TCM's Retry.Deadline#retryIndefinitely is dangerous if used with RemoteProcessor as the deadline does not impact message retries

Reply via email to