ffang opened a new pull request, #3143: URL: https://github.com/apache/cxf/pull/3143
## Problem `RetryStrategy` contains a shared mutable `counter` instance field that makes it stateful. When configured as a singleton bean — as implied by the documentation and consistent with its stateless siblings `SequentialStrategy` and `RandomStrategy` — this counter is shared across all invocations. The result is that retries from independent call chains interfere with each other: - Sequential calls on the same thread carry leftover count from a previous invocation into the next one - Concurrent calls on different threads race against the same integer In both cases the effective `maxNumberOfRetries` becomes non-deterministic — the actual retry limit is anywhere between 1 and the configured maximum, with no error or warning. For example, with `maxNumberOfRetries = 5`: if one call chain fails twice and then succeeds, a subsequent call chain starts with `counter = 2` and only gets 3 retries instead of 5. ## Root Cause Analysis The bug has two dimensions: **1. Shared instance field** `RetryStrategy.counter` is incremented by every invocation that shares the same strategy instance. **2. Two distinct code paths both consume the counter** - `getAlternateEndpoints(Exchange)` → `stillTheSameAddress(Exchange)` (endpoint-based failover) - `getFailoverTarget` → `selectAlternateAddress` / `selectAlternateEndpoint` → `getNextAlternate` → `stillTheSameAddress()` no-arg (address-based failover) ## Solution The retry counter is moved to a `ThreadLocal<Map<Integer, Integer>>` keyed by `Exchange` object identity (`System.identityHashCode`). Since `exchange.clear()` clears the Exchange's property map but never replaces the Exchange object itself, the counter survives between failover attempts while remaining scoped to a single invocation. Both code paths are covered: - **`stillTheSameAddress(Exchange)`** — new Exchange-aware variant that reads and writes the counter directly via the exchange identity key. - **`stillTheSameAddress()` no-arg** — kept for subclass compatibility. A second `ThreadLocal<Exchange>` (`CURRENT_EXCHANGE`) is set by `FailoverTargetSelector.getFailoverTarget` in a `try/finally` block before calling `selectAlternateAddress` / `selectAlternateEndpoint`, so the no-arg path can look up the correct per-exchange counter. This is safe under async dispatch and thread-pool reuse because it is keyed by Exchange identity, not by thread. ### Changes to `RetryStrategy` - Remove `private int counter` instance field - Add `EXCHANGE_COUNTERS` — `ThreadLocal<Map<Integer, Integer>>` keyed by exchange identity hash - Add `CURRENT_EXCHANGE` — `ThreadLocal<Exchange>` set by `FailoverTargetSelector` before entering the alternate-selection chain - Add `stillTheSameAddress(Exchange)` — Exchange-aware variant used by `getAlternateEndpoints` - Add `withExchange()` — helper that sets/restores `CURRENT_EXCHANGE` in a `try/finally` - Add `setCurrentExchange` / `clearCurrentExchange` — called by `FailoverTargetSelector.getFailoverTarget` - Add `resetCounter(Exchange)` / `removeCounter(Exchange)` — lifecycle hooks called by `FailoverTargetSelector` ### Changes to `FailoverTargetSelector` | Method | Change | |--------|--------| | `prepare()` | Calls `RetryStrategy.resetCounter(exchange)` to start each new top-level invocation from zero | | `getFailoverTarget()` | Wraps both `selectAlternateAddress` and `selectAlternateEndpoint` in a `try/finally` that sets and clears `CURRENT_EXCHANGE` | | `complete()` | Calls `RetryStrategy.removeCounter(exchange)` to prevent unbounded growth of the per-thread map | ## Testing Added `RetryStrategyTest` with 7 unit tests: | Test | What it verifies | |------|-----------------| | `testExchangePathRetriesExactlyMaxTimes` | Counter exhausts after exactly `maxNumberOfRetries` attempts | | `testExchangeCounterDoesNotLeakAcrossInvocations` | Fresh exchange always starts from zero regardless of prior state | | `testResetCounterClearsExchangeState` | `resetCounter` restores a clean slate (simulates `prepare()`) | | `testMaxRetriesZeroAlwaysReturnsSameAddress` | `maxNumberOfRetries=0` means unlimited retries | | `testNoArgPathUsesExchangeCounter` | No-arg `stillTheSameAddress()` shares the same per-exchange counter | | `testNoArgCounterDoesNotLeakAcrossExchanges` | No-arg path also isolates state between independent exchanges | | `testConcurrentInvocationsDoNotInterfere` | Two threads sharing a singleton strategy each get their full quota | All existing system tests pass: - `FailoverTest#testSequentialStrategyWithRetries` - `CircuitBreakerFailoverTest#testSequentialStrategyWithRetries` - `FailoverWebClientTest#testRetryFailover` - `FailoverWebClientTest#testCircuitBreakerRetryFailover` - `FailoverWebClientTest#testRetryFailoverAlternateAddresses` - `JAXRSSoapBookTest#testCheckBookClientErrorResponseWithFailover` ## Related - Jira: https://issues.apache.org/jira/browse/CXF-9213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
