[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875658#comment-17875658 ]
Dmitry Konstantinov commented on CASSANDRA-19651: ------------------------------------------------- Thank you for your support! Regarding the random exception in distributed.test.ring.BootstrapTest#bootstrapUnspecifiedResumeTest-_jdk17 {code:java} Cannot invoke "org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)" because "state" is null-java.lang.NullPointerException: Cannot invoke "org.apache.cassandra.gms.EndpointState.getApplicationState(org.apache.cassandra.gms.ApplicationState)" because "state" is null at org.apache.cassandra.distributed.action.GossipHelper$PullSchemaFrom.lambda$accept$6adea493$1(GossipHelper.java:245) at org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$async$10(IsolatedExecutor.java:156) at org.apache.cassandra.concurrent.FutureTask$2.call(FutureTask.java:124) at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:840) {code} I can create a separate ticket to make the test less flaky by adding a retry loop, like this: {code:java} EndpointState state = waitForValue(() -> Gossiper.instance.getEndpointStateForEndpoint(endpoint), Objects::nonNull, 10_000); ... private static<V> V waitForValue(Callable<V> valueProvider, Predicate<V> condition, int timeoutMs) { V value; int sleptTimeMs = 0; int sleepPeriodMs = 50; try { do { value = valueProvider.call(); if (condition.test(value) || sleptTimeMs >= timeoutMs) { return value; } sleptTimeMs += sleepPeriodMs; Thread.sleep(sleepPeriodMs); } while (true); } catch (Exception e) { throw new RuntimeException(e); } } {code} [~aweisberg] what do think - does it make sense? > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > --------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > Fix For: 4.0.14, 5.0.1, 5.1, 4.1.7 > > Attachments: > ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, > ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, > ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, > ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, > result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, > result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, > result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, > select-junit-tests-rerun-4.1.zip > > Time Spent: 20m > Remaining Estimate: 0h > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org