[
https://issues.apache.org/jira/browse/CASSANDRA-18564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761279#comment-17761279
]
Andres de la Peña commented on CASSANDRA-18564:
-----------------------------------------------
This patch makes the test survive 500 runs:
||PR||CI||
|[5.0
|https://github.com/apache/cassandra/pull/2657]|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3194/workflows/d7abb1fe-0954-4b11-8f15-0e9696d43f8d]|
|[trunk|https://github.com/apache/cassandra/pull/2658]|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3195/workflows/1c6be32a-ac4d-44f8-91a4-238997418c28]|
I have changed the test to skip testing the cases where the query should
timeout, as mentioned above. This allows us to increase the timeouts without
making the test prohibitively slow, which might cause a JUnit or CircleCI
timeout.
I have also grouped the testing of different consistency level combinations, so
they can all be tested into the same cluster run. This means that we only need
to run a third of the clusters. That doesn't make the test less prone to fail,
but it reduces the total CI load.
I have also split the test class by whether the coordinator node is upgraded or
not, so we don't have to run two full sets of upgrade paths in the same JVM.
However, I'm not sure I understand why the test was more prone to fail in 5.0
and trunk than in 4.0 and 4.1. 5.0 and trunk don't support direct upgrades from
3.0 or 3.x. So I think the number of tested upgrade paths is the same. There
should be something in 5.0 and trunk that makes the db, the test or the CI
environment slower. Could it be due to the fact that we are running with J11
instead of J8?
> Test Failure:
> MixedModeAvailabilityV30AllOneTest.testAvailabilityCoordinatorUpgraded
> ------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18564
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Andres de la Peña
> Assignee: Andres de la Peña
> Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> The JVM upgrade dtest
> {{MixedModeAvailabilityV3XAllOneTest.testAvailabilityCoordinatorUpgraded}}
> seems to be flaky at least in {{trunk}}:
> {code}
> junit.framework.AssertionFailedError: Error in test '4.0.11 -> [5.0]' while
> upgrading to '5.0'; successful upgrades []
> at
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:348)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:154)
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailabilityCoordinatorUpgraded(MixedModeAvailabilityTestBase.java:74)
> Caused by: java.lang.AssertionError: Unexpected error while reading in case
> write-read consistency ALL-ONE with upgraded coordinator and 2 nodes down:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 0 responses.
> at
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$6(MixedModeAvailabilityTestBase.java:145)
> at
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:339)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation
> timed out - received only 0 responses.
> at
> org.apache.cassandra.service.reads.ReadCallback.awaitResults(ReadCallback.java:162)
> at
> org.apache.cassandra.service.reads.AbstractReadExecutor.awaitResponses(AbstractReadExecutor.java:387)
> at
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:2124)
> at
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1995)
> at
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1873)
> at
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1286)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:364)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:293)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:105)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
> at
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
> at
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
> at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
> at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> This has failed 143 times in 500 iterations of this CircleCI run:
> *
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2927/workflows/fcd1cd60-826b-484a-8e81-d3ba640f7de9/jobs/47659/tests
> The failure has also recently appeared on Jenkins too:
> *
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1585/testReport/org.apache.cassandra.distributed.upgrade/MixedModeAvailabilityV3XAllOneTest/testAvailabilityCoordinatorUpgraded__jdk11/
> Given that the failure has just appeared on Jenkins and it fails relatively
> easily on CircleCI, it's likely that it has been broken by a very recent
> change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]