[jira] [Commented] (CASSANDRA-18564) Test Failure: MixedModeAvailabilityV30AllOneTest.testAvailabilityCoordinatorUpgraded

Jira Fri, 01 Sep 2023 04:16:07 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761279#comment-17761279
 ]


Andres de la Peña commented on CASSANDRA-18564:
-----------------------------------------------

This patch makes the test survive 500 runs:
||PR||CI||
|[5.0 
|https://github.com/apache/cassandra/pull/2657]|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3194/workflows/d7abb1fe-0954-4b11-8f15-0e9696d43f8d]|
|[trunk|https://github.com/apache/cassandra/pull/2658]|[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/3195/workflows/1c6be32a-ac4d-44f8-91a4-238997418c28]|

I have changed the test to skip testing the cases where the query should 
timeout, as mentioned above. This allows us to increase the timeouts without 
making the test prohibitively slow, which might cause a JUnit or CircleCI 
timeout.

I have also grouped the testing of different consistency level combinations, so 
they can all be tested into the same cluster run. This means that we only need 
to run a third of the clusters. That doesn't make the test less prone to fail, 
but it reduces the total CI load.

I have also split the test class by whether the coordinator node is upgraded or 
not, so we don't have to run two full sets of upgrade paths in the same JVM.

However, I'm not sure I understand why the test was more prone to fail in 5.0 
and trunk than in 4.0 and 4.1. 5.0 and trunk don't support direct upgrades from 
3.0 or 3.x. So I think the number of tested upgrade paths is the same. There 
should be something in 5.0 and trunk that makes the db, the test or the CI 
environment slower. Could it be due to the fact that we are running with J11 
instead of J8?

> Test Failure: 
> MixedModeAvailabilityV30AllOneTest.testAvailabilityCoordinatorUpgraded
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18564
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18564
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/java
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>
> The JVM upgrade dtest 
> {{MixedModeAvailabilityV3XAllOneTest.testAvailabilityCoordinatorUpgraded}} 
> seems to be flaky at least in {{trunk}}:
> {code}
> junit.framework.AssertionFailedError: Error in test '4.0.11 -> [5.0]' while 
> upgrading to '5.0'; successful upgrades []
>       at 
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:348)
>       at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailability(MixedModeAvailabilityTestBase.java:154)
>       at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.testAvailabilityCoordinatorUpgraded(MixedModeAvailabilityTestBase.java:74)
> Caused by: java.lang.AssertionError: Unexpected error while reading in case 
> write-read consistency ALL-ONE with upgraded coordinator and 2 nodes down: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 0 responses.
>       at 
> org.apache.cassandra.distributed.upgrade.MixedModeAvailabilityTestBase.lambda$testAvailability$6(MixedModeAvailabilityTestBase.java:145)
>       at 
> org.apache.cassandra.distributed.upgrade.UpgradeTestBase$TestCase.run(UpgradeTestBase.java:339)
> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
> timed out - received only 0 responses.
>       at 
> org.apache.cassandra.service.reads.ReadCallback.awaitResults(ReadCallback.java:162)
>       at 
> org.apache.cassandra.service.reads.AbstractReadExecutor.awaitResponses(AbstractReadExecutor.java:387)
>       at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:2124)
>       at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1995)
>       at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1873)
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1286)
>       at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:364)
>       at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:293)
>       at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:105)
>       at 
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
>       at 
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
>       at 
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
>       at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
>       at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:750)
> {code}
> This has failed 143 times in 500 iterations of this CircleCI run:
> * 
> https://app.circleci.com/pipelines/github/adelapena/cassandra/2927/workflows/fcd1cd60-826b-484a-8e81-d3ba640f7de9/jobs/47659/tests
> The failure has also recently appeared on Jenkins too:
> * 
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1585/testReport/org.apache.cassandra.distributed.upgrade/MixedModeAvailabilityV3XAllOneTest/testAvailabilityCoordinatorUpgraded__jdk11/
> Given that the failure has just appeared on Jenkins and it fails relatively 
> easily on CircleCI, it's likely that it has been broken by a very recent 
> change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18564) Test Failure: MixedModeAvailabilityV30AllOneTest.testAvailabilityCoordinatorUpgraded

Reply via email to