Hi all, The unit tests are looking pretty reliable right now. There is a long tail of infrequently failing tests but it's not bad and almost all builds succeed in the current build environment. In CircleCI it seems like unit tests might be a little less reliable, but still usable. The dtests on the other hand aren't producing clean builds yetl. There is also a pretty diverse set of failing tests. I did a bit of triaging of the flakey dtests. I started by cataloging everything, but what I found is that the long tail of flakey dtests is very long indeed so I narrowed focus to just the top frequently failing tests for now. See https://goo.gl/b96CdO I created spreadsheet with some of the failing tests. Links to JIRA, last time the test was seen failing, and how many failures I found in Apache Jenkins across the 3 dtest builds. There are a lot of failures not listed. There would be 50+ entries if I cataloged each one. There are two hard failing tests, but both are already moving along: CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta reviewing, last updated April 2017) dtest failure in topology_test.TestTopology.size_estimates_multidc_testCASSANDRA-13113 (Ready to commit, assigned Alex Petrov, Sam T Reviewing, last updated March 2017) test failure in auth_test.TestAuth.system_auth_ks_is_alterable_test I think the tests we should tackle first are on this sheet in priority order https://goo.gl/S3khv1 Suite Test JIRA Last failure Counted failures Status Assigned Reviewer Comments bootstrap_test TestBootstrap.simultaneous_bootstrap_test https://issues.apache.org/jira/browse/CASSANDRA-13506 5/5/2017 45 Open
repair_test incremental_repair_test.TestIncRepair.compaction_test https://issues.apache.org/jira/browse/CASSANDRA-13194 5/4/2017 44 Open sstableutil_test SSTableUtilTest.compaction_test https://issues.apache.org/jira/browse/CASSANDRA-[1]13182 5/4/2017 35 Open paging_test TestPagingWithDeletions.test_ttl_deletions https://issues.apache.org/jira/browse/CASSANDRA-[2]13507 4/25/2017 31 Open repair_test incremental_repair_test.TestIncRepair.multiple_repair_test https://issues.apache.org/jira/browse/CASSANDRA-[3]13515 5/4/2017 18 Open cqlsh_tests cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_* https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22 5/8/2017 23 paxos_tests TestPaxos.contention_test_many_threads https://issues.apache.org/jira/browse/CASSANDRA-[4]13517 5/8/2017 15 Open repair_test TestRepair https://issues.apache.org/jira/issues/?jql=status%20%3D%20Open%20AND%20text%20~%20%22dtest%20failure%20repair_test%22 5/4/2017 No one test fails a lot but the number of failing tests is substantial cqlsh_tests cqlsh_tests.CqlshSmokeTest.[test_insert | test_truncate | test_use_keyspace | test_create_keyspace] 4/22/2017 6 If you have spare cycles you can make a huge difference in test stability by picking off one of these. Regards, Ariel Links: 1. https://issues.apache.org/jira/browse/CASSANDRA-13194 2. https://issues.apache.org/jira/browse/CASSANDRA-13194 3. https://issues.apache.org/jira/browse/CASSANDRA-13194 4. https://issues.apache.org/jira/browse/CASSANDRA-13194