Hi, I am running large and "simple" dtests (executed via cassandra-builds/build-scripts/cassandra-dtest-pytest.sh) and I find myself quite frustrated as I do not know if there are errors because tests are flaky or there are legit issues which produced them.
It is "simple" to check it one by one when tests are stable and there is couple of them but when there are hundreds of tests, whole test run takes ~7 hours and it is not stable, it is like finding a needle in a haystack. Sometimes 15 tests fail, sometimes just 10 ... Sometimes there are timeouts, sometimes not. For basic dtests I am getting stable three errors out of 900 I think which quite good. I supplied one patch here (1) so only two of them are failing now consistently (it is not merged yet). Can you point me to your builds and what results you are getting there? Maybe something is wrong with my setup or these dtests are "expected" to be flaky from time to time? What stability are you getting with official builds when it comes to dtests? How often they are run? As part of every pull request / change? Do you commit only on "0 dtests failed"? Are there some recommendations as on what setup and machine these tests should run? I am running them on c5.9xlarge (36 cores with 64 GB or memory) on fairly recent Ubuntu with latest Java 8. I am trying to supply all needed parameters and libs in order to start Cassandra smoothly without any warnings / errors (there are these checks which check if your environment is all fine). I am testing current trunk. Thanks for any input how to make them more stable if there are some tips and tricks. (1) https://github.com/apache/cassandra-dtest/pull/47 Stefan Miklosovic