Any ideas how to make dtests more stable and reproducible?

Stefan Miklosovic Mon, 18 Mar 2019 17:47:44 -0700

Hi,

I am running large and "simple" dtests (executed via
cassandra-builds/build-scripts/cassandra-dtest-pytest.sh) and I find myself
quite frustrated as I do not know if there are errors because tests are
flaky or there are legit issues which produced them.


It is "simple" to check it one by one when tests are stable and there is
couple of them but when there are hundreds of tests, whole test run takes
~7 hours and it is not stable, it is like finding a needle in a haystack.
Sometimes 15 tests fail, sometimes just 10 ... Sometimes there are
timeouts, sometimes not.

For basic dtests I am getting stable three errors out of 900 I think which
quite good. I supplied one patch here (1) so only two of them are failing
now consistently (it is not merged yet).

Can you point me to your builds and what results you are getting there?
Maybe something is wrong with my setup or these dtests are "expected" to be
flaky from time to time?

What stability are you getting with official builds when it comes to
dtests? How often they are run? As part of every pull request / change? Do
you commit only on "0 dtests failed"?

Are there some recommendations as on what setup and machine these tests
should run? I am running them on c5.9xlarge (36 cores with 64 GB or memory)
on fairly recent Ubuntu with latest Java 8. I am trying to supply all
needed parameters and libs in order to start Cassandra smoothly without any
warnings / errors (there are these checks which check if your environment
is all fine).

I am testing current trunk.

Thanks for any input how to make them more stable if there are some tips
and tricks.

(1) https://github.com/apache/cassandra-dtest/pull/47

Stefan Miklosovic

Any ideas how to make dtests more stable and reproducible?

Reply via email to