Re: Apache Cassandra fuzz testing

[email protected] Fri, 18 Feb 2022 04:25:30 -0800

> There are many tests that are currently purely manual, and some are just hard 
> to maintain….. And whenever we add support for, say, UDTs, overnight you'll 
> just get UDTs for all existing tests


Yes, something worth really highlighting here is that many of our tests are 
flaky because we have so many tests, many of low quality, where 
determinism/reliability has been too costly to deliver. With fewer tests able 
to cover more functionality, investment in reliability and determinism more 
easily pay off. Also, by moving to frameworks that have done some of this heavy 
lifting, it is anyway easier to achieve.

I agree that some areas of the codebase might be quite ripe for this kind of 
work, particularly for more complex CQL features and ones being invested in 
today, or in the near future. MVs seem an obvious example, as part of work to 
move them out of experimental status. I’m uncertain if SAI is suitable for use 
with Harry, but it could be explored.

From: Alex Petrov <[email protected]>
Date: Friday, 18 February 2022 at 11:39
To: [email protected] <[email protected]>
Subject: Re: Apache Cassandra fuzz testing
I did not intend to imply that we should migrate all tests. To be more specific 
than I was, we can pick up only ones where Harry just makes more sense than 
manual tests, where it can cover more ground. GROUP BY comes to mind as a 
perfect example: its current test suite is rather limited. Fuzzing it can yield 
a lot of useful things, with very little risk for flakiness. It can completely 
replace existing test suite and test many more cases.

Another example - SelectTest and many tests like it, which is just a manual way 
to go through a bunch of cases, while leaving out many other potential 
edge-cases. TTL tests would be the next example. Range tombstones - yet 
another. Read repair tests would also be good to expand. Many python dtests 
that use stress to load data are another potential candidate.

There are many tests that are currently purely manual, and some are just hard 
to maintain. Many of those can be good candidates for switching to 
property-based. But, as Benedict mentioned, we don't have much bandwidth to 
migrate the tests anyways.

It could be that you are skeptical since you haven't had much experience with 
Harry just yet. While many features are still missing, it still is more 
powerful than many existing manually written tests. And whenever we add support 
for, say, UDTs, overnight you'll just get UDTs for all existing tests, followed 
by collections, and other things. Moreover, we will be able to see if all our 
tests pass under failure conditions, and test them with different sets of 
parameters.

Maybe if I reframe it and say that we add fuzz tests for the mentioned areas of 
code and, if we, at some point in the future, decide that manually-written 
tests are redundant, we can consider deprecating them.



On Fri, Feb 18, 2022, at 9:41 AM, Benjamin Lerer wrote:
Thanks a lot for raising that topic Alex.

I did not have the chance to use Harry yet and I guess it is the case for most 
of us.
Starting to use it in our new tests makes total sense to me.
I am more concerned about starting to migrate/update existing tests. It took us 
time to build some reliable and non flaky tests to guarantee the correctness of 
the codebase. As far as I can see from Harry's documentation some features are 
still missing. The people lack experience with this tool and it will take a bit 
of time for them to build that knowledge. Along the way we might also discover 
some issues with Harry that need to be addressed.

So I am +1 for starting to use it in our new tests and build our knowledge of 
Harry. Regarding a migration of existing tests to it, I would wait a bit before 
choosing to go down that path.



Le mer. 16 févr. 2022 à 16:30, [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> a écrit :

+1



The Simulator is hopefully going to be another powerful tool for this kind of 
work, and we should be encouraging the use of both for large or complex pieces 
of work.





From: Alex Petrov <[email protected]<mailto:[email protected]>>
Date: Wednesday, 16 February 2022 at 11:56
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: Apache Cassandra fuzz testing

(apologies for sending an incomplete email)



Hi everyone,



As you may know, we’ve been actively working on fuzz testing Apache Cassandra 
for the past several years and made quite a large progress on that front.



We’ve cut a 0.0.1 release of Harry [1], a fuzz testing tool for apache 
Cassandra and merged CASSANDRA-16262 [2].



I’d recommend us as a community to take the next logical step and demand fuzz / 
property-based tests for all marjor patches, and start migrating/updating 
existing tests to be property-based rather than using hardoced values.



Harry can be used to generate data, and then check that a sequence of events 
corresponds to Cassandra resolution rules. We will continue expanding Harry 
coverage and writing new models and checkers, too.



If you would like to learn more about Harry, you can refer to a recent blog 
post [3]. I will also be happy to answer any questions you may have about Harry 
and assist you in writing your tests, and helping to extend Harry in case 
there’s a feature you may need to accomplish it.



Thank you,

—Alex



[1] [GitHub - apache/cassandra-harry: Apache Cassandra - 
Harry](https://github.com/apache/cassandra-harry)

[2] [CASSANDRA-16262 4.0 Quality: Coordination & Replication Fuzz Testing - ASF 
JIRA](https://issues.apache.org/jira/browse/CASSANDRA-16262)

[3] [Apache Cassandra | Apache Cassandra 
Documentation](https://cassandra.apache.org/_/blog/Harry-an-Open-Source-Fuzz-Testing-and-Verification-Tool-for-Apache-Cassandra.html)

Re: Apache Cassandra fuzz testing

Reply via email to