On 11/12/2013 12:20 PM, Monty Taylor wrote:


On 11/12/2013 02:33 PM, David Kranz wrote:
On 11/12/2013 01:36 PM, Clint Byrum wrote:
Excerpts from Sean Dague's message of 2013-11-12 10:01:06 -0800:
During the freeze phase of Havana we got a ton of new contributors
coming on board to Tempest, which was super cool. However it meant we
had this new influx of negative tests (i.e. tests which push invalid
parameters looking for error codes) which made us realize that human
creation and review of negative tests really doesn't scale. David Kranz
is working on a generative model for this now.

Are there some notes or other source material we can follow to understand
this line of thinking? I don't agree or disagree with it, as I don't
really understand, so it would be helpful to have the problems enumerated
and the solution hypothesis stated. Thanks!

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
I am working on this with Marc Koderer but we only just started and are
not quite ready. But since you asked now...

The problem is that the current implementation of negative tests is that
each "case" is represented as code in a method and targets a particular
set of api arguments and expected result. In most (but not all) of these
tests there is boilerplate code surrounding the real content which is
the actual arguments being passed and the value expected. That
boilerplate code has to be written correctly and reviewed. The general
form of the solution has to be worked out but basically would involve
expressing these tests declaratively, perhaps in a yaml file. In order
to do this we will need some kind of json schema for each api. The main
implementation around this is defining the yaml attributes that make it
easy to express the test cases, and somehow coming up with the json
schema for each api.

In addition, we would like to support "fuzz testing" where arguments
are, at least partially, randomly generated and the return values are
only examined for 4xx vs something else. This would be possible if we
had json schemas. The main work is to write a generator and methods for
creating bad values including boundary conditions for types with ranges.
I had thought a bit about this last year and poked around for an
existing framework. I didn't find anything that seemed to make the job
much easier but if any one knows of such a thing (python, hopefully)
please let me know.

The negative tests for each api would be some combination of
declaratively specified cases and auto-generated ones.

With regard to the json schema, there have been various attempts at this
in the past, including some ideas of how wsme/pecan will help, and it
might be helpful to have more project coordination. I can see a few
options:

1. Tempest keeps its own json schema data
2. Each project keeps its own json schema in a way that supports
automated extraction
3. There are several use cases for json schema like this and it gets
stored in some openstacky place that is not in tempest

So that is the starting point. Comments and suggestions welcome! Marc
and I just started working on an etherpad
https://etherpad.openstack.org/p/bp_negative_tests but any one is
welcome to contribute there.

We actually did this back in the good old Drizzle days- and by we, I
mean Patrick Crews, who I copied here. He can refer to the research
better than I can, but AIUI, generative schema-driven testing of things
like this is certainly the right direction. It's about 10 years behind
the actual state of the art of the research, but it's in all ways
superior to making human combinations of input parameters and output
behaviors.

Thanks, Monty.
As Monty has stated, similar issues have been encountered in database testing. They are also complex, richly features systems that present interesting testing challenges. The best research regarding stochastic / randomized / high-volume / machine-generated test cases that I have seen has come from Microsoft's SQL Server team and it is this research that informed the creation of the random query generator tool for MySQL systems. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.3435&rep=rep1&type=pdf <-- MS paper on their db testing tools

We've been doing similar work with the test suite for libra - we organize things by api actions and define validation code (if name > max_len, we expect return value NNN, if user=bad, we expect MMM, etc) We have singular test cases (create_lb, update_lb, update_lb_nodes) and we feed in various parameters (names, number of nodes, etc) to produce several iterations of the test w/ different inputs.

This allows us to have one chunk of code that appropriately describes the api action's behavior while letting us quickly make new tests for that action by simply creating a new yaml file or adding to an existing one.

Some background:
Basically testing complex systems presents a couple of main problems (depth / interestingness of tests + maintainenance) People are not that good at writing and validating super complicated, insane tests by hand / eyeball. That is, people only often go so deep as time, energy, and brainpower permit (someone maybe won't create a tempest test w/ 200 steps and someone won't create a 20 table, 200 line SQL query for databases).

Throwing more people at testing generally only results in a ton of shallow test cases that you must also now maintain (keep up to date, investigate failures on, etc). If a company like MS found it could not feasibly scale w/ its resources, it should provide food for thought on OpenStack's testing strategy.

random query generator:
As a solution to this, one of my former colleagues created a testing tool called the random query generator (randgen). Instead of defining individual queries and their expected results (human validation of such things is also a time-sink / hell-hole), we instead move to defining stochastic grammars that express what components a query *may* have and we let the code and RNG do its thing for generating tests.

I will only say that this tool helped kill MySQL 6.0 and is heavily relied on by Percona, MariaDB, etc once their QA guys realized it was the only way to not be crushed under their own weight . It was like being handed a Zippo after relying on sticks and rocks to make fire...

Hope this information is useful and please feel free to ping me if anyone has further questions / wants to discuss this.

--
Thanks,
Patrick


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to