I've used test.generative[1], data.generators[2], and most recently,
simple-check[3] to perform specification-based testing of various libraries
for some time now, and now consider such tools indispensable.

However, one thing that's never sat well with me about them is the
implicit objective of producing repeatable streams of randomized test data.
For example, docs for the default java.util.Random instance in
data.generators[4] (which is initialized using a constant seed) reads:

    "Random instance for use in generators. By consistently using this
    instance you can get a repeatable basis for tests."

There's certainly a tradition of defaulting to trying to obtain reliably
repeatable test data; e.g. Haskell's QuickCheck (with which I only have
extremely shallow/recent experience) also makes it easy to use the same
seed for pseudorandom data generation.

My question is, why? Perhaps someone can help explain why this is
considered a quality objective.  A number of reasons for doing exactly
the opposite (i.e. always using a new seed for pseudorandom data
generation) occur to me:

* There are a number of circumstances where using the same seed will
  nevertheless produce a different set of data for test run 2 than was
  generated for test run 1:
 - Implementation differences / bugs between JDKs / operating systems; I've
   seen issues crop up in applications and libraries that depend upon a
   particular pattern of randomized data (cite needed, I know, can't find the
   references right now)
 - Simply adding, removing, or changing what your tests or code under test
   does will end up obtaining different random data in a different order
 - Changes outside of the codebase can do the same, e.g. Leiningen changing
   the order in which it tests namespaces, or testing a single specification
   alone that was previously in the middle of a full run
* If the objective of repeatable test data is to effectively provide a
  regression test (to use a unit testing term), it seems that a far more
  efficient route would be to simply add previously-failing test data to a
  retained file or other datastore, and use it as a prefix or suffix to
  broader, fully randomized testing.  This is particularly easy to do if you
  are using a shrinker (as provided in simple-check), which can help to
  significantly minimize the raw size of failing test cases, making them much
  easier to handle in general (i.e. you find a vector of three small strings
  that fail, rather than having to cart around a vector of 400 massive strings
  that originally are what caused the fault).
* Finally, while you have the option of e.g. rebinding or bashing out
  `clojure.data.generators/*rnd*` to use a `java.util.Random` with a fresh
  seed on each test run, the "spirit" of specification/generative testing
  would seem to call for casting as wide a net as possible to find failing
  cases, rather than constantly retreading the same ground over and over
  again.

(tl;dr: repeatable randomized streams of data are fragile side effects of your
codebase and tools, manually-curated sets of regression test data may serve
the same purposes more efficiently, and, "Why not test more different datasets
rather than the same ones over and over?")

Perhaps this is better off as a blog post, but I'd love to hear some
perspectives on this here.

Thoughts?

Cheers,

- Chas

[1] https://github.com/clojure/test.generative
[2] https://github.com/clojure/data.generators
[3] https://github.com/reiddraper/simple-check
[4] 
https://github.com/clojure/data.generators/blob/master/src/main/clojure/clojure/data/generators.clj#L18

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to