I've used test.generative[1], data.generators[2], and most recently, simple-check[3] to perform specification-based testing of various libraries for some time now, and now consider such tools indispensable.
However, one thing that's never sat well with me about them is the implicit objective of producing repeatable streams of randomized test data. For example, docs for the default java.util.Random instance in data.generators[4] (which is initialized using a constant seed) reads: "Random instance for use in generators. By consistently using this instance you can get a repeatable basis for tests." There's certainly a tradition of defaulting to trying to obtain reliably repeatable test data; e.g. Haskell's QuickCheck (with which I only have extremely shallow/recent experience) also makes it easy to use the same seed for pseudorandom data generation. My question is, why? Perhaps someone can help explain why this is considered a quality objective. A number of reasons for doing exactly the opposite (i.e. always using a new seed for pseudorandom data generation) occur to me: * There are a number of circumstances where using the same seed will nevertheless produce a different set of data for test run 2 than was generated for test run 1: - Implementation differences / bugs between JDKs / operating systems; I've seen issues crop up in applications and libraries that depend upon a particular pattern of randomized data (cite needed, I know, can't find the references right now) - Simply adding, removing, or changing what your tests or code under test does will end up obtaining different random data in a different order - Changes outside of the codebase can do the same, e.g. Leiningen changing the order in which it tests namespaces, or testing a single specification alone that was previously in the middle of a full run * If the objective of repeatable test data is to effectively provide a regression test (to use a unit testing term), it seems that a far more efficient route would be to simply add previously-failing test data to a retained file or other datastore, and use it as a prefix or suffix to broader, fully randomized testing. This is particularly easy to do if you are using a shrinker (as provided in simple-check), which can help to significantly minimize the raw size of failing test cases, making them much easier to handle in general (i.e. you find a vector of three small strings that fail, rather than having to cart around a vector of 400 massive strings that originally are what caused the fault). * Finally, while you have the option of e.g. rebinding or bashing out `clojure.data.generators/*rnd*` to use a `java.util.Random` with a fresh seed on each test run, the "spirit" of specification/generative testing would seem to call for casting as wide a net as possible to find failing cases, rather than constantly retreading the same ground over and over again. (tl;dr: repeatable randomized streams of data are fragile side effects of your codebase and tools, manually-curated sets of regression test data may serve the same purposes more efficiently, and, "Why not test more different datasets rather than the same ones over and over?") Perhaps this is better off as a blog post, but I'd love to hear some perspectives on this here. Thoughts? Cheers, - Chas [1] https://github.com/clojure/test.generative [2] https://github.com/clojure/data.generators [3] https://github.com/reiddraper/simple-check [4] https://github.com/clojure/data.generators/blob/master/src/main/clojure/clojure/data/generators.clj#L18 -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.