I ended up writing a new metric-test function for this problem, which I will likely release as a library soon. For now, the function is just embedded in the project where I first needed it.
Usage looks like the following. Note that the baseline argument is optional. If the baseline argument is not present, then the test will fail and will recommend a starting value for baseline. (ns figurer.car-example-test (:require [clojure.test :refer :all] [figurer.car-example :refer :all] [figurer.core :as figurer] [metric-test.core :refer [metric-test]])) (def metrics {:expected-value figurer/expected-value :plan-value (fn [solution] (let [plan (figurer/sample-plan solution) plan-value (/ (apply + (map (:value solution) (:states plan) )) (count (:states plan)))] plan-value))}) (deftest gentle-turn-metric-test (metric-test "gentle turn metric 0.1" #(figurer/figure gentle-turn-problem {:max-seconds 0.1}) :metrics metrics :baseline {:expected-value {:mean 62.508, :stdev 0.542} :plan-value {:mean 70.569, :stdev 1.46}})) And output for a failing test looks like this: FAIL in (gentle-turn-metric-test) (core.clj:112) gentle turn metric 0.1 Some metrics changed significantly compared to the baseline. | Metric | Old | New | Change | Unusual | |-----------------+----------------+----------------+-----------------------+---------| | :expected-value | 62.508 ± 0.542 | 72.566 ± 0.499 | 10.058 (18.558 stdev) | * | | :plan-value | 70.569 ± 1.46 | 70.541 ± 1.378 | -0.028 (-0.019 stdev) | | New baseline if these changes are accepted: {:expected-value {:mean 72.566, :stdev 0.499}, :plan-value {:mean 70.541, :stdev 1.378}} expected: false actual: false Here is the commit in which I created the metric-test function and used it for just one of my tests: https://github.com/ericlavigne/figurer/commit/1153b5d4db898d042de6e3aa0ab9d77e65c6e3cc On Saturday, October 6, 2018 at 5:41:27 PM UTC-4, Eric Lavigne wrote: > > *Summary* > > I am writing tests involving multiple metrics with tradeoffs. When I make > a software change, the tests should show check for changes across any of > these metrics and show me that I was able to improve along one metric, but > at the expense of another metric. If I decide that these changes are > overall acceptable, I should be able to quickly modify the test based on > this new baseline. So far I am following this strategy using just > clojure.test and a bunch of custom code. > > Is there a testing library that would help with this? If not, does anyone > else have use for such a tool? > > *Current code* > > > https://github.com/ericlavigne/figurer/blob/master/src/figurer/test_util.clj > > *Details of my problem* > > I am writing performance tests for a Monte Carlo tree search library > called figurer. In the beginning, the tests were focused on avoiding > regression. I recorded that within 0.1 second figurer could find a solution > with value between 71.7 and 73.6 (based on trying this 10 times) and wrote > a test that would fail if the value was higher or lower than this range. > This was helpful for determining whether the algorithm was getting better > or worse, but did not help with why the algorithm was getting better or > worse. > > I made a change to the library to focus more on refinement of paths that > showed early promise, rather than spreading attention evenly across all > candidates. I expected that this change would substantially improve the > value found, but instead it slightly reduced that value. There were many > possible explanations. Maybe the more sophisticated algorithm did a better > job of choosing new paths to try, but at too much cost in time spent per > path evaluation. Maybe the new algorithm focused too much on refining a > path that showed early promise, ignoring a better path whose that got an > unlucky roll of the dice early on. I needed to compare a variety of > performance metrics between the old and new versions of the code. > > Commit for the unexpected results described above: > > Switch from random to UCT-based exploration (worse performance) > > https://github.com/ericlavigne/figurer/commit/97c76b88ac3de0874444b0cfa55005ab909aba21 > > I would like to track all of the following metrics to help me understand > the effect of each code change. > > 1) Estimated value of the chosen plan > 2) Closeness of the plan's first move to the best plan > 3) Number of plans that were considered (raw speed) > 4) Closeness of closest candidate first move to the best plan > 5) Number of first moves that were considered > 6) Evaluation depth of the chosen plan > 7) Maximum evaluation depth across all considered plans > > For each metric, I need to record a baseline distribution by running the > code multiple times. The test will need to check whether new measurements > are consistent with that recorded distribution. If any metric is measured > outside the expected range, then a report should show me all metrics and > how they changed. The same report should also include a new baseline data > structure that I can copy back into my tests if I decide to accept this > result as my new baseline. > > The closest I've found so far is clojure-expectations, which has support > for comparing multiple values (via a map) as well as ranges (via > approximately). I would likely build on top of those capabilities and add > support for the baselining process. > > https://clojure-expectations.github.io/ > > *Is there another library that better matches this need? Anyone have a > better approach for the problem?* > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.