metric-based testing (evaluating changes to Monte Carlo tree search library)

Eric Lavigne Sat, 06 Oct 2018 14:41:50 -0700

*Summary*

I am writing tests involving multiple metrics with tradeoffs. When I make a 
software change, the tests should show check for changes across any of 
these metrics and show me that I was able to improve along one metric, but 
at the expense of another metric. If I decide that these changes are 
overall acceptable, I should be able to quickly modify the test based on 
this new baseline. So far I am following this strategy using just 
clojure.test and a bunch of custom code.


Is there a testing library that would help with this? If not, does anyone 
else have use for such a tool?

*Current code*

https://github.com/ericlavigne/figurer/blob/master/src/figurer/test_util.clj

*Details of my problem*

I am writing performance tests for a Monte Carlo tree search library called 
figurer. In the beginning, the tests were focused on avoiding regression. I 
recorded that within 0.1 second figurer could find a solution with value 
between 71.7 and 73.6 (based on trying this 10 times) and wrote a test that 
would fail if the value was higher or lower than this range. This was 
helpful for determining whether the algorithm was getting better or worse, 
but did not help with why the algorithm was getting better or worse.

I made a change to the library to focus more on refinement of paths that 
showed early promise, rather than spreading attention evenly across all 
candidates. I expected that this change would substantially improve the 
value found, but instead it slightly reduced that value. There were many 
possible explanations. Maybe the more sophisticated algorithm did a better 
job of choosing new paths to try, but at too much cost in time spent per 
path evaluation. Maybe the new algorithm focused too much on refining a 
path that showed early promise, ignoring a better path whose that got an 
unlucky roll of the dice early on. I needed to compare a variety of 
performance metrics between the old and new versions of the code.

Commit for the unexpected results described above:

    Switch from random to UCT-based exploration (worse performance)
    
https://github.com/ericlavigne/figurer/commit/97c76b88ac3de0874444b0cfa55005ab909aba21

I would like to track all of the following metrics to help me understand 
the effect of each code change.

1) Estimated value of the chosen plan
2) Closeness of the plan's first move to the best plan
3) Number of plans that were considered (raw speed)
4) Closeness of closest candidate first move to the best plan
5) Number of first moves that were considered
6) Evaluation depth of the chosen plan
7) Maximum evaluation depth across all considered plans

For each metric, I need to record a baseline distribution by running the 
code multiple times. The test will need to check whether new measurements 
are consistent with that recorded distribution. If any metric is measured 
outside the expected range, then a report should show me all metrics and 
how they changed. The same report should also include a new baseline data 
structure that I can copy back into my tests if I decide to accept this 
result as my new baseline.

The closest I've found so far is clojure-expectations, which has support 
for comparing multiple values (via a map) as well as ranges (via 
approximately). I would likely build on top of those capabilities and add 
support for the baselining process.

https://clojure-expectations.github.io/

*Is there another library that better matches this need? Anyone have a 
better approach for the problem?*

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

metric-based testing (evaluating changes to Monte Carlo tree search library)

Reply via email to