Re: [sage-devel] Regression testing

Dr. David Kirkby Tue, 26 Oct 2010 00:57:24 -0700

On 10/25/10 07:06 PM, Robert Bradshaw wrote:

On Mon, Oct 25, 2010 at 8:19 AM, David Kirkby<david.kir...@onetel.net>  wrote:

On 21 October 2010 01:33, David Roe<r...@math.harvard.edu>  wrote:

There are a number of tickets in trac about performance regressions in
Sage.  I'm sure there are far more performance regressions which we don't
know about because nobody noticed.



I agree, and I've seen some comments from William that writing code
one way or another can change things by a factor of 100.

As someone writing library code, it's generally not obvious that one is
about to introduce a performance regression (otherwise you'd probably not do
it).


Agreed.

Consequently, I've been thinking recently about how to detect performance
regressions automatically.  There are really two parts to the problem:
gathering timing data on the Sage library, and analyzing that data to
determine if regression have occurred (and how serious they are).


Data gathering:

One could modify local/bin/sage-doctest to allow the option of changing each
doctest by wrapping it in a "timeit()" call.  This would then generate a
timing datum for each doctest line.
* these timings vary from run to run (presumably due to differing levels of
load on the machine).  I don't know how to account for this, but usually
it's a fairly small effect (on the order of 10% error).


They would differ by a lot more than 10%. One of my machines is a Sage
buildbot client. If that is building Sage, and I'm not, Sage will take
about and hour to build and test. If I'm building Sage at the same or
similar time, or will increase that by a factor of at least two.

What is needed is to measure CPU time used. That should be relatively
stable and not depend too much on system load, though even there I
would not be surprised by changes of +/- 10%.


Yes, for sure, though it's probably worth having both. Of course when
we move from a pexpect interface to doing something natively that
would make the CPU time go up because the real work is no longer
hidden in a parallel process.

* if you're testing against a previous version of sage, the doctest
structure will have changed because people wrote more doctests.  And doctest
lines depend on each other: you define variables that are used in later
lines.  So inserting a line could make timings of later lines incomparable
to the exact same line without the inserted line.  We might be able to parse
the lines and check that various objects are actually the same (across
different versions of sage, so this would require either a version-invariant
hash or saving in one version, loading in the other and comparing.  And you
would have to do that for each variable that occurs in the line), but that
seems to be getting too complicated...


Getting a checksum of each doctest would be easy. I suggest we use:

$ cksum sometest.py  | awk '{print $1}'

because that will be totally portable across all platforms. 'cksum' is
32-bit checksum that's part of the POSIX standard and the algorithm is
defined. So there's no worry about whether one has an md5 program, and
if so what its called.


To be very useful, I think we need to be more granular than having
per-file tests. Just think about the number of files that get touched,
even a little bit, each release...


I was not aware of that.

Full doctest blocks should be
independent (though of course when looking at a doctest a line-by-line
time breakdown could be helpful.). It shouldn't be too hard to add
hooks into the unit test framework itself. With 1.5K test files and
several dozen doctests per file, changing from version to version, I
could easily see the birthday paradox being a problem with cksum (even
if it weren't weak), but from python we have md5 and sha1.


'cksum' is a 32-bit checksum. Actually, if used all three sections of the output

1) Checksum
2) Length
3) Filename

I feel that should be sufficiently relieable. The probability of a test havingthe exact same path, checksum and length, while having changed, would beexceeding close to zero.

Also, I was talking to Craig Citro about this and he had the
interesting idea of creating some kind of a "test object" which would
be saved and then could be run into future versions of Sage and re-run
in. The idea of saving the tests that are run, and then running the
exact same tests (rather than worrying about correlation  of files and
tests) will make catching regressions much easier.

- Robert

Yes, that makes a lot of sense, though unless there a lot of tests, it would beeasy to miss problems.


dave

--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Regression testing

Reply via email to