Re: [sage-devel] Regression testing

Robert Bradshaw Fri, 29 Oct 2010 14:11:02 -0700

On Tue, Oct 26, 2010 at 12:57 AM, Dr. David Kirkby
<david.kir...@onetel.net> wrote:
> On 10/25/10 07:06 PM, Robert Bradshaw wrote:
>>
>> On Mon, Oct 25, 2010 at 8:19 AM, David Kirkby<david.kir...@onetel.net>
>>  wrote:
>>>
>>> On 21 October 2010 01:33, David Roe<r...@math.harvard.edu>  wrote:
>>>>
>>>> There are a number of tickets in trac about performance regressions in
>>>> Sage.  I'm sure there are far more performance regressions which we
>>>> don't
>>>> know about because nobody noticed.
>>>
>>>
>>> I agree, and I've seen some comments from William that writing code
>>> one way or another can change things by a factor of 100.
>>>
>>>> As someone writing library code, it's generally not obvious that one is
>>>> about to introduce a performance regression (otherwise you'd probably
>>>> not do
>>>> it).
>>>
>>> Agreed.
>>>
>>>> Consequently, I've been thinking recently about how to detect
>>>> performance
>>>> regressions automatically.  There are really two parts to the problem:
>>>> gathering timing data on the Sage library, and analyzing that data to
>>>> determine if regression have occurred (and how serious they are).
>>>>
>>>>
>>>> Data gathering:
>>>>
>>>> One could modify local/bin/sage-doctest to allow the option of changing
>>>> each
>>>> doctest by wrapping it in a "timeit()" call.  This would then generate a
>>>> timing datum for each doctest line.
>>>> * these timings vary from run to run (presumably due to differing levels
>>>> of
>>>> load on the machine).  I don't know how to account for this, but usually
>>>> it's a fairly small effect (on the order of 10% error).
>>>
>>> They would differ by a lot more than 10%. One of my machines is a Sage
>>> buildbot client. If that is building Sage, and I'm not, Sage will take
>>> about and hour to build and test. If I'm building Sage at the same or
>>> similar time, or will increase that by a factor of at least two.
>>>
>>> What is needed is to measure CPU time used. That should be relatively
>>> stable and not depend too much on system load, though even there I
>>> would not be surprised by changes of +/- 10%.
>>
>> Yes, for sure, though it's probably worth having both. Of course when
>> we move from a pexpect interface to doing something natively that
>> would make the CPU time go up because the real work is no longer
>> hidden in a parallel process.
>>
>>>> * if you're testing against a previous version of sage, the doctest
>>>> structure will have changed because people wrote more doctests.  And
>>>> doctest
>>>> lines depend on each other: you define variables that are used in later
>>>> lines.  So inserting a line could make timings of later lines
>>>> incomparable
>>>> to the exact same line without the inserted line.  We might be able to
>>>> parse
>>>> the lines and check that various objects are actually the same (across
>>>> different versions of sage, so this would require either a
>>>> version-invariant
>>>> hash or saving in one version, loading in the other and comparing.  And
>>>> you
>>>> would have to do that for each variable that occurs in the line), but
>>>> that
>>>> seems to be getting too complicated...
>>>
>>> Getting a checksum of each doctest would be easy. I suggest we use:
>>>
>>> $ cksum sometest.py  | awk '{print $1}'
>>>
>>> because that will be totally portable across all platforms. 'cksum' is
>>> 32-bit checksum that's part of the POSIX standard and the algorithm is
>>> defined. So there's no worry about whether one has an md5 program, and
>>> if so what its called.
>>
>> To be very useful, I think we need to be more granular than having
>> per-file tests. Just think about the number of files that get touched,
>> even a little bit, each release...
>
> I was not aware of that.
>
>> Full doctest blocks should be
>> independent (though of course when looking at a doctest a line-by-line
>> time breakdown could be helpful.). It shouldn't be too hard to add
>> hooks into the unit test framework itself. With 1.5K test files and
>> several dozen doctests per file, changing from version to version, I
>> could easily see the birthday paradox being a problem with cksum (even
>> if it weren't weak), but from python we have md5 and sha1.
>
> 'cksum' is a 32-bit checksum. Actually, if used all three sections of the
> output
>
> 1) Checksum
> 2) Length
> 3) Filename
>
> I feel that should be sufficiently relieable. The probability of a test
> having the exact same path, checksum and length, while having changed, would
> be exceeding close to zero.


True, though I bet I could (maliciously) come up with two docstrings
that have the same checksum given how weak the algorithm is. I don't
see any advantage to using a weak checksum given that we have Python
and aren't doing this on a per-file basis (or, I hope, extracting
doctests with an external shell script).

>> Also, I was talking to Craig Citro about this and he had the
>> interesting idea of creating some kind of a "test object" which would
>> be saved and then could be run into future versions of Sage and re-run
>> in. The idea of saving the tests that are run, and then running the
>> exact same tests (rather than worrying about correlation  of files and
>> tests) will make catching regressions much easier.
>>
>> - Robert
>
> Yes, that makes a lot of sense, though unless there a lot of tests, it would
> be easy to miss problems.

Fortunately, there are a lot of tests :).

- Robert

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Regression testing

Reply via email to