On Tue, Oct 26, 2010 at 12:57 AM, Dr. David Kirkby <david.kir...@onetel.net> wrote: > On 10/25/10 07:06 PM, Robert Bradshaw wrote: >> >> On Mon, Oct 25, 2010 at 8:19 AM, David Kirkby<david.kir...@onetel.net> >> wrote: >>> >>> On 21 October 2010 01:33, David Roe<r...@math.harvard.edu> wrote: >>>> >>>> There are a number of tickets in trac about performance regressions in >>>> Sage. I'm sure there are far more performance regressions which we >>>> don't >>>> know about because nobody noticed. >>> >>> >>> I agree, and I've seen some comments from William that writing code >>> one way or another can change things by a factor of 100. >>> >>>> As someone writing library code, it's generally not obvious that one is >>>> about to introduce a performance regression (otherwise you'd probably >>>> not do >>>> it). >>> >>> Agreed. >>> >>>> Consequently, I've been thinking recently about how to detect >>>> performance >>>> regressions automatically. There are really two parts to the problem: >>>> gathering timing data on the Sage library, and analyzing that data to >>>> determine if regression have occurred (and how serious they are). >>>> >>>> >>>> Data gathering: >>>> >>>> One could modify local/bin/sage-doctest to allow the option of changing >>>> each >>>> doctest by wrapping it in a "timeit()" call. This would then generate a >>>> timing datum for each doctest line. >>>> * these timings vary from run to run (presumably due to differing levels >>>> of >>>> load on the machine). I don't know how to account for this, but usually >>>> it's a fairly small effect (on the order of 10% error). >>> >>> They would differ by a lot more than 10%. One of my machines is a Sage >>> buildbot client. If that is building Sage, and I'm not, Sage will take >>> about and hour to build and test. If I'm building Sage at the same or >>> similar time, or will increase that by a factor of at least two. >>> >>> What is needed is to measure CPU time used. That should be relatively >>> stable and not depend too much on system load, though even there I >>> would not be surprised by changes of +/- 10%. >> >> Yes, for sure, though it's probably worth having both. Of course when >> we move from a pexpect interface to doing something natively that >> would make the CPU time go up because the real work is no longer >> hidden in a parallel process. >> >>>> * if you're testing against a previous version of sage, the doctest >>>> structure will have changed because people wrote more doctests. And >>>> doctest >>>> lines depend on each other: you define variables that are used in later >>>> lines. So inserting a line could make timings of later lines >>>> incomparable >>>> to the exact same line without the inserted line. We might be able to >>>> parse >>>> the lines and check that various objects are actually the same (across >>>> different versions of sage, so this would require either a >>>> version-invariant >>>> hash or saving in one version, loading in the other and comparing. And >>>> you >>>> would have to do that for each variable that occurs in the line), but >>>> that >>>> seems to be getting too complicated... >>> >>> Getting a checksum of each doctest would be easy. I suggest we use: >>> >>> $ cksum sometest.py | awk '{print $1}' >>> >>> because that will be totally portable across all platforms. 'cksum' is >>> 32-bit checksum that's part of the POSIX standard and the algorithm is >>> defined. So there's no worry about whether one has an md5 program, and >>> if so what its called. >> >> To be very useful, I think we need to be more granular than having >> per-file tests. Just think about the number of files that get touched, >> even a little bit, each release... > > I was not aware of that. > >> Full doctest blocks should be >> independent (though of course when looking at a doctest a line-by-line >> time breakdown could be helpful.). It shouldn't be too hard to add >> hooks into the unit test framework itself. With 1.5K test files and >> several dozen doctests per file, changing from version to version, I >> could easily see the birthday paradox being a problem with cksum (even >> if it weren't weak), but from python we have md5 and sha1. > > 'cksum' is a 32-bit checksum. Actually, if used all three sections of the > output > > 1) Checksum > 2) Length > 3) Filename > > I feel that should be sufficiently relieable. The probability of a test > having the exact same path, checksum and length, while having changed, would > be exceeding close to zero.
True, though I bet I could (maliciously) come up with two docstrings that have the same checksum given how weak the algorithm is. I don't see any advantage to using a weak checksum given that we have Python and aren't doing this on a per-file basis (or, I hope, extracting doctests with an external shell script). >> Also, I was talking to Craig Citro about this and he had the >> interesting idea of creating some kind of a "test object" which would >> be saved and then could be run into future versions of Sage and re-run >> in. The idea of saving the tests that are run, and then running the >> exact same tests (rather than worrying about correlation of files and >> tests) will make catching regressions much easier. >> >> - Robert > > Yes, that makes a lot of sense, though unless there a lot of tests, it would > be easy to miss problems. Fortunately, there are a lot of tests :). - Robert -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org