[sage-devel] Re: Regression testing

koffie Thu, 21 Oct 2010 07:57:27 -0700

This would be a good addition to the sage developement process
indeed.

Before going trough all the work of implementing this stuff, it might
be wise to first look what's already out there. It might prevent you
from doing double work, or give inspiration on how to do it. There
seems to be a package wich already does some performance testing in
the form of pyUnitPerf: http://sourceforge.net/projects/pyunitperf/ .
There is also a coding recipe that tries to make timing results
compareble across machines by comparing test times to a performance
test of the machine: 
http://code.activestate.com/recipes/440700-performance-testing-with-a-pystone-measurement-dec/


Since we're probably not the only ones wanting this sort of testing
possibility it might be interesting to maybe try to collaborate on
this with programmers of other (open source) python projects and make
a standalone project of this, in this way we wouldn't have to maintain
it all by our own and is entirely in the open source spirit.

- Maarten Derickx

On Oct 21, 6:55 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:
> On Wed, Oct 20, 2010 at 5:33 PM, David Roe <r...@math.harvard.edu> wrote:
> > There are a number of tickets in trac about performance regressions in
> > Sage.  I'm sure there are far more performance regressions which we don't
> > know about because nobody noticed.
>
> > As someone writing library code, it's generally not obvious that one is
> > about to introduce a performance regression (otherwise you'd probably not do
> > it).  There have been instances where I've wondered whether a change I was
> > making might have unintended consequences for performance, but testing all
> > the ways in which this could manifest is laborious.
>
> > Consequently, I've been thinking recently about how to detect performance
> > regressions automatically.  There are really two parts to the problem:
> > gathering timing data on the Sage library, and analyzing that data to
> > determine if regression have occurred (and how serious they are).
>
> +1. This has been lacking for a long time.
>
>
>
>
>
> > Data gathering:
>
> > One could modify local/bin/sage-doctest to allow the option of changing each
> > doctest by wrapping it in a "timeit()" call.  This would then generate a
> > timing datum for each doctest line.
> > * these timings vary from run to run (presumably due to differing levels of
> > load on the machine).  I don't know how to account for this, but usually
> > it's a fairly small effect (on the order of 10% error).
> > * if you're testing against a previous version of sage, the doctest
> > structure will have changed because people wrote more doctests.  And doctest
> > lines depend on each other: you define variables that are used in later
> > lines.  So inserting a line could make timings of later lines incomparable
> > to the exact same line without the inserted line.  We might be able to parse
> > the lines and check that various objects are actually the same (across
> > different versions of sage, so this would require either a version-invariant
> > hash or saving in one version, loading in the other and comparing.  And you
> > would have to do that for each variable that occurs in the line), but that
> > seems to be getting too complicated...
>
> > Many users will only have one version of sage installed.  And even with
> > multiple versions, you need somewhere to PUT the data in order to compare to
> > later.
>
> > One way of handling these problems is to create a relational database to put
> > timings in.  This could also be interesting for benchmarketing purposes: we
> > could have timings on various machines, and we highlight performance
> > improvements, in addition to watching for performance regressions.
>
> > So, here's a first draft for a database schema to put timing data into.
> > I've included a description of each table, along with a description of
> > columns I thought were non-obvious.  I'm definitely interested in
> > suggestsion for improving this schema.
>
> > Table: Hosts
> > # computer information; including identifying data to determine when running
> > on same host
> > col: id
> > col: identifying_data # some way to uniquely identify the computer on which
> > a test is being run. Presumably the output of some unix function, but I
> > don't know what.
>
> > Table: Sage_version
> > # a table giving each existing version of Sage an id
> > # the ids for official sage releases should be consistent across databases;
> > # users can also create their own temporary versions which use a different
> > block of id numbers.
> > # when a new version is created, the Files, Doctest_blocks and Doctest_lines
> > tables are updated
> > col: id
> > col: version_name # string defining the version
> > col: prev_version_id # versions should be totally ordered.  Since we want to
> > reserve some id space for official sage versions and other space for users,
> > this can't be done by just the numerical id.
>
> > Table: Files
> > # a table for all the files in sage that have doctests
> > col: id
> > col: filename # e.g. "sage/rings/integer.pyx"
> > # we might also want some other data, like when the file was added (or
> > deleted), whether it's a python vs cython file….
>
> > Table: Doctest_blocks
> > # a table for the triple-quote-encased strings that hold doctests
> > # theoretically, these could be smaller groupings within such a larger
> > block.  But while it's generally true that doctest blocks can be executed
> > independently of each other, it's hard to identify subsets of the blocks
> > that are independent of each other.
> > col: id
> > col: name # usually the name of the function, class or file to which this
> > doctest belongs
> > col: version_id # sage version in which this block lies
> > col: file_id # sage file in which this block lies
> > col: prev_id # doctest block id for analogous block in previous version of
> > sage.  -1 indicates nonexistent
>
> > Table: Doctest_lines
> > # a table for lines beginning with "sage:" that are executed by the doctest
> > framework.
> > # lines that occur many times in the Sage library (e.g. "R.<x> = QQ[]") have
> > col: id
> > col: code # e.g. "R.<x> = QQ[]"
> > # could theoretically support storing equivalent lines of code, such as
> > "R.<x> = PolynomialRing(QQ)"
>
> > Table: Block_line_incidences
> > # a table to record the many-to-many relation between doctest_blocks and
> > doctest_lines
> > col: id
> > col: block_id
> > col: line_id
> > col: line_number # which line in the block this is, e.g. 1 (or 0?) if this
> > is the first doctest line in the block
>
> > Table: Regression_runs
> > # a table to record times (possibly on different machines) where data was
> > entered into this database from a regression-test-run
> > col: id
> > col: host_id # the computer on which this was run
> > col: version_id # the sage version
> > col: start_time # the time this run was started
> > col: duration # how long this run lasted (would end_time be more useful?)
> > # doesn't necessarily have to be a run through the whole library; could only
> > include a subset of the files.
> > # could include other info, like loading, though I don't know how to measure
> > this…
>
> > Table: Timing_data
> > # A table to record the timing data we're actually interested in
> > col: id
> > col: bli_id # the id for a block_line_incidence
> > col: run_id # the regression run that this measurement was a part of
> > col: cpu_time # the amount of time this took to run, as measured through
> > timeit
> > # presumably we require that the two version_ids you can get tracing up
> > through the bli and the run are the same version.
>
> > There are also questions of how this would be distributed.  Presumably the
> > data part wouldn't come standard with sage.  Maybe an optional spkg?  Of
> > course, you're mainly interested in comparing DIFFERENT versions of sage on
> > the SAME computer, which doesn't really fit with how we normally distribute
> > code and data.
>
> > Analysis:
> > Once we have the database set up and have gathered some data, you can do all
> > kinds of things with it.  I'm most interested in how to find speed
> > regressions, and I can certainly imagine writing code to do so.  You have
> > data from a previous version of sage (or your own, before you applied some
> > patch(es)) and you run a regression test with "sage -rt" or something; this
> > generates the same kind of timing data and then you look for slowdowns,
> > either absolute or relative to the average ratio of your current run to the
> > previous run for a given line of code.  It can then print out a sorted list
> > of lines with slowdowns, or ones above a certain threshold, or...
>
> > I'd like to hear more ideas on how to improve all of this.
>
> timit() might be a bit extreem--I think it performs a minimum of 5
> loops. I don't look forward to 15-cpu-hour test runs, ideally we could
> just collection (noisier) timing information by default. Just have it
> spit out a sqlite database object (perhaps in a default location if
> none is specified).
>
> I would say you don't really need a complicated schema. Perhaps a
> single table would suffice, with columns
>
> OS info/id
> Sage version
> hg changeset (useful to tell if it's vanilla)
> file
> doctest identifier (fully qualified function name?)
> doctest content
> cpu time
> wall time
> run id (unique per run)
>
> There'd be a lot of redundancy in some of the columns, but that's
> OK--it'll just compress nice. Perhaps it could even be a comma/tab
> separated text file, to make things even simpler.
>
> A separate tool could be used to compare runs, flagging any egregious
> (or favorable) differences. Matching could be done in several ways--on
> the text content, on the method name, etc.
>
> - Robert

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: Regression testing

Reply via email to