[sage-devel] Regression testing

David Roe Wed, 20 Oct 2010 17:33:33 -0700

There are a number of tickets in trac about performance regressions in
Sage.  I'm sure there are far more performance regressions which we don't
know about because nobody noticed.


As someone writing library code, it's generally not obvious that one is
about to introduce a performance regression (otherwise you'd probably not do
it).  There have been instances where I've wondered whether a change I was
making might have unintended consequences for performance, but testing all
the ways in which this could manifest is laborious.

Consequently, I've been thinking recently about how to detect performance
regressions automatically.  There are really two parts to the problem:
gathering timing data on the Sage library, and analyzing that data to
determine if regression have occurred (and how serious they are).


Data gathering:

One could modify local/bin/sage-doctest to allow the option of changing each
doctest by wrapping it in a "timeit()" call.  This would then generate a
timing datum for each doctest line.
* these timings vary from run to run (presumably due to differing levels of
load on the machine).  I don't know how to account for this, but usually
it's a fairly small effect (on the order of 10% error).
* if you're testing against a previous version of sage, the doctest
structure will have changed because people wrote more doctests.  And doctest
lines depend on each other: you define variables that are used in later
lines.  So inserting a line could make timings of later lines incomparable
to the exact same line without the inserted line.  We might be able to parse
the lines and check that various objects are actually the same (across
different versions of sage, so this would require either a version-invariant
hash or saving in one version, loading in the other and comparing.  And you
would have to do that for each variable that occurs in the line), but that
seems to be getting too complicated...

Many users will only have one version of sage installed.  And even with
multiple versions, you need somewhere to PUT the data in order to compare to
later.

One way of handling these problems is to create a relational database to put
timings in.  This could also be interesting for benchmarketing purposes: we
could have timings on various machines, and we highlight performance
improvements, in addition to watching for performance regressions.

So, here's a first draft for a database schema to put timing data into.
I've included a description of each table, along with a description of
columns I thought were non-obvious.  I'm definitely interested in
suggestsion for improving this schema.

Table: Hosts
# computer information; including identifying data to determine when running
on same host
col: id
col: identifying_data # some way to uniquely identify the computer on which
a test is being run. Presumably the output of some unix function, but I
don't know what.

Table: Sage_version
# a table giving each existing version of Sage an id
# the ids for official sage releases should be consistent across databases;
# users can also create their own temporary versions which use a different
block of id numbers.
# when a new version is created, the Files, Doctest_blocks and Doctest_lines
tables are updated
col: id
col: version_name # string defining the version
col: prev_version_id # versions should be totally ordered.  Since we want to
reserve some id space for official sage versions and other space for users,
this can't be done by just the numerical id.

Table: Files
# a table for all the files in sage that have doctests
col: id
col: filename # e.g. "sage/rings/integer.pyx"
# we might also want some other data, like when the file was added (or
deleted), whether it's a python vs cython file….

Table: Doctest_blocks
# a table for the triple-quote-encased strings that hold doctests
# theoretically, these could be smaller groupings within such a larger
block.  But while it's generally true that doctest blocks can be executed
independently of each other, it's hard to identify subsets of the blocks
that are independent of each other.
col: id
col: name # usually the name of the function, class or file to which this
doctest belongs
col: version_id # sage version in which this block lies
col: file_id # sage file in which this block lies
col: prev_id # doctest block id for analogous block in previous version of
sage.  -1 indicates nonexistent

Table: Doctest_lines
# a table for lines beginning with "sage:" that are executed by the doctest
framework.
# lines that occur many times in the Sage library (e.g. "R.<x> = QQ[]") have

col: id
col: code # e.g. "R.<x> = QQ[]"
# could theoretically support storing equivalent lines of code, such as
"R.<x> = PolynomialRing(QQ)"

Table: Block_line_incidences
# a table to record the many-to-many relation between doctest_blocks and
doctest_lines
col: id
col: block_id
col: line_id
col: line_number # which line in the block this is, e.g. 1 (or 0?) if this
is the first doctest line in the block

Table: Regression_runs
# a table to record times (possibly on different machines) where data was
entered into this database from a regression-test-run
col: id
col: host_id # the computer on which this was run
col: version_id # the sage version
col: start_time # the time this run was started
col: duration # how long this run lasted (would end_time be more useful?)
# doesn't necessarily have to be a run through the whole library; could only
include a subset of the files.
# could include other info, like loading, though I don't know how to measure
this…

Table: Timing_data
# A table to record the timing data we're actually interested in
col: id
col: bli_id # the id for a block_line_incidence
col: run_id # the regression run that this measurement was a part of
col: cpu_time # the amount of time this took to run, as measured through
timeit
# presumably we require that the two version_ids you can get tracing up
through the bli and the run are the same version.


There are also questions of how this would be distributed.  Presumably the
data part wouldn't come standard with sage.  Maybe an optional spkg?  Of
course, you're mainly interested in comparing DIFFERENT versions of sage on
the SAME computer, which doesn't really fit with how we normally distribute
code and data.


Analysis:
Once we have the database set up and have gathered some data, you can do all
kinds of things with it.  I'm most interested in how to find speed
regressions, and I can certainly imagine writing code to do so.  You have
data from a previous version of sage (or your own, before you applied some
patch(es)) and you run a regression test with "sage -rt" or something; this
generates the same kind of timing data and then you look for slowdowns,
either absolute or relative to the average ratio of your current run to the
previous run for a given line of code.  It can then print out a sorted list
of lines with slowdowns, or ones above a certain threshold, or...

I'd like to hear more ideas on how to improve all of this.
David

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Regression testing

Reply via email to