[sage-devel] Re: Using valgrind to find segfaults

leif Tue, 02 Nov 2010 23:19:22 -0700

On 3 Nov., 06:29, Bill Hart <goodwillh...@googlemail.com> wrote:
> [...]
> Firstly, thank you to all the people who took the time to work on
> putting the new MPIR and Pari into Sage.
>
> (By the way, I don't understand why MPIR has been updated to 2.1.2 and
> not 2.1.3 which fixes a serious bug in the mpf functions. Nor do I
> understand why MPIR has been updated and the thread for this hasn't
> been closed. Also FLINT hasn't been updated, even though I explicitly
> stated it isn't safe to build the old flint against the new MPIR.)


Em, we haven't yet upgraded MPIR in Sage (see #8664), it's still
1.2.2.

(I recently sent you and thempirteam an e-mail regarding the [rather
trivial] exec stack problem of MPIR 2.1.x with Fedora 14. I couldn't
find it on mpir-devel, and MPIR's trac was down.)

Hopefully we'll get the new MPIR into an early alpha of 4.6.1, but
there's still work to do to make /upgrading Sage/ work with that,
since currently not all dependent parts of the /Sage library/ get
automatically properly rebuilt. I think we made a step forward with
the 4.6 release, since now at least dependent /spkgs/ get rebuilt.

W.r.t. 2.1.3, somebody else said we're currently not using any of the
mpf functions in Sage.


> Anyhow, whilst reading the long Pari trac ticket, and associated
> tickets, a few things stood out to me (a C programmer) that just might
> not be obvious to everyone. Apologies if this is already known to
> everyone here.
>
> At some point the new Pari + new MPIR caused a segfault in one of the
> doctests. Now, segfaults are in some ways the easiest types of bugs to
> track down. Here's how:
>
> You simply compile the relevant C libraries with gcc -g (this adds
> symbol information and line numbers to the compiled libraries). Next,
> you run the program valgrind. You don't need to do anything to run
> this program. It just works.
>
> If you normally type "blah" at the command line to run your program,
> just type "valgrind blah" instead. It will take much longer to run
> (usually 25-100 times longer), but it will tell you precisely which
> lines of the C code caused the segfault and if it was reading or
> writing to an invalid memory address at the time! Its output is a bit
> like a stack trace in Python.
>
> Note you can actually do all this with a Sage doctest, because after
> all, Sage is just a program you run from the command line.
>
> Once you find out which lines of C code the segfault occurs at, you
> can put a trace in to see if the data being fed to the relevant
> function is valid or not. This tells you if the library is at fault or
> your higher level Python/Cython code is somehow responsible for
> feeding invalid data (e.g. some C object wasn't initialised).
>
> Once upon a time, Michael Abshoff used to valgrind the entire Sage
> test suite and fix all the myriad bugs that showed up!
>
> So valgrind is the bug hunters friend.
>
> A second observation, made by Leif I think, is spot on. This all quite
> possibly shows up a problem with insufficient doctesting in Sage.
>
> Now the MPIR test code is pretty extensive and really ought to have
> picked up this bug. We put a lot of time into the test code for that
> MPIR release, so this is unfortunate.
>
> However, the entire Pari test suite and the entire Sage test suite
> (with an older version of Pari) passed without picking up this pretty
> serious bug in the MPIR division code!
>
> I think this underscores something I have been saying for a long time.
> Sage doesn't test the C libraries it uses well enough. As a result of
> that, it is taking inordinate amounts of developers' time to track
> down bugs turned up by Sage doctests when spkg's are updated. In some
> cases there is actually woefully inadequate test code in the C library
> itself. But even when this is not the case, it makes sense for Sage to
> do some serious testing before assuming the library is bug free. This
> is particularly easy to do in Python, and much harder to do at the
> level of the C library itself, by the way.
>
> I have been saying this for a very long time, to many people. *ALL*
> mathematical libraries are broken and contain bugs. If you don't test
> the code you are using, it *is* broken. The right ratio of test code
> to code is really pretty close to 50/50. And if you think I don't do
> this myself when I write code (even Sage code), well you'd be wrong.
>
> One solution would be for everyone to test more widely. If you write
> code that depends on feature Y of module X and module X doesn't
> properly test feature Y, assume it is broken and write doctests for
> that code as well as the code you are writing yourself.
>
> To give an example, Andy Novocin and I have been working on new
> polynomial factoring code in FLINT for a couple of years now. Around 6
> months ago we had a long test of some 100,000 or so polynomials
> factoring correctly. We also had a long test of some 20 odd very
> difficult polynomials factoring correctly. Thus there was no reason at
> all to suppose there were *ANY* bugs in the polynomial factoring code
> or any of the functions it made use of. By Sage standards I think this
> is an insane level of testing.
>
> But I insisted that every function we have written have its own test
> code. This has meant 6 months more work (there was something like
> 40,000 lines of new code to test). But I cannot tell you how many new
> serious bugs (and also performance problems too) that we turned up.
> There must be dozens of serious bugs we've fixed, many of which would
> have led to incorrect factorisations of whole classes of polynomials.
>
> The lesson for me was: just because my very extensive 5 or 6 doctests
> passed for the very complex new functionality I added, does not mean
> there aren't incredibly serious bugs in the underlying modules I used,
> nor does it mean that my new code is worth printing out and using as
> toilet paper.
>
> Detecting bugs in Sage won't make Sage a viable alternative to the
> MA*'s (that a whole nuther thread). After all, testing standards in
> those other packages are quite possibly much worse. But testing more
> thoroughly will mean less time is spent wasted trying to track down
> bugs in an ad hoc manner, and eventually, much more time available for
> addressing those issues that are relevant to becoming a viable
> alternative.

A long way to go... ;-)

I don't think people would like a complete feature (and perhaps
component upgrade) freeze for e.g. 6 months.

But there's work in progress to at least better support more automatic
testing on a wide(r) variety of platforms and systems. If we get new
weird doctest or build errors with every (pre-)release, there remains
little time to solve problems a long time in.


-Leif

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: Using valgrind to find segfaults

Reply via email to