Valgrind-on-TBPL

Nicholas Nethercote Tue, 03 Dec 2013 17:03:31 -0800

Hi,

I'm working on getting Valgrind (Linux64-only) runs visible on TBPL.
https://bugzilla.mozilla.org/show_bug.cgi?id=valgrind-green is the tracking
bug.  This aim of this email is to (a) get answers to some questions I have,
and (b) serve as a heads-up for people I will have to co-ordinate with.


I've done a bunch of work to get the Valgrind builds running nicely -- fixing
test harness bugs, diagnosing false positive leak reports, getting the test
machines configured better, and removing unnecessary suppressions.

An example of a recent successful run is here:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31312433&tree=Mozilla-Central&full=1

Someone will probably ask why we want to do this.  Valgrind can find the
following kinds of problems.

1. Accessing memory you shouldn't, e.g. overrunning and underrunning heap
   blocks, overrunning the top of the stack, and accessing memory after it has
   been freed.

2. Using undefined values, i.e. values that have not been initialised, or that
   have been derived from other undefined values.

3. Incorrect freeing of heap memory, such as double-freeing heap blocks,
   freeing non-heap blocks, or mismatched use of malloc/new/new[] versus
   free/delete/delete[].

4. Overlapping src and dst pointers in memcpy and related functions.

5. Memory leaks.

1, 2 and 3 are often sec-critical bugs.  1 overlaps with ASan's checking, but I
don't think 2 and 3 do.  4 is rare but can be bad when it happens.  5 is of
obvious interest for MemShrink.

As jorendorff says, "if we don't do Valgrind runs, somebody else will".  (In
case it's not clear, the "somebody else" in that sentence refers to people
looking for security vulnerabilities.)

----

In order to understand what needs to be done, I looked at the "Requirements for
being shown in the default TBPL view", from From
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy.

1) Has an active owner

That's me.

2) Breakage is expected to be followed by tree closure or backout

Yep.

3) Runs on mozilla-central and all trees that merge into it

Currently Valgrind runs only occur on mozilla-central.
https://bugzilla.mozilla.org/show_bug.cgi?id=801955 is open for running it on
other m-i, fx-team and Try.  Should there be others, e.g. aurora/beta/release?

What's involved with this -- what code needs to change?  Who do I need to talk
to?

4) Scheduled on every push

Currently they only run once per day.
https://bugzilla.mozilla.org/show_bug.cgi?id=946002 is open for changing this.

This will incur additional machine load, but it should be a drop in the ocean:
it only takes about 45 minutes, and it only runs on Linux, which is our most
scalable platform.  (We might consider expanding what gets run under Valgrind
later on, but this is a big enough challenge for now and the existing coverage
is still enough to be useful.)

Again, what code needs to change for this to happen?  Who do I need to talk to?

5) Easily run on try server

The bug mentioned in 3) above covers getting the builds runnable on tryserver.
https://bugzilla.mozilla.org/show_bug.cgi?id=946005 is open for updating
trychooser appropriate.  sfink told me a bit about what's involved here,
apparently the buildbot configuration code needs updating in some fashion.

Anything else needed for this one?

6) Outputs failures in a TBPL-starrable format

Currently the Valgrind builds show up as green, even if there are failures
(https://bugzilla.mozilla.org/show_bug.cgi?id=823787).  That needs fixing.

Again, what code needs to change for this to happen?  Who do I need to talk to?

7) Low intermittent failure rate
8) Must avoid patterns known to cause non deterministic failures

gkw monitored the Valgrind runs for several months, and they exhibited no
intermittent failures.

Furthermore, Valgrind bugs tend to be deterministic on a particular machine, so
devs can just borrow a tbpl machine and try to reproduce failures there if they
are unable to reproduce locally.

9) Supports the disabling of individual tests

These will only run on Linux64, and there's only a single test per se (it's
actually a build + pgo-profile run in a single job).  So I think nothing
additional needs to be done here.

10) Has sufficient documentation

https://bugzilla.mozilla.org/show_bug.cgi?id=946011.  I can do that, it'll be
easy once everything else is in place.

11) Easy for a dev to run locally

We have https://bugzilla.mozilla.org/show_bug.cgi?id=631842 for adding a make
target and/or mach command.  I'm not sure if this point is critical?  If it is,
I should be able to do it without too much difficulty, I hope.

----

I'd be happy to hear suggestions about the best order to attack these changes,
and which parts will be the hardest.  Speaking for myself, the thing I'd like
first is to run on every push, so that patches responsible for breakage are
more obvious.  (Someone actually landed something that caused Valgrind to
complain in the past 24 hours, so now I have to hunt that down.)

Thanks!

Nick
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Valgrind-on-TBPL

Reply via email to