Hi, I'm working on getting Valgrind (Linux64-only) runs visible on TBPL. https://bugzilla.mozilla.org/show_bug.cgi?id=valgrind-green is the tracking bug. This aim of this email is to (a) get answers to some questions I have, and (b) serve as a heads-up for people I will have to co-ordinate with.
I've done a bunch of work to get the Valgrind builds running nicely -- fixing test harness bugs, diagnosing false positive leak reports, getting the test machines configured better, and removing unnecessary suppressions. An example of a recent successful run is here: https://tbpl.mozilla.org/php/getParsedLog.php?id=31312433&tree=Mozilla-Central&full=1 Someone will probably ask why we want to do this. Valgrind can find the following kinds of problems. 1. Accessing memory you shouldn't, e.g. overrunning and underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed. 2. Using undefined values, i.e. values that have not been initialised, or that have been derived from other undefined values. 3. Incorrect freeing of heap memory, such as double-freeing heap blocks, freeing non-heap blocks, or mismatched use of malloc/new/new[] versus free/delete/delete[]. 4. Overlapping src and dst pointers in memcpy and related functions. 5. Memory leaks. 1, 2 and 3 are often sec-critical bugs. 1 overlaps with ASan's checking, but I don't think 2 and 3 do. 4 is rare but can be bad when it happens. 5 is of obvious interest for MemShrink. As jorendorff says, "if we don't do Valgrind runs, somebody else will". (In case it's not clear, the "somebody else" in that sentence refers to people looking for security vulnerabilities.) ---- In order to understand what needs to be done, I looked at the "Requirements for being shown in the default TBPL view", from From https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy. 1) Has an active owner That's me. 2) Breakage is expected to be followed by tree closure or backout Yep. 3) Runs on mozilla-central and all trees that merge into it Currently Valgrind runs only occur on mozilla-central. https://bugzilla.mozilla.org/show_bug.cgi?id=801955 is open for running it on other m-i, fx-team and Try. Should there be others, e.g. aurora/beta/release? What's involved with this -- what code needs to change? Who do I need to talk to? 4) Scheduled on every push Currently they only run once per day. https://bugzilla.mozilla.org/show_bug.cgi?id=946002 is open for changing this. This will incur additional machine load, but it should be a drop in the ocean: it only takes about 45 minutes, and it only runs on Linux, which is our most scalable platform. (We might consider expanding what gets run under Valgrind later on, but this is a big enough challenge for now and the existing coverage is still enough to be useful.) Again, what code needs to change for this to happen? Who do I need to talk to? 5) Easily run on try server The bug mentioned in 3) above covers getting the builds runnable on tryserver. https://bugzilla.mozilla.org/show_bug.cgi?id=946005 is open for updating trychooser appropriate. sfink told me a bit about what's involved here, apparently the buildbot configuration code needs updating in some fashion. Anything else needed for this one? 6) Outputs failures in a TBPL-starrable format Currently the Valgrind builds show up as green, even if there are failures (https://bugzilla.mozilla.org/show_bug.cgi?id=823787). That needs fixing. Again, what code needs to change for this to happen? Who do I need to talk to? 7) Low intermittent failure rate 8) Must avoid patterns known to cause non deterministic failures gkw monitored the Valgrind runs for several months, and they exhibited no intermittent failures. Furthermore, Valgrind bugs tend to be deterministic on a particular machine, so devs can just borrow a tbpl machine and try to reproduce failures there if they are unable to reproduce locally. 9) Supports the disabling of individual tests These will only run on Linux64, and there's only a single test per se (it's actually a build + pgo-profile run in a single job). So I think nothing additional needs to be done here. 10) Has sufficient documentation https://bugzilla.mozilla.org/show_bug.cgi?id=946011. I can do that, it'll be easy once everything else is in place. 11) Easy for a dev to run locally We have https://bugzilla.mozilla.org/show_bug.cgi?id=631842 for adding a make target and/or mach command. I'm not sure if this point is critical? If it is, I should be able to do it without too much difficulty, I hope. ---- I'd be happy to hear suggestions about the best order to attack these changes, and which parts will be the hardest. Speaking for myself, the thing I'd like first is to run on every push, so that patches responsible for breakage are more obvious. (Someone actually landed something that caused Valgrind to complain in the past 24 hours, so now I have to hunt that down.) Thanks! Nick _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform