Re: [lldb-dev] [cfe-dev] [llvm-dev] RFC: End-to-end testing

Philip Reames via lldb-dev Sat, 19 Oct 2019 13:54:43 -0700


On 10/9/19 6:25 PM, David Greene wrote:

Philip Reames via cfe-dev <cfe-...@lists.llvm.org> writes:

A challenge we already have - as in, I've broken these tests and had to
fix them - is that an end to end test which checks either IR or assembly
ends up being extraordinarily fragile.  Completely unrelated profitable
transforms create small differences which cause spurious test failures.
This is a very real issue today with the few end-to-end clang tests we
have, and I am extremely hesitant to expand those tests without giving
this workflow problem serious thought.  If we don't, this could bring
development on middle end transforms to a complete stop.  (Not kidding.)

Do you have a pointer to these tests?  We literally have tens of
thousands of end-to-end tests downstream and while some are fragile, the
vast majority are not.  A test that, for example, checks the entire
generated asm for a match is indeed very fragile.  A test that checks
whether a specific instruction/mnemonic was emitted is generally not, at
least in my experience.  End-to-end tests require some care in
construction.  I don't think update_llc_test_checks.py-type operation is
desirable.

The couple I remember off hand were mostly vectorization tests, but it'sbeen a while, so I might be misremembering.


Still, you raise a valid point and I think present some good options
below.

A couple of approaches we could consider:

  1. Simply restrict end to end tests to crash/assert cases.  (i.e. no
     property of the generated code is checked, other than that it is
     generated)  This isn't as restrictive as it sounds when combined
     w/coverage guided fuzzer corpuses.

I would be pretty hesitant to do this but I'd like to hear more about
how you see this working with coverage/fuzzing.

We've found end-to-end fuzzing from Java (which guarantees singlethreaded determinism and lack of UB) comparing two implementations to beextremely effective at catching regressions. A big chunk of theregressions are assertion failures. Our ability to detect miscompilesby comparing the output of two implementations (well, 2 or more for tiebreaking purposes) has worked extremely well. However, once a problem isidentified, we're stuck manually reducing and reacting, which is a verymajor time sink. Key thing here in the context of this discussion isthat there are no IR checks of any form, we just check the end-to-endcorrectness of the system and then reduce from there.

  2. Auto-update all diffs, but report them to a human user for
     inspection.  This ends up meaning that tests never "fail" per se,
     but that individuals who have expressed interest in particular tests
     get an automated notification and a chance to respond on list with a
     reduced example.

That's certainly workable.

  3. As a variant on the former, don't auto-update tests, but only inform
     the *contributor* of an end-to-end test of a failure. Responsibility
     for determining failure vs false positive lies solely with them, and
     normal channels are used to report a failure after it has been
     confirmed/analyzed/explained.

I think I like this best of the three but it raises the question of what
happens when the contributor is no longer contributing.  Who's
responsible for the test?  Maybe it just sits there until someone else
claims it.

I'd argue it should be deleted if no one is willing to actively stepup. It is not in the community's interest to assume unendingresponsibility for any third party test suite given the high burdeninvolved here.

I really think this is a problem we need to have thought through and
found a workable solution before end-to-end testing as proposed becomes
a practically workable option.

Noted.  I'm very happy to have this discussion and work the problem.

                      -David

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] [cfe-dev] [llvm-dev] RFC: End-to-end testing

Reply via email to