Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Erik Krogen Fri, 15 Sep 2017 10:24:34 -0700

I am +1 for a way to specify additional modules that should be run. Always 
running all module dependencies would be prohibitively expensive (especially 
for hadoop-common patches) but developers should have a good understanding of 
which patches are more high-risk for downstream consumers and can label 
accordingly.


> Maybe we should have a week to try and collaborate on that, with a focus on 
> 1+ specific build (branch 3 ?) for now, and get that stable and happy?

While we need to start somewhere and trunk or branch-3 seem like reasonable 
places to start, I would actually argue that stable tests for the older release 
lines are at least as, if not more, valuable. Maintenance releases are likely 
to be less rigorously tested than a major release (given the assumption that it 
is already pretty stable and should only have lower risk patches), and 
backports are generally less rigorously reviewed than the trunk patch, yet 
these are the releases which should have the highest stability guarantees. This 
implies to me that they are the branches which need the most stable unit 
testing.

On 9/15/17, 5:17 AM, "Steve Loughran" <ste...@hortonworks.com> wrote:

    1. I think maybe we should have the special ability or process for patches 
which change dependencies. We know that they have the potential for damage way 
beyond their .patch size and I fear them
    
    2. like allen says, we can't afford to have a full test run on every patch. 
Because t hen the infra is overloaded and either you don't get a turnaround 
time on a test within the day of submission, or the queue builds up so big that 
it's only by sunday evening that the backlog is cleared
    
    3. And we don't test the object stores enough, as even if you can do it 
just with a set of credentials, we can't grant them to jenkins (security) and 
it still takes lots of time (though with HADOOP-14553 we will cut the windows 
time down)
    
    4. And like allen also says, tests are a bit unreliable on the test infra. 
Example: TestKDiag; one of mine. No idea why it fails, it does work locally. 
Generally though, I think a lot of them are race conditions where the jenkins 
machines execute things in a different order, or simply take longer than we 
expect
    
    How about we identify those tests which fail intermittently on Jenkins 
alone and somehow downgrade them/get them explicilty excluded. I know its 
cheating, and we should try to fix them first (after all, the way they fail may 
change, which would be a regression)
    
    LambdaTestUtils.eventually() is designed to support spinning until a test 
passes, and with the most recent fix (HADOOP-14851) it may actually do this. It 
can help with race conditions (& inconsistent obect stores) by wrapping up the 
entire retry-until-something works process. But it only works if the race 
condition is between the production code and the assertion; if it is in the 
procuction code across threads, that's a serious problem
    
    Anyway: tests fail, we should care. IF you want to learn how to care, try 
and do what Allen has been busy with: try and keep Jenkins happy.
    
    Maybe we should have a week to try and collaborate on that, with a focus on 
1+ specific build (branch 3 ?) for now, and get that stable and happy?
    
    If we have to do it with a jenkins profile and skipping the unreliable 
tests, so be it
    
    
    > On 14 Sep 2017, at 22:44, Arun Suresh <arun.sur...@gmail.com> wrote:
    > 
    > I actually like this idea:
    > 
    >> One approach: do a dependency:list of each module and for those that show
    > a
    > change with the patch we run tests there.
    > 
    > Can 'jdeps' be used to prune the list of sub modules on which we do
    > pre-commit ? Essentially, we figure out which classes actually use the
    > modified classes from the patch and then run the pre-commit on those
    > packages ?
    > 
    > Cheers
    > -Arun
    > 
    > On Thu, Sep 14, 2017 at 2:23 PM, Andrew Wang <andrew.w...@cloudera.com>
    > wrote:
    > 
    >> On Thu, Sep 14, 2017 at 1:59 PM, Sean Busbey <bus...@apache.org> wrote:
    >> 
    >>> 
    >>> 
    >>> On 2017-09-14 15:36, Chris Douglas <cdoug...@apache.org> wrote:
    >>>> This has gotten bad enough that people are dismissing legitimate test
    >>>> failures among the noise.
    >>>> 
    >>>> On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
    >>>> <a...@effectivemachines.com> wrote:
    >>>>>        Someone should probably invest some time into integrating the
    >>> HBase flaky test code a) into Yetus and then b) into Hadoop.
    >>>> 
    >>>> What does the HBase flaky test code do? Another extension to
    >>>> test-patch could run all new/modified tests multiple times, and report
    >>>> to JIRA if any run fails.
    >>>> 
    >>> 
    >>> The current HBase stuff segregates untrusted tests by looking through
    >>> nightly test runs to find things that fail intermittently. We then don't
    >>> include those tests in either nightly or precommit tests. We have a
    >>> different job that just runs the untrusted tests and if they start
    >> passing
    >>> removes them from the list.
    >>> 
    >>> There's also a project getting used by SOLR called "BeastIT" that goes
    >>> through running parallel copies of a given test a large number of times
    >> to
    >>> reveal flaky tests.
    >>> 
    >>> Getting either/both of those into Yetus and used here would be a huge
    >>> improvement.
    >>> 
    >>> I discussed this on yetus-dev a while back and Allen thought it'd be
    >> non-trivial:
    >> 
    >> https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c01084
    >> 57bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E
    >> 
    >> I unfortunately don't have the test-patch.sh expertise to dig into this.
    >> 
    >> ---------------------------------------------------------------------
    >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
    >>> 
    >>> 
    >> 
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Reply via email to