I am +1 for a way to specify additional modules that should be run. Always running all module dependencies would be prohibitively expensive (especially for hadoop-common patches) but developers should have a good understanding of which patches are more high-risk for downstream consumers and can label accordingly.
> Maybe we should have a week to try and collaborate on that, with a focus on > 1+ specific build (branch 3 ?) for now, and get that stable and happy? While we need to start somewhere and trunk or branch-3 seem like reasonable places to start, I would actually argue that stable tests for the older release lines are at least as, if not more, valuable. Maintenance releases are likely to be less rigorously tested than a major release (given the assumption that it is already pretty stable and should only have lower risk patches), and backports are generally less rigorously reviewed than the trunk patch, yet these are the releases which should have the highest stability guarantees. This implies to me that they are the branches which need the most stable unit testing. On 9/15/17, 5:17 AM, "Steve Loughran" <ste...@hortonworks.com> wrote: 1. I think maybe we should have the special ability or process for patches which change dependencies. We know that they have the potential for damage way beyond their .patch size and I fear them 2. like allen says, we can't afford to have a full test run on every patch. Because t hen the infra is overloaded and either you don't get a turnaround time on a test within the day of submission, or the queue builds up so big that it's only by sunday evening that the backlog is cleared 3. And we don't test the object stores enough, as even if you can do it just with a set of credentials, we can't grant them to jenkins (security) and it still takes lots of time (though with HADOOP-14553 we will cut the windows time down) 4. And like allen also says, tests are a bit unreliable on the test infra. Example: TestKDiag; one of mine. No idea why it fails, it does work locally. Generally though, I think a lot of them are race conditions where the jenkins machines execute things in a different order, or simply take longer than we expect How about we identify those tests which fail intermittently on Jenkins alone and somehow downgrade them/get them explicilty excluded. I know its cheating, and we should try to fix them first (after all, the way they fail may change, which would be a regression) LambdaTestUtils.eventually() is designed to support spinning until a test passes, and with the most recent fix (HADOOP-14851) it may actually do this. It can help with race conditions (& inconsistent obect stores) by wrapping up the entire retry-until-something works process. But it only works if the race condition is between the production code and the assertion; if it is in the procuction code across threads, that's a serious problem Anyway: tests fail, we should care. IF you want to learn how to care, try and do what Allen has been busy with: try and keep Jenkins happy. Maybe we should have a week to try and collaborate on that, with a focus on 1+ specific build (branch 3 ?) for now, and get that stable and happy? If we have to do it with a jenkins profile and skipping the unreliable tests, so be it > On 14 Sep 2017, at 22:44, Arun Suresh <arun.sur...@gmail.com> wrote: > > I actually like this idea: > >> One approach: do a dependency:list of each module and for those that show > a > change with the patch we run tests there. > > Can 'jdeps' be used to prune the list of sub modules on which we do > pre-commit ? Essentially, we figure out which classes actually use the > modified classes from the patch and then run the pre-commit on those > packages ? > > Cheers > -Arun > > On Thu, Sep 14, 2017 at 2:23 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > >> On Thu, Sep 14, 2017 at 1:59 PM, Sean Busbey <bus...@apache.org> wrote: >> >>> >>> >>> On 2017-09-14 15:36, Chris Douglas <cdoug...@apache.org> wrote: >>>> This has gotten bad enough that people are dismissing legitimate test >>>> failures among the noise. >>>> >>>> On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer >>>> <a...@effectivemachines.com> wrote: >>>>> Someone should probably invest some time into integrating the >>> HBase flaky test code a) into Yetus and then b) into Hadoop. >>>> >>>> What does the HBase flaky test code do? Another extension to >>>> test-patch could run all new/modified tests multiple times, and report >>>> to JIRA if any run fails. >>>> >>> >>> The current HBase stuff segregates untrusted tests by looking through >>> nightly test runs to find things that fail intermittently. We then don't >>> include those tests in either nightly or precommit tests. We have a >>> different job that just runs the untrusted tests and if they start >> passing >>> removes them from the list. >>> >>> There's also a project getting used by SOLR called "BeastIT" that goes >>> through running parallel copies of a given test a large number of times >> to >>> reveal flaky tests. >>> >>> Getting either/both of those into Yetus and used here would be a huge >>> improvement. >>> >>> I discussed this on yetus-dev a while back and Allen thought it'd be >> non-trivial: >> >> https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c01084 >> 57bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E >> >> I unfortunately don't have the test-patch.sh expertise to dig into this. >> >> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org