+1 Although we should be cautious when enabling this policy. We have decent backlog of bugs that we need to plumb through.
--Mikhail Have feedback <http://go/migryz-feedback>? On Thu, Jan 10, 2019 at 11:44 AM Scott Wegner <[email protected]> wrote: > +1, this sounds good to me. > > I believe the next step would be to open a PR to add this to the release > guide: > https://github.com/apache/beam/blob/master/website/src/contribute/release-guide.md > > On Wed, Jan 9, 2019 at 12:04 PM Sam Rohde <[email protected]> wrote: > >> Cool, thanks for all of the replies. Does this summary sound reasonable? >> >> *Problem:* there are a number of failing tests (including flaky) that >> don't get looked at, and aren't necessarily green upon cutting a new Beam >> release. >> >> *Proposed Solution:* >> >> - Add all tests to the release validation >> - For all failing tests (including flaky) create a JIRA attached to >> the Beam release and add to the "test-failures" component* >> - If a test is continuously failing >> - fix it >> - add fix to release >> - close out JIRA >> - If a test is flaky >> - try and fix it >> - If fixed >> - add fix to release >> - close out JIRA >> - else >> - manually test it >> - modify "Fix Version" to next release >> - The release validation can continue when all JIRAs are closed >> out. >> >> *Why this is an improvement:* >> >> - Ensures that every test is a valid signal (as opposed to disabling >> failing tests) >> - Creates an incentive to automate tests (no longer on the hook to >> manually test) >> - Creates a forcing-function to fix flaky tests (once fixed, no >> longer needs to be manually tested) >> - Ensures that every failing test gets looked at >> >> *Why this may not be an improvement:* >> >> - More effort for release validation >> - May slow down release velocity >> >> * for brevity, this might be better to create a JIRA per component >> containing a summary of failing tests >> >> >> -Sam >> >> >> >> >> On Tue, Jan 8, 2019 at 10:25 AM Ahmet Altay <[email protected]> wrote: >> >>> >>> >>> On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]> wrote: >>> >>>> >>>> >>>> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]> wrote: >>>> >>>>> For reference, there are currently 34 unresolved JIRA issues under the >>>>> test-failures component [1]. >>>>> >>>>> [1] >>>>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >>>>> >>>> >>>> And there are 19 labeled with flake or sickbay: >>>> https://issues.apache.org/jira/issues/?filter=12343195 >>>> >>>> >>>>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> This is a a good idea. Some suggestions: >>>>>> - It would be nicer if we can figure out process to act on flaky test >>>>>> more frequently than releases. >>>>>> >>>>> >>>> Any ideas? We could just have some cadence and try to establish the >>>> practice of having a deflake thread every couple of weeks? How about we add >>>> it to release verification as a first step and then continue to discuss? >>>> >>> >>> Sounds great. I do not know enough JIRA, but I am hoping that a solution >>> can come in the form of tooling. If we could configure JIRA with SLOs per >>> issue type, we could have customized reports on which issues are not >>> getting enough attention and then do a load balance among us. >>> >>> >>>> >>>> - Another improvement in the process would be having actual owners of >>>>>> issues rather than auto assigned component owners. A few folks have 100+ >>>>>> assigned issues. Unassigning those issues, and finding owners who would >>>>>> have time to work on identified flaky tests would be helpful. >>>>>> >>>>> >>>> Yikes. Two issues here: >>>> >>>> - sounds like Jira component owners aren't really working for us as a >>>> first point of contact for triage >>>> - a person shouldn't really have more than 5 Jira assigned, or if you >>>> get really loose maybe 20 (I am guilty of having 30 at this moment...) >>>> >>>> Maybe this is one or two separate threads? >>>> >>> >>> I can fork this to another thread. I think both issues are related >>> because components owners are more likely to be in this situaion. I agree >>> with assessment of two issues. >>> >>> >>>> >>>> Kenn >>>> >>>> >>>>> >>>>>> >>>>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I love this idea. It can easily feel like bugs filed for Jenkins >>>>>>> flakes/failures just get lost if there is no process for looking them >>>>>>> over >>>>>>> regularly. >>>>>>> >>>>>>> I would suggest that test failures / flakes all get filed with Fix >>>>>>> Version = whatever release is next. Then at release time we can triage >>>>>>> the >>>>>>> list, making sure none might be a symptom of something that should block >>>>>>> the release. One modification to your proposal is that after manual >>>>>>> verification that it is safe to release I would move Fix Version to the >>>>>>> next release instead of closing, unless the issue really is fixed or >>>>>>> otherwise not reproducible. >>>>>>> >>>>>>> For automation, I wonder if there's something automatic already >>>>>>> available somewhere that would: >>>>>>> >>>>>>> - mark the Jenkins build to "Keep This Build Forever" >>>>>>> - be *very* careful to try to find an existing bug, else it will be >>>>>>> spam >>>>>>> - file bugs to "test-failures" component >>>>>>> - set Fix Version to the "next" - right now we have 2.7.1 (LTS), >>>>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the >>>>>>> smarts to choose 2.11.0 >>>>>>> >>>>>>> If not, I think doing this stuff manually is not that bad, assuming >>>>>>> we can stay fairly green. >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> There are a number of tests in our system that are either flaky or >>>>>>>> permanently red. I am suggesting to add, if not all, then most of the >>>>>>>> tests >>>>>>>> (style, unit, integration, etc) to the release validation step. In this >>>>>>>> way, we will add a regular cadence to ensuring greenness and no flaky >>>>>>>> tests >>>>>>>> in Beam. >>>>>>>> >>>>>>>> There are a number of ways of implementing this, but what I think >>>>>>>> might work the best is to set up a process that either manually or >>>>>>>> automatically creates a JIRA for the failing test and assigns it to a >>>>>>>> component tagged with the release number. The release can then continue >>>>>>>> when all JIRAs are closed by either fixing the failure or manually >>>>>>>> testing >>>>>>>> to ensure no adverse side effects (this is in case there are >>>>>>>> environmental >>>>>>>> issues in the testing infrastructure or otherwise). >>>>>>>> >>>>>>>> Thanks for reading, what do you think? >>>>>>>> - Is there another, easier way to ensure that no test failures go >>>>>>>> unfixed? >>>>>>>> - Can the process be automated? >>>>>>>> - What am I missing? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Sam >>>>>>>> >>>>>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> Got feedback? tinyurl.com/swegner-feedback >>>>> >>>> > > -- > > > > > Got feedback? tinyurl.com/swegner-feedback >
