What do you think about crowd-sourcing? 1. Fix Version = 2.10.0 2. If assigned, ping ticket and maybe assignee, unassign if unresponsive 3. If unassigned, assign it to yourself while thinking about it 4. If you can route it a bit closer to someone who might know, great 5. If it doesn't look like a blocker (after routing best you can), Fix Version = 2.11.0
I think this has enough mutexes that there should be no duplicated work if it is followed. And every step is a standard use of Fix Version and Assignee so there's not really special policy needed. Kenn On Thu, Jan 10, 2019 at 4:25 PM Mikhail Gryzykhin <[email protected]> wrote: > +1 > > Although we should be cautious when enabling this policy. We have decent > backlog of bugs that we need to plumb through. > > --Mikhail > > Have feedback <http://go/migryz-feedback>? > > > On Thu, Jan 10, 2019 at 11:44 AM Scott Wegner <[email protected]> wrote: > >> +1, this sounds good to me. >> >> I believe the next step would be to open a PR to add this to the release >> guide: >> https://github.com/apache/beam/blob/master/website/src/contribute/release-guide.md >> >> On Wed, Jan 9, 2019 at 12:04 PM Sam Rohde <[email protected]> wrote: >> >>> Cool, thanks for all of the replies. Does this summary sound reasonable? >>> >>> *Problem:* there are a number of failing tests (including flaky) that >>> don't get looked at, and aren't necessarily green upon cutting a new Beam >>> release. >>> >>> *Proposed Solution:* >>> >>> - Add all tests to the release validation >>> - For all failing tests (including flaky) create a JIRA attached to >>> the Beam release and add to the "test-failures" component* >>> - If a test is continuously failing >>> - fix it >>> - add fix to release >>> - close out JIRA >>> - If a test is flaky >>> - try and fix it >>> - If fixed >>> - add fix to release >>> - close out JIRA >>> - else >>> - manually test it >>> - modify "Fix Version" to next release >>> - The release validation can continue when all JIRAs are >>> closed out. >>> >>> *Why this is an improvement:* >>> >>> - Ensures that every test is a valid signal (as opposed to disabling >>> failing tests) >>> - Creates an incentive to automate tests (no longer on the hook to >>> manually test) >>> - Creates a forcing-function to fix flaky tests (once fixed, no >>> longer needs to be manually tested) >>> - Ensures that every failing test gets looked at >>> >>> *Why this may not be an improvement:* >>> >>> - More effort for release validation >>> - May slow down release velocity >>> >>> * for brevity, this might be better to create a JIRA per component >>> containing a summary of failing tests >>> >>> >>> -Sam >>> >>> >>> >>> >>> On Tue, Jan 8, 2019 at 10:25 AM Ahmet Altay <[email protected]> wrote: >>> >>>> >>>> >>>> On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]> wrote: >>>>> >>>>>> For reference, there are currently 34 unresolved JIRA issues under >>>>>> the test-failures component [1]. >>>>>> >>>>>> [1] >>>>>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >>>>>> >>>>> >>>>> And there are 19 labeled with flake or sickbay: >>>>> https://issues.apache.org/jira/issues/?filter=12343195 >>>>> >>>>> >>>>>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]> wrote: >>>>>> >>>>>>> This is a a good idea. Some suggestions: >>>>>>> - It would be nicer if we can figure out process to act on flaky >>>>>>> test more frequently than releases. >>>>>>> >>>>>> >>>>> Any ideas? We could just have some cadence and try to establish the >>>>> practice of having a deflake thread every couple of weeks? How about we >>>>> add >>>>> it to release verification as a first step and then continue to discuss? >>>>> >>>> >>>> Sounds great. I do not know enough JIRA, but I am hoping that a >>>> solution can come in the form of tooling. If we could configure JIRA with >>>> SLOs per issue type, we could have customized reports on which issues are >>>> not getting enough attention and then do a load balance among us. >>>> >>>> >>>>> >>>>> - Another improvement in the process would be having actual owners of >>>>>>> issues rather than auto assigned component owners. A few folks have 100+ >>>>>>> assigned issues. Unassigning those issues, and finding owners who would >>>>>>> have time to work on identified flaky tests would be helpful. >>>>>>> >>>>>> >>>>> Yikes. Two issues here: >>>>> >>>>> - sounds like Jira component owners aren't really working for us as a >>>>> first point of contact for triage >>>>> - a person shouldn't really have more than 5 Jira assigned, or if you >>>>> get really loose maybe 20 (I am guilty of having 30 at this moment...) >>>>> >>>>> Maybe this is one or two separate threads? >>>>> >>>> >>>> I can fork this to another thread. I think both issues are related >>>> because components owners are more likely to be in this situaion. I agree >>>> with assessment of two issues. >>>> >>>> >>>>> >>>>> Kenn >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I love this idea. It can easily feel like bugs filed for Jenkins >>>>>>>> flakes/failures just get lost if there is no process for looking them >>>>>>>> over >>>>>>>> regularly. >>>>>>>> >>>>>>>> I would suggest that test failures / flakes all get filed with Fix >>>>>>>> Version = whatever release is next. Then at release time we can triage >>>>>>>> the >>>>>>>> list, making sure none might be a symptom of something that should >>>>>>>> block >>>>>>>> the release. One modification to your proposal is that after manual >>>>>>>> verification that it is safe to release I would move Fix Version to the >>>>>>>> next release instead of closing, unless the issue really is fixed or >>>>>>>> otherwise not reproducible. >>>>>>>> >>>>>>>> For automation, I wonder if there's something automatic already >>>>>>>> available somewhere that would: >>>>>>>> >>>>>>>> - mark the Jenkins build to "Keep This Build Forever" >>>>>>>> - be *very* careful to try to find an existing bug, else it will >>>>>>>> be spam >>>>>>>> - file bugs to "test-failures" component >>>>>>>> - set Fix Version to the "next" - right now we have 2.7.1 (LTS), >>>>>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the >>>>>>>> smarts to choose 2.11.0 >>>>>>>> >>>>>>>> If not, I think doing this stuff manually is not that bad, assuming >>>>>>>> we can stay fairly green. >>>>>>>> >>>>>>>> Kenn >>>>>>>> >>>>>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> There are a number of tests in our system that are either flaky or >>>>>>>>> permanently red. I am suggesting to add, if not all, then most of the >>>>>>>>> tests >>>>>>>>> (style, unit, integration, etc) to the release validation step. In >>>>>>>>> this >>>>>>>>> way, we will add a regular cadence to ensuring greenness and no flaky >>>>>>>>> tests >>>>>>>>> in Beam. >>>>>>>>> >>>>>>>>> There are a number of ways of implementing this, but what I think >>>>>>>>> might work the best is to set up a process that either manually or >>>>>>>>> automatically creates a JIRA for the failing test and assigns it to a >>>>>>>>> component tagged with the release number. The release can then >>>>>>>>> continue >>>>>>>>> when all JIRAs are closed by either fixing the failure or manually >>>>>>>>> testing >>>>>>>>> to ensure no adverse side effects (this is in case there are >>>>>>>>> environmental >>>>>>>>> issues in the testing infrastructure or otherwise). >>>>>>>>> >>>>>>>>> Thanks for reading, what do you think? >>>>>>>>> - Is there another, easier way to ensure that no test failures go >>>>>>>>> unfixed? >>>>>>>>> - Can the process be automated? >>>>>>>>> - What am I missing? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Got feedback? tinyurl.com/swegner-feedback >>>>>> >>>>> >> >> -- >> >> >> >> >> Got feedback? tinyurl.com/swegner-feedback >> >
