Since you brought up the entirety of the process, I would suggest to move the release branch cut up like so:
- Decide to release - Create a new version in JIRA - Find a recent green commit (according to postcommit) - Create a release branch from that commit - Bump the version on master (green PR w/ parent at the green commit) - Triage release-blocking JIRAs - ... Notes: - Choosing postcommit signal to cut means we already have the signal and we aren't tempted to wait on master - Cutting before triage starts stabilization process ASAP and gives clear signal on the burndown Kenn On Tue, Jan 15, 2019 at 1:25 PM Sam Rohde <[email protected]> wrote: > +Boyuan Zhang <[email protected]> who is modifying the rc validation > script > > I'm thinking of a small change to the proposed process brought to my > attention from Boyuan. > > Instead of running the additional validation tests during the rc > validation, run the tests and the proposed process after the release branch > has been cut. A couple of reasons why: > > - The additional validation tests (PostCommit and PreCommit) don't run > against the RC and are instead run against the branch. This is confusing > considering the other tests in the RC validation step are per RC. > - The additional validation tests are expensive. > > The final release process would look like: > > - Decide to release > - Create a new version in JIRA > - Triage release-blocking issue in JIRAs > - Review release notes in JIRA > - Create a release branch > - Verify that a release builds > - >>> Verify that a release passes its tests <<< (this is where the > new process would be added) > - Build/test/fix RCs > - >>> Fix any issues <<< (all JIRAs created during the new process > will have to be closed by here) > - Finalize the release > - Promote the release > > > > > On Thu, Jan 10, 2019 at 4:32 PM Kenneth Knowles <[email protected]> wrote: > >> What do you think about crowd-sourcing? >> >> 1. Fix Version = 2.10.0 >> 2. If assigned, ping ticket and maybe assignee, unassign if unresponsive >> 3. If unassigned, assign it to yourself while thinking about it >> 4. If you can route it a bit closer to someone who might know, great >> 5. If it doesn't look like a blocker (after routing best you can), Fix >> Version = 2.11.0 >> >> I think this has enough mutexes that there should be no duplicated work >> if it is followed. And every step is a standard use of Fix Version and >> Assignee so there's not really special policy needed. >> >> Kenn >> >> On Thu, Jan 10, 2019 at 4:25 PM Mikhail Gryzykhin <[email protected]> >> wrote: >> >>> +1 >>> >>> Although we should be cautious when enabling this policy. We have decent >>> backlog of bugs that we need to plumb through. >>> >>> --Mikhail >>> >>> Have feedback <http://go/migryz-feedback>? >>> >>> >>> On Thu, Jan 10, 2019 at 11:44 AM Scott Wegner <[email protected]> wrote: >>> >>>> +1, this sounds good to me. >>>> >>>> I believe the next step would be to open a PR to add this to the >>>> release guide: >>>> https://github.com/apache/beam/blob/master/website/src/contribute/release-guide.md >>>> >>>> On Wed, Jan 9, 2019 at 12:04 PM Sam Rohde <[email protected]> wrote: >>>> >>>>> Cool, thanks for all of the replies. Does this summary sound >>>>> reasonable? >>>>> >>>>> *Problem:* there are a number of failing tests (including flaky) that >>>>> don't get looked at, and aren't necessarily green upon cutting a new Beam >>>>> release. >>>>> >>>>> *Proposed Solution:* >>>>> >>>>> - Add all tests to the release validation >>>>> - For all failing tests (including flaky) create a JIRA attached >>>>> to the Beam release and add to the "test-failures" component* >>>>> - If a test is continuously failing >>>>> - fix it >>>>> - add fix to release >>>>> - close out JIRA >>>>> - If a test is flaky >>>>> - try and fix it >>>>> - If fixed >>>>> - add fix to release >>>>> - close out JIRA >>>>> - else >>>>> - manually test it >>>>> - modify "Fix Version" to next release >>>>> - The release validation can continue when all JIRAs are >>>>> closed out. >>>>> >>>>> *Why this is an improvement:* >>>>> >>>>> - Ensures that every test is a valid signal (as opposed to >>>>> disabling failing tests) >>>>> - Creates an incentive to automate tests (no longer on the hook to >>>>> manually test) >>>>> - Creates a forcing-function to fix flaky tests (once fixed, no >>>>> longer needs to be manually tested) >>>>> - Ensures that every failing test gets looked at >>>>> >>>>> *Why this may not be an improvement:* >>>>> >>>>> - More effort for release validation >>>>> - May slow down release velocity >>>>> >>>>> * for brevity, this might be better to create a JIRA per component >>>>> containing a summary of failing tests >>>>> >>>>> >>>>> -Sam >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Jan 8, 2019 at 10:25 AM Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> For reference, there are currently 34 unresolved JIRA issues under >>>>>>>> the test-failures component [1]. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >>>>>>>> >>>>>>> >>>>>>> And there are 19 labeled with flake or sickbay: >>>>>>> https://issues.apache.org/jira/issues/?filter=12343195 >>>>>>> >>>>>>> >>>>>>>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> This is a a good idea. Some suggestions: >>>>>>>>> - It would be nicer if we can figure out process to act on flaky >>>>>>>>> test more frequently than releases. >>>>>>>>> >>>>>>>> >>>>>>> Any ideas? We could just have some cadence and try to establish the >>>>>>> practice of having a deflake thread every couple of weeks? How about we >>>>>>> add >>>>>>> it to release verification as a first step and then continue to discuss? >>>>>>> >>>>>> >>>>>> Sounds great. I do not know enough JIRA, but I am hoping that a >>>>>> solution can come in the form of tooling. If we could configure JIRA with >>>>>> SLOs per issue type, we could have customized reports on which issues are >>>>>> not getting enough attention and then do a load balance among us. >>>>>> >>>>>> >>>>>>> >>>>>>> - Another improvement in the process would be having actual owners >>>>>>>>> of issues rather than auto assigned component owners. A few folks >>>>>>>>> have 100+ >>>>>>>>> assigned issues. Unassigning those issues, and finding owners who >>>>>>>>> would >>>>>>>>> have time to work on identified flaky tests would be helpful. >>>>>>>>> >>>>>>>> >>>>>>> Yikes. Two issues here: >>>>>>> >>>>>>> - sounds like Jira component owners aren't really working for us as >>>>>>> a first point of contact for triage >>>>>>> - a person shouldn't really have more than 5 Jira assigned, or if >>>>>>> you get really loose maybe 20 (I am guilty of having 30 at this >>>>>>> moment...) >>>>>>> >>>>>>> Maybe this is one or two separate threads? >>>>>>> >>>>>> >>>>>> I can fork this to another thread. I think both issues are related >>>>>> because components owners are more likely to be in this situaion. I agree >>>>>> with assessment of two issues. >>>>>> >>>>>> >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I love this idea. It can easily feel like bugs filed for Jenkins >>>>>>>>>> flakes/failures just get lost if there is no process for looking >>>>>>>>>> them over >>>>>>>>>> regularly. >>>>>>>>>> >>>>>>>>>> I would suggest that test failures / flakes all get filed with >>>>>>>>>> Fix Version = whatever release is next. Then at release time we can >>>>>>>>>> triage >>>>>>>>>> the list, making sure none might be a symptom of something that >>>>>>>>>> should >>>>>>>>>> block the release. One modification to your proposal is that after >>>>>>>>>> manual >>>>>>>>>> verification that it is safe to release I would move Fix Version to >>>>>>>>>> the >>>>>>>>>> next release instead of closing, unless the issue really is fixed or >>>>>>>>>> otherwise not reproducible. >>>>>>>>>> >>>>>>>>>> For automation, I wonder if there's something automatic already >>>>>>>>>> available somewhere that would: >>>>>>>>>> >>>>>>>>>> - mark the Jenkins build to "Keep This Build Forever" >>>>>>>>>> - be *very* careful to try to find an existing bug, else it will >>>>>>>>>> be spam >>>>>>>>>> - file bugs to "test-failures" component >>>>>>>>>> - set Fix Version to the "next" - right now we have 2.7.1 (LTS), >>>>>>>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the >>>>>>>>>> smarts to choose 2.11.0 >>>>>>>>>> >>>>>>>>>> If not, I think doing this stuff manually is not that bad, >>>>>>>>>> assuming we can stay fairly green. >>>>>>>>>> >>>>>>>>>> Kenn >>>>>>>>>> >>>>>>>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> There are a number of tests in our system that are either flaky >>>>>>>>>>> or permanently red. I am suggesting to add, if not all, then most >>>>>>>>>>> of the >>>>>>>>>>> tests (style, unit, integration, etc) to the release validation >>>>>>>>>>> step. In >>>>>>>>>>> this way, we will add a regular cadence to ensuring greenness and >>>>>>>>>>> no flaky >>>>>>>>>>> tests in Beam. >>>>>>>>>>> >>>>>>>>>>> There are a number of ways of implementing this, but what I >>>>>>>>>>> think might work the best is to set up a process that either >>>>>>>>>>> manually or >>>>>>>>>>> automatically creates a JIRA for the failing test and assigns it to >>>>>>>>>>> a >>>>>>>>>>> component tagged with the release number. The release can then >>>>>>>>>>> continue >>>>>>>>>>> when all JIRAs are closed by either fixing the failure or manually >>>>>>>>>>> testing >>>>>>>>>>> to ensure no adverse side effects (this is in case there are >>>>>>>>>>> environmental >>>>>>>>>>> issues in the testing infrastructure or otherwise). >>>>>>>>>>> >>>>>>>>>>> Thanks for reading, what do you think? >>>>>>>>>>> - Is there another, easier way to ensure that no test failures >>>>>>>>>>> go unfixed? >>>>>>>>>>> - Can the process be automated? >>>>>>>>>>> - What am I missing? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Got feedback? tinyurl.com/swegner-feedback >>>>>>>> >>>>>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> Got feedback? tinyurl.com/swegner-feedback >>>> >>>
