+Boyuan Zhang <[email protected]> who is modifying the rc validation script

I'm thinking of a small change to the proposed process brought to my
attention from Boyuan.

Instead of running the additional validation tests during the rc
validation, run the tests and the proposed process after the release branch
has been cut. A couple of reasons why:

   - The additional validation tests (PostCommit and PreCommit) don't run
   against the RC and are instead run against the branch. This is confusing
   considering the other tests in the RC validation step are per RC.
   - The additional validation tests are expensive.

The final release process would look like:

   - Decide to release
   - Create a new version in JIRA
   - Triage release-blocking issue in JIRAs
   - Review release notes in JIRA
   - Create a release branch
   - Verify that a release builds
   - >>> Verify that a release passes its tests <<< (this is where the new
   process would be added)
   - Build/test/fix RCs
   - >>> Fix any issues <<< (all JIRAs created during the new process will
   have to be closed by here)
   - Finalize the release
   - Promote the release




On Thu, Jan 10, 2019 at 4:32 PM Kenneth Knowles <[email protected]> wrote:

> What do you think about crowd-sourcing?
>
> 1. Fix Version = 2.10.0
> 2. If assigned, ping ticket and maybe assignee, unassign if unresponsive
> 3. If unassigned, assign it to yourself while thinking about it
> 4. If you can route it a bit closer to someone who might know, great
> 5. If it doesn't look like a blocker (after routing best you can), Fix
> Version = 2.11.0
>
> I think this has enough mutexes that there should be no duplicated work if
> it is followed. And every step is a standard use of Fix Version and
> Assignee so there's not really special policy needed.
>
> Kenn
>
> On Thu, Jan 10, 2019 at 4:25 PM Mikhail Gryzykhin <[email protected]>
> wrote:
>
>> +1
>>
>> Although we should be cautious when enabling this policy. We have decent
>> backlog of bugs that we need to plumb through.
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Thu, Jan 10, 2019 at 11:44 AM Scott Wegner <[email protected]> wrote:
>>
>>> +1, this sounds good to me.
>>>
>>> I believe the next step would be to open a PR to add this to the release
>>> guide:
>>> https://github.com/apache/beam/blob/master/website/src/contribute/release-guide.md
>>>
>>> On Wed, Jan 9, 2019 at 12:04 PM Sam Rohde <[email protected]> wrote:
>>>
>>>> Cool, thanks for all of the replies. Does this summary sound reasonable?
>>>>
>>>> *Problem:* there are a number of failing tests (including flaky) that
>>>> don't get looked at, and aren't necessarily green upon cutting a new Beam
>>>> release.
>>>>
>>>> *Proposed Solution:*
>>>>
>>>>    - Add all tests to the release validation
>>>>    - For all failing tests (including flaky) create a JIRA attached to
>>>>    the Beam release and add to the "test-failures" component*
>>>>    - If a test is continuously failing
>>>>          - fix it
>>>>          - add fix to release
>>>>          - close out JIRA
>>>>       - If a test is flaky
>>>>          - try and fix it
>>>>          - If fixed
>>>>             - add fix to release
>>>>             - close out JIRA
>>>>          - else
>>>>             - manually test it
>>>>             - modify "Fix Version" to next release
>>>>          - The release validation can continue when all JIRAs are
>>>>    closed out.
>>>>
>>>> *Why this is an improvement:*
>>>>
>>>>    - Ensures that every test is a valid signal (as opposed to
>>>>    disabling failing tests)
>>>>    - Creates an incentive to automate tests (no longer on the hook to
>>>>    manually test)
>>>>    - Creates a forcing-function to fix flaky tests (once fixed, no
>>>>    longer needs to be manually tested)
>>>>    - Ensures that every failing test gets looked at
>>>>
>>>> *Why this may not be an improvement:*
>>>>
>>>>    - More effort for release validation
>>>>    - May slow down release velocity
>>>>
>>>> * for brevity, this might be better to create a JIRA per component
>>>> containing a summary of failing tests
>>>>
>>>>
>>>> -Sam
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 8, 2019 at 10:25 AM Ahmet Altay <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]> wrote:
>>>>>>
>>>>>>> For reference, there are currently 34 unresolved JIRA issues under
>>>>>>> the test-failures component [1].
>>>>>>>
>>>>>>> [1]
>>>>>>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>>>
>>>>>>
>>>>>> And there are 19 labeled with flake or sickbay:
>>>>>> https://issues.apache.org/jira/issues/?filter=12343195
>>>>>>
>>>>>>
>>>>>>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]> wrote:
>>>>>>>
>>>>>>>> This is a a good idea. Some suggestions:
>>>>>>>> - It would be nicer if we can figure out process to act on flaky
>>>>>>>> test more frequently than releases.
>>>>>>>>
>>>>>>>
>>>>>> Any ideas? We could just have some cadence and try to establish the
>>>>>> practice of having a deflake thread every couple of weeks? How about we 
>>>>>> add
>>>>>> it to release verification as a first step and then continue to discuss?
>>>>>>
>>>>>
>>>>> Sounds great. I do not know enough JIRA, but I am hoping that a
>>>>> solution can come in the form of tooling. If we could configure JIRA with
>>>>> SLOs per issue type, we could have customized reports on which issues are
>>>>> not getting enough attention and then do a load balance among us.
>>>>>
>>>>>
>>>>>>
>>>>>> - Another improvement in the process would be having actual owners of
>>>>>>>> issues rather than auto assigned component owners. A few folks have 
>>>>>>>> 100+
>>>>>>>> assigned issues. Unassigning those issues, and finding owners who would
>>>>>>>> have time to work on identified flaky tests would be helpful.
>>>>>>>>
>>>>>>>
>>>>>> Yikes. Two issues here:
>>>>>>
>>>>>>  - sounds like Jira component owners aren't really working for us as
>>>>>> a first point of contact for triage
>>>>>>  - a person shouldn't really have more than 5 Jira assigned, or if
>>>>>> you get really loose maybe 20 (I am guilty of having 30 at this 
>>>>>> moment...)
>>>>>>
>>>>>> Maybe this is one or two separate threads?
>>>>>>
>>>>>
>>>>> I can fork this to another thread. I think both issues are related
>>>>> because components owners are more likely to be in this situaion. I agree
>>>>> with assessment of two issues.
>>>>>
>>>>>
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I love this idea. It can easily feel like bugs filed for Jenkins
>>>>>>>>> flakes/failures just get lost if there is no process for looking them 
>>>>>>>>> over
>>>>>>>>> regularly.
>>>>>>>>>
>>>>>>>>> I would suggest that test failures / flakes all get filed with Fix
>>>>>>>>> Version = whatever release is next. Then at release time we can 
>>>>>>>>> triage the
>>>>>>>>> list, making sure none might be a symptom of something that should 
>>>>>>>>> block
>>>>>>>>> the release. One modification to your proposal is that after manual
>>>>>>>>> verification that it is safe to release I would move Fix Version to 
>>>>>>>>> the
>>>>>>>>> next release instead of closing, unless the issue really is fixed or
>>>>>>>>> otherwise not reproducible.
>>>>>>>>>
>>>>>>>>> For automation, I wonder if there's something automatic already
>>>>>>>>> available somewhere that would:
>>>>>>>>>
>>>>>>>>>  - mark the Jenkins build to "Keep This Build Forever"
>>>>>>>>>  - be *very* careful to try to find an existing bug, else it will
>>>>>>>>> be spam
>>>>>>>>>  - file bugs to "test-failures" component
>>>>>>>>>  - set Fix Version to the "next" - right now we have 2.7.1 (LTS),
>>>>>>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the
>>>>>>>>> smarts to choose 2.11.0
>>>>>>>>>
>>>>>>>>> If not, I think doing this stuff manually is not that bad,
>>>>>>>>> assuming we can stay fairly green.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> There are a number of tests in our system that are either flaky
>>>>>>>>>> or permanently red. I am suggesting to add, if not all, then most of 
>>>>>>>>>> the
>>>>>>>>>> tests (style, unit, integration, etc) to the release validation 
>>>>>>>>>> step. In
>>>>>>>>>> this way, we will add a regular cadence to ensuring greenness and no 
>>>>>>>>>> flaky
>>>>>>>>>> tests in Beam.
>>>>>>>>>>
>>>>>>>>>> There are a number of ways of implementing this, but what I think
>>>>>>>>>> might work the best is to set up a process that either manually or
>>>>>>>>>> automatically creates a JIRA for the failing test and assigns it to a
>>>>>>>>>> component tagged with the release number. The release can then 
>>>>>>>>>> continue
>>>>>>>>>> when all JIRAs are closed by either fixing the failure or manually 
>>>>>>>>>> testing
>>>>>>>>>> to ensure no adverse side effects (this is in case there are 
>>>>>>>>>> environmental
>>>>>>>>>> issues in the testing infrastructure or otherwise).
>>>>>>>>>>
>>>>>>>>>> Thanks for reading, what do you think?
>>>>>>>>>> - Is there another, easier way to ensure that no test failures go
>>>>>>>>>> unfixed?
>>>>>>>>>> - Can the process be automated?
>>>>>>>>>> - What am I missing?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>
>>>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback
>>>
>>

Reply via email to