Re: Add all tests to release validation

Kenneth Knowles Tue, 15 Jan 2019 14:47:28 -0800

Since you brought up the entirety of the process, I would suggest to move
the release branch cut up like so:


 - Decide to release
 - Create a new version in JIRA
 - Find a recent green commit (according to postcommit)
 - Create a release branch from that commit
 - Bump the version on master (green PR w/ parent at the green commit)
 - Triage release-blocking JIRAs
 - ...

Notes:

 - Choosing postcommit signal to cut means we already have the signal and
we aren't tempted to wait on master
 - Cutting before triage starts stabilization process ASAP and gives clear
signal on the burndown

Kenn


On Tue, Jan 15, 2019 at 1:25 PM Sam Rohde <[email protected]> wrote:

> +Boyuan Zhang <[email protected]> who is modifying the rc validation
> script
>
> I'm thinking of a small change to the proposed process brought to my
> attention from Boyuan.
>
> Instead of running the additional validation tests during the rc
> validation, run the tests and the proposed process after the release branch
> has been cut. A couple of reasons why:
>
>    - The additional validation tests (PostCommit and PreCommit) don't run
>    against the RC and are instead run against the branch. This is confusing
>    considering the other tests in the RC validation step are per RC.
>    - The additional validation tests are expensive.
>
> The final release process would look like:
>
>    - Decide to release
>    - Create a new version in JIRA
>    - Triage release-blocking issue in JIRAs
>    - Review release notes in JIRA
>    - Create a release branch
>    - Verify that a release builds
>    - >>> Verify that a release passes its tests <<< (this is where the
>    new process would be added)
>    - Build/test/fix RCs
>    - >>> Fix any issues <<< (all JIRAs created during the new process
>    will have to be closed by here)
>    - Finalize the release
>    - Promote the release
>
>
>
>
> On Thu, Jan 10, 2019 at 4:32 PM Kenneth Knowles <[email protected]> wrote:
>
>> What do you think about crowd-sourcing?
>>
>> 1. Fix Version = 2.10.0
>> 2. If assigned, ping ticket and maybe assignee, unassign if unresponsive
>> 3. If unassigned, assign it to yourself while thinking about it
>> 4. If you can route it a bit closer to someone who might know, great
>> 5. If it doesn't look like a blocker (after routing best you can), Fix
>> Version = 2.11.0
>>
>> I think this has enough mutexes that there should be no duplicated work
>> if it is followed. And every step is a standard use of Fix Version and
>> Assignee so there's not really special policy needed.
>>
>> Kenn
>>
>> On Thu, Jan 10, 2019 at 4:25 PM Mikhail Gryzykhin <[email protected]>
>> wrote:
>>
>>> +1
>>>
>>> Although we should be cautious when enabling this policy. We have decent
>>> backlog of bugs that we need to plumb through.
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Thu, Jan 10, 2019 at 11:44 AM Scott Wegner <[email protected]> wrote:
>>>
>>>> +1, this sounds good to me.
>>>>
>>>> I believe the next step would be to open a PR to add this to the
>>>> release guide:
>>>> https://github.com/apache/beam/blob/master/website/src/contribute/release-guide.md
>>>>
>>>> On Wed, Jan 9, 2019 at 12:04 PM Sam Rohde <[email protected]> wrote:
>>>>
>>>>> Cool, thanks for all of the replies. Does this summary sound
>>>>> reasonable?
>>>>>
>>>>> *Problem:* there are a number of failing tests (including flaky) that
>>>>> don't get looked at, and aren't necessarily green upon cutting a new Beam
>>>>> release.
>>>>>
>>>>> *Proposed Solution:*
>>>>>
>>>>>    - Add all tests to the release validation
>>>>>    - For all failing tests (including flaky) create a JIRA attached
>>>>>    to the Beam release and add to the "test-failures" component*
>>>>>    - If a test is continuously failing
>>>>>          - fix it
>>>>>          - add fix to release
>>>>>          - close out JIRA
>>>>>       - If a test is flaky
>>>>>          - try and fix it
>>>>>          - If fixed
>>>>>             - add fix to release
>>>>>             - close out JIRA
>>>>>          - else
>>>>>             - manually test it
>>>>>             - modify "Fix Version" to next release
>>>>>          - The release validation can continue when all JIRAs are
>>>>>    closed out.
>>>>>
>>>>> *Why this is an improvement:*
>>>>>
>>>>>    - Ensures that every test is a valid signal (as opposed to
>>>>>    disabling failing tests)
>>>>>    - Creates an incentive to automate tests (no longer on the hook to
>>>>>    manually test)
>>>>>    - Creates a forcing-function to fix flaky tests (once fixed, no
>>>>>    longer needs to be manually tested)
>>>>>    - Ensures that every failing test gets looked at
>>>>>
>>>>> *Why this may not be an improvement:*
>>>>>
>>>>>    - More effort for release validation
>>>>>    - May slow down release velocity
>>>>>
>>>>> * for brevity, this might be better to create a JIRA per component
>>>>> containing a summary of failing tests
>>>>>
>>>>>
>>>>> -Sam
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 8, 2019 at 10:25 AM Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> For reference, there are currently 34 unresolved JIRA issues under
>>>>>>>> the test-failures component [1].
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>>>>
>>>>>>>
>>>>>>> And there are 19 labeled with flake or sickbay:
>>>>>>> https://issues.apache.org/jira/issues/?filter=12343195
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> This is a a good idea. Some suggestions:
>>>>>>>>> - It would be nicer if we can figure out process to act on flaky
>>>>>>>>> test more frequently than releases.
>>>>>>>>>
>>>>>>>>
>>>>>>> Any ideas? We could just have some cadence and try to establish the
>>>>>>> practice of having a deflake thread every couple of weeks? How about we 
>>>>>>> add
>>>>>>> it to release verification as a first step and then continue to discuss?
>>>>>>>
>>>>>>
>>>>>> Sounds great. I do not know enough JIRA, but I am hoping that a
>>>>>> solution can come in the form of tooling. If we could configure JIRA with
>>>>>> SLOs per issue type, we could have customized reports on which issues are
>>>>>> not getting enough attention and then do a load balance among us.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - Another improvement in the process would be having actual owners
>>>>>>>>> of issues rather than auto assigned component owners. A few folks 
>>>>>>>>> have 100+
>>>>>>>>> assigned issues. Unassigning those issues, and finding owners who 
>>>>>>>>> would
>>>>>>>>> have time to work on identified flaky tests would be helpful.
>>>>>>>>>
>>>>>>>>
>>>>>>> Yikes. Two issues here:
>>>>>>>
>>>>>>>  - sounds like Jira component owners aren't really working for us as
>>>>>>> a first point of contact for triage
>>>>>>>  - a person shouldn't really have more than 5 Jira assigned, or if
>>>>>>> you get really loose maybe 20 (I am guilty of having 30 at this 
>>>>>>> moment...)
>>>>>>>
>>>>>>> Maybe this is one or two separate threads?
>>>>>>>
>>>>>>
>>>>>> I can fork this to another thread. I think both issues are related
>>>>>> because components owners are more likely to be in this situaion. I agree
>>>>>> with assessment of two issues.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I love this idea. It can easily feel like bugs filed for Jenkins
>>>>>>>>>> flakes/failures just get lost if there is no process for looking 
>>>>>>>>>> them over
>>>>>>>>>> regularly.
>>>>>>>>>>
>>>>>>>>>> I would suggest that test failures / flakes all get filed with
>>>>>>>>>> Fix Version = whatever release is next. Then at release time we can 
>>>>>>>>>> triage
>>>>>>>>>> the list, making sure none might be a symptom of something that 
>>>>>>>>>> should
>>>>>>>>>> block the release. One modification to your proposal is that after 
>>>>>>>>>> manual
>>>>>>>>>> verification that it is safe to release I would move Fix Version to 
>>>>>>>>>> the
>>>>>>>>>> next release instead of closing, unless the issue really is fixed or
>>>>>>>>>> otherwise not reproducible.
>>>>>>>>>>
>>>>>>>>>> For automation, I wonder if there's something automatic already
>>>>>>>>>> available somewhere that would:
>>>>>>>>>>
>>>>>>>>>>  - mark the Jenkins build to "Keep This Build Forever"
>>>>>>>>>>  - be *very* careful to try to find an existing bug, else it will
>>>>>>>>>> be spam
>>>>>>>>>>  - file bugs to "test-failures" component
>>>>>>>>>>  - set Fix Version to the "next" - right now we have 2.7.1 (LTS),
>>>>>>>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the
>>>>>>>>>> smarts to choose 2.11.0
>>>>>>>>>>
>>>>>>>>>> If not, I think doing this stuff manually is not that bad,
>>>>>>>>>> assuming we can stay fairly green.
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> There are a number of tests in our system that are either flaky
>>>>>>>>>>> or permanently red. I am suggesting to add, if not all, then most 
>>>>>>>>>>> of the
>>>>>>>>>>> tests (style, unit, integration, etc) to the release validation 
>>>>>>>>>>> step. In
>>>>>>>>>>> this way, we will add a regular cadence to ensuring greenness and 
>>>>>>>>>>> no flaky
>>>>>>>>>>> tests in Beam.
>>>>>>>>>>>
>>>>>>>>>>> There are a number of ways of implementing this, but what I
>>>>>>>>>>> think might work the best is to set up a process that either 
>>>>>>>>>>> manually or
>>>>>>>>>>> automatically creates a JIRA for the failing test and assigns it to 
>>>>>>>>>>> a
>>>>>>>>>>> component tagged with the release number. The release can then 
>>>>>>>>>>> continue
>>>>>>>>>>> when all JIRAs are closed by either fixing the failure or manually 
>>>>>>>>>>> testing
>>>>>>>>>>> to ensure no adverse side effects (this is in case there are 
>>>>>>>>>>> environmental
>>>>>>>>>>> issues in the testing infrastructure or otherwise).
>>>>>>>>>>>
>>>>>>>>>>> Thanks for reading, what do you think?
>>>>>>>>>>> - Is there another, easier way to ensure that no test failures
>>>>>>>>>>> go unfixed?
>>>>>>>>>>> - Can the process be automated?
>>>>>>>>>>> - What am I missing?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>
>>>

Re: Add all tests to release validation

Reply via email to