+1

Although we should be cautious when enabling this policy. We have decent
backlog of bugs that we need to plumb through.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Jan 10, 2019 at 11:44 AM Scott Wegner <[email protected]> wrote:

> +1, this sounds good to me.
>
> I believe the next step would be to open a PR to add this to the release
> guide:
> https://github.com/apache/beam/blob/master/website/src/contribute/release-guide.md
>
> On Wed, Jan 9, 2019 at 12:04 PM Sam Rohde <[email protected]> wrote:
>
>> Cool, thanks for all of the replies. Does this summary sound reasonable?
>>
>> *Problem:* there are a number of failing tests (including flaky) that
>> don't get looked at, and aren't necessarily green upon cutting a new Beam
>> release.
>>
>> *Proposed Solution:*
>>
>>    - Add all tests to the release validation
>>    - For all failing tests (including flaky) create a JIRA attached to
>>    the Beam release and add to the "test-failures" component*
>>    - If a test is continuously failing
>>          - fix it
>>          - add fix to release
>>          - close out JIRA
>>       - If a test is flaky
>>          - try and fix it
>>          - If fixed
>>             - add fix to release
>>             - close out JIRA
>>          - else
>>             - manually test it
>>             - modify "Fix Version" to next release
>>          - The release validation can continue when all JIRAs are closed
>>    out.
>>
>> *Why this is an improvement:*
>>
>>    - Ensures that every test is a valid signal (as opposed to disabling
>>    failing tests)
>>    - Creates an incentive to automate tests (no longer on the hook to
>>    manually test)
>>    - Creates a forcing-function to fix flaky tests (once fixed, no
>>    longer needs to be manually tested)
>>    - Ensures that every failing test gets looked at
>>
>> *Why this may not be an improvement:*
>>
>>    - More effort for release validation
>>    - May slow down release velocity
>>
>> * for brevity, this might be better to create a JIRA per component
>> containing a summary of failing tests
>>
>>
>> -Sam
>>
>>
>>
>>
>> On Tue, Jan 8, 2019 at 10:25 AM Ahmet Altay <[email protected]> wrote:
>>
>>>
>>>
>>> On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]> wrote:
>>>>
>>>>> For reference, there are currently 34 unresolved JIRA issues under the
>>>>> test-failures component [1].
>>>>>
>>>>> [1]
>>>>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>
>>>>
>>>> And there are 19 labeled with flake or sickbay:
>>>> https://issues.apache.org/jira/issues/?filter=12343195
>>>>
>>>>
>>>>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>> This is a a good idea. Some suggestions:
>>>>>> - It would be nicer if we can figure out process to act on flaky test
>>>>>> more frequently than releases.
>>>>>>
>>>>>
>>>> Any ideas? We could just have some cadence and try to establish the
>>>> practice of having a deflake thread every couple of weeks? How about we add
>>>> it to release verification as a first step and then continue to discuss?
>>>>
>>>
>>> Sounds great. I do not know enough JIRA, but I am hoping that a solution
>>> can come in the form of tooling. If we could configure JIRA with SLOs per
>>> issue type, we could have customized reports on which issues are not
>>> getting enough attention and then do a load balance among us.
>>>
>>>
>>>>
>>>> - Another improvement in the process would be having actual owners of
>>>>>> issues rather than auto assigned component owners. A few folks have 100+
>>>>>> assigned issues. Unassigning those issues, and finding owners who would
>>>>>> have time to work on identified flaky tests would be helpful.
>>>>>>
>>>>>
>>>> Yikes. Two issues here:
>>>>
>>>>  - sounds like Jira component owners aren't really working for us as a
>>>> first point of contact for triage
>>>>  - a person shouldn't really have more than 5 Jira assigned, or if you
>>>> get really loose maybe 20 (I am guilty of having 30 at this moment...)
>>>>
>>>> Maybe this is one or two separate threads?
>>>>
>>>
>>> I can fork this to another thread. I think both issues are related
>>> because components owners are more likely to be in this situaion. I agree
>>> with assessment of two issues.
>>>
>>>
>>>>
>>>> Kenn
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I love this idea. It can easily feel like bugs filed for Jenkins
>>>>>>> flakes/failures just get lost if there is no process for looking them 
>>>>>>> over
>>>>>>> regularly.
>>>>>>>
>>>>>>> I would suggest that test failures / flakes all get filed with Fix
>>>>>>> Version = whatever release is next. Then at release time we can triage 
>>>>>>> the
>>>>>>> list, making sure none might be a symptom of something that should block
>>>>>>> the release. One modification to your proposal is that after manual
>>>>>>> verification that it is safe to release I would move Fix Version to the
>>>>>>> next release instead of closing, unless the issue really is fixed or
>>>>>>> otherwise not reproducible.
>>>>>>>
>>>>>>> For automation, I wonder if there's something automatic already
>>>>>>> available somewhere that would:
>>>>>>>
>>>>>>>  - mark the Jenkins build to "Keep This Build Forever"
>>>>>>>  - be *very* careful to try to find an existing bug, else it will be
>>>>>>> spam
>>>>>>>  - file bugs to "test-failures" component
>>>>>>>  - set Fix Version to the "next" - right now we have 2.7.1 (LTS),
>>>>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the
>>>>>>> smarts to choose 2.11.0
>>>>>>>
>>>>>>> If not, I think doing this stuff manually is not that bad, assuming
>>>>>>> we can stay fairly green.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> There are a number of tests in our system that are either flaky or
>>>>>>>> permanently red. I am suggesting to add, if not all, then most of the 
>>>>>>>> tests
>>>>>>>> (style, unit, integration, etc) to the release validation step. In this
>>>>>>>> way, we will add a regular cadence to ensuring greenness and no flaky 
>>>>>>>> tests
>>>>>>>> in Beam.
>>>>>>>>
>>>>>>>> There are a number of ways of implementing this, but what I think
>>>>>>>> might work the best is to set up a process that either manually or
>>>>>>>> automatically creates a JIRA for the failing test and assigns it to a
>>>>>>>> component tagged with the release number. The release can then continue
>>>>>>>> when all JIRAs are closed by either fixing the failure or manually 
>>>>>>>> testing
>>>>>>>> to ensure no adverse side effects (this is in case there are 
>>>>>>>> environmental
>>>>>>>> issues in the testing infrastructure or otherwise).
>>>>>>>>
>>>>>>>> Thanks for reading, what do you think?
>>>>>>>> - Is there another, easier way to ensure that no test failures go
>>>>>>>> unfixed?
>>>>>>>> - Can the process be automated?
>>>>>>>> - What am I missing?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sam
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Got feedback? tinyurl.com/swegner-feedback
>>>>>
>>>>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>

Reply via email to