On Tue, Jan 8, 2019 at 8:25 AM Kenneth Knowles <[email protected]> wrote:

>
>
> On Tue, Jan 8, 2019 at 7:52 AM Scott Wegner <[email protected]> wrote:
>
>> For reference, there are currently 34 unresolved JIRA issues under the
>> test-failures component [1].
>>
>> [1]
>> https://issues.apache.org/jira/browse/BEAM-6280?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>
>
> And there are 19 labeled with flake or sickbay:
> https://issues.apache.org/jira/issues/?filter=12343195
>
>
>> On Mon, Jan 7, 2019 at 4:03 PM Ahmet Altay <[email protected]> wrote:
>>
>>> This is a a good idea. Some suggestions:
>>> - It would be nicer if we can figure out process to act on flaky test
>>> more frequently than releases.
>>>
>>
> Any ideas? We could just have some cadence and try to establish the
> practice of having a deflake thread every couple of weeks? How about we add
> it to release verification as a first step and then continue to discuss?
>

Sounds great. I do not know enough JIRA, but I am hoping that a solution
can come in the form of tooling. If we could configure JIRA with SLOs per
issue type, we could have customized reports on which issues are not
getting enough attention and then do a load balance among us.


>
> - Another improvement in the process would be having actual owners of
>>> issues rather than auto assigned component owners. A few folks have 100+
>>> assigned issues. Unassigning those issues, and finding owners who would
>>> have time to work on identified flaky tests would be helpful.
>>>
>>
> Yikes. Two issues here:
>
>  - sounds like Jira component owners aren't really working for us as a
> first point of contact for triage
>  - a person shouldn't really have more than 5 Jira assigned, or if you get
> really loose maybe 20 (I am guilty of having 30 at this moment...)
>
> Maybe this is one or two separate threads?
>

I can fork this to another thread. I think both issues are related because
components owners are more likely to be in this situaion. I agree with
assessment of two issues.


>
> Kenn
>
>
>>
>>>
>>> On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <[email protected]> wrote:
>>>
>>>> I love this idea. It can easily feel like bugs filed for Jenkins
>>>> flakes/failures just get lost if there is no process for looking them over
>>>> regularly.
>>>>
>>>> I would suggest that test failures / flakes all get filed with Fix
>>>> Version = whatever release is next. Then at release time we can triage the
>>>> list, making sure none might be a symptom of something that should block
>>>> the release. One modification to your proposal is that after manual
>>>> verification that it is safe to release I would move Fix Version to the
>>>> next release instead of closing, unless the issue really is fixed or
>>>> otherwise not reproducible.
>>>>
>>>> For automation, I wonder if there's something automatic already
>>>> available somewhere that would:
>>>>
>>>>  - mark the Jenkins build to "Keep This Build Forever"
>>>>  - be *very* careful to try to find an existing bug, else it will be
>>>> spam
>>>>  - file bugs to "test-failures" component
>>>>  - set Fix Version to the "next" - right now we have 2.7.1 (LTS),
>>>> 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the
>>>> smarts to choose 2.11.0
>>>>
>>>> If not, I think doing this stuff manually is not that bad, assuming we
>>>> can stay fairly green.
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <[email protected]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> There are a number of tests in our system that are either flaky or
>>>>> permanently red. I am suggesting to add, if not all, then most of the 
>>>>> tests
>>>>> (style, unit, integration, etc) to the release validation step. In this
>>>>> way, we will add a regular cadence to ensuring greenness and no flaky 
>>>>> tests
>>>>> in Beam.
>>>>>
>>>>> There are a number of ways of implementing this, but what I think
>>>>> might work the best is to set up a process that either manually or
>>>>> automatically creates a JIRA for the failing test and assigns it to a
>>>>> component tagged with the release number. The release can then continue
>>>>> when all JIRAs are closed by either fixing the failure or manually testing
>>>>> to ensure no adverse side effects (this is in case there are environmental
>>>>> issues in the testing infrastructure or otherwise).
>>>>>
>>>>> Thanks for reading, what do you think?
>>>>> - Is there another, easier way to ensure that no test failures go
>>>>> unfixed?
>>>>> - Can the process be automated?
>>>>> - What am I missing?
>>>>>
>>>>> Regards,
>>>>> Sam
>>>>>
>>>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>

Reply via email to