On 10/22/2014 06:07 AM, Thierry Carrez wrote:
Ihar Hrachyshka wrote:
[...]
For stable branches, we have so called periodic jobs that are
triggered once in a while against the current code in a stable branch,
and report to openstack-stable-maint@ mailing list. An example of
failing periodic job report can be found at [2]. I envision that
similar approach can be applied to test auxiliary features in gate. So
once something is broken in master, the interested parties behind the
auxiliary feature will be informed in due time.
[...]
The main issue with periodic jobs is that since they are non-blocking,
they can get ignored really easily. It takes a bit of organization and
process to get those failures addressed.
It's only recently (and a lot thanks to you) that failures in the
periodic jobs for stable branches are being taken into account quickly
and seriously. For years the failures just lingered until they blocked
someone's work enough for that person to go and fix them.
So while I think periodic jobs are a good way to increase corner case
testing coverage, I am skeptical of our collective ability to have the
discipline necessary for them not to become a pain. We'll need a strict
process around them: identified groups of people signed up to act on
failure, and failure stats so that we can remove jobs that don't get
enough attention.
While I share some of your skepticism, we have to find a way to make
this work.
Saying we are doing our best to ensure the quality of upstream OpenStack
based on a single-tier of testing (the gate) that is limited to 40min runs
is not plausible. Of course a lot more testing happens downstream but we
can do better as a community. I think we should rephrase this subject as
"non-gating" jobs. We could have various kinds of stress and longevity
jobs running to good effect if we can solve this process problem.
Following on your process suggestion, in practice the most likely way
this could actually work is to have a rotation of "build guardians" that
agree to keep an eye on jobs for a short period of time. There would
need to be a separate rotation list for each project that has
non-gating, project-specific jobs. This will likely happen as we move
towards deeper functional testing in projects. The qa team would be the
logical pool for a rotation of more global jobs of the kind I think Ihar
was referring to.
As for failure status, each of these non-gating jobs would have their
own name so logstash could be used to debug failures. Do we already have
anything that tracks failure rates of jobs?
-David
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev