> -----Original Message----- > From: Anita Kuno [mailto:ante...@anteaya.info] > Sent: 01 July 2014 14:42 > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [third-party-ci][neutron] What is "Success" > exactly? > > On 06/30/2014 09:13 PM, Jay Pipes wrote: > > On 06/30/2014 07:08 PM, Anita Kuno wrote: > >> On 06/30/2014 04:22 PM, Jay Pipes wrote: > >>> Hi Stackers, > >>> > >>> Some recent ML threads [1] and a hot IRC meeting today [2] brought > >>> up some legitimate questions around how a newly-proposed > >>> Stackalytics report page for Neutron External CI systems [2] > >>> represented the results of an external CI system as "successful" or > not. > >>> > >>> First, I want to say that Ilya and all those involved in the > >>> Stackalytics program simply want to provide the most accurate > >>> information to developers in a format that is easily consumed. While > >>> there need to be some changes in how data is shown (and the wording > >>> of things like "Tests Succeeded"), I hope that the community knows > >>> there isn't any ill intent on the part of Mirantis or anyone who > >>> works on Stackalytics. OK, so let's keep the conversation civil -- > >>> we're all working towards the same goals of transparency and > >>> accuracy. :) > >>> > >>> Alright, now, Anita and Kurt Taylor were asking a very poignant > >>> question: > >>> > >>> "But what does CI tested really mean? just running tests? or tested > >>> to pass some level of requirements?" > >>> > >>> In this nascent world of external CI systems, we have a set of > >>> issues that we need to resolve: > >>> > >>> 1) All of the CI systems are different. > >>> > >>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate > >>> scripts. Others run custom Python code that spawns VMs and publishes > >>> logs to some public domain. > >>> > >>> As a community, we need to decide whether it is worth putting in the > >>> effort to create a single, unified, installable and runnable CI > >>> system, so that we can legitimately say "all of the external systems > >>> are identical, with the exception of the driver code for vendor X > >>> being substituted in the Neutron codebase." > >>> > >>> If the goal of the external CI systems is to produce reliable, > >>> consistent results, I feel the answer to the above is "yes", but I'm > >>> interested to hear what others think. Frankly, in the world of > >>> benchmarks, it would be unthinkable to say "go ahead and everyone > >>> run your own benchmark suite", because you would get wildly > >>> different results. A similar problem has emerged here. > >>> > >>> 2) There is no mediation or verification that the external CI system > >>> is actually testing anything at all > >>> > >>> As a community, we need to decide whether the current system of > >>> self-policing should continue. If it should, then language on > >>> reports like [3] should be very clear that any numbers derived from > >>> such systems should be taken with a grain of salt. Use of the word > >>> "Success" should be avoided, as it has connotations (in English, at > >>> least) that the result has been verified, which is simply not the > >>> case as long as no verification or mediation occurs for any external > CI system. > >>> > >>> 3) There is no clear indication of what tests are being run, and > >>> therefore there is no clear indication of what "success" is > >>> > >>> I think we can all agree that a test has three possible outcomes: > >>> pass, fail, and skip. The results of a test suite run therefore is > >>> nothing more than the aggregation of which tests passed, which > >>> failed, and which were skipped. > >>> > >>> As a community, we must document, for each project, what are > >>> expected set of tests that must be run for each merged patch into > >>> the project's source tree. This documentation should be discoverable > >>> so that reports like [3] can be crystal-clear on what the data shown > >>> actually means. The report is simply displaying the data it receives > >>> from Gerrit. The community needs to be proactive in saying "this is > >>> what is expected to be tested." This alone would allow the report to > >>> give information such as "External CI system ABC performed the > expected tests. X tests passed. > >>> Y tests failed. Z tests were skipped." Likewise, it would also make > >>> it possible for the report to give information such as "External CI > >>> system DEF did not perform the expected tests.", which is excellent > >>> information in and of itself. > >>> > >>> === > >>> > >>> In thinking about the likely answers to the above questions, I > >>> believe it would be prudent to change the Stackalytics report in > >>> question [3] in the following ways: > >>> > >>> a. Change the "Success %" column header to "% Reported +1 Votes" > >>> b. Change the phrase " Green cell - tests ran successfully, red cell > >>> - tests failed" to "Green cell - System voted +1, red cell - System > >>> voted -1" > >>> > >>> and then, when we have more and better data (for example, # tests > >>> passed, failed, skipped, etc), we can provide more detailed > >>> information than just "reported +1" or not. > >>> > >>> Thoughts? > >>> > >>> Best, > >>> -jay > >>> > >>> [1] > >>> http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933. > >>> html > >>> [2] > >>> http://eavesdrop.openstack.org/meetings/third_party/2014/third_party > >>> .2014-06-30-18.01.log.html > >>> > >>> > >>> [3] http://stackalytics.com/report/ci/neutron/7 > >>> > >>> _______________________________________________ > >>> OpenStack-dev mailing list > >>> OpenStack-dev@lists.openstack.org > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> Hi Jay: > >> > >> Thanks for starting this thread. You raise some interesting > questions. > >> > >> The question I had identified as needing definition is "what > >> algorithm do we use to assess fitness of a third party ci system". > >> > >> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstac > >> k-infra.2014-06-30.log > >> > >> timestamp 2014-06-30T19:23:40 > >> > >> This is the question that is top of mind for me. > > > > Right, my email above is written to say "unless there is a) uniformity > > of the external CI system, b) agreement on mediation or verification > > of said systems, and c) agreement on what tests shall be expected to > > pass and be skipped for each project, then no such algorithm is really > > possible." > > > > Now, if the community is willing to agree to a), b), and c), then > > certainly there is the ability to determine the fitness of a CI system > > -- at least in regards to its output (test results and the voting on > > the Gerrit system). > > > > Barring agreement on any or all of those three things, I recommended > > changing the language on the report due to the inability to have any > > consistently-applied algorithm to determine fitness. > > > > Best, > > -jay > >
+1 to all of your points above, Jay. Well-written, thank you. > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > I've been mulling this over and looking at how I assess feedback I get > from different human reviewers, since I don't know the basis of how they > arrive at their decisions unless they tell me and/or I have experience > with their criteria for how they review my patches. > > I get different value from different human reviewers based upon my > experience of them reviewing my patches, my experience of them reviewing > other people's patches, my experience reviewing their code and my > discussions with them in channel, on the mailing list and in person, as > well as my experience reading or becoming aware of other decisions they > make. > > It would be really valuable for me personally to have a page in gerrit > for each third party ci account, where I could sign in and leave > comments or vote +/-1 or 0 as a way of giving feedback to the > maintainers of that system. Also others could do the same and I could > read their feedback. For instance, yesterday someone linked me to logs > that forced me to download them to read. I hadn't been made aware this > account had been doing this, but this developer was aware. Currently we > have no system for a developer, in the course of their normal workflow, > to leave a comment and/or vote on a third party ci system to give those > maintainers feedback about how they are doing at providing consumable > artifacts from their system. > > It also would remove the perception that I'm just a big meany, since > developers could comment for themselves, directly on the account, how > they feel about having to download tarballs, or sign into other systems > to trigger a recheck. The community of developers would say how fit a > system is or isn't since they are the individuals having to dig through > logs and evaluate "did this build fail because the code needs > adjustment" or not, and can reflect their findings in a comment and vote > on the system. > > The other thing I really value about gerrit is that votes can change, > systems can improve, given motivation and accurate feedback for making > changes. > > I have no idea how hard this would be to create, but I think having > direct feedback from developers on systems would help both the > developers and the maintainers of ci systems. > > There are a number of people working really hard to do a good job in > this area. This sort of structure would also provide support and > encouragement to those people providing leadership in this space, people > asking good questions, helping other system maintainers, starting > discussions, offering patches to infra (and reviewing infra patches) in > accordance with the goals of the third party meeting[0] and other hard- > to-measure valuable decisions that provide value for the community. > I'd really like a way we all can demonstrate the extent to which we > value these contributions. > > So far, those are my thoughts. > > Thanks, > Anita. +1 - this sounds like a really good idea. How is feedback on the Openstack check/gate retrieved and moderated? Can that provide a model for doing what you suggest here? > > > [0] > https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_Part > y_meetings > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev