On Jul 3, 2014 8:57 AM, "Anita Kuno" <[email protected]> wrote: > > On 07/03/2014 06:22 AM, Sullivan, Jon Paul wrote: > >> -----Original Message----- > >> From: Anita Kuno [mailto:[email protected]] > >> Sent: 01 July 2014 14:42 > >> To: [email protected] > >> Subject: Re: [openstack-dev] [third-party-ci][neutron] What is "Success" > >> exactly? > >> > >> On 06/30/2014 09:13 PM, Jay Pipes wrote: > >>> On 06/30/2014 07:08 PM, Anita Kuno wrote: > >>>> On 06/30/2014 04:22 PM, Jay Pipes wrote: > >>>>> Hi Stackers, > >>>>> > >>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought > >>>>> up some legitimate questions around how a newly-proposed > >>>>> Stackalytics report page for Neutron External CI systems [2] > >>>>> represented the results of an external CI system as "successful" or > >> not. > >>>>> > >>>>> First, I want to say that Ilya and all those involved in the > >>>>> Stackalytics program simply want to provide the most accurate > >>>>> information to developers in a format that is easily consumed. While > >>>>> there need to be some changes in how data is shown (and the wording > >>>>> of things like "Tests Succeeded"), I hope that the community knows > >>>>> there isn't any ill intent on the part of Mirantis or anyone who > >>>>> works on Stackalytics. OK, so let's keep the conversation civil -- > >>>>> we're all working towards the same goals of transparency and > >>>>> accuracy. :) > >>>>> > >>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant > >>>>> question: > >>>>> > >>>>> "But what does CI tested really mean? just running tests? or tested > >>>>> to pass some level of requirements?" > >>>>> > >>>>> In this nascent world of external CI systems, we have a set of > >>>>> issues that we need to resolve: > >>>>> > >>>>> 1) All of the CI systems are different. > >>>>> > >>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate > >>>>> scripts. Others run custom Python code that spawns VMs and publishes > >>>>> logs to some public domain. > >>>>> > >>>>> As a community, we need to decide whether it is worth putting in the > >>>>> effort to create a single, unified, installable and runnable CI > >>>>> system, so that we can legitimately say "all of the external systems > >>>>> are identical, with the exception of the driver code for vendor X > >>>>> being substituted in the Neutron codebase." > >>>>> > >>>>> If the goal of the external CI systems is to produce reliable, > >>>>> consistent results, I feel the answer to the above is "yes", but I'm > >>>>> interested to hear what others think. Frankly, in the world of > >>>>> benchmarks, it would be unthinkable to say "go ahead and everyone > >>>>> run your own benchmark suite", because you would get wildly > >>>>> different results. A similar problem has emerged here. > >>>>> > >>>>> 2) There is no mediation or verification that the external CI system > >>>>> is actually testing anything at all > >>>>> > >>>>> As a community, we need to decide whether the current system of > >>>>> self-policing should continue. If it should, then language on > >>>>> reports like [3] should be very clear that any numbers derived from > >>>>> such systems should be taken with a grain of salt. Use of the word > >>>>> "Success" should be avoided, as it has connotations (in English, at > >>>>> least) that the result has been verified, which is simply not the > >>>>> case as long as no verification or mediation occurs for any external > >> CI system. > >>>>> > >>>>> 3) There is no clear indication of what tests are being run, and > >>>>> therefore there is no clear indication of what "success" is > >>>>> > >>>>> I think we can all agree that a test has three possible outcomes: > >>>>> pass, fail, and skip. The results of a test suite run therefore is > >>>>> nothing more than the aggregation of which tests passed, which > >>>>> failed, and which were skipped. > >>>>> > >>>>> As a community, we must document, for each project, what are > >>>>> expected set of tests that must be run for each merged patch into > >>>>> the project's source tree. This documentation should be discoverable > >>>>> so that reports like [3] can be crystal-clear on what the data shown > >>>>> actually means. The report is simply displaying the data it receives > >>>>> from Gerrit. The community needs to be proactive in saying "this is > >>>>> what is expected to be tested." This alone would allow the report to > >>>>> give information such as "External CI system ABC performed the > >> expected tests. X tests passed. > >>>>> Y tests failed. Z tests were skipped." Likewise, it would also make > >>>>> it possible for the report to give information such as "External CI > >>>>> system DEF did not perform the expected tests.", which is excellent > >>>>> information in and of itself. > >>>>> > >>>>> === > >>>>> > >>>>> In thinking about the likely answers to the above questions, I > >>>>> believe it would be prudent to change the Stackalytics report in > >>>>> question [3] in the following ways: > >>>>> > >>>>> a. Change the "Success %" column header to "% Reported +1 Votes" > >>>>> b. Change the phrase " Green cell - tests ran successfully, red cell > >>>>> - tests failed" to "Green cell - System voted +1, red cell - System > >>>>> voted -1" > >>>>> > >>>>> and then, when we have more and better data (for example, # tests > >>>>> passed, failed, skipped, etc), we can provide more detailed > >>>>> information than just "reported +1" or not. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> Best, > >>>>> -jay > >>>>> > >>>>> [1] > >>>>> http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933. > >>>>> html > >>>>> [2] > >>>>> http://eavesdrop.openstack.org/meetings/third_party/2014/third_party > >>>>> .2014-06-30-18.01.log.html > >>>>> > >>>>> > >>>>> [3] http://stackalytics.com/report/ci/neutron/7 > >>>>> > >>>>> _______________________________________________ > >>>>> OpenStack-dev mailing list > >>>>> [email protected] > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >>>> Hi Jay: > >>>> > >>>> Thanks for starting this thread. You raise some interesting > >> questions. > >>>> > >>>> The question I had identified as needing definition is "what > >>>> algorithm do we use to assess fitness of a third party ci system". > >>>> > >>>> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstac > >>>> k-infra.2014-06-30.log > >>>> > >>>> timestamp 2014-06-30T19:23:40 > >>>> > >>>> This is the question that is top of mind for me. > >>> > >>> Right, my email above is written to say "unless there is a) uniformity > >>> of the external CI system, b) agreement on mediation or verification > >>> of said systems, and c) agreement on what tests shall be expected to > >>> pass and be skipped for each project, then no such algorithm is really > >>> possible." > >>> > >>> Now, if the community is willing to agree to a), b), and c), then > >>> certainly there is the ability to determine the fitness of a CI system > >>> -- at least in regards to its output (test results and the voting on > >>> the Gerrit system). > >>> > >>> Barring agreement on any or all of those three things, I recommended > >>> changing the language on the report due to the inability to have any > >>> consistently-applied algorithm to determine fitness. > >>> > >>> Best, > >>> -jay > >>> > > > > +1 to all of your points above, Jay. Well-written, thank you. > > > >>> _______________________________________________ > >>> OpenStack-dev mailing list > >>> [email protected] > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> I've been mulling this over and looking at how I assess feedback I get > >> from different human reviewers, since I don't know the basis of how they > >> arrive at their decisions unless they tell me and/or I have experience > >> with their criteria for how they review my patches. > >> > >> I get different value from different human reviewers based upon my > >> experience of them reviewing my patches, my experience of them reviewing > >> other people's patches, my experience reviewing their code and my > >> discussions with them in channel, on the mailing list and in person, as > >> well as my experience reading or becoming aware of other decisions they > >> make. > >> > >> It would be really valuable for me personally to have a page in gerrit > >> for each third party ci account, where I could sign in and leave > >> comments or vote +/-1 or 0 as a way of giving feedback to the > >> maintainers of that system. Also others could do the same and I could > >> read their feedback. For instance, yesterday someone linked me to logs > >> that forced me to download them to read. I hadn't been made aware this > >> account had been doing this, but this developer was aware. Currently we > >> have no system for a developer, in the course of their normal workflow, > >> to leave a comment and/or vote on a third party ci system to give those > >> maintainers feedback about how they are doing at providing consumable > >> artifacts from their system. > >> > >> It also would remove the perception that I'm just a big meany, since > >> developers could comment for themselves, directly on the account, how > >> they feel about having to download tarballs, or sign into other systems > >> to trigger a recheck. The community of developers would say how fit a > >> system is or isn't since they are the individuals having to dig through > >> logs and evaluate "did this build fail because the code needs > >> adjustment" or not, and can reflect their findings in a comment and vote > >> on the system. > >> > >> The other thing I really value about gerrit is that votes can change, > >> systems can improve, given motivation and accurate feedback for making > >> changes. > >> > >> I have no idea how hard this would be to create, but I think having > >> direct feedback from developers on systems would help both the > >> developers and the maintainers of ci systems. > >> > >> There are a number of people working really hard to do a good job in > >> this area. This sort of structure would also provide support and > >> encouragement to those people providing leadership in this space, people > >> asking good questions, helping other system maintainers, starting > >> discussions, offering patches to infra (and reviewing infra patches) in > >> accordance with the goals of the third party meeting[0] and other hard- > >> to-measure valuable decisions that provide value for the community. > >> I'd really like a way we all can demonstrate the extent to which we > >> value these contributions. > >> > >> So far, those are my thoughts. > >> > >> Thanks, > >> Anita. > > > > +1 - this sounds like a really good idea. > > > > How is feedback on the Openstack check/gate retrieved and moderated? Can that provide a model for doing what you suggest here? > Hi Jon Paul: (Is it Jon Paul or Jon?) > > The OpenStack check/gate pipelines are assessed using a system we call > elastic recheck: http://status.openstack.org/elastic-recheck/ > > We use logstash to index log output and elastic search to then be able > to compose queries to evaluate the number of incidence of a given error > message (for example). Sample query: > http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries/1097592.yaml > > The elastic-recheck repo is here: > http://git.openstack.org/cgit/openstack-infra/elastic-recheck/ > > The gui is available at logstash.openstack.org. > > All the queries are written manually as yaml files and named with a > corresponding bug number: > http://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries > > Here is some documentation about elastic-recheck and how to write > queries: http://docs.openstack.org/infra/elastic-recheck/readme.html > > Joe Gordon actually has created some great graphs (where are those > hosted again, Joe?) to be able to evaluate failure rates in the > pipelines (check and gate) based on test groups (tempest, unit tests).
http://jogo.github.io/gate The data comes from graphite.openstack.org and is hosted off site because the data requires some interpretation and should be viewed with a grain of salt. > > Clark Boylan and Sean Dague did and do the majority of the heavy lifting > setting up and maintaining elastic-recheck (with lots of help from > others, thank you!) so perhaps they could offer their opinion on if this > is a reasonable choice for evaluating third party ci systems. > > Thanks Jon Paul, this is a good question, > Anita. > > > >> > >> > >> [0] > >> https://wiki.openstack.org/wiki/Meetings/ThirdParty#Goals_for_Third_Part > >> y_meetings > >> > >> _______________________________________________ > >> OpenStack-dev mailing list > >> [email protected] > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > > OpenStack-dev mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
