>This allows the viewer to see categories of reviews based upon their >divergence from OpenStack's Jenkins results. I think evaluating >divergence from Jenkins might be a metric worth consideration.
I think the only thing this really reflects though is how much the third party CI system is mirroring Jenkins. A system that frequently diverges may be functioning perfectly fine and just has a vastly different code path that it is integration testing so it is legitimately detecting failures the OpenStack CI cannot. -- Kevin Benton On Thu, Jul 3, 2014 at 6:49 AM, Anita Kuno <ante...@anteaya.info> wrote: > On 07/03/2014 07:12 AM, Salvatore Orlando wrote: > > Apologies for quoting again the top post of the thread. > > > > Comments inline (mostly thinking aloud) > > Salvatore > > > > > > On 30 June 2014 22:22, Jay Pipes <jaypi...@gmail.com> wrote: > > > >> Hi Stackers, > >> > >> Some recent ML threads [1] and a hot IRC meeting today [2] brought up > some > >> legitimate questions around how a newly-proposed Stackalytics report > page > >> for Neutron External CI systems [2] represented the results of an > external > >> CI system as "successful" or not. > >> > >> First, I want to say that Ilya and all those involved in the > Stackalytics > >> program simply want to provide the most accurate information to > developers > >> in a format that is easily consumed. While there need to be some > changes in > >> how data is shown (and the wording of things like "Tests Succeeded"), I > >> hope that the community knows there isn't any ill intent on the part of > >> Mirantis or anyone who works on Stackalytics. OK, so let's keep the > >> conversation civil -- we're all working towards the same goals of > >> transparency and accuracy. :) > >> > >> Alright, now, Anita and Kurt Taylor were asking a very poignant > question: > >> > >> "But what does CI tested really mean? just running tests? or tested to > >> pass some level of requirements?" > >> > >> In this nascent world of external CI systems, we have a set of issues > that > >> we need to resolve: > >> > >> 1) All of the CI systems are different. > >> > >> Some run Bash scripts. Some run Jenkins slaves and devstack-gate > scripts. > >> Others run custom Python code that spawns VMs and publishes logs to some > >> public domain. > >> > >> As a community, we need to decide whether it is worth putting in the > >> effort to create a single, unified, installable and runnable CI system, > so > >> that we can legitimately say "all of the external systems are identical, > >> with the exception of the driver code for vendor X being substituted in > the > >> Neutron codebase." > >> > > > > I think such system already exists, and it's documented here: > > http://ci.openstack.org/ > > Still, understanding it is quite a learning curve, and running it is not > > exactly straightforward. But I guess that's pretty much understandable > > given the complexity of the system, isn't it? > > > > > >> > >> If the goal of the external CI systems is to produce reliable, > consistent > >> results, I feel the answer to the above is "yes", but I'm interested to > >> hear what others think. Frankly, in the world of benchmarks, it would be > >> unthinkable to say "go ahead and everyone run your own benchmark suite", > >> because you would get wildly different results. A similar problem has > >> emerged here. > >> > > > > I don't think the particular infrastructure which might range from an > > openstack-ci clone to a 100-line bash script would have an impact on the > > "reliability" of the quality assessment regarding a particular driver or > > plugin. This is determined, in my opinion, by the quantity and nature of > > tests one runs on a specific driver. In Neutron for instance, there is a > > wide range of choices - from a few test cases in tempest.api.network to > the > > full smoketest job. As long there is no minimal standard here, then it > > would be difficult to assess the quality of the evaluation from a CI > > system, unless we explicitly keep into account coverage into the > evaluation. > > > > On the other hand, different CI infrastructures will have different > levels > > in terms of % of patches tested and % of infrastructure failures. I think > > it might not be a terrible idea to use these parameters to evaluate how > > good a CI is from an infra standpoint. However, there are still open > > questions. For instance, a CI might have a low patch % score because it > > only needs to test patches affecting a given driver. > > > > > >> 2) There is no mediation or verification that the external CI system is > >> actually testing anything at all > >> > >> As a community, we need to decide whether the current system of > >> self-policing should continue. If it should, then language on reports > like > >> [3] should be very clear that any numbers derived from such systems > should > >> be taken with a grain of salt. Use of the word "Success" should be > avoided, > >> as it has connotations (in English, at least) that the result has been > >> verified, which is simply not the case as long as no verification or > >> mediation occurs for any external CI system. > >> > > > > > > > > > >> 3) There is no clear indication of what tests are being run, and > therefore > >> there is no clear indication of what "success" is > >> > >> I think we can all agree that a test has three possible outcomes: pass, > >> fail, and skip. The results of a test suite run therefore is nothing > more > >> than the aggregation of which tests passed, which failed, and which were > >> skipped. > >> > >> As a community, we must document, for each project, what are expected > set > >> of tests that must be run for each merged patch into the project's > source > >> tree. This documentation should be discoverable so that reports like [3] > >> can be crystal-clear on what the data shown actually means. The report > is > >> simply displaying the data it receives from Gerrit. The community needs > to > >> be proactive in saying "this is what is expected to be tested." This > alone > >> would allow the report to give information such as "External CI system > ABC > >> performed the expected tests. X tests passed. Y tests failed. Z tests > were > >> skipped." Likewise, it would also make it possible for the report to > give > >> information such as "External CI system DEF did not perform the expected > >> tests.", which is excellent information in and of itself. > >> > >> > > Agreed. In Neutron we have enforced CIs but not yet agreed on what's the > > minimum set of tests we expect them to run. I reckon this will be fixed > > soon. > > > > I'll try to look at what "SUCCESS" is from a naive standpoint: a CI says > > "SUCCESS" if the test suite it rans passed; then one should have means to > > understand whether a CI might blatantly lie or tell "half truths". For > > instance saying it passes tempest.api.network while > > tempest.scenario.test_network_basic_ops has not been executed is a half > > truth, in my opinion. > > Stackalitycs can help here, I think. One could create "CI classes" > > according to how much they're close to the level of the upstream gate, > and > > then parse results posted to classify CIs. Now, before cursing me, I > > totally understand that this won't be easy at all to implement! > > Furthermore, I don't know whether how this should be reflected in gerrit. > > > > > >> === > >> > >> In thinking about the likely answers to the above questions, I believe > it > >> would be prudent to change the Stackalytics report in question [3] in > the > >> following ways: > >> > >> a. Change the "Success %" column header to "% Reported +1 Votes" > >> b. Change the phrase " Green cell - tests ran successfully, red cell - > >> tests failed" to "Green cell - System voted +1, red cell - System voted > -1" > >> > > > > That makes sense to me. > > > > > >> > >> and then, when we have more and better data (for example, # tests > passed, > >> failed, skipped, etc), we can provide more detailed information than > just > >> "reported +1" or not. > >> > > > > I think it should not be too hard to start adding minimal measures such > as > > "% of voted patches" > > > >> > >> Thoughts? > >> > >> Best, > >> -jay > >> > >> [1] http://lists.openstack.org/pipermail/openstack-dev/2014- > >> June/038933.html > >> [2] http://eavesdrop.openstack.org/meetings/third_party/2014/ > >> third_party.2014-06-30-18.01.log.html > >> [3] http://stackalytics.com/report/ci/neutron/7 > >> > >> _______________________________________________ > >> OpenStack-dev mailing list > >> OpenStack-dev@lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > Thanks for sharing your thoughts, Salvadore. > > Some additional things to look at: > > Sean Dague has created a tool in stackforge gerrit-dash-creator: > > http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst > which has the ability to make interesting queries on gerrit results. One > such example can be found here: http://paste.openstack.org/show/85416/ > (Note when this url was created there was a bug in the syntax and this > url works in chrome but not firefox, Sean tells me the firefox bug has > been addressed - though this url hasn't been altered with the new syntax > yet) > > This allows the viewer to see categories of reviews based upon their > divergence from OpenStack's Jenkins results. I think evaluating > divergence from Jenkins might be a metric worth consideration. > > Also a gui representation worth looking at is Mikal Still's gui for > Neutron ci health: > http://www.rcbops.com/gerrit/reports/neutron-cireport.html > and Nova ci health: > http://www.rcbops.com/gerrit/reports/nova-cireport.html > > I don't know the details of how the graphs are calculated in these > pages, but being able to view passed/failed/missed and compare them to > Jenkins is an interesting approach and I feel has some merit. > > Thanks I think we are getting some good information out in this thread > and look forward to hearing more thoughts. > > Thank you, > Anita. > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Kevin Benton
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev