Re: [openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Anita Kuno Thu, 03 Jul 2014 15:09:07 -0700

On 07/03/2014 04:34 PM, Kevin Benton wrote:
> Yes, I can propose a spec for that. It probably won't be until Monday.
> Is that okay?
> 
Sure, that's fine. Thanks Kevin, I look forward to your spec once it is
up. Enjoy tomorrow. :D


Thanks Kevin,
Anita.
> 
> On Thu, Jul 3, 2014 at 11:42 AM, Anita Kuno <[email protected]> wrote:
> 
>> On 07/03/2014 02:33 PM, Kevin Benton wrote:
>>> Maybe we can require period checks against the head of the master
>>> branch (which should always pass) and build statistics based on the
>> results
>>> of that.
>> I like this suggestion. I really like this suggestion.
>>
>> Hmmmm, what to do with a good suggestion? I wonder if we could capture
>> it in an infra-spec and work on it from there.
>>
>> Would you feel comfortable offering a draft as an infra-spec and then
>> perhaps we can discuss the design through the spec?
>>
>> What do you think?
>>
>> Thanks Kevin,
>> Anita.
>>
>>> Otherwise it seems like we have to take a CI system's word for it
>>> that a particular patch indeed broke that system.
>>>
>>> --
>>> Kevin Benton
>>>
>>>
>>> On Thu, Jul 3, 2014 at 11:07 AM, Anita Kuno <[email protected]>
>> wrote:
>>>
>>>> On 07/03/2014 01:27 PM, Kevin Benton wrote:
>>>>>> This allows the viewer to see categories of reviews based upon their
>>>>>> divergence from OpenStack's Jenkins results. I think evaluating
>>>>>> divergence from Jenkins might be a metric worth consideration.
>>>>>
>>>>> I think the only thing this really reflects though is how much the
>> third
>>>>> party CI system is mirroring Jenkins.
>>>>> A system that frequently diverges may be functioning perfectly fine and
>>>>> just has a vastly different code path that it is integration testing so
>>>> it
>>>>> is legitimately detecting failures the OpenStack CI cannot.
>>>> Great.
>>>>
>>>> How do we measure the degree to which it is legitimately detecting
>>>> failures?
>>>>
>>>> Thanks Kevin,
>>>> Anita.
>>>>>
>>>>> --
>>>>> Kevin Benton
>>>>>
>>>>>
>>>>> On Thu, Jul 3, 2014 at 6:49 AM, Anita Kuno <[email protected]>
>> wrote:
>>>>>
>>>>>> On 07/03/2014 07:12 AM, Salvatore Orlando wrote:
>>>>>>> Apologies for quoting again the top post of the thread.
>>>>>>>
>>>>>>> Comments inline (mostly thinking aloud)
>>>>>>> Salvatore
>>>>>>>
>>>>>>>
>>>>>>> On 30 June 2014 22:22, Jay Pipes <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Stackers,
>>>>>>>>
>>>>>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought
>> up
>>>>>> some
>>>>>>>> legitimate questions around how a newly-proposed Stackalytics report
>>>>>> page
>>>>>>>> for Neutron External CI systems [2] represented the results of an
>>>>>> external
>>>>>>>> CI system as "successful" or not.
>>>>>>>>
>>>>>>>> First, I want to say that Ilya and all those involved in the
>>>>>> Stackalytics
>>>>>>>> program simply want to provide the most accurate information to
>>>>>> developers
>>>>>>>> in a format that is easily consumed. While there need to be some
>>>>>> changes in
>>>>>>>> how data is shown (and the wording of things like "Tests
>> Succeeded"),
>>>> I
>>>>>>>> hope that the community knows there isn't any ill intent on the part
>>>> of
>>>>>>>> Mirantis or anyone who works on Stackalytics. OK, so let's keep the
>>>>>>>> conversation civil -- we're all working towards the same goals of
>>>>>>>> transparency and accuracy. :)
>>>>>>>>
>>>>>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant
>>>>>> question:
>>>>>>>>
>>>>>>>> "But what does CI tested really mean? just running tests? or tested
>> to
>>>>>>>> pass some level of requirements?"
>>>>>>>>
>>>>>>>> In this nascent world of external CI systems, we have a set of
>> issues
>>>>>> that
>>>>>>>> we need to resolve:
>>>>>>>>
>>>>>>>> 1) All of the CI systems are different.
>>>>>>>>
>>>>>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate
>>>>>> scripts.
>>>>>>>> Others run custom Python code that spawns VMs and publishes logs to
>>>> some
>>>>>>>> public domain.
>>>>>>>>
>>>>>>>> As a community, we need to decide whether it is worth putting in the
>>>>>>>> effort to create a single, unified, installable and runnable CI
>>>> system,
>>>>>> so
>>>>>>>> that we can legitimately say "all of the external systems are
>>>> identical,
>>>>>>>> with the exception of the driver code for vendor X being substituted
>>>> in
>>>>>> the
>>>>>>>> Neutron codebase."
>>>>>>>>
>>>>>>>
>>>>>>> I think such system already exists, and it's documented here:
>>>>>>> http://ci.openstack.org/
>>>>>>> Still, understanding it is quite a learning curve, and running it is
>>>> not
>>>>>>> exactly straightforward. But I guess that's pretty much
>> understandable
>>>>>>> given the complexity of the system, isn't it?
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> If the goal of the external CI systems is to produce reliable,
>>>>>> consistent
>>>>>>>> results, I feel the answer to the above is "yes", but I'm interested
>>>> to
>>>>>>>> hear what others think. Frankly, in the world of benchmarks, it
>> would
>>>> be
>>>>>>>> unthinkable to say "go ahead and everyone run your own benchmark
>>>> suite",
>>>>>>>> because you would get wildly different results. A similar problem
>> has
>>>>>>>> emerged here.
>>>>>>>>
>>>>>>>
>>>>>>> I don't think the particular infrastructure which might range from an
>>>>>>> openstack-ci clone to a 100-line bash script would have an impact on
>>>> the
>>>>>>> "reliability" of the quality assessment regarding a particular driver
>>>> or
>>>>>>> plugin. This is determined, in my opinion, by the quantity and nature
>>>> of
>>>>>>> tests one runs on a specific driver. In Neutron for instance, there
>> is
>>>> a
>>>>>>> wide range of choices - from a few test cases in tempest.api.network
>> to
>>>>>> the
>>>>>>> full smoketest job. As long there is no minimal standard here, then
>> it
>>>>>>> would be difficult to assess the quality of the evaluation from a CI
>>>>>>> system, unless we explicitly keep into account coverage into the
>>>>>> evaluation.
>>>>>>>
>>>>>>> On the other hand, different CI infrastructures will have different
>>>>>> levels
>>>>>>> in terms of % of patches tested and % of infrastructure failures. I
>>>> think
>>>>>>> it might not be a terrible idea to use these parameters to evaluate
>> how
>>>>>>> good a CI is from an infra standpoint. However, there are still open
>>>>>>> questions. For instance, a CI might have a low patch % score because
>> it
>>>>>>> only needs to test patches affecting a given driver.
>>>>>>>
>>>>>>>
>>>>>>>> 2) There is no mediation or verification that the external CI system
>>>> is
>>>>>>>> actually testing anything at all
>>>>>>>>
>>>>>>>> As a community, we need to decide whether the current system of
>>>>>>>> self-policing should continue. If it should, then language on
>> reports
>>>>>> like
>>>>>>>> [3] should be very clear that any numbers derived from such systems
>>>>>> should
>>>>>>>> be taken with a grain of salt. Use of the word "Success" should be
>>>>>> avoided,
>>>>>>>> as it has connotations (in English, at least) that the result has
>> been
>>>>>>>> verified, which is simply not the case as long as no verification or
>>>>>>>> mediation occurs for any external CI system.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> 3) There is no clear indication of what tests are being run, and
>>>>>> therefore
>>>>>>>> there is no clear indication of what "success" is
>>>>>>>>
>>>>>>>> I think we can all agree that a test has three possible outcomes:
>>>> pass,
>>>>>>>> fail, and skip. The results of a test suite run therefore is nothing
>>>>>> more
>>>>>>>> than the aggregation of which tests passed, which failed, and which
>>>> were
>>>>>>>> skipped.
>>>>>>>>
>>>>>>>> As a community, we must document, for each project, what are
>> expected
>>>>>> set
>>>>>>>> of tests that must be run for each merged patch into the project's
>>>>>> source
>>>>>>>> tree. This documentation should be discoverable so that reports like
>>>> [3]
>>>>>>>> can be crystal-clear on what the data shown actually means. The
>> report
>>>>>> is
>>>>>>>> simply displaying the data it receives from Gerrit. The community
>>>> needs
>>>>>> to
>>>>>>>> be proactive in saying "this is what is expected to be tested." This
>>>>>> alone
>>>>>>>> would allow the report to give information such as "External CI
>> system
>>>>>> ABC
>>>>>>>> performed the expected tests. X tests passed. Y tests failed. Z
>> tests
>>>>>> were
>>>>>>>> skipped." Likewise, it would also make it possible for the report to
>>>>>> give
>>>>>>>> information such as "External CI system DEF did not perform the
>>>> expected
>>>>>>>> tests.", which is excellent information in and of itself.
>>>>>>>>
>>>>>>>>
>>>>>>> Agreed. In Neutron we have enforced CIs but not yet agreed on what's
>>>> the
>>>>>>> minimum set of tests we expect them to run. I reckon this will be
>> fixed
>>>>>>> soon.
>>>>>>>
>>>>>>> I'll try to look at what "SUCCESS" is from a naive standpoint: a CI
>>>> says
>>>>>>> "SUCCESS" if the test suite it rans passed; then one should have
>> means
>>>> to
>>>>>>> understand whether a CI might blatantly lie or tell "half truths".
>> For
>>>>>>> instance saying it passes tempest.api.network while
>>>>>>> tempest.scenario.test_network_basic_ops has not been executed is a
>> half
>>>>>>> truth, in my opinion.
>>>>>>> Stackalitycs can help here, I think. One could create "CI classes"
>>>>>>> according to how much they're close to the level of the upstream
>> gate,
>>>>>> and
>>>>>>> then parse results posted to classify CIs. Now, before cursing me, I
>>>>>>> totally understand that this won't be easy at all to implement!
>>>>>>> Furthermore, I don't know whether how this should be reflected in
>>>> gerrit.
>>>>>>>
>>>>>>>
>>>>>>>> ===
>>>>>>>>
>>>>>>>> In thinking about the likely answers to the above questions, I
>> believe
>>>>>> it
>>>>>>>> would be prudent to change the Stackalytics report in question [3]
>> in
>>>>>> the
>>>>>>>> following ways:
>>>>>>>>
>>>>>>>> a. Change the "Success %" column header to "% Reported +1 Votes"
>>>>>>>> b. Change the phrase " Green cell - tests ran successfully, red
>> cell -
>>>>>>>> tests failed" to "Green cell - System voted +1, red cell - System
>>>> voted
>>>>>> -1"
>>>>>>>>
>>>>>>>
>>>>>>> That makes sense to me.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> and then, when we have more and better data (for example, # tests
>>>>>> passed,
>>>>>>>> failed, skipped, etc), we can provide more detailed information than
>>>>>> just
>>>>>>>> "reported +1" or not.
>>>>>>>>
>>>>>>>
>>>>>>> I think it should not be too hard to start adding minimal measures
>> such
>>>>>> as
>>>>>>> "% of voted patches"
>>>>>>>
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> -jay
>>>>>>>>
>>>>>>>> [1] http://lists.openstack.org/pipermail/openstack-dev/2014-
>>>>>>>> June/038933.html
>>>>>>>> [2] http://eavesdrop.openstack.org/meetings/third_party/2014/
>>>>>>>> third_party.2014-06-30-18.01.log.html
>>>>>>>> [3] http://stackalytics.com/report/ci/neutron/7
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> OpenStack-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>> Thanks for sharing your thoughts, Salvadore.
>>>>>>
>>>>>> Some additional things to look at:
>>>>>>
>>>>>> Sean Dague has created a tool in stackforge gerrit-dash-creator:
>>>>>>
>>>>>>
>>>>
>> http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst
>>>>>> which has the ability to make interesting queries on gerrit results.
>> One
>>>>>> such example can be found here:
>> http://paste.openstack.org/show/85416/
>>>>>> (Note when this url was created there was a bug in the syntax and this
>>>>>> url works in chrome but not firefox, Sean tells me the firefox bug has
>>>>>> been addressed - though this url hasn't been altered with the new
>> syntax
>>>>>> yet)
>>>>>>
>>>>>> This allows the viewer to see categories of reviews based upon their
>>>>>> divergence from OpenStack's Jenkins results. I think evaluating
>>>>>> divergence from Jenkins might be a metric worth consideration.
>>>>>>
>>>>>> Also a gui representation worth looking at is Mikal Still's gui for
>>>>>> Neutron ci health:
>>>>>> http://www.rcbops.com/gerrit/reports/neutron-cireport.html
>>>>>> and Nova ci health:
>>>>>> http://www.rcbops.com/gerrit/reports/nova-cireport.html
>>>>>>
>>>>>> I don't know the details of how the graphs are calculated in these
>>>>>> pages, but being able to view passed/failed/missed and compare them to
>>>>>> Jenkins is an interesting approach and I feel has some merit.
>>>>>>
>>>>>> Thanks I think we are getting some good information out in this thread
>>>>>> and look forward to hearing more thoughts.
>>>>>>
>>>>>> Thank you,
>>>>>> Anita.
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> [email protected]
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> [email protected]
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> [email protected]
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> [email protected]
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> 
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> [email protected]
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [third-party-ci][neutron] What is "Success" exactly?

Reply via email to