On Wed, Aug 14, 2013 at 06:13:18PM -0300, Thierry Carrez wrote: > Matthew Treinish wrote: > > Also, if anyone has any input on what threshold they feel is good enough > > for this I'd welcome any input on that. For example, do we want to ensure > > a >= 1:1 match for job success? Or would something like 90% as stable as the > > serial job be good enough considering the speed advantage. (The parallel > > runs > > take about half as much time as a full serial run, the parallel job normally > > finishes in ~25-30min) Since this affects almost every project I don't want > > to > > define this threshold without input from everyone. > > I guess 90% would be the limit where we'd start questioning it. 95% as > stable then the speed improvement makes it definitely worth it IMHO. At > 85% we would introduce way too many new false negatives in the tests, > and those are painful to work around... >
So after fighting with the numbers a bit I'm having a hard time quantifying how much flakier the testr runs are in practice. There has been too much variability in the gate lately. It's also hard to quantify the failure rate vs serial when there are gate resets, since the parallel runs don't always show up as aborted because of the speed difference with the serial runs. That combined with there being multiple serial jobs in different configurations that are all gating makes just getting a percentage not the most straightforward problem. I can try to invest more time figuring out how to best visualize this using graphite. But, my gut feeling after watching zuul and the jenkins job is that it'll probably end up with a few more random failures (< 5) every couple of hours. The most common random fails that I've seen are documented with these 3 bugs: https://bugs.launchpad.net/tempest/+bug/1213209 https://bugs.launchpad.net/tempest/+bug/1213212 https://bugs.launchpad.net/tempest/+bug/1213215 However, the issues occurring in those bugs are subtle enough that I'm having trouble debugging them just from the gate logs, and so far I haven't been able to reproduce them locally. I think the only way to have them get the attention they need is to start gating with them being random failures. I'm thinking that it would be better to start gating on parallel now and debug these as they come up. Another option that I've thought about is making the testr-full jobs voting on the check queue. This way it will raise parallel failures to peoples attention but not increase the number of gate resets. The only tradeoff here is that it will make the voting gate jobs differ from the voting check jobs which is something that we try to avoid. So I'm not sure it's a real option. Assuming everyone is ok with green lighting parallel tempest with a couple of known bugs then the only real blocker right now is that neutron does not work with tempest in parallel. I'm looking into getting this working with neutron (there are a couple of issues with it right now), but if the demand for faster tempest is there we can keep the neutron-smoke jobs serial for the time being, and bring it parallel once it's ready. I really don't want to summarily decide whether it's ok to leave neutron serial for the time being, or whether the random the failure rate is low enough to make the switch now since it's a decision that affects almost all the projects. But, at the same time I don't think that anyone else is really watching the gate-tempest-devstack-vm-testr-full jobs. Does anyone else have an opinion on how we should proceed here? Thanks, Matt Treinish _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev