Hi All, I'm trying to get a handle on what needs to happen before getting tripleo-ci(toci) into the gate, I realize this may take some time but I'm trying to map out how to get to the end goal of putting multi node tripleo based deployments in the gate which should cover a lot of uses cases that devstact-gate doesn't. Here are some of the stages I think we need to achieve before being in the gate along with some questions where people may be able to fill in the blanks.
Stage 1: check - tripleo projects This is what we currently have running, 5 separate jobs running non voting checks against tripleo projects Stage 2 (a). reliability Obviously keeping the reliability of both the results and the ci system is a must and we should always aim towards 0% false test results, but is there an acceptable number of false negatives for example that would be acceptable to infa, what are the numbers on the gate at the moment? should we aim to match those at the very least (Maybe we already have). And for how long do we need to maintain those levels before considering the system proven? Stage 2 (b). speedup How long can the longest jobs take? We have plans in place to speed up our current jobs but what should the target be? 3. More Capacity I'm going to talk about RAM here as its probably the resource where we will hit our infrastructure limits first. Each time a suite of toci jobs is kicked off we currently kick off 5 jobs (which will double once Fedora is added[1]) In total these jobs spawn 15 vm's consuming 80G of RAM (its actually 120G to workaround a bug we will should soon have fixed[2]), we also have plans that will reduce this 80G further but lets stick with it for the moment. Some of these jobs complete after about 30 minutes but lets say our target is an overall average of 45 minutes. With Fedora that means each run will tie up 160G for 45 minutes. Or 160G can provide us with 32 runs (each including 10 jobs) per day So to kick off 500 (I made this number up) runs per day, we would need (500 / 32.0) * 160G = 2500G of RAM We then need to double this number to allow for redundancy, so thats 5000G of RAM We probably have about 3/4 of this available to us at the moment but its not evenly balanced between the 2 clouds so we're not covered from a redundancy point of view. So we need more hardware (either by expanding the clouds we have or added new clouds), I'd like for us to start a separate effort to map out exactly what our medium term goals should be, including o jobs we want to run o how long we expect each of them to take o how much ram each one would take so that we can roughly put together an idea of what our HW requirements will be. 4. check - all openstack projects Once we're happy we have the required capacity I think we can then move to check on all openstack projects 5. voting check - all projects Once we're happy that everybody is happy with reliability I think we can move to voting check 6. gate on all openstack projects And then finally when everything else lines up I think we can be added to the gate A) Gating with Ironic I bring this up because there was some confusion about ironic's status in the Gate at a recent tripleo meeting[3], when can tripleo's ironic jobs be part of the gate? Any thoughts? Am I way off with any of my assumptions? Is my maths correct? thanks, Derek. [1] https://review.openstack.org/#/q/status:open+topic:add-f20-jobs,n,z [2] https://bugs.launchpad.net/diskimage-builder/+bug/1289582 [3] http://eavesdrop.openstack.org/meetings/tripleo/2014/tripleo.2014-03-11-19.01.log.html _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev