On 08/19/2016 12:12 PM, Erno Kuvaja wrote:
On Fri, Aug 19, 2016 at 10:53 AM, Hugh Brock <hbr...@redhat.com> wrote:
On Fri, Aug 19, 2016 at 11:41 AM, Derek Higgins <der...@redhat.com> wrote:
On 19 August 2016 at 00:07, Sagi Shnaidman <sshna...@redhat.com> wrote:
Hi,

we have a problem again with not enough memory in HA jobs, all of them
constantly fails in CI: http://status-tripleoci.rhcloud.com/

Have we any idea why we need more memory all of a sudden? For months
the overcloud nodes have had 5G of RAM, then last week[1] we bumped it
too 5.5G now we need it bumped too 6G.

If a new service has been added that is needed on the overcloud then
bumping to 6G is expected and probably the correct answer but I'd like
to see us avoiding blindly increasing the resources each time we see
out of memory errors without investigating if there was a regression
causing something to start hogging memory.

Sorry if it seems like I'm being picky about this (I seem to resist
these bumps every time they come up) but there are two good reasons to
avoid this if possible
o at peak we are currently configured to run 75 simultaneous jobs
(although we probably don't reach that at the moment), and each HA job
has 5 baremetal nodes so bumping from 5G too 6G increases the amount
of RAM ci can use at peak by 375G
o When we bump the RAM usage of baremetal nodes from 5G too 6G what
we're actually doing is increasing the minimum requirements for
developers from 28G(or whatever the number is now) too 32G

So before we bump the number can we just check first if its justified,
as I've watched this number increase from 2G since we started running
tripleo-ci

thanks,
Derek.

[1] - https://review.openstack.org/#/c/353655/

Wondering if it makes sense to enable any but the most basic overcloud
services in TripleO CI. The idea of using some type of on-demand job
for services other than the ones needed for the ping test has been
proposed elsewhere -- maybe this should be our default mode for
TripleO CI. Thoughts?

--Hugh

Problem with periodic jobs are that the results are bit hidden and 1
to 2 people care about them when they happen to have time. OTOH if I
understand correctly we don't test the services even now, just that
their deployment goes through without failures.

we do some testing of the overcloud in the gate jobs, we actually deploy a heat stack in the overcloud [1], creating a volume based nova guest (backed by Ceph for HA job), set some routing and ping it (in network isolation!)

1. https://github.com/openstack-infra/tripleo-ci/blob/master/templates/tenantvm_floatingip.yaml
--
Giulio Fidente
GPG KEY: 08D733BA | IRC: gfidente

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to