On Mon, Jan 22, 2018 at 6:55 AM, Or Idgar <oid...@redhat.com> wrote:
> Hi, > Still having timeouts but now in tripleo-heat-templates experimental gates > (tripleo-ci-centos-7-ovb-fakeha-caserver and tripleo-ci-centos-7-ovb-ha- > tempest-oooq). > > see examples: > http://logs.openstack.org/31/518331/23/experimental- > tripleo/tripleo-ci-centos-7-ovb-fakeha-caserver/7502e82/ > http://logs.openstack.org/31/518331/23/experimental- > tripleo/tripleo-ci-centos-7-ovb-ha-tempest-oooq/46e8e0d/ > > Anyone have an idea what we can do to fix it? > > Thanks, > Idgar > > On Sat, Jan 20, 2018 at 4:38 AM, Paul Belanger <pabelan...@redhat.com> > wrote: > >> On Fri, Jan 19, 2018 at 11:23:45AM -0600, Ben Nemec wrote: >> > >> > >> > On 01/18/2018 09:45 AM, Emilien Macchi wrote: >> > > On Thu, Jan 18, 2018 at 6:34 AM, Or Idgar <oid...@redhat.com> wrote: >> > > > Hi, >> > > > we're encountering many timeouts for zuul gates in TripleO. >> > > > For example, see >> > > > http://logs.openstack.org/95/508195/28/check-tripleo/tripleo >> -ci-centos-7-ovb-ha-oooq/c85fcb7/. >> > > > >> > > > rechecks won't help and sometimes specific gate is end successfully >> and >> > > > sometimes not. >> > > > The problem is that after recheck it's not always the same gate >> which is >> > > > failed. >> > > > >> > > > Is there someone who have access to the servers load to see what >> cause this? >> > > > alternatively, is there something we can do in order to reduce the >> running >> > > > time for each gate? >> > > >> > > We're migrating to RDO Cloud for OVB jobs: >> > > https://review.openstack.org/#/c/526481/ >> > > It's a work in progress but will help a lot for OVB timeouts on RH1. >> > > >> > > I'll let the CI folks comment on that topic. >> > > >> > >> > I noticed that the timeouts on rh1 have been especially bad as of late >> so I >> > did a little testing and found that it did seem to be running more >> slowly >> > than it should. After some investigation I found that 6 of our compute >> > nodes have warning messages that the cpu was throttled due to high >> > temperature. I've disabled 4 of them that had a lot of warnings. The >> other >> > 2 only had a handful of warnings so I'm hopeful we can leave them active >> > without affecting job performance too much. It won't accomplish much >> if we >> > disable the overheating nodes only to overload the remaining ones. >> > >> > I'll follow up with our hardware people and see if we can determine why >> > these specific nodes are overheating. They seem to be running 20 >> degrees C >> > hotter than the rest of the nodes. >> > >> Did tripleo-test-cloud-rh1 get new kernels applied for meltdown / spectre, >> possible that is impacting performance too? >> >> -Paul >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > > -- > Best regards, > Or Idgar > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > FYI.. we created a lp to track decommissioning the ovb jobs on rh1 and moving them to third party ci. Up for comments https://bugs.launchpad.net/tripleo/+bug/1744763
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev