https://review.openstack.org/575264 just landed (and didn't timeout in check nor gate without recheck, so good sigh it helped to mitigate).
I've restore and rechecked some patches that I evacuated from the gate, please do not restore others or recheck or approve anything for now, and see how it goes with a few patches. We're still working with Steve on his patches to optimize the way we deploy containers on the registry and are investigating how we could make it faster with a proxy. Stay tuned and thanks for your patience. On Wed, Jun 13, 2018 at 5:50 PM, Emilien Macchi <emil...@redhat.com> wrote: > TL;DR: gate queue was 25h+, we put all patches from gate on standby, do > not restore/recheck until further announcement. > > We recently enabled the containerized undercloud for multinode jobs and we > believe this was a bit premature as the container download process wasn't > optimized so it's not pulling the mirrors for the same containers multiple > times yet. > It caused the job runtime to increase and probably the load on docker.io > mirrors hosted by OpenStack Infra to be a bit slower to provide the same > containers multiple times. The time taken to prepare containers on the > undercloud and then for the overcloud caused the jobs to randomly timeout > therefore the gate to fail in a high amount of times, so we decided to > remove all jobs from the gate by abandoning the patches temporarily (I have > them in my browser and will restore when things are stable again, please do > not touch anything). > > Steve Baker has been working on a series of patches that optimize the way > we prepare the containers but basically the workflow will be: > - pull containers needed for the undercloud into a local registry, using > infra mirror if available > - deploy the containerized undercloud > - pull containers needed for the overcloud minus the ones already pulled > for the undercloud, using infra mirror if available > - update containers on the overcloud > - deploy the containerized undercloud > > With that process, we hope to reduce the runtime of the deployment and > therefore reduce the timeouts in the gate. > To enable it, we need to land in that order: https://review. > openstack.org/#/c/571613/, https://review.openstack.org/#/c/574485/, > https://review.openstack.org/#/c/571631/ and https://review.openstack. > org/#/c/568403. > > In the meantime, we are disabling the containerized undercloud recently > enabled on all scenarios: https://review.openstack.org/#/c/575264/ for > mitigation with the hope to stabilize things until Steve's patches land. > Hopefully, we can merge Steve's work tonight/tomorrow and re-enable the > containerized undercloud on scenarios after checking that we don't have > timeouts and reasonable deployment runtimes. > > That's the plan we came with, if you have any question / feedback please > share it. > -- > Emilien, Steve and Wes > -- Emilien Macchi
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev