The latest recurring problem that is failing a lot of the nonha ssl jobs in tripleo-ci is:
https://bugs.launchpad.net/tripleo/+bug/1616144 tripleo-ci: nonha jobs failing with Unable to establish connection to https://192.0.2.2:13004/v1/a90407df1e7f4f80a38a1b1671ced2ff/stacks/overcloud/f9f6f712-8e89-4ea9-a34b-6084dc74b5c1 This error happens while polling for events from the overcloud stack by tripleoclient. I can reproduce this error very easily locally by deploying with an ssl undercloud with 6GB ram and 2 vcpus. If I don't enable swap, something gets OOM killed. If I do enable swap, swap gets used (< 1GB) and then I hit this error almost every time. The stack keeps deploying but the client has died, so the job fails. My investigation so far has only pointed out that it's the swap allocation that is delaying things enough to cause the failure. We do not see this error in the ha job even though it deploys more nodes. As of now, my only suspect is that it's the overhead of the initial SSL connections causing the error. If I test with 6GB ram and 4 vcpus I can't reproduce the error, although much more swap is used due to the increased number of default workers for each API service. However, I suggest we just raise the undercloud specs in our jobs to 8GB ram and 4 vcpus. These seem reasonable to me because those are the default specs used by infra in all of their devstack single and multinode jobs spawned on all their other cloud providers. Our own multinode job for the undercloud/overcloud and undercloud only job are running on instances of these sizes. Yes, this is just sidestepping the problem by throwing more resources at it. The reality is that we do not prioritize working on optimizing for speed/performance/resources. We prioritize feature work that indirectly (or maybe it's directly?) makes everything slower, especially at this point in the development cycle. We should therefore expect to have to continue to provide more and more resources to our CI jobs until we prioritize optimizing them to run with less. Let me know if there is any disagreement on making these changes. If there isn't, I'll apply them in the next day or so. If there are any other ideas on how to address this particular bug for some immediate short term relief, please let me know. -- -- James Slagle -- __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev