Hi, I found that my controller nodes were a bit overloaded with 16 uwsgi nova-api-os compute processes. I reduced the nova-api-os uwsgi processes to 10 and timeout and slowdowns were eliminated. My cloud went stable and the response times went lower. I have 20 vcpus on a Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
For the openstack-ansible I need to change this variable from 16 to 10: nova_wsgi_processes_max: 10. Seems I need to set it to an equal number of my cpu cores. Regards, Robert Varjasi consultant@Component Soft Ltd. Tel: +36/30-259-9221 On 10/08/2018 06:33 PM, Robert Varjasi wrote: > Hi, > > After a few tempest run I noticed slowdowns in the nova-api-os-compute > uwsgi processes. I check the processes with py-spy and found that a lot > of process blocked on read(). Here is my py-spy output from one of my > nova-api-os-compute uwsgi process: http://paste.openstack.org/show/731677/ > > And the stack trace: > > thread_id = Thread-2 filename = /usr/lib/python2.7/threading.py lineno = > 774 function = __bootstrap line = self.__bootstrap_inner() > thread_id = Thread-2 filename = /usr/lib/python2.7/threading.py lineno = > 801 function = __bootstrap_inner line = self.run() > thread_id = Thread-2 filename = /usr/lib/python2.7/threading.py lineno = > 754 function = run line = self.__target(*self.__args, **self.__kwargs) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py > lineno = 382 function = poll line = > self.conn.consume(timeout=current_timeout) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py > lineno = 1083 function = consume line = error_callback=_error_callback) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py > lineno = 807 function = ensure line = ret, channel = autoretry_method() > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/connection.py > lineno = 494 function = _ensured line = return fun(*args, **kwargs) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/connection.py > lineno = 570 function = __call__ line = return fun(*args, > channel=channels[0], **kwargs), channels[0] > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py > lineno = 796 function = execute_method line = method() > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py > lineno = 1068 function = _consume line = > self.connection.drain_events(timeout=poll_timeout) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/connection.py > lineno = 301 function = drain_events line = return > self.transport.drain_events(self.connection, **kwargs) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/transport/pyamqp.py > lineno = 103 function = drain_events line = return > connection.drain_events(**kwargs) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/connection.py > lineno = 471 function = drain_events line = while not > self.blocking_read(timeout): > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/connection.py > lineno = 476 function = blocking_read line = frame = > self.transport.read_frame() > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/transport.py > lineno = 226 function = read_frame line = frame_header = read(7, True) > thread_id = Thread-2 filename = > /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/transport.py > lineno = 346 function = _read line = s = recv(n - len(rbuf)) # see note > above > thread_id = Thread-2 filename = /usr/lib/python2.7/ssl.py lineno = 643 > function = read line = v = self._sslobj.read(len) > > I am using nova 17.0.4.dev1, amqp (2.2.2), oslo.messaging (5.35.0), > kombu (4.1.0). I have 3 controller nodes. The openstack deployed by OSA > 17.0.4. > > I can reproduce the read() block if I click on "Log" in Horizon to see > the console outputs from one of my VM or run a tempest test: > tempest.api.compute.admin.test_hypervisor.HypervisorAdminTestJSON.test_get_hypervisor_uptime. > > The nova-api response time increasing when more and more nova-api > processes get blocked at this read. Is it a normal behavior? > > --- > Regards, > Robert Varjasi > consultant@Component Soft Ltd. > _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack