This issue was fixed in the openstack/nova victoria-eom release. ** Changed in: nova/victoria Status: Fix Committed => Fix Released
-- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1964149 Title: nova dns lookups can block the nova api process leading to 503 errors. Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) train series: In Progress Status in OpenStack Compute (nova) ussuri series: In Progress Status in OpenStack Compute (nova) victoria series: Fix Released Status in OpenStack Compute (nova) wallaby series: Fix Released Status in OpenStack Compute (nova) xena series: Fix Released Bug description: we currently have 4 possibly related downstream bugs whereby DNS lookups can result in 503 errors as we do not monkey patch green DNS and that can result in blocking behavior. specifically we have seen callses to socket.getaddrinfo in py-amqp block the API when using ipv6. https://bugzilla.redhat.com/show_bug.cgi?id=2037690 https://bugzilla.redhat.com/show_bug.cgi?id=2050867 https://bugzilla.redhat.com/show_bug.cgi?id=2051631 https://bugzilla.redhat.com/show_bug.cgi?id=2056504 copying a summary of the rca from one of the bugs What happens: - A request comes in which requires rpc, so a new connection to rabbitmq is to be established - The hostname(s) from the transport_url setting are ultimately passed to py-amqp, which attempts to resolve the hostname to an ip address so it can set up the underlying socket and connect - py-amqp explicitly tries to resolve with AF_INET first and then only if that fails, then it tries with AF_INET6[1] - The customer environment is primarily IPv6. Attempting to resolve the hostname via AF_INET fails nss_hosts (the /etc/hosts file only have IPv6 addrs), and falls through to nss_dns - Something about the customer DNS infrastructure is slow, so it takes a long time (~10 seconds) for this IPv4-lookup to fail. - py-amqp finally tries with AF_INET6 and the hostname is resolved immediately via nss_hosts because the entry is in the /etc/hosts Critically, because nova explicitly disables greendns[2] with eventlet, the *entire* nova-api worker is blocked during the duration of the slow name resolution, because socket.getaddrinfo is a blocking call into glibc. [1] https://github.com/celery/py-amqp/blob/1f599c7213b097df07d0afd7868072ff9febf4da/amqp/transport.py#L155-L208 [2] https://github.com/openstack/nova/blob/master/nova/monkey_patch.py#L25-L36 nova currently disables greendns monkeypatch because of a very old bug on centos 6 on python 2.6 and the havana release of nova https://bugs.launchpad.net/nova/+bug/1164822 ipv6 support was added in v0.17 in the same release that added python 3 support back in 2015 https://github.com/eventlet/eventlet/issues/8#issuecomment-75490457 so we should not need to work around the lack of ipv6 support anymore. https://review.opendev.org/c/openstack/nova/+/830966 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1964149/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp