[Yahoo-eng-team] [Bug 1902276] [NEW] libvirtd going into a tight loop causing instances to not transition to ACTIVE

Michael Johnson Fri, 30 Oct 2020 10:01:38 -0700

Public bug reported:

Description
===========
This is current master branch (wallaby) of OpenStack.


We seen this regularly, but it's intermittent.

We are seeing nova instances that do not transition to ACTIVE inside
five minutes. Investigating this led us to find that libvirtd seems to
be going into a tight loop on an instance delete.

The 136MB log is here:
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c77/759973/3/check/octavia-v2
-dsvm-scenario/c77fe63/controller/logs/libvirt/libvirtd_log.txt

The overall job logs are here: 
https://zuul.opendev.org/t/openstack/build/c77fe63a94ef4298872ad5f40c5df7d4/logs

When running the Octavia scenario test suite, we occasionally see nova
instances fail to become ACTIVE in a timely manner, causing timeouts and
failures. In investigating this issue we found the libvirtd log was
136MB.

Most of the file is full of this repeating:
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:767 : Error on 
monitor internal error: End of file from qemu monitor
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:788 : Triggering EOF 
callback
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:301 : 
Received EOF on 0x7f6278014ca0 'instance-00000001'
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:305 : 
Domain is being destroyed, EOF is expected

Here is a snippet for the lead in to the repeated lines:
http://paste.openstack.org/show/799559/

It appears to be a tight loop, repeating many times per second.

Eventually it does stop and things seem to go back to normal in nova.

Here is the snippet of the end of the loop in the log:
http://paste.openstack.org/show/799560/

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1902276

Title:
  libvirtd going into a tight loop causing instances to not transition
  to ACTIVE

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  This is current master branch (wallaby) of OpenStack.

  We seen this regularly, but it's intermittent.

  We are seeing nova instances that do not transition to ACTIVE inside
  five minutes. Investigating this led us to find that libvirtd seems to
  be going into a tight loop on an instance delete.

  The 136MB log is here:
  
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c77/759973/3/check/octavia-v2
  -dsvm-scenario/c77fe63/controller/logs/libvirt/libvirtd_log.txt

  The overall job logs are here: 
  
https://zuul.opendev.org/t/openstack/build/c77fe63a94ef4298872ad5f40c5df7d4/logs

  When running the Octavia scenario test suite, we occasionally see nova
  instances fail to become ACTIVE in a timely manner, causing timeouts
  and failures. In investigating this issue we found the libvirtd log
  was 136MB.

  Most of the file is full of this repeating:
  2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:767 : Error on 
monitor internal error: End of file from qemu monitor
  2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:788 : Triggering 
EOF callback
  2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:301 
: Received EOF on 0x7f6278014ca0 'instance-00000001'
  2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:305 
: Domain is being destroyed, EOF is expected

  Here is a snippet for the lead in to the repeated lines:
  http://paste.openstack.org/show/799559/

  It appears to be a tight loop, repeating many times per second.

  Eventually it does stop and things seem to go back to normal in nova.

  Here is the snippet of the end of the loop in the log:
  http://paste.openstack.org/show/799560/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1902276/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1902276] [NEW] libvirtd going into a tight loop causing instances to not transition to ACTIVE

Reply via email to