[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

ChristianEhrhardt Tue, 22 Aug 2017 04:37:17 -0700

Stack from qemu_fill_buffer to qio_channel_socket_readv
#0  qio_channel_socket_readv (ioc=<optimized out>, iov=<optimized out>, 
niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0)
    at ./io/channel-socket.c:477
#1  0x0000001486ec97e2 in qio_channel_read (ioc=ioc@entry=0x148a73a6c0, 
    buf=buf@entry=\060\nLw", buflen=buflen@entry=28728, errp=errp@entry=0x0) at 
./io/channel.c:112
#2  0x0000001486e005ec in channel_get_buffer (opaque=<optimized out>, 
    buf=0x1489844c00 "\060\nLw", pos=<optimized out>, size=28728) at 
./migration/qemu-file-channel.c:80
#3  0x0000001486dff095 in qemu_fill_buffer (f=f@entry=0x1489843c00) at 
./migration/qemu-file.c:293


I checked that sioc->fd, &msg, sflags) is in fact the socket.
With e.g. with this fd being 27
tcp    ESTAB      1405050 0      ::ffff:10.22.69.30:49152                   
::ffff:10.22.69.157:49804                 
users:(("qemu-system-x86",pid=29273,fd=27)) ino:3345152 sk:30 <->
         skmem:(r1420644,rb1495660,t0,tb332800,f668,w0,o0,bl0,d14) ts sack 
cubic wscale:7,7 rto:200 rtt:0.04/0.02 ato:80 mss:8948 cwnd:10 
bytes_received:1981460 segs_out:37 segs_in:247 data_segs_in:231 send 
17896.0Mbps lastsnd:254728 lastrcv:250372 lastack:250372 rcv_rtt:0.205 
rcv_space:115461 minrtt:0.04

I need to break on the fail of that recvmsg in qio_channel_socket_readv
# the following does not work due to optimization the ret value is only around 
later
b io/channel-socket.c:478 if ret < 0
But catching it "inside" the if works
b io/channel-socket.c:479


Take the following with a grain of salt, this is very threaded and noisy to 
debug.

Once I hit it the recmsg returned "-1", that was on f->pos = 311641887
But at the same time I could confirm (via ss) that the socket itself is still 
open on source and target of the migration.

-1  is EAGAIN and returns QIO_CHANNEL_ERR_BLOCK
That seems to arrive in nbd_rwv nbd/common.c:44).
And led to "qio_channel_yield"

There are a few corouting switches in between so I hope I'm not loosing 
anything.
But that first ret<0 actually worked, it seems the yield and retry got it 
working.

I got back to qemu_fill_buffer iterating further after this.
This hit ret<0 in qio_channel_socket_readv again at f->pos 311641887.

This time on returning the QIO_CHANNEL_ERR_BLOCK it returned to 
"./migration/qemu-file-channel.c:81".
That was interesting as it is different than before.
After this it seemed to become a death spiral - recmsg returned -1 every time 
(still on the same offset).
It passed back through the nbd_rwv which called qio_channel_yield for multiple 
times.

Then it continued and later on on 321998304 is the last I saw.
It did no more pass b io/channel-socket.c:479 at all, but then led to the exit.

Hmm, I might have lost myself on the coroutine switches - but it is odd at 
least.
Trying to redo less interactive and with a bit more prep ...
Maybe the results are more reliable then ...

Getting back with more later ...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal 
qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 
2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: 
char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: 
Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: 
Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: 
Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: 
Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 
seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual 
take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

[Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10

Reply via email to