On 04/20/2016 04:38 PM, Lutz Vieweg wrote:
I've now a
strace -f -p 10727 -e trace=pwrite,pwritev,fdatasync,file -t 2>&1 | gzip -1 -c
>trace.gz
attached to the qemu-process.
If the incident rate stays the same, by tomorrow I should be able
to correlate newly emitted I/O-errors in the guest with that log.
Ok, mystery solved:
[pid 18241] 00:17:15 pwritev(16, [{..., 4096}, {..., 4096}], 2, 6585417728) =
-1 ENOSPC (No space left on device)
[pid 18241] 00:17:15 pwrite(16, ..., 4096, 6581915648) = -1 ENOSPC (No space
left on device)
[pid 18241] 00:17:15 pwrite(16, ..., 4096, 1048576) = -1 ENOSPC (No space left
on device)
[pid 18241] 00:17:15 pwrite(16, ..., 4096, 1048576) = -1 ENOSPC (No space left
on device)
File descriptor fd=16 was associated with a raw image file that actually
resides on a btrfs filesystem, a constant-sized 16GB file with attributes
set to not use CopyOnWrite semantics.
Nevertheless, writes to such files can still yield ENOSPC due to a bug in btrfs:
http://www.spinics.net/lists/linux-btrfs/msg52691.html
And indeed, the errors occured exactly at the time a backup procedure
was preparing a read-only snapshot with "btrfs subvolume snapshot -r" -
so until I can upgrade to a mainline kernel including the fix, I'll
pause the qemu process while the "btrfs subvolume snapshot -r" runs.
Thanks for the hints.
Sorry this turned out to be a btrfs rather than a qemu bug - I was
first misled to believe the image was on XFS.
Nevertheless, I think qemu could be somewhat more verbose, reporting
when and why it stops emulation. Something like a message to the monitor
or to standard out would be helpful to start with...
Regards,
Lutz Vieweg