On 2020-10-12 13:47, Max Reitz wrote:
On 09.10.20 14:55, Jakob Bohm wrote:
On 2020-10-09 10:48, Max Reitz wrote:
On 08.10.20 18:49, Jakob Bohm wrote:
(Top posting because previous reply did so):
If the bug was closed as "can't reproduce", why was a very similar bug
listed as fixed in RHSA-2019:2553-01 ?
Hi,
Which very similar bug do you mean? I can only guess that perhaps you
mean 1603104 or 1551486.
Bug 1603104 was about qemu not ignoring errors when releasing file locks
fails (we should ignore errors then, because they're not fatal, and we
often cannot return errors, so they ended up as aborts). (To give more
context, this error generally appeared only when the storage the image
is on somehow disappeared while qemu is running. E.g. when the
connection to an NFS server was lost.)
Bug 1551486 entailed a bit of a rewrite of the whole locking code, which
may have resulted in the bug 1655408 no longer appearing for our QE
team. But it was a different bug, as it wasn’t about any error, but
just about the fact that qemu used more FDs than necessary.
(Although I see 1655408 was reported for RHEL 8, whereas 1603104 and
1551486 (as part of RHSA-2019:2553) were reported for RHEL 7. The
corresponding RHEL 8 bug for those two is 1694148.)
Either way, both of those bugs are fixed in 5.0.
1655408 in contrast reports an error at startup; locking itself failed.
I couldn’t reproduce it, and I still can’t; neither with the image
mounted concurrently, nor with an RO NFS mount.
(For example:
exports:
[...]/test-nfs-ro
127.0.0.1(ro,sync,no_subtree_check,fsid=0,insecure,crossmnt)
$ for i in $(seq 100); do \
echo -e '\033[1m---\033[0m'; \
x86_64-softmmu/qemu-system-x86_64 \
-drive \
if=none,id=drv0,readonly=on,file=/mnt/tmp/arch.iso,format=raw \
-device ide-cd,drive=drv0 \
-enable-kvm -m 2048 -display none &; \
pid=$!; \
sleep 1; \
kill $pid; \
done
(Where x86_64-softmmu/qemu-system-x86_64 is upstream 5.0.1.)
All I see is something like:
---
qemu-system-x86_64: terminating on signal 15 from pid 7278 (/bin/zsh)
[2] 34103
[3] - 34095 terminated x86_64-softmmu/qemu-system-x86_64 -drive
-device ide-cd,drive=drv0 -m 2048
So no file locking errors.)
The error I got was specifically "Failed to lock byte 100" and VM not
starting. The ISO file was on a R/W NFS3 share, but was itself R/O for
the user that root was mapped to by linux-nfs-server via /etc/exports
options, specifically the file iso file was mode 0444 in a 0755
directory, and the exports line was (simplified)
/share1
xxxx:xxxx:xxxx:xxxx/64(ro,sync,mp,subtree_check,anonuid=1000,anongid=1000)
where xxxx:xxxx:xxxx:xxxx/64 is the numeric IPv6 prefix of the LAN
NFS kernel Server ran Debian Stretch kernel 4.19.0-0.bpo.8-amd64 #1 SMP
Debian 4.19.98-1~bpo9+1 (2020-03-09) x86_64 GNU/Linux
NFS client mount options were:
rw,nosuid,nodev,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,
soft,proto=tcp6,timeo=600,retrans=6,sec=sys,mountaddr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx,
mountvers=3,mountport=45327,mountproto=udp6,local_lock=none,addr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx
I’ve tried using these settings, but still can’t reproduce the bug.
Nothing else uses the image when you try to attach it to qemu, right?
(Your last email noted something about a loop mount, but I’m not sure
whether that just referred to the RH Bugzilla entry.)
(local_lock=none means that all locks are relayed to the server, correct?)
Max
Nothing else was supposed to access that ISO at the time, but at various
times that ISO has been accessed by different virtualization systems for
different virtual machines, and maybe something didn't release its own
locks from much earlier (virtualization hosts tend to accumulate a lot
of uptime).
Coordinating locking of shared disk images between multiple qemu
instances should ideally try to emulate what happens when a SCSI disk is
shared over a SAN (fibre channel, iSCSI, shared parallel SCSI bus etc.),
so if a VM issues the SCSI lock management commands, they should behave
as they would for real hardware.
My reference to loop mounts refers to the (common) scenario where
someone tries to mount a raw image file using both qemu and OS methods,
with the loop block driver being the traditional POSIX method that would
be invoked by not using the qemu NBD server.
My large batch job is still running...
Enjoy
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded