On 22.08.19 17:40, Max Reitz wrote: > On 22.08.19 17:25, Max Reitz wrote: >> On 22.08.19 14:09, Max Reitz wrote: >>> (CC-ing Paolo because of the XFS connection, and Stefan because why not.) >>> >>> On 22.08.19 13:27, Lukáš Doktor wrote: >>>> Dne 21. 08. 19 v 19:51 Max Reitz napsal(a): >>>>> On 21.08.19 16:14, Lukáš Doktor wrote: >>>>>> Hello guys, >>>>>> >>>>>> First attempt was rejected due to zip attachment, let's try it again >>>>>> with just Avocado-vt debug.log and serial console log files attached. >>>>>> >>>>>> I bisected a regression on aarch64 all the way to this commit: "qcow2: >>>>>> skip writing zero buffers to empty COW areas" >>>>>> c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a. Would you please have a look >>>>>> at it? >>>>> >>>>> I think I can see the issue on my x64 system (I don’t see the XFS >>>>> corruption, but the installation fails because of some segfaults). >>>>> >>>>> I haven’t found a simpler way to reproduce the problem yet, though, >>>>> which is a pain... :-/ >>>>> >>>>> It looks like the problem disappears when I configure qemu with >>>>> “--disable-xfsctl”. Can you try that? >>>>> >>>>> Max >>>>> >>>> >>>> Hello Max, >>>> >>>> yes, I'm getting the same behavior. With "--disable-xfsctl" it works well. >>>> Also looking at the option I understand why it only failed on aarch64 for >>>> me, I don't have libs installed on the other machines, therefor it was >>>> disabled by "./configure" there. Anyway I guess disabling it in my builds >>>> won't really fix the issue, right? :-) >>> >>> Thanks! >>> >>> No, it won’t, but it means the actual root of the problem is probably >>> rather in some XFS-related code (be it because qemu uses it the wrong >>> way or because of XFS kernel code) than in the pure qcow2 commit that >>> made the problem surface by exercising it heavily. (Or in an >>> interaction between the two.) >> >> OK, I got a simpler reproducer now: >> >> $ ./qemu-img create -f qcow2 test.qcow2 1M >> $ (for i in $(seq 15 -1 0); do \ >> echo "aio_write -P 42 $((i * 64 + 1))k 62k"; \ >> done) \ >> | ./qemu-io test.qcow2 >> $ for i in $(seq 0 15); do \ >> echo $i; \ >> ofs=$((i * 64)); \ >> ./qemu-io -c "read -P 0 ${ofs}k 1k" \ >> -c "read -P 42 $((ofs + 1))k 62k" \ >> -c "read -P 0 $((ofs + 63))k 1k" \ >> test.qcow2 \ >> | grep 'verification'; \ >> done >> >> On XFS with --enable-xfsctl, this basically always gives me some >> verification failure somewhere. (On tmpfs or with --disable-xfsctl, it >> never fails.) >> >> So it seems to be related to I/O from back to front. >> >> (You can also reproduce it with a plain “qemu-img bench” invocation, >> like “./qemu-img bench -w --pattern=42 -o 1k -S 64k -s 62k test.qcow2” >> (on, say, a 4 GB image), but then the failure appears much later in the >> image, because you have to wait from some requests to come in reverse >> (by chance) first.) > > The problem is the ftruncate() in xfs_write_zeroes(). It is possible > for it to yield, then other requests come in, and the data they write > may get discarded once the ftruncate() settles.
I’ve just sent a patch: “block/file-posix: Fix xfs_write_zeroes()”, Message-ID <20190822162618.27670-1-mre...@redhat.com>: https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg01148.html Max