Am 08.04.2023 um 00:11 hat Lukas Tschoke geschrieben: > The corruption occurs when a BAT entry aligned to 4096 bytes is changed. > > Specifically, the corruption occurs during the creation of the LOG Data > Descriptor. The incorrect behavior involves copying 4088 bytes from the > original 4096 bytes aligned offset to `tmp[8..4096]` and then copying > the new value for the first BAT entry to the beginning `tmp[0..8]`. > This results in all existing BAT entries inside the 4K region being > incorrectly moved by 8 bytes and the last entry being lost. > > This bug did not cause noticeable corruption when only sequentially > writing once to an empty dynamic VHDX (e.g. > using `qemu-img convert -O vhdx -o subformat=dynamic ...`), but it > still resulted in invalid values for the (unused) Sector Bitmap BAT > entries. > > Importantly, this corruption would only become noticeable after the > corrupted BAT is re-read from the file. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/727 > Signed-off-by: Lukas Tschoke <lukts...@gmail.com>
Thanks, applied to the block branch. Reviewing this function was a bit confusing, but after understanding what each variable means, your change is clearly fixing a local bug and therefore an improvement. But now I'm wondering if it's really right that we don't have to handle the case where we write only a few bytes and therefore can have a leading and a trailing part in the same log sector. In fact, having everything in the same sector actually seems to be the only case that really happens because vhdx_log_write_and_flush() is only called with length = 8 and offset = bat_entry_offset, which is a multiple of 8. Most of the cases should be in the middle of the BAT. vhdx_log_write() uses the leading_length != 0 code path for them. This reads the part before the written entry from the image, replaces the entry itself, but seems to leave the buffer uninitialised for everything after the entry. So does that part get corrupted? I suspect that while your patch fixes some cases, the function still isn't entirely right. Do you happen to have a simple reproducer using only qemu-io and qemu-img for the case you fixed? If so, maybe it could be relatively easily adjusted to cover these other cases, too. Kevin