Am 15.02.23 um 22:47 schrieb John Snow: > Hm, I'm not sure I see any pattern that might help. Could be that AHCI > is just bugged during load, but it's tough to know in what way.
If we ever get a backtrace where the bad write actually goes through QEMU, I'll let you know. We are considering providing a custom build to affected users (using GDB-hooks leads to too much slowdown in these performance-critical paths) in the hope to catch it if it triggers again. We can't really roll it out to all users, because most writes to sector zero are legitimate after all and most users are not affected. > What versions of QEMU are in use here? Is there a date on which you > noticed an increased frequency of these reports? There were a few reports around the time we rolled out 4.2 and 5.0 (Q2/Q3 of 2020), but the frequency was always very low. AFAICT, there's about 20-40 reports that could be this issue in total. The earliest I know of with lost partitions, but not much more information, are forum threads from 2017/2018. With 4.2, there was a rework with our backup patches so naturally, I suspected that. Before 4.2, we had extended the backup job to allow using a callback to handle the writes instead of the BlockDriverState target. But starting from 4.2, we are not messing with that anymore and using a custom driver as the backup target. That custom driver doesn't even know about the source. The source is handled by the usual backup job mechanisms. If there was some general mix-up there, I'd not expect it to work for >99.99% of backups and only trigger in combination with AHCI, but who knows? Best Regards, Fiona