Jan Johansson writes:
> Mike Larkin <mlar...@nested.page> wrote: >> On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote: >> > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: >> > > If I try to cp or dd the disk image on the host it fails >> > > >> > > dd if=disk.raw.old of=disk.raw.bak bs=1m >> > > dd: disk.raw.old: Input/output error >> > > 8858+0 records in >> > > 8858+0 records out >> > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) >> > > >> > > The host show no other signs of failing hardware. >> > > >> > > Is this a software or a hardware error? >> > >> > Given that it gives an error outside the VM, it's likely hardware. >> > >> >> Agreed. Sorta hard to fault vmd(8) if it's not even running. > > Since these are sparse files, could the vioblk(4) somehow write > incorrect data that later will make it unreadable such as a > pointer pointing into nothingness? > > The messages > > vmd[39543]: vioblk write error: Input/output error > vmd[39543]: wr vioblk: disk write error > > was produced and 01:30 when all the 4 guests and the host all run > the daily script (which makes backup and other maintenance tasks) > if that could have any impact. > > Should there not be anything on the host logging errors to > dmesg/syslog such as sd(4) or ahci(4)? > > (If it is not obvious my understanding of how the virtio/vioblk > stuff hooks in to the disk stack is very limited) > vmd(8) reads/writes to the disk image files (both raw and qcow2) using pread(2)/pwrite(2) calls. The qcow2 handling is a bit more complex, but they're still just calling pread/pwrite as far as I'm aware. Have you run fsck(8) on your host? > This drive was installed in august 2020 and if I recall correctly > it was because of this issue. So I am thinkig cable or > motherboard. > > If I decide to replace would it make sense to make this a > softraid mirror (RAID1) to avoid or get better indication of this > kind of problems in the future or would only add more parts that > can break? > > I'am currently trying to provoke the drive from the host with > > dd if=/dev/random of=test.raw bs=1m count=17000 > > then cp/dd and cmp to see if I can make it break for real. I'd say maybe make sure you have backups of anything important first if you're purposely going to break things. :-) -- -Dave Voutila