Must be the holiday season *sigh*.... my OpenBSD server is suddenly giving the occassional read-timeout on the /var slice of the main harddisk:

-------
wd0(pciide0:0:0): timeout
        type: ata
        c_bcount: 65536
        c_skip: 0
wd0g: device timeout reading fsbn 17002464 of 17002464-17002591 (wd0 bn 67334928; cn 66800 tn 8 sn 24), retrying
wd0: soft error (corrected)
-------

Is this the actual disk or the controller/other hardware? Either way it needs a fix.

My problem is this is a live system that is not close by. I would very much prefer to 'fix' this remotely to buy some time to replace the machine completely. I do have offsite backups of essential data but not a spare system in the rack at this very moment.
Not to mention I would like to avoid spending X-mas alone in the datacenter.

There is a second harddisk installed, with OpenBSD formatted slices, but of different proportions. This (larger) disk is unused, so data / layout may be wiped, so it seems like smart idea to copy the data at least (I do have offsite backups of essential data but not a spare system in the rack at this very moment)

Can I "just copy /var (wd0g) to /var2 (wd1i) and remount" or should I proceed otherwise or would copy/remounting /var simply not work on a live system?

Or, possibly, I could 'clone' the whole wd0 disk to wd1 and use that instead of wd1? I understood you will need to boot in single user mode for this [1] and or have identical disks [2], or is there another (remote-safe) way?

Any advice is highly appreciated!

Thanks, and happy holidays,

Matt

[1] http://unixsadm.blogspot.com/2007/08/cloning-disk-in-openbsd.html
[2] http://monkey.org/openbsd/archive/tech/0112/msg00079.html

Reply via email to