Must be the holiday season *sigh*.... my OpenBSD server is suddenly
giving the occassional read-timeout on the /var slice of the main harddisk:
-------
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 65536
c_skip: 0
wd0g: device timeout reading fsbn 17002464 of 17002464-17002591 (wd0 bn
67334928; cn 66800 tn 8 sn 24), retrying
wd0: soft error (corrected)
-------
Is this the actual disk or the controller/other hardware? Either way it
needs a fix.
My problem is this is a live system that is not close by. I would very
much prefer to 'fix' this remotely to buy some time to replace the
machine completely.
I do have offsite backups of essential data but not a spare system in
the rack at this very moment.
Not to mention I would like to avoid spending X-mas alone in the datacenter.
There is a second harddisk installed, with OpenBSD formatted slices, but
of different proportions. This (larger) disk is unused, so data / layout
may be wiped,
so it seems like smart idea to copy the data at least (I do have offsite
backups of essential data but not a spare system in the rack at this
very moment)
Can I "just copy /var (wd0g) to /var2 (wd1i) and remount" or should I
proceed otherwise or would copy/remounting /var simply not work on a
live system?
Or, possibly, I could 'clone' the whole wd0 disk to wd1 and use that
instead of wd1?
I understood you will need to boot in single user mode for this [1] and
or have identical disks [2], or is there another (remote-safe) way?
Any advice is highly appreciated!
Thanks, and happy holidays,
Matt
[1] http://unixsadm.blogspot.com/2007/08/cloning-disk-in-openbsd.html
[2] http://monkey.org/openbsd/archive/tech/0112/msg00079.html