wd0 read timeouts - how to proceed?

Webcharge Fri, 24 Dec 2010 02:11:21 -0800

Must be the holiday season *sigh*.... my OpenBSD server is suddenlygiving the occassional read-timeout on the /var slice of the main harddisk:


-------
wd0(pciide0:0:0): timeout
        type: ata
        c_bcount: 65536
        c_skip: 0

wd0g: device timeout reading fsbn 17002464 of 17002464-17002591 (wd0 bn67334928; cn 66800 tn 8 sn 24), retrying

wd0: soft error (corrected)
-------

Is this the actual disk or the controller/other hardware? Either way itneeds a fix.

My problem is this is a live system that is not close by. I would verymuch prefer to 'fix' this remotely to buy some time to replace themachine completely.I do have offsite backups of essential data but not a spare system inthe rack at this very moment.

Not to mention I would like to avoid spending X-mas alone in the datacenter.

There is a second harddisk installed, with OpenBSD formatted slices, butof different proportions. This (larger) disk is unused, so data / layoutmay be wiped,so it seems like smart idea to copy the data at least (I do have offsitebackups of essential data but not a spare system in the rack at thisvery moment)

Can I "just copy /var (wd0g) to /var2 (wd1i) and remount" or should Iproceed otherwise or would copy/remounting /var simply not work on alive system?

Or, possibly, I could 'clone' the whole wd0 disk to wd1 and use thatinstead of wd1?I understood you will need to boot in single user mode for this [1] andor have identical disks [2], or is there another (remote-safe) way?


Any advice is highly appreciated!

Thanks, and happy holidays,

Matt

[1] http://unixsadm.blogspot.com/2007/08/cloning-disk-in-openbsd.html
[2] http://monkey.org/openbsd/archive/tech/0112/msg00079.html

wd0 read timeouts - how to proceed?

Reply via email to