On Thu, Aug 24, 2006 at 06:09:11PM +0200, Hans van Leeuwen wrote:
> Hello misc,
> 
> 
> I run a server with two harddiscs running as a software RAID1 using ccd.

Erm... search the archives for why you shouldn't use ccd to mirror and
then think you have a RAID.

> Yesterday I started to import a large database in PostgreSQL, and found allot 
> of these errors in my logs:
> 
> error reading: Processor VRM
>  error code: ae
>  error code: ae
> kcs_sendmsg: 18 22
> bmc_io_wait fails : v=88 m=03 b=01 read_data
> kcs_sendmsg: 10 27 b8
>  error code: ae
>  error code: ae
> kcs_sendmsg: 18 22
> bmc_io_wait fails : v=88 m=03 b=01 read_data
>  error code: ae
>  error code: ae
> kcs_sendmsg: 18 22
> bmc_io_wait fails : v=88 m=03 b=01 read_data
>  error code: ae
>  error code: ae
> kcs_sendmsg: 18 22
> bmc_io_wait fails : v=88 m=03 b=01 read_data
> 
> 
> I'm guessing that one of the disks is broken, but how can I found out which 
> one? And is the data still stored correctly, or does this mean the database 
> will be corrupt?

If you can find out which disk is broken, and it's only that one disk,
it may work.

> Below you will (hopefully) find all relevant information.

> wd1d: DMA error reading fsbn 503872 of 503872-503887 (wd1 bn 2592322; cn 2571 
> tn 11 sn 61), retrying
> wd0d: DMA error reading fsbn 503952 of 503952-503967 (wd0 bn 2592402; cn 2571 
> tn 13 sn 15), retrying
> wd1: transfer error, downgrading to Ultra-DMA mode 4
> wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 4
> wd1d: DMA error reading fsbn 503872 of 503872-503887 (wd1 bn 2592322; cn 2571 
> tn 11 sn 61), retrying
...
> wd0: transfer error, downgrading to PIO mode 4
> wd0(pciide0:0:0): using PIO mode 4
> wd0d: DMA error reading fsbn 504016 of 504016-504031 (wd0 bn 2592466; cn 2571 
> tn 14 sn 16), retrying
> wd0: soft error (corrected)

> [EMAIL PROTECTED]:~] dmesg |tail -n 50
> bmc_io_wait fails : v=88 m=03 b=01 read_data
>  error code: ae
>  error code: ae
> kcs_sendmsg: 18 22
...

And those errors are a strong indication that your drive is dying. On
some boxes, it appears to be normal to get a few downgrades (the box I'm
typing at downgrades two times), but this happens at boot and doesn't go
all the way to not using DMA at all.

In your case, I'd also be inclined to restore a backup database instead
of continuing with the one you have, but it should be possible to save
it.

                Joachim

Reply via email to