Those are intermittent errors that are not relevant to your failure. I did fix those in -current.
You simply have a dying HDD. On Thu, Aug 24, 2006 at 06:09:11PM +0200, Hans van Leeuwen wrote: > Hello misc, > > > I run a server with two harddiscs running as a software RAID1 using ccd. > Yesterday I started to import a large database in PostgreSQL, and found allot > of these errors in my logs: > > error reading: Processor VRM > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > kcs_sendmsg: 10 27 b8 > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > > > I'm guessing that one of the disks is broken, but how can I found out which > one? And is the data still stored correctly, or does this mean the database > will be corrupt? > > Below you will (hopefully) find all relevant information. > > > Thanks, > > > Hans > > > > [EMAIL PROTECTED]:~] cat /etc/fstab > /dev/wd0a / ffs rw 1 1 > /dev/wd1a /altroot ffs xx 0 0 > /dev/ccd0a /home ffs rw,nodev,nosuid 1 2 > /dev/ccd0b /usr ffs rw,nodev 1 2 > /dev/ccd0d /var ffs rw,nosuid 1 2 > > > [EMAIL PROTECTED]:~] cat /etc/ccd.conf > # $OpenBSD: ccd.conf,v 1.1 1996/08/24 20:52:22 deraadt Exp $ > # Configuration file for concatenated disk devices > # > # ccd ileave flags component devices > #ccd0 16 none /dev/sd2e /dev/sd3e > ccd0 16 CCDF_MIRROR /dev/wd0d /dev/wd1d > > > [EMAIL PROTECTED]:~] cat /var/run/dmesg.boot > OpenBSD 3.9 (GENERIC) #617: Thu Mar 2 02:26:48 MST 2006 > [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC > cpu0: Intel(R) Pentium(R) III CPU family 1266MHz ("GenuineIntel" 686-class) > 1.27 GHz > cpu0: > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE > real mem = 536166400 (523600K) > avail mem = 482222080 (470920K) > using 4278 buffers containing 26910720 bytes (26280K) of memory > mainbus0 (root) > bios0 at mainbus0: AT/286+(00) BIOS, date 10/18/01, BIOS32 rev. 0 @ 0xfda54 > pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000 > pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf3bb0/240 (13 entries) > pcibios0: PCI Interrupt Router at 000:15:0 ("ServerWorks CSB5" rev 0x00) > pcibios0: PCI bus #0 is the last bus > bios0: ROM list: 0xc0000/0x8000 0xc8000/0x8800 0xd0800/0x1800 0xd2000/0x1800 > ipmi0 at mainbus0: version 1.5 interface KCS iobase 0xca2/2 spacing 1 irq 0 > cpu0 at mainbus0 > pci0 at mainbus0 bus 0: configuration mode 1 (no bios) > pchb0 at pci0 dev 0 function 0 "ServerWorks CNB20HE Host" rev 0x23 > pci1 at pchb0 bus 1 > pchb1 at pci0 dev 0 function 1 "ServerWorks CNB20HE Host" rev 0x01 > pchb2 at pci0 dev 0 function 2 "ServerWorks CNB20HE Host" rev 0x01 > pchb3 at pci0 dev 0 function 3 "ServerWorks CNB20HE Host" rev 0x01 > pci2 at pchb3 bus 2 > pciide0 at pci0 dev 2 function 0 "Promise PDC20267" rev 0x02: DMA, channel 0 > configured to native-PCI, channel 1 configured to native-PCI > pciide0: using irq 11 for native-PCI interrupt > wd0 at pciide0 channel 0 drive 0: <ST340016A> > wd0: 16-sector PIO, LBA, 38166MB, 78165360 sectors > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 > wd1 at pciide0 channel 1 drive 1: <ST340016A> > wd1: 16-sector PIO, LBA, 38166MB, 78165360 sectors > wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 5 > fxp0 at pci0 dev 3 function 0 "Intel 8255x" rev 0x0d, i82550: irq 9, address > 00:03:47:bd:45:47 > inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4 > fxp1 at pci0 dev 4 function 0 "Intel 8255x" rev 0x0d, i82550: irq 5, address > 00:03:47:bd:45:48 > inphy1 at fxp1 phy 1: i82555 10/100 PHY, rev. 4 > vga1 at pci0 dev 12 function 0 "ATI Rage XL" rev 0x27 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > piixpm0 at pci0 dev 15 function 0 "ServerWorks CSB5" rev 0x92 > iic0 at piixpm0: disabled to avoid ipmi0 interactions > pciide1 at pci0 dev 15 function 1 "ServerWorks CSB5 IDE" rev 0x92: DMA > atapiscsi0 at pciide1 channel 0 drive 0 > scsibus0 at atapiscsi0: 2 targets > cd0 at scsibus0 targ 0 lun 0: <SAMSUNG, CD-ROM SN-124, QM15> SCSI0 5/cdrom > removable > cd0(pciide1:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2 > ohci0 at pci0 dev 15 function 2 "ServerWorks OSB4/CSB5 USB" rev 0x05: irq 10, > version 1.0, legacy support > usb0 at ohci0: USB revision 1.0 > uhub0 at usb0 > uhub0: ServerWorks OHCI root hub, rev 1.00/1.00, addr 1 > uhub0: 4 ports with 4 removable, self powered > pchb4 at pci0 dev 15 function 3 "ServerWorks CSB5 LPC" rev 0x00 > isa0 at mainbus0 > isadma0 at isa0 > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > pcppi0 at isa0 port 0x61 > midi0 at pcppi0: <PC speaker> > spkr0 at pcppi0 > npx0 at isa0 port 0xf0/16: using exception 16 > pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec > biomask fdc5 netmask ffe5 ttymask ffe7 > pctr: 686-class user-level performance counters enabled > mtrr: Pentium Pro MTRR support > dkcsum: wd0 matches BIOS drive 0x80 > dkcsum: wd1 matches BIOS drive 0x81 > root on wd0a > rootdev=0x0 rrootdev=0x300 rawdev=0x302 > WARNING: / was not properly unmounted > wd1d: DMA error reading fsbn 503872 of 503872-503887 (wd1 bn 2592322; cn 2571 > tn 11 sn 61), retrying > wd0d: DMA error reading fsbn 503952 of 503952-503967 (wd0 bn 2592402; cn 2571 > tn 13 sn 15), retrying > wd1: transfer error, downgrading to Ultra-DMA mode 4 > wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 4 > wd1d: DMA error reading fsbn 503872 of 503872-503887 (wd1 bn 2592322; cn 2571 > tn 11 sn 61), retrying > wd0: transfer error, downgrading to Ultra-DMA mode 4 > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 > wd0d: DMA error reading fsbn 503952 of 503952-503967 (wd0 bn 2592402; cn 2571 > tn 13 sn 15), retrying > wd1: transfer error, downgrading to Ultra-DMA mode 3 > wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 3 > wd1d: DMA error reading fsbn 503872 of 503872-503887 (wd1 bn 2592322; cn 2571 > tn 11 sn 61), retrying > wd0: transfer error, downgrading to Ultra-DMA mode 3 > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 3 > wd0d: DMA error reading fsbn 503952 of 503952-503967 (wd0 bn 2592402; cn 2571 > tn 13 sn 15), retrying > wd1: transfer error, downgrading to Ultra-DMA mode 2 > wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2 > wd1d: DMA error reading fsbn 503872 of 503872-503887 (wd1 bn 2592322; cn 2571 > tn 11 sn 61), retrying > wd0: transfer error, downgrading to Ultra-DMA mode 2 > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 > wd0d: DMA error reading fsbn 503952 of 503952-503967 (wd0 bn 2592402; cn 2571 > tn 13 sn 15), retrying > wd1: soft error (corrected) > wd1: transfer error, downgrading to Ultra-DMA mode 1 > wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 1 > wd1d: DMA error reading fsbn 503888 of 503888-503903 (wd1 bn 2592338; cn 2571 > tn 12 sn 14), retrying > wd0: soft error (corrected) > wd1: transfer error, downgrading to Ultra-DMA mode 0 > wd1(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 0 > wd1d: DMA error reading fsbn 503888 of 503888-503903 (wd1 bn 2592338; cn 2571 > tn 12 sn 14), retrying > wd1: transfer error, downgrading to DMA mode 2 > wd1(pciide0:1:1): using PIO mode 4, DMA mode 2 > wd1d: DMA error reading fsbn 503888 of 503888-503903 (wd1 bn 2592338; cn 2571 > tn 12 sn 14), retrying > wd1: transfer error, downgrading to PIO mode 4 > wd1(pciide0:1:1): using PIO mode 4 > wd1d: DMA error reading fsbn 503888 of 503888-503903 (wd1 bn 2592338; cn 2571 > tn 12 sn 14), retrying > wd1: soft error (corrected) > wd0: transfer error, downgrading to Ultra-DMA mode 1 > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 1 > wd0d: DMA error reading fsbn 504016 of 504016-504031 (wd0 bn 2592466; cn 2571 > tn 14 sn 16), retrying > wd0: transfer error, downgrading to Ultra-DMA mode 0 > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 0 > wd0d: DMA error reading fsbn 504016 of 504016-504031 (wd0 bn 2592466; cn 2571 > tn 14 sn 16), retrying > wd0: transfer error, downgrading to DMA mode 2 > wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 > wd0d: DMA error reading fsbn 504016 of 504016-504031 (wd0 bn 2592466; cn 2571 > tn 14 sn 16), retrying > wd0: transfer error, downgrading to PIO mode 4 > wd0(pciide0:0:0): using PIO mode 4 > wd0d: DMA error reading fsbn 504016 of 504016-504031 (wd0 bn 2592466; cn 2571 > tn 14 sn 16), retrying > wd0: soft error (corrected) > > > [EMAIL PROTECTED]:~] dmesg |tail -n 50 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > kcs_sendmsg: 10 27 b8 > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > kcs_sendmsg: 10 2d b8 > error reading: Processor VRM > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > kcs_sendmsg: 10 27 b8 > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data > error code: ae > error code: ae > kcs_sendmsg: 18 22 > bmc_io_wait fails : v=88 m=03 b=01 read_data