Hi,
Ik accidentally discovered that my software RAID5 array was degraded:
one of the three disks had been kicked out of the array.
It appeared that disk /dev/sda1 had been disabled 5 days before! I was
totally unaware of the problem, which is odd.
syslog shows the moment the array got into trouble:
2010-02-23T17:10:31.974066+01:00 home07 kernel: ata1.00: exception
Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
2010-02-23T17:10:31.974200+01:00 home07 kernel: ata1.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
2010-02-23T17:10:31.974213+01:00 home07 kernel: res
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
2010-02-23T17:10:31.974222+01:00 home07 kernel: ata1.00: status: {
DRDY }
2010-02-23T17:10:31.974230+01:00 home07 kernel: ata1: hard resetting
link
2010-02-23T17:10:31.974240+01:00 home07 kernel: ata1: nv: skipping
hardreset on occupied port
2010-02-23T17:10:37.479060+01:00 home07 kernel: ata1: link is slow
to respond, please be patient (ready=0)
2010-02-23T17:10:42.018051+01:00 home07 kernel: ata1: SRST failed
(errno=-16)
2010-02-23T17:10:42.018074+01:00 home07 kernel: ata1: hard resetting
link
2010-02-23T17:10:42.018081+01:00 home07 kernel: ata1: nv: skipping
hardreset on occupied port
2010-02-23T17:10:47.521060+01:00 home07 kernel: ata1: link is slow
to respond, please be patient (ready=0)
2010-02-23T17:10:52.062048+01:00 home07 kernel: ata1: SRST failed
(errno=-16)
2010-02-23T17:10:52.062072+01:00 home07 kernel: ata1: hard resetting
link
2010-02-23T17:10:52.062079+01:00 home07 kernel: ata1: nv: skipping
hardreset on occupied port
2010-02-23T17:10:57.564052+01:00 home07 kernel: ata1: link is slow
to respond, please be patient (ready=0)
2010-02-23T17:11:27.094054+01:00 home07 kernel: ata1: SRST failed
(errno=-16)
2010-02-23T17:11:27.094086+01:00 home07 kernel: ata1: limiting SATA
link speed to 1.5 Gbps
2010-02-23T17:11:27.094096+01:00 home07 kernel: ata1: hard resetting
link
2010-02-23T17:11:27.094107+01:00 home07 kernel: ata1: nv: skipping
hardreset on occupied port
2010-02-23T17:11:32.139070+01:00 home07 kernel: ata1: SRST failed
(errno=-16)
2010-02-23T17:11:32.139103+01:00 home07 kernel: ata1: reset failed,
giving up
2010-02-23T17:11:32.139112+01:00 home07 kernel: ata1.00: disabled
2010-02-23T17:11:32.139147+01:00 home07 kernel: ata1.00: device
reported invalid CHS sector 0
2010-02-23T17:11:32.139158+01:00 home07 kernel: end_request: I/O
error, dev sda, sector 164842814
2010-02-23T17:11:32.139168+01:00 home07 kernel: md: super_written
gets error=-5, uptodate=0
2010-02-23T17:11:32.139177+01:00 home07 kernel: raid5: Disk failure
on sda2, disabling device.
2010-02-23T17:11:32.139186+01:00 home07 kernel: raid5: Operation
continuing on 2 devices.
2010-02-23T17:11:32.139194+01:00 home07 kernel: ata1: EH complete
2010-02-23T17:11:32.154040+01:00 home07 kernel: RAID5 conf printout:
2010-02-23T17:11:32.154064+01:00 home07 kernel: --- rd:3 wd:2
2010-02-23T17:11:32.154069+01:00 home07 kernel: disk 0, o:0, dev:sda2
2010-02-23T17:11:32.154074+01:00 home07 kernel: disk 1, o:1, dev:sdb2
2010-02-23T17:11:32.154079+01:00 home07 kernel: disk 2, o:1, dev:sdc2
2010-02-23T17:11:32.157289+01:00 home07 kernel: RAID5 conf printout:
2010-02-23T17:11:32.157419+01:00 home07 kernel: --- rd:3 wd:2
2010-02-23T17:11:32.157426+01:00 home07 kernel: disk 1, o:1, dev:sdb2
2010-02-23T17:11:32.157431+01:00 home07 kernel: disk 2, o:1, dev:sdc2
Upon discovery I rebuilt the array of course:
2010-03-01T10:21:13.944903+01:00 home07 kernel: md: bind<sda2>
2010-03-01T10:21:13.962118+01:00 home07 kernel: RAID5 conf printout:
2010-03-01T10:21:13.962272+01:00 home07 kernel: --- rd:3 wd:2
2010-03-01T10:21:13.962284+01:00 home07 kernel: disk 0, o:1, dev:sda2
2010-03-01T10:21:13.962293+01:00 home07 kernel: disk 1, o:1, dev:sdb2
2010-03-01T10:21:13.962302+01:00 home07 kernel: disk 2, o:1, dev:sdc2
2010-03-01T10:21:13.968175+01:00 home07 kernel: md: recovery of RAID
array md1
2010-03-01T10:21:13.968215+01:00 home07 kernel: md: minimum
_guaranteed_ speed: 1000 KB/sec/disk.
2010-03-01T10:21:13.968228+01:00 home07 kernel: md: using maximum
available idle IO bandwidth (but not more than 200000 KB/sec) for
recovery.
2010-03-01T10:21:13.968239+01:00 home07 kernel: md: using 128k
window, over a total of 81923328 blocks.
2010-03-01T10:42:39.093334+01:00 home07 dhclient[1804]: DHCPREQUEST
on br0 to 192.168.254.254 port 67
2010-03-01T10:42:39.095010+01:00 home07 dhclient[1804]: DHCPACK from
192.168.254.254
2010-03-01T10:42:39.846397+01:00 home07 dhclient[1804]: bound to
192.168.254.7 -- renewal in 1797 seconds.
2010-03-01T10:46:41.279910+01:00 home07 kernel: md: md1: recovery done.
2010-03-01T10:46:41.320932+01:00 home07 kernel: RAID5 conf printout:
2010-03-01T10:46:41.320973+01:00 home07 kernel: --- rd:3 wd:3
2010-03-01T10:46:41.320984+01:00 home07 kernel: disk 0, o:1, dev:sda2
2010-03-01T10:46:41.320996+01:00 home07 kernel: disk 1, o:1, dev:sdb2
2010-03-01T10:46:41.321004+01:00 home07 kernel: disk 2, o:1, dev:sdc2
What surprises me is the fact that the system didn't inform me (except
for the syslog messages) that there was something seriously wrong, It
should have done so by alarming popups in X, or whatever.
Probably I just misconfigured something, but maybe Fedora has no
software installed (or available) to alarm the user about these serious
events.
Any suggestions?
Thanks,
Rolf
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines