I have a RAID1 (using md) running on two USB disks. (I'm working on moving to eSATA, but it's USB for now.) That means I don't have any insight using SMART. Meanwhile, I've been getting occasional fail events. Unfortunately, I don't get any information on which disk is failing.
When the system comes up, it seems to be entirely random which disk comes up as /dev/sda and which comes up as /dev/sdb. In fact, since my root disk is on SATA, at least one time it came up as /dev/sda and the USB drives came up as /dev/sdb and /dev/sdc, though I think that was under a different kernel version. When I get a failure email, it tells me that it might be due to /dev/sda1 failing -- except when it tells me that it might be due to /dev/sdb1 failing. When things are working, mdadm -D /dev/md0 looks like this: /dev/md0: Version : 00.90 Creation Time : Wed Feb 22 20:50:29 2006 Raid Level : raid1 Array Size : 312496256 (298.02 GiB 320.00 GB) Used Dev Size : 312496256 (298.02 GiB 320.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Jul 22 07:30:46 2010 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : e4feee4a:6b6be6d2:013f88ab:1b80cac5 Events : 0.17961786 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 1 1 active sync /dev/sda1 When it fails, however, the device names disappear and it just tells me it's clean, degraded and shows an active disk, a removed disk, and a faulty spare without any device names. I even tried doing dd if=/dev/md0 of=/dev/null to see if I could get the light flickering on one and not the other, but I just get I/O errors. Once a disk fails, the RAID seems to go into a nasty state where it reads properly through the crypto loop and LVM I have on top of it, but the filesystems become read-only and the block devices just give errors. Worse, the first indication (even before the mdadm email) that something is wrong is a message to console that an ext3 journal write failed. What I've been doing (which makes me tremendously uncomfortable since I know a disk is failing) is to reboot and bring everything back up. This has been working, but I know it's just a matter of time before the failing disk becomes a failed disk. I could wait until then, since presumably I'll then know which is which, but who knows what data corruption is possible between now and then? So, um, help? --Greg -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100722113802.gb10...@anthropohedron.net