I have a newish high-end machine here that's causing me some problems
with RAID, but looking at log files and dmesg I don't think the
problem is actually RAID and more likely udev. I'm looking for some
ideas on how to debug this.

The hardware:
Asus Rampage II Extreme
Intel Core i7-980x
12GB DRAM
5 WD5002ABYS RAID Edition 500GB drives

The drives are arranged as a 3-drive RAID1 and a 2-drive RAID0 using mdadm.

The issue is that when booting gets to the point where it starts mdadm
and then about 50% of the time mdadm fails to find some of the
partitions and hence either starts the RAID1 with missing drives or in
the case of RAID0 won't start the RAID. For instance, /dev/md5 might
start with a failed partition, either /dev/sda5 or sdb5 or sdc5 isn't
found and the RAID is started. Once the problem has occurred I don't
seem to be able to fix it with anything other than a reboot so far.

Investigating dmesg when there is a failure I actually don't see that
the missing partition is ever identified and looking at the /dev
directory the partition isn't there either.

Personally I don't think the problem is with the drives as BIOS shows
me a table of the drives attached before booting and the 5 drives are
_always_ shown. If I drop into BIOS proper and use BIOS tools to look
at the drives I can _always_ read smart data and all drives respond to
DOS-based tools like SpinRite. It's only when I get into Linux that
they aren't found.

The problem hasn't changed much with different kernels from 2.6.32
through 2.6.34, nor do I see any difference running vanilla-sources or
gentoo-sources.

Currently I'm using udev-149 with devfs-compat and extra flags enabled.

Where might I start looking for the root cause of a problem like this?

Let me know what other info would be helpful.

Thanks,
Mark

Reply via email to