Adesanya, Adeyemi wrote: > I'm discussing the proposed architecture for two new Lustre 1.8.x > filesystems. We plan to use a failover pair of MDS nodes (active-active), > with each MDS serving an MDT. The MDTs will be housed in external storage > but we would like to implement redundancy across more than one storage array > by using software RAID1. > > The Lustre documentation mentions using linux md to set up software RAID1 or > RAID10 for MDTs. Does the RAID1 implementation in the Lustre 1.8.x RHEL5 > kernel do an adequate job of ensuring consistency across mirrored devices > (compared to a hardware RAID1 implementation)? >
Adequate, probably. As correct as hardware raid, doubtful. Without special hardware, or doing things that kill performance, there will always remain some corner cases. The issue is what happens for writes that are in process when you have a crash/reboot/power loss: it is possible for them to make it to one disk, but not the other. So it is possible to believe they are on disk, and proceed accordingly, when they are only on one copy, and are lost if that disk fails. Even worse, Linux alternates reads, so in theory it could be there one time and gone the next. The good news is that writes should(!) not be marked as "on disk" until both disks have said it is written. So you could do an md "check", and if needed do a "repair" before eg, replaying the journal (mounting the file system doing fsck, etc). Even if the MD resync takes the older copy and undoes a write, it should not have been a write that was expected to have made it to stable storage, so the normal Lustre recovery mechanisms should be able to replay it. Assuming, that is, that this is done _before_ you mount the device. Kevin _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
