Hi all, I'm having some problems trying to work out how to get mdadm to restart a RAID array after a disk failure. It is refusing to close the array saying it's in use, and it's refusing to let me start the array again saying the devices are already part of another array:
$ mdadm --manage /dev/md10 --stop mdadm: Cannot get exclusive access to /dev/md10:Perhaps a running process, mounted filesystem or active volume group? $ mdadm --manage /dev/md10 --fail $ mdadm --manage /dev/md10 --stop mdadm: Cannot get exclusive access to /dev/md10:Perhaps a running process, mounted filesystem or active volume group? $ cat /proc/mdstat Personalities : [raid0] md10 : active raid0 sde1[0] sdd1[1] 5860268032 blocks super 1.2 512k chunks Why is it still telling me the array is active after I have tried to mark it failed? If I try to specifically list one of the devices that make up the array, that doesn't work either: $ mdadm --manage /dev/md10 --fail /dev/sdd1 mdadm: Cannot find /dev/sdd1: No such file or directory This is because /dev/sdd doesn't exist anymore, as it's an external drive so when I replugged it it became /dev/sdf. The manpage says you can use the special word "detached" for this situation, but that doesn't work either: $ mdadm --manage /dev/md10 --fail detached mdadm: set device faulty failed for 8:65: Device or resource busy 8:65 corresponds to /dev/sde1, so it appears to get the right device but why is it busy? Isn't the point of --fail to simulate a drive failure, which could occur at any time, even if a drive is busy? The two disks (sdd and sde) reappeared as sdf and sdg after replugging, so I thought I could just create the array and ignore the old failed one: $ mdadm --assemble /dev/md11 /dev/sdf1 /dev/sdg1 mdadm: Found some drive for an array that is already active: /dev/md/10 mdadm: giving up. I'm not sure how it considers the drive part of an active array, when it's a different device. I guess it's matching serial numbers or something which is wrong in this case. Although it wouldn't be a problem if there was some way to remove the old array that is refusing to die! Is there any way to solve this problem, or do you just have to reboot a machine after a disk failure? Thanks, Adam. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/