WARNING: it is late at night here, make damn sure what is written in this email actually makes sense before you attempt it. I am somewhat sleepy.
Debian-boot removed from CC since this reply is OT there. On Fri, 26 Mar 2004, Antony Gelberg wrote: > I had a nice Woody system up and running on hda. I created /dev/md0 > with hdc, and a missing drive. I did a cp -ax to copy everything on hda > to md0. So far so good. It's best to use mdadm and drop raidtools. Using mdadm [without a config file] is much less error-prone than raidtools can ever hope to be. I have done what you attempted to at least five times now, and every time I have to use a checklist, and to be extremely careful to avoid trouble. As others said, "missing disk installs are a sure way to lose data". > All I needed was to get the boot loader sorted. I put boot=/dev/md0 and > root=/dev/md0 in lilo.conf, and changed fstab to mount / on md0. Lilo Lilo promptly corrupted some of your data. The thing to understand about lilo is that whomever wrote the initial RAID1 support for lilo was either a moron, or a very sick person. The same goes for whomever had the "brilliant" idea to suggest people to do "boot=/dev/hda1" in the howtos, I should add. New lilo can be told to act in a sane way, but I don't think it is the default. If you tell lilo that "boot=/dev/md0", you MUST give it "raid-extra-boot=mbr-only" too. /etc/lilo.conf: boot=/dev/md0 raid-extra-boot=mbr-only Otherwise, the dumb PoS will overwrite the first sector(s?) of whatever is in your raid array, and that can be quite fatal to whatever is in there. Indeed, something quite stupid to have as a default for anything IMHO. Make sure you are using a new enough lilo to get the mbr-only option. > came back with some errors. Unfortunately I don't have them to hand, > but it was something like the boot map not being on the root device > (this is vague, sorry). You did a snafu on the bootloader-and-kernel side of things too. Do this to repair: 1. Get a boot disk that supports RAID1, your SATA drives, and whatever filesystem you used. A Good bet is a knoppix live CD. 2. Boot from it. Verify that you can see your SATA disks. 3. Manually start the RAID. 4. Repair whatever is the first thing in your RAID array. For filesystems, that means *_repair or fsck. For a lvm1 or lvm2 PV, I have no idea. 5. Mount /dev/md0 somewhere. Go in there, fix etc/lilo.conf with the mbr-only option, chroot . , run lilo. If the RAID array already has the two disks in it, umount everything, shutdown the raid, reboot and you're done. If it does not, go to pass 6 below. > (I have another box booting off RAID-1, and it doesn't need the > raid-extra-boot line.) It never does. You can tell lilo to install the MBR in /dev/hda, and that will work just fine. However, you won't have a copy of the boot loader in /dev/hdb, so should /dev/hda fail, you will not survive a reboot. The problem was caused because you did not correctly move the root fs and kernel to the RAID array. The process goes more or less like this: 1. Create the RAID with the missing disk. Prepare the filesystems in the RAID array. Double-check everything 2. go single user. Copy over all filesystems to the new ones in the RAID array. Fix etc/fstab and etc/lilo.conf (to avoid mistakes later) in the new root filesystem 3. Change only the "root" option of lilo (or give the correct one during the reboot) in the current filesystem, run lilo to update that. sync, umount everything, reboot. Now, you have your system running entirely from the RAID array *BUT THE KERNEL AND BOOT LOADERS ARE STILL LOADED FROM THE OLD DISK*. If you get an unexpected reboot from this point on, you will need a bootdisk to get the system up again. You've been warned. MAKE VERY VERY SURE that you did boot with the root partition set to the RAID array. Otherwise, pass 6 will block and you WILL NEED TO REBOOT WITH A BOOT DISK/CD TO RECOVER. 4. verify that /etc/lilo.conf is sane (root points to the fs in the raid array, boot=/dev/md0 is there, and that raid-extra-boot=mbr-only is also there). DO NOT RUN LILO YET. 5. make very damn sure nothing has anything in the old disk open. One safe way to check that is to run good old fdisk on the disk, print the partition table, change nothing, tell it to write to the disk. IF it complains that the kernel did not re-read the partition table, go find where you did wrong before you kill your system. Rebooting right now should still work. 6. re-partition the old disk. From now on, reboots are impossible. If the kernel doesn't update the partition table, you are screwed, get your rescue CD and go fix it with a reboot and doing everything else from the CD. Add the newly partitioned disk to the RAID array. Make sure all RAID partitions in both disks are of type 0xFD, so that the kernel can autorun the raid array. It doesn't matter if you have to update the partition table of the currently active second disk, as you are just changing partition types, and it doesn't matter if the kernel reloads the partition table or not for that. 7. now the raid is syncing. run lilo. It will write all the crap to the proper places. run "sync". Chances are that a forced reboot now will not hose your system anymore. 8. wait until the RAID sync finishes. You're done. You have to test the hard way to know for sure if your system will reboot correctly from the second disk if the first one dies. With new lilo, in any recent system, it should. On older ones, only God knows what the BIOS will do. If you are going to do that test, mdadm --manage /dev/md0 --fail /dev/hda before you unplug it to simulate a boot with a broken first disk. That way, you know you will not lose any data. If the system boots with the first disk unplugged, shut it down cleanly, replug the disk, boot, and remove the "failed" drive from the RAID set and re-add it to get it back online. If the system doesn't boot, plug the drive back and reboot. It will boot (lilo doesn't care that md will think that drive is hosed), and after the boot is complete, do the remove-and-readd dance to get the disk back online in the RAID array. One last warning: I never do something like this without a bootdisk/CD handy, and console access. If anything goes wrong in the kernel move, you will need it. Out of the 5 times I did this, I had to use the rescue CD 3 times, to get the kernel and lilo in the first disk. It is tricky to get it right. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]