On 3/7/08, John Fleming <[EMAIL PROTECTED]> wrote: > On 3/7/08, Douglas A. Tutty <[EMAIL PROTECTED]> wrote: > > On Fri, Mar 07, 2008 at 08:29:08AM -0500, John Fleming wrote: > > > Backgroud - I had a well-established LAMP server that was giving some > > > filesystem errors on boot, with the "hit control-D to continue or give > > > root > > > password to fix manually" message. It would go ahead and work normally if > > > I > > > hit Control-D. > > > > You should have fixed it manually. Control-D is in case the person > > sitting there doesn't have the root password and is generic to Debian's > > single-user mode. > > > > > However, I wanted to try to get rid of the error and need for human > > > intervention in the event of the need for a remote reboot, so I tried > > > to fix the errors with fsck. Somehow I ended up with a badly trashed > > > filesystem and inability to reboot. > > > > > > After much knashing of teeth and consideration of my options, I > > > installed a new etch system. I installed several benign packages like > > > apache. I ran update and dist-upgrade to bring the system up to date. > > > When I ran the upgrade, it told me that it was trying to install an > > > identical kernel image. > > > > Well, it was a new kernel image with the same version code so that the > > new modules would be going in the same directory as the old modules. > > The new modules won't work with the old kernel so that if you do > > anything to trigger a module load, bad things can happen, which is why > > you reboot as soon as the upgrade is complete. > > > > > It explained some things about what it was doing about modules, and > > > then said to be sure to reboot. I did that, but then it gave me that > > > now-familiar message about how the filesystem has errors, hit > > > control-D to continue... > > > > > I booted with Knoppix, made sure my filesystem on /dev/hda1 was NOT > > > mounted, and ran fsck -f. It did the 5 passes without mention of > > > errors. I ran it a second time with same results. However, when I > > > boot from /dev/hda1, I still get the error about a filesystem with > > > errors! > > > > > > Trying to rebuild the server as it was is painful enough - Why would I > > > be having these filesystem errors? The HDD is relatively new. > > > > > > Any other way to try to get rid of the boot error before I reinstall > > > etch again? I hate to do that because I don't understand how these > > > errors originate, so I don't know why I shouldn't expect them to crop > > > up again at some point later after another fresh install. > > > > > > Why does the fsck during boot find errors when the fsck run via > > > knoppix on the same filesystem return clean? > > > > Don't know why. Here's how I'd proceed: > > > > 1. boot with the kernel command line: init=/bin/sh since debian's > > single-user mode gives you most filesystems already mounted. > > > > 2. run fsck (read the man page to give you the options appropriate > > to your root fs); run it on all your filesystems. > > > > 3. shutdown -h and power-cycle. > > > > 4. run aptitude update then upgrade anything required. > > > > 5. reboot. Watch the screen for any errors on shutdown that would > > suggest that the system isn't, e.g. remounting the / fs ro > > before halt/reboot. If in doubt, set up a serial console and > > log the output or set up the console output to go to a printer. > > > > 6. If you still have problems, boot knoppix (I use grml) and run > > fsck. If this is ext2/3, I'd run -c -c so that the entire disk > > gets read to force the drive firmware to re-map any bad sectors. > > While this is running, I'd be watching /var/log/syslog for any > > errors from the drive. > > > > 7. Ensure that you have SMARTmontools installed and run a long > > smart test and when its complete, check the results on the > > drive. > > > > --- > > > > If all else fails, plan for a reinstall (ensure that you have backups). > > Then boot knoppix and run wipe on the drive. This fully exercises the > > drive to exorsize any gremlins. Then install etch minimal (don't select > > any tasks), ensure that aptitude is installed (if it isn't, then apt-get > > aptitude), get aptitude set up the way you like with only necessary > > packages marked as manual, the rest as automatic, then do an update and > > upgrade before you install any other packages. > > > > At each stage, do a shutdown -rF. > > > > Doug. > > Doug, thanks for the good ideas - I learned some things from your > considered response. I ended up finally reinstalling etch again. > I've now captured the pertinent part of the boot messages and will > copy below. Why does the filesystem check clean once and then come up > with errors the 2nd time? You mentioned that I should fix it manually > - Well, if I enter the root password at the prompt and try to run fsck > manually, it warns me about the damage I might due to the MOUNTED > filesystem. I mentioned in my earlier post that if I boot into > Knoppix and run fsck, it comes back CLEAN. So I can't seem to repair > it with Knoppix fsck, yet I get the error when I boot from my > /dev/hda1 - the second time in the fsck sequence. Can you shed any > light on this? > > Here is the pertinent boot sequence: > > Checking root filesystem...fsck 1.40-WIP (14-Nov-2006) > /dev/hda1: clean, 126072/19218432 files, 1420493/38409399 blocks > done. > > Setting up system clock.. > Cleaning up ifupdown.... > Loading kernel modules...loop: loaded (max 8 devices) > done. > > Loading device-mapper supportdevice-mapper: ioctl: 4.7.0-ioctl > (2006-06-24) initialized: [EMAIL PROTECTED] > > Checking file systems...fsck 1.40-WIP (14-Nov-2006) > / contains a file system with errors, check forced. > /: > Inodes that were part of a corrupted orphan linked list found. > /: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. > fsck died with exit status 4 > > THANKS! - John >
Sorry to answer my own post, but it's FIXED! When it got to the "enter root password to enter maintenance", I did that, and at the prompt entered fsck. It warned me about running e2fsck on a mounted filesystem, and I entered "n" and saw "no" echoed - However, then it goes ahead and runs. Will someone please explain that? It seemed to fix a million things (or at least a few hundred), but now it is actually fixed! Why does this work like this, and why didn't it work running fsck from a live CD? Thanks again! - John -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]