I've been trying out a script (attached) for the last few days, that
does something similar to the idea in my previous comment.  It's a shell
script that can be put in cron.daily and/or called from an @reboot cron
job.  The script checks each of your LVM-based filesystems in turn, and
won't start a new check if it's been going for more than 10 minutes.

The short version of the story is that fsck'ing a snapshot of a live
filesystem is possible, but we might want to get at least a little input
from LVM or FS developers first.

The main problem with this script is that it trips over on temporary
files.  It's common for programs (via mkstemp(), I think) to create a
temporary file, open it, then delete it.  The inode that was previously
associated with the file continues to exist so long as a file descriptor
to it remains open, but when a snapshot of the filesystem is created,
the inodes are never removed, so they become orphans.  fsck notices this
minor problem in the snapshot and flags the filesystem as needing to be
checked.

Steps to repeat this problem:

$ sudo /etc/init.d/mysql start # creates temporary files on my system
$ sudo lvcreate -L1024M -s /dev/your-volgroup/your-root-device
$ sudo fsck -v -n -f /dev/your-volgroup/lvol0
$ sudo lvremove /dev/your-volgroup/lvol0

fsck should complain about orphaned files.  I get this:

$ sudo fsck -v -n -f /dev/nautilus/lvol0
fsck 1.40.8 (13-Mar-2008)
e2fsck 1.40.8 (13-Mar-2008)
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 180229 has zero dtime.  Fix? no

Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 180230 was part of the orphaned inode list.  IGNORED.
Inode 180231 was part of the orphaned inode list.  IGNORED.
Inode 180232 was part of the orphaned inode list.  IGNORED.
Inode 180233 was part of the orphaned inode list.  IGNORED.
Inode 180251 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Inode bitmap differences:  -(180229--180233) -180251
Fix? no


root: ********** WARNING: Filesystem still has errors **********


   23381 inodes used (8.92%)
     518 non-contiguous inodes (2.2%)
         # of inodes with ind/dind/tind blocks: 2563/15/0
  211424 blocks used (40.33%)
       0 bad blocks
       1 large file

   13390 regular files
    2902 directories
    1258 character device files
    4553 block device files
       1 fifo
      16 links
    1216 symbolic links (1137 fast symbolic links)
      46 sockets
--------
   23382 files


To my untrained eye, it looks like this could be argued to be a bug in
ext2 or LVM (because they're not deleting inodes properly), or a bug in
fsck (because it doesn't have an "errors remain, but who cares?" return
code).  Alternatively, it could be argued that the fsck script I've
written should parse the output of fsck and decide which filesystem
errors are really important.

I've gone as far as I can go with this idea - if someone with more of a
clue is interested, could you suggest the best way of solving this issue?

        - Andrew
#!/bin/sh
# Check filesystems without rebooting, using LVM
# Andrew Sayers, 14 August 2008
# [EMAIL PROTECTED]
#
# This script aims to be FS-agnostic, although it currently calls "tune2fs" in
# two places, to reset the mount-count and check-time.

# What to tell the user if an error occurs
TITLE="Filesystem problem detected"
MESSAGE="Your hard disk has a problem,
Please reboot your system to fix it"


check_filesystem() {
        # (I think) LVM escapes dashes in volume names by doubling them (--)
        # The following gets the volume group, even if it has --s in it
        export VOLDEV="$1"
        export VOLGROUP=$(echo "$VOLDEV" | sed -e 
's/^\(\(\(\([^-]*\)--\)*\)[^-]*\)-\([^-].*\)/\1/' -e 's/--/-/g') \
        export VOLUME=$(  echo "$VOLDEV" | sed -e 
's/^\(\(\(\([^-]*\)--\)*\)[^-]*\)-\([^-].*\)/\5/' -e 's/--/-/g')

        export BACKUP=$(lvcreate -L1024M -s "/dev/$VOLGROUP/$VOLUME" | cut -d\" 
-f2)
        if ERRORS=$(fsck -v -n -f "/dev/$VOLGROUP/$BACKUP" 2>&1)
        then
                tune2fs -T now -C 0 "/dev/mapper/$VOLDEV" >/dev/null
                lvremove -f "/dev/$VOLGROUP/$BACKUP" >/dev/null

                # Note: in the success case, success isn't reported until after 
tune2fs has completed
                # (in case tune2fs fails)
                touch "/var/cache/fsck/$VOLDEV"
                logger -p cron.info "snapshot fsck of 
\"/dev/$VOLGROUP/$VOLUME\" reported a healthy filesystem"
        else
                RETURN_VALUE=$?

                # TODO: check whether $BACKUP has gone away (due to too much FS 
activity), and handle that somehow
                # TODO: write a co-operating GUI app to handle messages 
something like:
                # notify-send -u critical -t 6000 --category=device.error 
"$TITLE" "$MESSAGE"
                # TODO: automatically remove $BACKUP after reboot

                # Note: in the failure case, failure must be reported before 
tune2fs has completed
                # (in case tune2fs fails)
                cat <<END | mail -s "$TITLE" root
$MESSAGE.

Once your system has been recovered, please do:

        lvremove -f "/dev/$VOLGROUP/$BACKUP"

fsck returned return value $RETURN_VALUE while scanning /dev/$VOLGROUP/$VOLUME
The following errors were reported:

$ERRORS
END

                logger -p cron.alert "snapshot fsck of 
\"/dev/$VOLGROUP/$VOLUME\" reported a damaged filesystem - reboot to fix it"

                # Force an fsck on the next reboot (for vaguely sane systems)
                # 16000 is the highest allowed value for -C
                tune2fs -C 16000 "/dev/mapper/$VOLDEV" >/dev/null
        fi
}




# Create the directory that will remember which devices were most-recently 
checked
if ! [ -d /var/cache/fsck ]
then
        mkdir /var/cache/fsck
fi

# Add files for any block devices that have been created since the last time 
the program ran
cd /dev/mapper
for fs in *
do

        # checkable files must have exactly one '-' that isn't part of a 
doubled '--'
        # They must also not be swap partitions
        # Files with no dashes, or more than one dash, are internal LVM files
        # Files with "-cow" counterparts are copy-on-write snapshots
        if   echo "$fs" | grep -q -- -                   && \
           ! echo "$fs" | grep -q '[^-]-[^-].*[^-]-[^-]' && \
           ! swapon -s  | grep -q "^/dev/mapper/$fs" && \
           ! [ -e "$fs-cow" ]
        then
                # Newly created filesystems must have been created since the 
last run of this script
                if [ ! -e "/var/cache/fsck/$fs" ]
                then
                        touch -d "$(date -d '-1 day')" "/var/cache/fsck/$fs"
                fi
        fi

done
cd - >/dev/null

# Delete files for any block devices that have been deleted since the last time 
the program ran
cd /var/cache/fsck
for fs in *
do
        if ! [ -e "/dev/mapper/$fs" ]
        then
                rm -f "$fs"
        fi
done
cd - >/dev/null

# Find the least-recently-fsck'd filesystem.
# Use the directory itself as the default
#
# A more intelligent solution might be to see which FS is nearest to its
# max-mount-count/interval-time, but that would be hard and FS-specific
#check_filesystem "$(ls -r -t /var/cache/fsck/ | head -1)"

# fsck all files, in order of which was least-recently checked
# If this takes longer than 10 minutes, it finishes the current FS then quits
STOP_DATE=$(date -d "+10 minutes" +%s)
ls -r -t /var/cache/fsck/ | while [ $(date +%s) -lt $STOP_DATE ] && read 
"VOLUME" ; do check_filesystem "$VOLUME" ; done
-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss

Reply via email to