Hello

Doing some test where lots of small files get copied (and some large
ones) around, I experienced filesystem corruption with 2.4.1-pre3.

The system has a ASUS P2B-DS (onboard adaptec controller) with two P2-350,
256MB (one module) PC-100 222 SDRAM with ECC, with 4 SCSI disk and one IDE
disk put together as one big SW Raid5 disk, SuSE 6.4 with the following:
    Linux cube 2.4.1-pre3 #3 SMP Sun Jan 14 14:19:02 CET 2001 i686 unknown
    Kernel modules    2.3.24
    Gnu C             2.95.2
    Gnu Make          3.78.1
    Binutils          2.9.5.0.24
    Linux C Library   x   1 root     root      4061504 Mar 11  2000 /lib/libc.so.6
    Dynamic linker    ldd (GNU libc) 2.1.3
    Procps            2.0.6
    Mount             2.10r
    Net-tools         1.54
    Kbd               0.99
    Sh-utils          2.0
    Modules Loaded

I know my modutilities are not up to date, but all relevant things (SCSI,
filesystem, raid) where compiled in.
Here are some messages from syslog:

    Jan 14 18:50:00 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (613512), 0
    Jan 14 18:56:19 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (613533), 0
    Jan 14 18:56:20 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (613510), 0
    Jan 14 18:57:14 cube kernel: attempt to access beyond end of device
    Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=1753106892, limit=8449536
    Jan 14 18:57:14 cube kernel: attempt to access beyond end of device
    Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=1635361196, limit=8449536
        .
        .
        .
    Jan 14 18:57:14 cube kernel: attempt to access beyond end of device
    Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=127799040, limit=8449536
    Jan 14 18:57:14 cube kernel: attempt to access beyond end of device
    Jan 14 18:57:14 cube kernel: 09:01: rw=1, want=1004451972, limit=8449536
    Jan 14 19:09:05 cube -- MARK --
    Jan 14 19:29:05 cube -- MARK --
    Jan 14 19:32:55 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (145947), 0
    Jan 14 19:32:55 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (145948), 0
    Jan 14 19:32:55 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (145949), 0
        .
        .
        .
    Jan 14 19:33:18 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (145945), 0
    Jan 14 19:33:18 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (145946), 0
    Jan 14 19:49:06 cube -- MARK --
    Jan 14 19:53:36 cube kernel: __alloc_pages: 2-order allocation failed.
    Jan 14 19:53:39 cube last message repeated 8 times
    Jan 14 20:09:06 cube -- MARK --
    Jan 14 20:10:52 cube kernel: EXT2-fs error (device md(9,1)): ext2_readdir: bad 
entry in directory #929061: rec_len is smaller than minimal - offset=4056, inode=0, 
rec_len=0, name_len=0
    Jan 14 20:10:52 cube kernel: EXT2-fs error (device md(9,1)): empty_dir: bad entry 
in directory #929061: rec_len is smaller than minimal - offset=4056, inode=0, 
rec_len=0, name_len=0
    Jan 14 20:30:20 cube -- MARK --
    Jan 14 20:50:24 cube -- MARK --
    Jan 14 21:10:06 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit 
already cleared for block 1402395
    Jan 14 21:10:06 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit 
already cleared for block 1438368
    Jan 14 21:11:57 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit 
already cleared for block 1439021
    Jan 14 21:11:57 cube kernel: EXT2-fs error (device md(9,1)): ext2_free_blocks: bit 
already cleared for block 1435690
    Jan 14 21:27:01 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (698429), 0
        .
        .
        .
    Jan 14 21:27:03 cube kernel: EXT2-fs warning (device md(9,1)): ext2_unlink: 
Deleting nonexistent file (698429), 0
    Jan 14 21:30:02 cube nscd: 175: cannot stat() file `/etc/group': No such file or 
directory
    Jan 14 21:35:38 cube /usr/sbin/gpm[113]: oops() invoked from gpm.c(508)
    Jan 14 21:35:38 cube /usr/sbin/gpm[113]: get_shift_state: Inappropriate ioctl for 
device

At this point I could still log into the system.
I noticed after killing all process with SysRQ+i that something (I assume
the kernel) was eating my memory:

    ps aux

    USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
    root         1  0.0  0.0   344  200 ?        S    14:48   0:09 init
    root         2  0.0  0.0     0    0 ?        SW   14:48   0:00 [keventd]
    root         4  0.0  0.0     0    0 ?        SW   14:48   0:23 [kswapd]
    root         5  0.0  0.0     0    0 ?        SW   14:48   0:03 [kreclaimd]
    root         6  0.7  0.0     0    0 ?        SW   14:48   2:59 [bdflush]
    root         7  0.3  0.0     0    0 ?        SW   14:48   1:19 [kupdate]
    root         8  0.0  0.0     0    0 ?        SW<  14:48   0:00 [mdrecoveryd]
    root         9  2.2  0.0     0    0 ?        SW<  14:48   9:16 [raid5d]
    root        10  0.0  0.0     0    0 ?        SW<  14:48   0:00 [raid1d]
    root     11847  0.0  0.2  1160  524 tty6     S    21:38   0:00 /sbin/mingetty tty6
    root     11848  0.0  0.2  1160  528 tty5     S    21:38   0:00 /sbin/mingetty tty5
    root     11854  0.0  0.2  1160  524 tty3     S    21:38   0:00 /sbin/mingetty tty3
    root     11855  0.0  0.2  1160  524 tty4     S    21:38   0:00 /sbin/mingetty tty4
    root     11856  0.0  0.4  1804 1112 tty1     S    21:38   0:00 login -- root
    root     11857  0.0  0.2  1160  524 tty2     S    21:38   0:00 /sbin/mingetty tty2
    root     11858  0.2  0.5  2164 1316 tty1     S    21:39   0:00 -bash
    root     11867  0.0  0.4  2760 1156 tty1     R    21:39   0:00 ps aux

    free

    total       used       free     shared    buffers     cached
    Mem:        255284     226000      29284          0      17828     117788
    -/+ buffers/cache:      90384     164900
    Swap:       267228        540     266688

One ps even just dumped core and I still have the core file. Don't know if
this is of help:

    afdbench@cube:~$ gdb /bin/ps core.ps
    GNU gdb 4.18
    Copyright 1998 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "i386-suse-linux"...(no debugging symbols found)...
    Core was generated by `ps x'.
    Program terminated with signal 11, Segmentation fault.
    Reading symbols from /lib/libc.so.6...done.
    Reading symbols from /lib/ld-linux.so.2...done.
    #0  0x80525ac in strcpy () at ../sysdeps/generic/strcpy.c:43
    43      ../sysdeps/generic/strcpy.c: No such file or directory.
    (gdb) where
    #0  0x80525ac in strcpy () at ../sysdeps/generic/strcpy.c:43
    #1  0x81490e0 in __ctype_b ()
    #2  0x8052f28 in strcpy () at ../sysdeps/generic/strcpy.c:43
    #3  0x8050472 in strcpy () at ../sysdeps/generic/strcpy.c:43
    #4  0x80509a5 in strcpy () at ../sysdeps/generic/strcpy.c:43
    #5  0x40034a5e in __libc_start_main () at ../sysdeps/generic/libc-start.c:93

I have this system now for several years and it has always been very
stabel under 2.2.x. In fact this is my first filesystem corruption.

If I forget anything or more information is required please tell me.

Holger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to