On Mon, Oct 23, 2000 at 02:20:17PM -0700, H. Peter Anvin wrote:
> Hi there,
> 
> I wanted to let you know that I was trying 2.2.18-pre17 on
> hera.kernel.org, a uniprocessor with an SMP motherboard.  After about six
> hours, it went catatonic, responding to pings and TCP SYNs but not doing
> anything that required user space.
> 
> On the console, it had multiple copies of the message:
> 
> "Kernel panic: LRU list corrupted"    [fs/buffer.c:438]
> 
> ... but no register dump.
> 
> I have fallen back to 2.2.17 and it has run stably for a few days now.

I found one bug that can generate that kind of corruption and lockups and it's
in 2.2.17 too (and it was in the 2.2.18pre*aa kernels too even if for some
VM change I did it was extremely hard to reproduce there)

I fixed it in 2.2.18pre17aa1 (I suggest to give a try to 2.2.18pre17aa1 btw).

I also included the fix in a new VM-global patch against vanilla 2.2.18pre17
(the VM-global patch is available as a single patch inside 2.2.18pre17aa1/
directory too but I have to maintain a separate version of it against clean
2.2.18pre17 due silly rejects that I can't avoid)

        
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre17/VM-global-2.2.18pre17-7.bz2

(the way I could reproduce the hang with 2.2.18pre17aa1 is been while testing
LVM snapshotting because while a LV is under snapshot [as also while using
raid5] WRITEA will block too)

Vanilla 2.2.18pre17 can reproduce such bug one order of magnitude more easily
since it blocks there all the time, and I had to partly change that blocking
behaviour in my tree for performance reasons. That's why people reported that
VM-global patch "cured" the problem. But really it had a small window for that
bug too.

So now I ported the strict fix to 2.2.18pre17 clean.  It's untested but I'm
almost sure it will fix the problem there too.

--- 2.2.18pre17/fs/buffer.c.~1~ Tue Sep  5 02:28:47 2000
+++ 2.2.18pre17/fs/buffer.c     Wed Oct 25 04:38:34 2000
@@ -1468,10 +1468,13 @@
 #define BUFFER_BUSY_BITS       ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected))
 #define buffer_busy(bh)                ((bh)->b_count || ((bh)->b_state & 
BUFFER_BUSY_BITS))
 
-static int sync_page_buffers(struct buffer_head *bh, int wait)
+static int sync_page_buffers(struct page * page, int wait)
 {
+       struct buffer_head * bh = page->buffers;
        struct buffer_head * tmp = bh;
 
+       page->buffers = NULL;
+
        do {
                struct buffer_head *p = tmp;
                tmp = tmp->b_this_page;
@@ -1482,6 +1485,8 @@
                        ll_rw_block(WRITE, 1, &p);
        } while (tmp != bh);
 
+       page->buffers = bh;
+
        do {
                struct buffer_head *p = tmp;
                tmp = tmp->b_this_page;
@@ -1533,7 +1538,7 @@
  busy:
        too_many = (nr_buffers * bdf_prm.b_un.nfract/100);
 
-       if (!sync_page_buffers(bh, wait)) {
+       if (!sync_page_buffers(page_map, wait)) {
 
                /* If a high percentage of the buffers are dirty, 
                 * wake kflushd 


The above strict version of the fix is downloadable from here too:

        
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre17/strict-VM-corruption-fix-1

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to