Re: [patch(?)] question wrt context switching during disk i/o

Mike Galbraith Sat, 21 Oct 2000 21:25:17 -0700
On Sat, 21 Oct 2000, Mark Hahn wrote:

> > } > bdflush is broken in current kernels.  I posted to linux-mm about this,
> > } > but Rik et al haven't shown any interest.  I normally see bursts of 
> > } > up to around 40K cs/second when doing writes; I hacked a little 
> > } > premption counter into the kernel and verified that they're practially
> > } > all bdflush...
> > } 
> > There's some strangness in bdflush(). The comment says:
> > 
> >                 /*
> >                  * If there are still a lot of dirty buffers around,
> >                  * skip the sleep and flush some more. Otherwise, we
> >                  * go to sleep waiting a wakeup.
> >                  */
> >                 if (!flushed || balance_dirty_state(NODEV) < 0) {
> >                         run_task_queue(&tq_disk);
> >                         schedule();
> >                 }
> 
> to me, that says: we haven't succeeded in flushing anything,
> and our balance looks OK, so unplug the disks and sleep.  this logic
> must have got accidentally inverted at some point.
> 
> I think we want to unplug if we've flushed and are low on memory:

To me, whether we suceeded in flushing something is meaningless.
balance_dirty_state() tells us everything we need to know.. that
we still have too many dirty buffers despite having tried to flush.
We should then unplug to accelerate io completion. I don't see why
bdflush() even calls page_launder().. that calls wakeup_bdflush()
when it hasn't been able to free enough.

Something else that looks strange to me is wakeup_bdflush(1) usage.
In those spots, we add a SCHED_YIELD and schedule() after returning
from wakeup_bdflush().. where we've already scheduled.  I moved the
SCHED_YIELD addition into the wakeup_bdflush() blocking portion and
killed the extra schedule, seemingly without doing harm.

>       if (flushed && balance_dirty_state(NODEV) >= 0)
> 
> > Which leads me to believe that the `<' should be either `==' or `<='. I
> > tried it with the `<=' and it doesn't seem to be so bad...Here's a patch
> > to see if it helps you?
> 
> definitely.  actually, I think there's some major code rot in the tuning
> logic of the VM.  specifically, free_shortage() and inactive_shortage()
> both return an int that tells you how many pages we're short.  negative
> returns mean we're not short.  but all calls to these functions treat the
> return as a boolean!  so for example, the following is wrong:
> 
>       if (can_get_io_locks && !launder_loop && free_shortage()) {
> 
> (vmscan.c:page_launder)  should be "free_shortage > 0".  there are 
> about a dozen other similar places, for which I'll shortly post a patch.

Looking forward to trying your patch.

        -Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch(?)] question wrt context switching during disk i/o

Reply via email to