On Mon, 11 Jun 2007, John Baldwin wrote:

On Friday 08 June 2007 10:20:19 pm Bruce Evans wrote:
On Fri, 8 Jun 2007, John Baldwin wrote:

On Friday 08 June 2007 10:50:15 am Bruce Evans wrote:
On Thu, 7 Jun 2007, John Baldwin wrote:

["this" is in vm_page_zero_idle() where the vm mutex is dropped]

I think this is as good a place to preempt as any.  In fact, the code
can be simplified by preempting here and not in the main loop.  The
...

Preemption happens when a thread is made runnable, i.e. usually when an
interrupt comes in, but also when releasing locks or doing
wakeup/cv_broadcast, etc.  The only thing the idle thread does other than
interrupts is release the lock.

I'm not sure what you mean in the last sentence.  This is a special idle
thread, and it doesn't do interrupts.

One side effect of using a critical section
here is that we are potentially adding interrupt latency, so this may be a
rather bad thing. :-/  Probably you are seeing the number of context switches
go down because we are "batching" up switches.  Say you get two interrupts
during an iteration of the main loop for this process, previously you'd get 4
context switches out of that: zeroidle -> first ithread -> zeroidle, and then
later zeroidle -> second ithread -> zeroidle.  Now you block those switches
until the critical exit, at which point both interrupt threads are pending,
so you do: zeroidle -> first ithread -> second ithread -> zeroidle, i.e. 3
context switches.  However, the cost is increased interrupt latency and that
may explain why the "real" time went up.  Lots of context switches is fairly
lame, but if the box is idle, then I'm less worried about wasting time in a
context switch (well, I'm concerned because it means we can't put the CPU to
sleep since we are executing something still, but extra overhead in an idle
thread doing a background task is not near as worrisome as overhead
in "important" work like ithreads).

I think I want to batch up switches a little in general, and and only
a little batching occurs here.  pmap_zero_page_idle() takes about 1uS
on my main test system (Turion X2 with relatively slow DDR2 memory
which can neverthless be zeroed at 4 or 5GB/S).  An extra 1uS of
interrupt latency here and there won't make much difference.  It's
less than the extra latency for 1 ATPIC access if not using the APIC.
Also, for the makeworld benchmark, the interrupt handler runs for about
2% of the time, and pagezero runs for about 1% of the time.  These
threads just don't run long enough to have much contention.

I think the main cause of extra context switches that I saw in some
misconfigurations is switching while holding the vm mutex.  The thread
switched to is quite likely to block on this mutex and switch back.

As to why preemption doesn't work for SMP, a thread only knows to preempt if
it makes a higher priority thread runnable.  This happens in mtx_unlock when
we wakeup a thread waiting on the lock, in wakeup, or when an interrupt
thread is scheduled (the interrupted thread "sees" the ithread being
scheduled).  If another thread on another CPU makes a thread runnable, the
thread on the first CPU has no idea unless the second CPU explicitly sends a
message (i.e. IPI) to the first CPU asking it to yield instead.

I believe SCHED_ULE does the IPI.

Specifically, suppose you have threads A, B, C and with priorities A < B < C.
Suppose A is running on CPU 0, and C is running on CPU 1.  If C does a wakeup
that awakens B, C isn't going to preempt to B because C is more important
(assume > means more important, even though priority values are opposite
that, which is annoying).  If A had awakened B it would have preempted
though, so in theory C should look at the other CPUs, notice that A < B, and
send an IPI to CPU 0 to ask it to preempt from A to B.  One thing is that
IPI_PREEMPT should set td_owepreempt if the target A is in a critical
section, I haven't checked to see if we do that currently.

Would it be worth checking a preemption flag in mtx_unlock()?  This
would bloat the macro a bit.  However, we already have the check and the
bloat for spinlocks, at least on i386's, by checking in critical_exit()
via spinlock_exit().

Using critical sections should have the side effect of getting the flag
checked in critical_exit().  This doesn't seem to work (for SCHED_4BSD),
so there seems to be a problem setting the flag.

Next I will try moving the PREEMPTION code to where the vm mutex is dropped
naturally.  I will try the following order:

I like this idea a lot.

Actually, I would keep the sched_runnable(), etc. as #ifndef PREEMPTION, the
critical_exit() already does that check (modulo concerns above).  Also, I

Testing shows critical_exit() doesn't seem to be doing the preemption.  On
UP, depending on PREEMPTION makes little difference, but on 2-way SMP with
no voluntary yielding, pagezero is too active.  The critical sections don't
seem to be doing much.

originally wanted to not hold the critical sectino while zeroing the page to
not impeded interrupts during that operation.  I was actually trying to just
avoid preempting while holding the lock.  However, given my comments about
how this harms interrupt latency, maybe this is a bad idea and we should just
let priority propagation handle that for us.  Moving the context switch is
probably a good idea though.

The 1 uS extra latency on my main test machine wouldn't matter, but go back
to a 486 with 10MB/S memory and the latency would be a problem -- the 1uS
becomes 400uS, which is a lot even for a 486.

the reason I wanted to avoid preempting while holding the lock is you can get
this case:

zeroidle -> some ithread -> some top-half thread in kernel which needs the
vm page queue mtx -> zeroidle (with top-half thread's priority; until
mtx_unlock) -> top-half thread in kernel -> zeroidle

which can be many context switches.  By not switching while holding the lock,
one can reduce this to:

zeroidle -> some ithread -> some top-half thread -> zeroidle

but at the cost of adding latency to "some ithread" and "some top-half thread"

Maybe preemption should be inhibited a bit when any mutex is held.

Here is my current version.  I got tired of using a dynamic enable for
the PREEMPTION ifdef code and removed all the conditionals after the
most recent test showed that the voluntary switch is still needed.

% Index: vm_zeroidle.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/vm/vm_zeroidle.c,v
% retrieving revision 1.47
% diff -u -2 -r1.47 vm_zeroidle.c
% --- vm_zeroidle.c     5 Jun 2007 00:00:57 -0000       1.47
% +++ vm_zeroidle.c     15 Jun 2007 19:30:13 -0000
% @@ -111,5 +111,13 @@
%               mtx_unlock(&vm_page_queue_free_mtx);
%               pmap_zero_page_idle(m);
% +             if (sched_runnable()) {
% +                     thread_lock(curthread);
% +                     critical_exit();
% +                     mi_switch(SW_VOL, NULL);
% +                     thread_unlock(curthread);
% +             } else
% +                     critical_exit();
%               mtx_lock(&vm_page_queue_free_mtx);
% +             critical_enter();
%               m->flags |= PG_ZERO;
%               vm_pageq_enqueue(PQ_FREE + m->pc, m);
% @@ -141,18 +149,14 @@
% % mtx_lock(&vm_page_queue_free_mtx);
% +     critical_enter();
%       for (;;) {
%               if (vm_page_zero_check()) {
%                       vm_page_zero_idle();
% -#ifndef PREEMPTION
% -                     if (sched_runnable()) {
% -                             thread_lock(curthread);
% -                             mi_switch(SW_VOL, NULL);
% -                             thread_unlock(curthread);
% -                     }
% -#endif
%               } else {
%                       wakeup_needed = TRUE;
% +                     critical_exit();
%                       msleep(&zero_state, &vm_page_queue_free_mtx, 0,
%                           "pgzero", hz * 300);
% +                     critical_enter();
%               }
%       }

Bruce
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to