Re: ule+smp: small optimization for turnstile priority lending

2012-09-20 Thread David Xu

On 2012/09/18 22:05, Andriy Gapon wrote:


Here is a snippet that demonstrates the issue on a supposedly fully loaded
2-processor system:

136794   0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid 102818",
state:"running", attributes: prio:122

136793   0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid 111916",
state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)"

136792   1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid 14",
state:"running", attributes: prio:255

136791   1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", counter:0,
attributes: none

136790   1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid 113473",
state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx"

136789   1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", counter:2,
attributes: none

136788   1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid 113473",
point:"wokeup", attributes: linkedto:"Xorg tid 102818"

136787   1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid 102818",
state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473"

136786   1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", counter:1,
attributes: none

136785   1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid 102818",
state:"runq rem", attributes: prio:176

136784   1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid 102818",
point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid 113473"

136783   1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid 113473",
state:"running", attributes: prio:104

See how how the Xorg thread was forced from CPU 1 to CPU 0 where it preempted
cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero 
load.

Here is a proposed solution:

 turnstile_wait: optimize priority lending to a thread on a runqueue

 As the current thread is definitely going into mi_switch, it now removes
 its load before doing priority propagation which can potentially result
 in sched_add.  In the SMP && ULE case the latter searches for the
 least loaded CPU to place a boosted thread, which is supposedly about
 to run.

diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
index 8e466cd..3299cae 100644
--- a/sys/kern/sched_ule.c
+++ b/sys/kern/sched_ule.c
@@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread *newtd, int
flags)
/* This thread must be going to sleep. */
TDQ_LOCK(tdq);
mtx = thread_lock_block(td);
-   tdq_load_rem(tdq, td);
+#if defined(SMP)
+   if ((flags & SW_TYPE_MASK) != SWT_TURNSTILE)
+#endif
+   tdq_load_rem(tdq, td);
}
/*
 * We enter here with the thread blocked and assigned to the
@@ -2412,6 +2415,21 @@ sched_rem(struct thread *td)
tdq_setlowpri(tdq, NULL);
  }

+void
+sched_load_rem(struct thread *td)
+{
+   struct tdq *tdq;
+
+   KASSERT(td == curthread,
+   ("sched_rem_load: only curthread is supported"));
+   KASSERT(td->td_oncpu == td->td_sched->ts_cpu,
+   ("thread running on cpu different from ts_cpu"));
+   tdq = TDQ_CPU(td->td_sched->ts_cpu);
+   TDQ_LOCK_ASSERT(tdq, MA_OWNED);
+   MPASS(td->td_lock == TDQ_LOCKPTR(tdq));
+   tdq_load_rem(tdq, td);
+}
+
  /*
   * Fetch cpu utilization information.  Updates on demand.
   */
diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c
index 31d16fe..d1d68e9 100644
--- a/sys/kern/subr_turnstile.c
+++ b/sys/kern/subr_turnstile.c
@@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread *owner,
int queue)
LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash);
}
thread_lock(td);
+#if defined(SCHED_ULE) && defined(SMP)
+   /*
+* Remove load earlier so that it does not affect cpu selection
+* for a thread waken up due to priority lending, if any.
+*/
+   sched_load_rem(td);
+#endif
thread_lock_set(td, &ts->ts_lock);
td->td_turnstile = NULL;

diff --git a/sys/sys/sched.h b/sys/sys/sched.h
index 4b8387c..b1ead1b 100644
--- a/sys/sys/sched.h
+++ b/sys/sys/sched.h
@@ -110,6 +110,9 @@ voidsched_preempt(struct thread *td);
  void  sched_add(struct thread *td, int flags);
  void  sched_clock(struct thread *td);
  void  sched_rem(struct thread *td);
+#if defined(SCHED_ULE) && defined(SMP)
+void   sched_load_rem(struct thread *td);
+#endif
  void  sched_tick(int cnt);
  void  sched_relinquish(struct thread *td);
  struct thread *sched_choose(void);



I found another scenario in taskqueue, in the function
taskqueue_terminate, current thread tries to wake
another thread up and sleep immediately, the tq_mutex sometimes
is a spinlock. So if you remove one thread load from current cpu
before wakeup, the resumed thread may be put on same cpu,
so it will optimize the cpu scheduling too.


Re: fdgrowtable() cleanup

2012-09-20 Thread Konstantin Belousov
On Wed, Sep 19, 2012 at 09:04:30PM +0200, Dag-Erling Sm??rgrav wrote:
> Konstantin Belousov  writes:
> > "Dag-Erling Sm??rgrav"  writes:
> > > I assume you mean assignments, not calculations.  I trust the compiler
> > > to move them to where they are needed - a trivial optimization with SSA.
> > It is a dereference before the assignment, so I perceive it as the
> > calculation. No, I do not think that compiler can move the dereference,
> > because there are many sequential points between the calculation and
> > use of the result.
> 
> Sequence points are a language feature and are only meaningful in the
> translation phase.  Once the code is in SSA form or some other
> equivalent intermediate representation, the compiler can see that the
> variables are only used in one specific case and move the assignments
> inside that block.  In fact, it will probably optimize them away,
> because they are completely unnecessary - I added them solely for
> readability after Niclas called my attention to the fact that it is
> almost impossible to understand fdgrowtable() at a first reading.
Compiler cannot change the semantic of the program regardless of the phase
of compilation the change happens at.

Compiler that would reorder reads from the global memory across
sequential points seems to be not disallowed by c99, but it certainly
contradicts to our semantic of the lock releases, which the compiler
cannot infer from the non-inlined function calls. Malloc call is
the sequential point and does have the release semantic FWIW.
> 
> > > Correct, thanks for pointing it out.  The easiest solution is to place
> > > the struct freetable between the file array and the flag array.
> > As I know, for all ABIs FreeBSD run on, the structure alignment is the
> > alignment of the most requiring member. You would introduce very tacit
> > dependency between struct file and struct freetable.
> 
> The existing code *already* places the struct freetable immediately
> after the struct file array.  What I'm proposing now is to leave the
> struct freetable where it was but move the fileflags array so they don't
> overlap.  The fileflags array is actually a char[] and has no alignment
> requirement.
Ok.

> 
> Memory usage will not increase, because the code already allocates
> additional space for the struct freetable to make sure it will fit even
> if onfiles < sizeof(struct freetable).
> 
> BTW, I just noticed that there is some dead code in fdgrowtable():
> 
>   nnfiles = NDSLOTS(nfd) * NDENTRIES; /* round up */
>   if (nnfiles <= onfiles)
>   /* the table is already large enough */
>   return;
> 
> /* ... */
> 
>   /* allocate new bitmaps if necessary */
>   if (NDSLOTS(nnfiles) > NDSLOTS(onfiles))
>   nmap = malloc(NDSLOTS(nnfiles) * NDSLOTSIZE,
>   M_FILEDESC, M_ZERO | M_WAITOK);
>   else
>   nmap = NULL;
> 
> Since neither nnflags nor onflags are modified between these two chunks
> of code, the condition in the second if will always be true.

You mean that new bitmap shall be always allocated, making the
nmap = NULL assignment not needed ? I agree. It would also make the
code in the last if () executed unconditionally.


pgpxF1S7g8LNi.pgp
Description: PGP signature


Re: ule+smp: small optimization for turnstile priority lending

2012-09-20 Thread Andriy Gapon
on 19/09/2012 10:33 Attilio Rao said the following:
> It is hard for me to tell if this is subject to same issues because I
> do not entirely understand where the locking was happening in your
> patch.
> Can you try testing this with your already KTR instrumented test and
> possibly report?

The patch works well as far as I can tell.  Thank you!
There is one warning with full witness enables but it appears to be harmless (so
far):

Acquiring duplicate lock of same type: "sched lock"
 1st sched lock 1 @ /usr/src/sys/kern/subr_turnstile.c:212
 2nd sched lock 0 @ /usr/src/sys/kern/sched_ule.c:1244
KDB: stack backtrace:
db_trace_self_wrapper() at 0x802d238a = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0x80555f4a = kdb_backtrace+0x3a
_witness_debugger() at 0x8056e2bc = _witness_debugger+0x2c
witness_checkorder() at 0x8056f759 = witness_checkorder+0x579
_mtx_lock_spin_flags() at 0x80504bcd = _mtx_lock_spin_flags+0x10d
sched_pickcpu() at 0x80547829 = sched_pickcpu+0x199
sched_add() at 0x80547bdb = sched_add+0x14b
sched_thread_priority() at 0x80548059 = sched_thread_priority+0x1c9
sched_lend_prio() at 0x80548344 = sched_lend_prio+0x14
propagate_priority() at 0x8056801e = propagate_priority+0x1ce
turnstile_wait() at 0x8056959f = turnstile_wait+0x4ef
_mtx_lock_sleep() at 0x805045f6 = _mtx_lock_sleep+0x486
_mtx_lock_flags() at 0x80504814 = _mtx_lock_flags+0x104
lock_mtx() at 0x805049ca = lock_mtx+0x1a
_sleep() at 0x80524589 = _sleep+0x4f9
taskqueue_thread_loop() at 0x805664c8 = taskqueue_thread_loop+0xa8
fork_exit() at 0x804e5d6a = fork_exit+0x1aa
fork_trampoline() at 0x806ea2ce = fork_trampoline+0xe


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ule+smp: small optimization for turnstile priority lending

2012-09-20 Thread Andriy Gapon
on 20/09/2012 14:04 Andriy Gapon said the following:
> on 19/09/2012 10:33 Attilio Rao said the following:
>> It is hard for me to tell if this is subject to same issues because I
>> do not entirely understand where the locking was happening in your
>> patch.
>> Can you try testing this with your already KTR instrumented test and
>> possibly report?
> 
> The patch works well as far as I can tell.  Thank you!

After more testing it seems that the idea was not complete.  While loads are set
better tdq_lowpri (which is generally <= newpri) on the current CPU still
prevents the boosted thread from taking advantage of the lowered load.
Not sure how to "refresh" tdq_lowpri in this scenario...

> There is one warning with full witness enables but it appears to be harmless 
> (so
> far):
> 
> Acquiring duplicate lock of same type: "sched lock"
>  1st sched lock 1 @ /usr/src/sys/kern/subr_turnstile.c:212
>  2nd sched lock 0 @ /usr/src/sys/kern/sched_ule.c:1244
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x802d238a = db_trace_self_wrapper+0x2a
> kdb_backtrace() at 0x80555f4a = kdb_backtrace+0x3a
> _witness_debugger() at 0x8056e2bc = _witness_debugger+0x2c
> witness_checkorder() at 0x8056f759 = witness_checkorder+0x579
> _mtx_lock_spin_flags() at 0x80504bcd = _mtx_lock_spin_flags+0x10d
> sched_pickcpu() at 0x80547829 = sched_pickcpu+0x199
> sched_add() at 0x80547bdb = sched_add+0x14b
> sched_thread_priority() at 0x80548059 = sched_thread_priority+0x1c9
> sched_lend_prio() at 0x80548344 = sched_lend_prio+0x14
> propagate_priority() at 0x8056801e = propagate_priority+0x1ce
> turnstile_wait() at 0x8056959f = turnstile_wait+0x4ef
> _mtx_lock_sleep() at 0x805045f6 = _mtx_lock_sleep+0x486
> _mtx_lock_flags() at 0x80504814 = _mtx_lock_flags+0x104
> lock_mtx() at 0x805049ca = lock_mtx+0x1a
> _sleep() at 0x80524589 = _sleep+0x4f9
> taskqueue_thread_loop() at 0x805664c8 = taskqueue_thread_loop+0xa8
> fork_exit() at 0x804e5d6a = fork_exit+0x1aa
> fork_trampoline() at 0x806ea2ce = fork_trampoline+0xe
> 
> 


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ule+smp: small optimization for turnstile priority lending

2012-09-20 Thread Andriy Gapon
on 20/09/2012 14:04 Andriy Gapon said the following:
> on 19/09/2012 10:33 Attilio Rao said the following:
>> It is hard for me to tell if this is subject to same issues because I
>> do not entirely understand where the locking was happening in your
>> patch.
>> Can you try testing this with your already KTR instrumented test and
>> possibly report?
> 
> The patch works well as far as I can tell.  Thank you!

After more testing it seems that the idea was not complete.  While loads are set
better tdq_lowpri (which is generally <= newpri) on the current CPU still
prevents the boosted thread from taking advantage of the lowered load.
Not sure how to "refresh" tdq_lowpri in this scenario...

> There is one warning with full witness enables but it appears to be harmless 
> (so
> far):
> 
> Acquiring duplicate lock of same type: "sched lock"
>  1st sched lock 1 @ /usr/src/sys/kern/subr_turnstile.c:212
>  2nd sched lock 0 @ /usr/src/sys/kern/sched_ule.c:1244
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x802d238a = db_trace_self_wrapper+0x2a
> kdb_backtrace() at 0x80555f4a = kdb_backtrace+0x3a
> _witness_debugger() at 0x8056e2bc = _witness_debugger+0x2c
> witness_checkorder() at 0x8056f759 = witness_checkorder+0x579
> _mtx_lock_spin_flags() at 0x80504bcd = _mtx_lock_spin_flags+0x10d
> sched_pickcpu() at 0x80547829 = sched_pickcpu+0x199
> sched_add() at 0x80547bdb = sched_add+0x14b
> sched_thread_priority() at 0x80548059 = sched_thread_priority+0x1c9
> sched_lend_prio() at 0x80548344 = sched_lend_prio+0x14
> propagate_priority() at 0x8056801e = propagate_priority+0x1ce
> turnstile_wait() at 0x8056959f = turnstile_wait+0x4ef
> _mtx_lock_sleep() at 0x805045f6 = _mtx_lock_sleep+0x486
> _mtx_lock_flags() at 0x80504814 = _mtx_lock_flags+0x104
> lock_mtx() at 0x805049ca = lock_mtx+0x1a
> _sleep() at 0x80524589 = _sleep+0x4f9
> taskqueue_thread_loop() at 0x805664c8 = taskqueue_thread_loop+0xa8
> fork_exit() at 0x804e5d6a = fork_exit+0x1aa
> fork_trampoline() at 0x806ea2ce = fork_trampoline+0xe

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ule+smp: small optimization for turnstile priority lending

2012-09-20 Thread Attilio Rao
On 9/20/12, Andriy Gapon  wrote:
> on 19/09/2012 10:33 Attilio Rao said the following:
>> It is hard for me to tell if this is subject to same issues because I
>> do not entirely understand where the locking was happening in your
>> patch.
>> Can you try testing this with your already KTR instrumented test and
>> possibly report?
>
> The patch works well as far as I can tell.  Thank you!
> There is one warning with full witness enables but it appears to be harmless
> (so
> far):

Andiy,
thanks a lot for your testing and reports you made so far.
Unfortunately I'm going off for 2 weeks now and I won't work on
FreeBSD for that timeframe. I will get back to those in 2 weeks then.
If you want  to play more with this idea feel free to extend/fix/etc.
this patch.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD 8.0 suddenly freezing

2012-09-20 Thread Dieter BSD
> In the last 72 hours, we've had two different systems freeze; they don't
> apparently recognize any interrupts, they won't respond to ping, and
> they require a powercycle to reboot. We can't easily generate an NMI on
> these boxes.

> Just on the off chance, does this sound like a familiar -- and
> preferably soluble -- problem?

Only theory I can think of is that some device driver shuts off
all interrupts and then goes into an infinite loop.

I have had trouble with at least four different device drivers
locking everything out long enough to lose incoming data.
Usually it recovers without a reboot. Not an infinte loop, just
way way too long to leave interrupts shut off.

See also the "How to diagnose system freezes?" thread from July-August.
Yuri found that a particular rev of the nvidia driver went into
an infinite loop.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD 8.0 suddenly freezing

2012-09-20 Thread Erich Dollansky
Hi,

On Mon, 17 Sep 2012 15:21:59 -0600
Charles R Martin  wrote:

> We have a web host installation using FreeBSD 8.0, Tomcat, and Java.
> We ship many copies of this, and haven't changed the OS load in
> several months.
> 
> In the last 72 hours, we've had two different systems freeze; they
> don't apparently recognize any interrupts, they won't respond to
> ping, and they require a powercycle to reboot.  We can't easily
> generate an NMI on these boxes.
> 
> No crashdumps were generated.
> 
> Just on the off chance, does this sound like a familiar -- and 
> preferably soluble -- problem?
> 
I have had 8.0 to 8.3 running on a machine without a single crash, hang
or whatever except of problems with sound. A dmesg might will tell some
people of problematic hardware.

As you write on top of this that you have shipped many copies of it,
are the problems happening with the same or at least identical hardware?

Erich
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"