Re: ule+smp: small optimization for turnstile priority lending
On 2012/09/18 22:05, Andriy Gapon wrote: Here is a snippet that demonstrates the issue on a supposedly fully loaded 2-processor system: 136794 0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid 102818", state:"running", attributes: prio:122 136793 0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid 111916", state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)" 136792 1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid 14", state:"running", attributes: prio:255 136791 1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", counter:0, attributes: none 136790 1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid 113473", state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx" 136789 1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", counter:2, attributes: none 136788 1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid 113473", point:"wokeup", attributes: linkedto:"Xorg tid 102818" 136787 1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid 102818", state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473" 136786 1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", counter:1, attributes: none 136785 1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid 102818", state:"runq rem", attributes: prio:176 136784 1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid 102818", point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid 113473" 136783 1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid 113473", state:"running", attributes: prio:104 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it preempted cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero load. Here is a proposed solution: turnstile_wait: optimize priority lending to a thread on a runqueue As the current thread is definitely going into mi_switch, it now removes its load before doing priority propagation which can potentially result in sched_add. In the SMP && ULE case the latter searches for the least loaded CPU to place a boosted thread, which is supposedly about to run. diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c index 8e466cd..3299cae 100644 --- a/sys/kern/sched_ule.c +++ b/sys/kern/sched_ule.c @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread *newtd, int flags) /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); - tdq_load_rem(tdq, td); +#if defined(SMP) + if ((flags & SW_TYPE_MASK) != SWT_TURNSTILE) +#endif + tdq_load_rem(tdq, td); } /* * We enter here with the thread blocked and assigned to the @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td) tdq_setlowpri(tdq, NULL); } +void +sched_load_rem(struct thread *td) +{ + struct tdq *tdq; + + KASSERT(td == curthread, + ("sched_rem_load: only curthread is supported")); + KASSERT(td->td_oncpu == td->td_sched->ts_cpu, + ("thread running on cpu different from ts_cpu")); + tdq = TDQ_CPU(td->td_sched->ts_cpu); + TDQ_LOCK_ASSERT(tdq, MA_OWNED); + MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); + tdq_load_rem(tdq, td); +} + /* * Fetch cpu utilization information. Updates on demand. */ diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c index 31d16fe..d1d68e9 100644 --- a/sys/kern/subr_turnstile.c +++ b/sys/kern/subr_turnstile.c @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread *owner, int queue) LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash); } thread_lock(td); +#if defined(SCHED_ULE) && defined(SMP) + /* +* Remove load earlier so that it does not affect cpu selection +* for a thread waken up due to priority lending, if any. +*/ + sched_load_rem(td); +#endif thread_lock_set(td, &ts->ts_lock); td->td_turnstile = NULL; diff --git a/sys/sys/sched.h b/sys/sys/sched.h index 4b8387c..b1ead1b 100644 --- a/sys/sys/sched.h +++ b/sys/sys/sched.h @@ -110,6 +110,9 @@ voidsched_preempt(struct thread *td); void sched_add(struct thread *td, int flags); void sched_clock(struct thread *td); void sched_rem(struct thread *td); +#if defined(SCHED_ULE) && defined(SMP) +void sched_load_rem(struct thread *td); +#endif void sched_tick(int cnt); void sched_relinquish(struct thread *td); struct thread *sched_choose(void); I found another scenario in taskqueue, in the function taskqueue_terminate, current thread tries to wake another thread up and sleep immediately, the tq_mutex sometimes is a spinlock. So if you remove one thread load from current cpu before wakeup, the resumed thread may be put on same cpu, so it will optimize the cpu scheduling too.
Re: fdgrowtable() cleanup
On Wed, Sep 19, 2012 at 09:04:30PM +0200, Dag-Erling Sm??rgrav wrote: > Konstantin Belousov writes: > > "Dag-Erling Sm??rgrav" writes: > > > I assume you mean assignments, not calculations. I trust the compiler > > > to move them to where they are needed - a trivial optimization with SSA. > > It is a dereference before the assignment, so I perceive it as the > > calculation. No, I do not think that compiler can move the dereference, > > because there are many sequential points between the calculation and > > use of the result. > > Sequence points are a language feature and are only meaningful in the > translation phase. Once the code is in SSA form or some other > equivalent intermediate representation, the compiler can see that the > variables are only used in one specific case and move the assignments > inside that block. In fact, it will probably optimize them away, > because they are completely unnecessary - I added them solely for > readability after Niclas called my attention to the fact that it is > almost impossible to understand fdgrowtable() at a first reading. Compiler cannot change the semantic of the program regardless of the phase of compilation the change happens at. Compiler that would reorder reads from the global memory across sequential points seems to be not disallowed by c99, but it certainly contradicts to our semantic of the lock releases, which the compiler cannot infer from the non-inlined function calls. Malloc call is the sequential point and does have the release semantic FWIW. > > > > Correct, thanks for pointing it out. The easiest solution is to place > > > the struct freetable between the file array and the flag array. > > As I know, for all ABIs FreeBSD run on, the structure alignment is the > > alignment of the most requiring member. You would introduce very tacit > > dependency between struct file and struct freetable. > > The existing code *already* places the struct freetable immediately > after the struct file array. What I'm proposing now is to leave the > struct freetable where it was but move the fileflags array so they don't > overlap. The fileflags array is actually a char[] and has no alignment > requirement. Ok. > > Memory usage will not increase, because the code already allocates > additional space for the struct freetable to make sure it will fit even > if onfiles < sizeof(struct freetable). > > BTW, I just noticed that there is some dead code in fdgrowtable(): > > nnfiles = NDSLOTS(nfd) * NDENTRIES; /* round up */ > if (nnfiles <= onfiles) > /* the table is already large enough */ > return; > > /* ... */ > > /* allocate new bitmaps if necessary */ > if (NDSLOTS(nnfiles) > NDSLOTS(onfiles)) > nmap = malloc(NDSLOTS(nnfiles) * NDSLOTSIZE, > M_FILEDESC, M_ZERO | M_WAITOK); > else > nmap = NULL; > > Since neither nnflags nor onflags are modified between these two chunks > of code, the condition in the second if will always be true. You mean that new bitmap shall be always allocated, making the nmap = NULL assignment not needed ? I agree. It would also make the code in the last if () executed unconditionally. pgpxF1S7g8LNi.pgp Description: PGP signature
Re: ule+smp: small optimization for turnstile priority lending
on 19/09/2012 10:33 Attilio Rao said the following: > It is hard for me to tell if this is subject to same issues because I > do not entirely understand where the locking was happening in your > patch. > Can you try testing this with your already KTR instrumented test and > possibly report? The patch works well as far as I can tell. Thank you! There is one warning with full witness enables but it appears to be harmless (so far): Acquiring duplicate lock of same type: "sched lock" 1st sched lock 1 @ /usr/src/sys/kern/subr_turnstile.c:212 2nd sched lock 0 @ /usr/src/sys/kern/sched_ule.c:1244 KDB: stack backtrace: db_trace_self_wrapper() at 0x802d238a = db_trace_self_wrapper+0x2a kdb_backtrace() at 0x80555f4a = kdb_backtrace+0x3a _witness_debugger() at 0x8056e2bc = _witness_debugger+0x2c witness_checkorder() at 0x8056f759 = witness_checkorder+0x579 _mtx_lock_spin_flags() at 0x80504bcd = _mtx_lock_spin_flags+0x10d sched_pickcpu() at 0x80547829 = sched_pickcpu+0x199 sched_add() at 0x80547bdb = sched_add+0x14b sched_thread_priority() at 0x80548059 = sched_thread_priority+0x1c9 sched_lend_prio() at 0x80548344 = sched_lend_prio+0x14 propagate_priority() at 0x8056801e = propagate_priority+0x1ce turnstile_wait() at 0x8056959f = turnstile_wait+0x4ef _mtx_lock_sleep() at 0x805045f6 = _mtx_lock_sleep+0x486 _mtx_lock_flags() at 0x80504814 = _mtx_lock_flags+0x104 lock_mtx() at 0x805049ca = lock_mtx+0x1a _sleep() at 0x80524589 = _sleep+0x4f9 taskqueue_thread_loop() at 0x805664c8 = taskqueue_thread_loop+0xa8 fork_exit() at 0x804e5d6a = fork_exit+0x1aa fork_trampoline() at 0x806ea2ce = fork_trampoline+0xe -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
on 20/09/2012 14:04 Andriy Gapon said the following: > on 19/09/2012 10:33 Attilio Rao said the following: >> It is hard for me to tell if this is subject to same issues because I >> do not entirely understand where the locking was happening in your >> patch. >> Can you try testing this with your already KTR instrumented test and >> possibly report? > > The patch works well as far as I can tell. Thank you! After more testing it seems that the idea was not complete. While loads are set better tdq_lowpri (which is generally <= newpri) on the current CPU still prevents the boosted thread from taking advantage of the lowered load. Not sure how to "refresh" tdq_lowpri in this scenario... > There is one warning with full witness enables but it appears to be harmless > (so > far): > > Acquiring duplicate lock of same type: "sched lock" > 1st sched lock 1 @ /usr/src/sys/kern/subr_turnstile.c:212 > 2nd sched lock 0 @ /usr/src/sys/kern/sched_ule.c:1244 > KDB: stack backtrace: > db_trace_self_wrapper() at 0x802d238a = db_trace_self_wrapper+0x2a > kdb_backtrace() at 0x80555f4a = kdb_backtrace+0x3a > _witness_debugger() at 0x8056e2bc = _witness_debugger+0x2c > witness_checkorder() at 0x8056f759 = witness_checkorder+0x579 > _mtx_lock_spin_flags() at 0x80504bcd = _mtx_lock_spin_flags+0x10d > sched_pickcpu() at 0x80547829 = sched_pickcpu+0x199 > sched_add() at 0x80547bdb = sched_add+0x14b > sched_thread_priority() at 0x80548059 = sched_thread_priority+0x1c9 > sched_lend_prio() at 0x80548344 = sched_lend_prio+0x14 > propagate_priority() at 0x8056801e = propagate_priority+0x1ce > turnstile_wait() at 0x8056959f = turnstile_wait+0x4ef > _mtx_lock_sleep() at 0x805045f6 = _mtx_lock_sleep+0x486 > _mtx_lock_flags() at 0x80504814 = _mtx_lock_flags+0x104 > lock_mtx() at 0x805049ca = lock_mtx+0x1a > _sleep() at 0x80524589 = _sleep+0x4f9 > taskqueue_thread_loop() at 0x805664c8 = taskqueue_thread_loop+0xa8 > fork_exit() at 0x804e5d6a = fork_exit+0x1aa > fork_trampoline() at 0x806ea2ce = fork_trampoline+0xe > > -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
on 20/09/2012 14:04 Andriy Gapon said the following: > on 19/09/2012 10:33 Attilio Rao said the following: >> It is hard for me to tell if this is subject to same issues because I >> do not entirely understand where the locking was happening in your >> patch. >> Can you try testing this with your already KTR instrumented test and >> possibly report? > > The patch works well as far as I can tell. Thank you! After more testing it seems that the idea was not complete. While loads are set better tdq_lowpri (which is generally <= newpri) on the current CPU still prevents the boosted thread from taking advantage of the lowered load. Not sure how to "refresh" tdq_lowpri in this scenario... > There is one warning with full witness enables but it appears to be harmless > (so > far): > > Acquiring duplicate lock of same type: "sched lock" > 1st sched lock 1 @ /usr/src/sys/kern/subr_turnstile.c:212 > 2nd sched lock 0 @ /usr/src/sys/kern/sched_ule.c:1244 > KDB: stack backtrace: > db_trace_self_wrapper() at 0x802d238a = db_trace_self_wrapper+0x2a > kdb_backtrace() at 0x80555f4a = kdb_backtrace+0x3a > _witness_debugger() at 0x8056e2bc = _witness_debugger+0x2c > witness_checkorder() at 0x8056f759 = witness_checkorder+0x579 > _mtx_lock_spin_flags() at 0x80504bcd = _mtx_lock_spin_flags+0x10d > sched_pickcpu() at 0x80547829 = sched_pickcpu+0x199 > sched_add() at 0x80547bdb = sched_add+0x14b > sched_thread_priority() at 0x80548059 = sched_thread_priority+0x1c9 > sched_lend_prio() at 0x80548344 = sched_lend_prio+0x14 > propagate_priority() at 0x8056801e = propagate_priority+0x1ce > turnstile_wait() at 0x8056959f = turnstile_wait+0x4ef > _mtx_lock_sleep() at 0x805045f6 = _mtx_lock_sleep+0x486 > _mtx_lock_flags() at 0x80504814 = _mtx_lock_flags+0x104 > lock_mtx() at 0x805049ca = lock_mtx+0x1a > _sleep() at 0x80524589 = _sleep+0x4f9 > taskqueue_thread_loop() at 0x805664c8 = taskqueue_thread_loop+0xa8 > fork_exit() at 0x804e5d6a = fork_exit+0x1aa > fork_trampoline() at 0x806ea2ce = fork_trampoline+0xe -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On 9/20/12, Andriy Gapon wrote: > on 19/09/2012 10:33 Attilio Rao said the following: >> It is hard for me to tell if this is subject to same issues because I >> do not entirely understand where the locking was happening in your >> patch. >> Can you try testing this with your already KTR instrumented test and >> possibly report? > > The patch works well as far as I can tell. Thank you! > There is one warning with full witness enables but it appears to be harmless > (so > far): Andiy, thanks a lot for your testing and reports you made so far. Unfortunately I'm going off for 2 weeks now and I won't work on FreeBSD for that timeframe. I will get back to those in 2 weeks then. If you want to play more with this idea feel free to extend/fix/etc. this patch. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: FreeBSD 8.0 suddenly freezing
> In the last 72 hours, we've had two different systems freeze; they don't > apparently recognize any interrupts, they won't respond to ping, and > they require a powercycle to reboot. We can't easily generate an NMI on > these boxes. > Just on the off chance, does this sound like a familiar -- and > preferably soluble -- problem? Only theory I can think of is that some device driver shuts off all interrupts and then goes into an infinite loop. I have had trouble with at least four different device drivers locking everything out long enough to lose incoming data. Usually it recovers without a reboot. Not an infinte loop, just way way too long to leave interrupts shut off. See also the "How to diagnose system freezes?" thread from July-August. Yuri found that a particular rev of the nvidia driver went into an infinite loop. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: FreeBSD 8.0 suddenly freezing
Hi, On Mon, 17 Sep 2012 15:21:59 -0600 Charles R Martin wrote: > We have a web host installation using FreeBSD 8.0, Tomcat, and Java. > We ship many copies of this, and haven't changed the OS load in > several months. > > In the last 72 hours, we've had two different systems freeze; they > don't apparently recognize any interrupts, they won't respond to > ping, and they require a powercycle to reboot. We can't easily > generate an NMI on these boxes. > > No crashdumps were generated. > > Just on the off chance, does this sound like a familiar -- and > preferably soluble -- problem? > I have had 8.0 to 8.3 running on a machine without a single crash, hang or whatever except of problems with sound. A dmesg might will tell some people of problematic hardware. As you write on top of this that you have shipped many copies of it, are the problems happening with the same or at least identical hardware? Erich ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"