On 10/29/12, Attilio Rao <atti...@freebsd.org> wrote: > On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao <atti...@freebsd.org> wrote: >> On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon <a...@freebsd.org> wrote: >>> on 20/09/2012 16:14 Attilio Rao said the following: >>>> On 9/20/12, Andriy Gapon <a...@freebsd.org> wrote: >>> [snip] >>>>> The patch works well as far as I can tell. Thank you! >>>>> There is one warning with full witness enables but it appears to be >>>>> harmless >>>>> (so >>>>> far): >>>> >>>> Andriy, >>>> thanks a lot for your testing and reports you made so far. >>>> Unfortunately I'm going off for 2 weeks now and I won't work on >>>> FreeBSD for that timeframe. I will get back to those in 2 weeks then. >>>> If you want to play more with this idea feel free to extend/fix/etc. >>>> this patch. >>> >>> Unfortunately I haven't found time to work on this further, but I have >>> some >>> additional thoughts. >>> >>> First, I think that the witness message was not benign and it actually >>> warns about >>> the same kind of deadlock that I originally had. >>> The problem is that sched_lend_prio() is called with target thread's >>> td_lock held, >>> which is a lock of tdq on the thread's CPU. Then, in your patch, we >>> acquire >>> current tdq's lock to modify its load. But if two tdq locks are to be >>> held at the >>> same time, then they must be locked using tdq_lock_pair, so that lock >>> order is >>> maintained. With the patch no tdq lock order can be maintained (can >>> arbitrary) >>> and thus it is possible to run into a deadlock. >> >> Indeed. I realized this after thinking about this problem while I was >> on holiday. >> >>> >>> I see two possibilities here, but don't rule out that there can be more. >>> >>> 1. Do something like my patch does. That is, manipulate current thread's >>> tdq in >>> advance before any other thread/tdq locks are acquired (or td_lock is >>> changed to >>> point to a different lock and current tdq is unlocked). The API can be >>> made more >>> generic in nature. E.g. it can look like this: >>> void >>> sched_thread_block(struct thread *td, int inhibitor) >>> { >>> struct tdq *tdq; >>> >>> THREAD_LOCK_ASSERT(td, MA_OWNED); >>> KASSERT(td == curthread, >>> ("sched_thread_block: only curthread is supported")); >>> tdq = TDQ_SELF(); >>> TDQ_LOCK_ASSERT(tdq, MA_OWNED); >>> MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); >>> TD_SET_INHIB(td, inhibitor); >>> tdq_load_rem(tdq, td); >>> tdq_setlowpri(tdq, td); >>> } >>> >>> >>> 2. Try to do things from sched_lend_prio based on curthread's state. >>> This, as it >>> seems, would require completely lock-less manipulations of current tdq. >>> E.g. for >>> tdq_load we could use atomic operations (it is already accessed >>> locklessly, but >>> not modified). But for tdq_lowpri a more elaborate trick would be >>> required like >>> having a separate field for a temporary value. >> >> No, this is not going to work because tdq_lowpri and tdq_load are >> updated and used in the same critical path (and possibly also other >> tdq* members in tdq_runqueue_add(), for example, but I didn't bother >> to also check that). > > Ok, so here I wasn't completely right because td_lowpri and tdq_load > are not read in the same critical path (cpu_search() and > sched_pickcpu() in the self case). > However, I was over-estimating a factor in this patch: I don't think > we need to fix real tdq_load for self in that case before cpu_search() > because self is handled in a very special way, with its own > comparison. I then think that a patch local to sched_pickcpu() on the > self path should be enough. > > This brings us to another bigger issue, however. tdq_lowpri handling > is not perfect in that patch. Think about the case where the thread to > be switched out is, infact, the lowest priority one. In that case, > what happens is that what we should do is recalculating tdq_lowpri > which is usually a very expensive operation and requires TDQ self > locking to be in place to be brought on correctly.
I think that the easy fix for this is to implement a second field called tdq_sec_lowpri where we store the next thread to be scheduled, after tdq_lowpri. This is going to not really change anything because we will get it for free when we already calculate tdq_lowpri and it will let us the implement a definitive logic that helps in the case you describe. However I still want to think about the base idea behind the current algorithm in the self/likely cpu pickup. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"