Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access

Subhra Mazumdar Tue, 19 Mar 2019 19:32:45 -0700


On 3/18/19 8:41 AM, Julien Desfossez wrote:

The case where we try to acquire the lock on 2 runqueues belonging to 2
different cores requires the rq_lockp wrapper as well otherwise we
frequently deadlock in there.

This fixes the crash reported in
[email protected]

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 76fee56..71bb71f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2078,7 +2078,7 @@ static inline void double_rq_lock(struct rq *rq1, struct 
rq *rq2)
                raw_spin_lock(rq_lockp(rq1));
                __acquire(rq2->lock);        /* Fake it out ;) */
        } else {
-               if (rq1 < rq2) {
+               if (rq_lockp(rq1) < rq_lockp(rq2)) {
                        raw_spin_lock(rq_lockp(rq1));
                        raw_spin_lock_nested(rq_lockp(rq2), 
SINGLE_DEPTH_NESTING);
                } else {

With this fix and my previous NULL pointer fix my stress tests aresurviving. I

re-ran my 2 DB instance setup on 44 core 2 socket system by putting each DB

instance in separate core scheduling group. The numbers look much worsenow.


users  baseline  %stdev  %idle  core_sched  %stdev %idle
16     1         0.3     66     -73.4%      136.8 82
24     1         1.6     54     -95.8%      133.2 81
32     1         1.5     42     -97.5%      124.3 89

I also notice that if I enable a bunch of debug configs related tomutexes, spinlocks, lockdep etc. (which I did earlier to debug the dead lock), itopens up a

can of worms with multiple crashes.

Re: [RFC][PATCH 03/16] sched: Wrap rq::lock access

Reply via email to