Hi, On Tue, Jun 30, 2020 at 8:05 AM Daniel Thompson <daniel.thomp...@linaro.org> wrote: > > On Mon, Jun 29, 2020 at 02:03:52PM -0700, Doug Anderson wrote: > > Hi, > > > > On Mon, Jun 29, 2020 at 10:15 AM Daniel Thompson > > <daniel.thomp...@linaro.org> wrote: > > > > > > Currently kgdb_register_callbacks() and kgdb_unregister_callbacks() > > > are called outside the scope of the kgdb_registration_lock. This > > > allows them to race with each other. This could do all sorts of crazy > > > things up to and including dbg_io_ops becoming NULL partway through the > > > execution of the kgdb trap handler (which isn't allowed and would be > > > fatal). > > > > > > Fix this by bringing the trap handler setup and teardown into the scope > > > of the registration lock. > > > > > > Signed-off-by: Daniel Thompson <daniel.thomp...@linaro.org> > > > --- > > > kernel/debug/debug_core.c | 8 +++++--- > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c > > > index 9e5934780f41..9799f2c6dc94 100644 > > > --- a/kernel/debug/debug_core.c > > > +++ b/kernel/debug/debug_core.c > > > @@ -1117,9 +1117,8 @@ int kgdb_register_io_module(struct kgdb_io > > > *new_dbg_io_ops) > > > > > > dbg_io_ops = new_dbg_io_ops; > > > > > > - spin_unlock(&kgdb_registration_lock); > > > - > > > if (old_dbg_io_ops) { > > > + spin_unlock(&kgdb_registration_lock); > > > old_dbg_io_ops->deinit(); > > > return 0; > > > } > > > @@ -1129,6 +1128,8 @@ int kgdb_register_io_module(struct kgdb_io > > > *new_dbg_io_ops) > > > /* Arm KGDB now. */ > > > kgdb_register_callbacks(); > > > > > > + spin_unlock(&kgdb_registration_lock); > > > > From looking at code paths, I think this is illegal, isn't it? You're > > now calling kgdb_register_callbacks() while holding a spinlock, but: > > > > kgdb_register_callbacks() > > -> register_console() > > --> console_lock() > > ---> might_sleep() > > ----> <boom!> > > Thanks. > > I very nearly didn't press "Send" yesterday because I was worried I was > rushing it too much (in order to avoid forgetting it ;-) ). Should have > listened to myself! > > > > I'm a little curious about the exact race we're trying to solve. > > Calling unregister on an IO module before register even finished seems > > like an error on the caller, so I guess it would be calling register > > from a 2nd thread for a different IO module while the first thread was > > partway through unregistering? Even that seems awfully sketchy since > > you're risking registering a 2nd IO ops while the first is still there > > and that's illegal enough that we do a pr_err() for it (though we > > don't crash), but let's say we're trying to solve that one. > > I didn't follow all the possible paths. Utlimately the > (un)register_callbacks() functions use a flag variable without a lock > and that can interact in lots of different ways. > > To be honest none are especially likely because the normal case is to > register once during boot and never unregister. However we can trigger > register/unregister from userspace so I think they can happen > in parallel.
This is for kgdboc or one of the other IO modules? I do know that, at least for kgdboc, we have the "config_mutex". I won't promise that there are no bugs there but in the very least it should mostly prevent a host of these types of issues. ...so I guess you'd have to in parallel be spamming a register of a non kgdboc IO module together with an unregister of kgdboc? > Double unregister can lead to some especially nasty schedules... > although they still remain pretty unlikely since we need the double > unregister to coincide with a breakpoint: > > > kgdb_unregister_callbacks() kgdb_unregister_callbacks() > . . > test flag . > set flag to 0 . > . test flag > . spin_lock() > *** kgdb trap *** . > . paranoid dbg_io_ops check . > . dbg_io_ops = NULL > . stop other CPUs > . try to use NULL dbg_io_ops > > > I have drawn the kgdb trap in the first column because otherwise things > get too wide but the trap could trigger on any CPU in the system and > provoke the problem. > > > > > > Looking at it closely, I _think_ the only race in this case is if the > > one we're trying to unregister had a deinit() function and we going to > > replace it? If it didn't have a deinit function: > > > > cpu1 (unregister) cpu2 (register): > > ----------------- ---------------------- > > kgdb_unregister_callbacks() > > spin_lock() <got> > > spin_lock() <blocked> > > if (old_dbg_io_ops) <true> > > if (has dinit) <false> > > print error > > spin_unlock() > > return -EBUSY > > <finish unregister> > > > > The above is fine and is the same thing that would happen if the > > whole register function ran before the unregister even started, right? > > > > Also: if the unregister won the race that should also be fine. > > > > So really the problem is this: > > > > cpu1 (unregister) cpu2 (register): > > ----------------- ---------------------- > > kgdb_unregister_callbacks() > > spin_lock() <got> > > spin_lock() <blocked> > > if (old_dbg_io_ops) <true> > > if (has dinit) <true> > > print Replacing > > init new IO ops > > spin_unlock() > > if (old_dbg_io_ops) <true> > > finish deinit of old > > return true > > WARN_ON() <hits and shouts!> > > dbg_io_ops = NULL > > spin_unlock() > > if (deinit) <true> > > double-call to deinit of old > > > > So in this case we'll hit a WARN_ON(), incorrectly unregister the new > > IO ops, and call deinit twice. > > To be honest I was simply working on "it is racy" and "there's not a > good reason to allow that", especially as we start to develop tools to > bring races to the surfaces someone will yell at us about it sooner or > later ;-). > > Of course, implementing it correctly would have been better... Yeah, still wouldn't hurt to try to figure out how to make it cleaner. :-) -Doug