On Thu, Jul 19, 2018 at 02:14:56PM +0200, Markus Armbruster wrote: > Peter Xu <pet...@redhat.com> writes: > > > On Thu, Jul 19, 2018 at 11:05:34AM +0200, Markus Armbruster wrote: > >> Peter Xu <pet...@redhat.com> writes: > >> > >> > On Thu, Jul 19, 2018 at 09:20:34AM +0200, Markus Armbruster wrote: > >> >> Peter Xu <pet...@redhat.com> writes: > >> >> > >> >> > On Wed, Jul 18, 2018 at 05:38:11PM +0200, Markus Armbruster wrote: > >> >> >> Peter Xu <pet...@redhat.com> writes: > >> >> >> > >> >> >> > After the Out-Of-Band work, the monitor iothread may be accessing > >> >> >> > the > >> >> >> > cur_mon as well (via monitor_qmp_dispatch_one()). > >> > >> Since renamed to monitor_qmp_dispatch(). > > > > True. > > > >> > >> Further down, we concluded that cur_mon isn't actually used from the I/O > >> thread, didn't we? > > > > I think so; if not we should either fix it or apply this patch. :) > > > >> > >> >> >> > Let's convert > >> >> >> > the > >> >> >> > cur_mon variable to be a per-thread variable to make sure there > >> >> >> > won't be > >> >> >> > a race between threads when accessing the variable. > >> >> >> > >> >> >> Hmm... why hasn't the OOB work created such a race already? > >> >> >> > >> >> >> A monitor reads, parses, dispatches and executes commands, formats > >> >> >> and > >> >> >> sends replies. > >> >> >> > >> >> >> Before OOB, all of that ran in the main thread. Any access of > >> >> >> cur_mon > >> >> >> should therefore be from the main thread. No races. > >> >> >> > >> >> >> OOB moves read, parse, format and send to an I/O thread. Dispatch > >> >> >> and > >> >> >> execute remain in the main thread. *Except* for commands executed > >> >> >> OOB, > >> >> >> dispatch and execute move to the I/O thread, too. > >> >> >> > >> >> >> Why is this not racy? I guess it relies on careful non-use of > >> >> >> cur_mon > >> >> >> in any part that may now execute in the I/O thread. Scary... > >> >> > > >> >> > I think it's because cur_mon is not really used in out-of-band command > >> >> > executions - now we only have a few out-of-band enabled commands, and > >> >> > IIUC none of them is using cur_mon (for example, in > >> >> > qmp_migrate_recover() we don't even call error_report, and the code > >> >> > path is quite straight forward to make sure of that). So IIUC cur_mon > >> >> > variable is still only touched by main thread for now hence we should > >> >> > be safe. However that condition might change in the future when we > >> >> > add more out-of-band capable commands. > >> >> > > >> >> > (not to mention that I don't even know whether there are real users of > >> >> > out-of-band if we haven't yet started to support that for libvirt...) > >> >> > >> >> It's not just the actual OOB commands (there are just two), it's also > >> >> the monitor code to read, parse, format and send. > >> > > >> > My understanding is that read, parse, format, send will not touch > >> > cur_mon (it was touched before but some patches in the out-of-band > >> > series should have removed the last users when parsing). So IIUC only > >> > the dispatcher would touch that now. I didn't consider the callers > >> > like net_init_socket() and I'm only considering the monitor code (and > >> > those callers should be only in the main thread too after all). > >> > >> There *is* cur_mon use outside dispatch & execute, e.g. > >> > >> void error_vprintf(const char *fmt, va_list ap) > >> { > >> if (cur_mon && !monitor_cur_is_qmp()) { > >> monitor_vprintf(cur_mon, fmt, ap); > >> } else { > >> vfprintf(stderr, fmt, ap); > >> } > >> } > >> > >> Obviously unsafe to use outside the main thread. Consider: > >> > >> bool monitor_cur_is_qmp(void) > >> { > >> return cur_mon && monitor_is_qmp(cur_mon); > >> } > >> > >> static inline bool monitor_is_qmp(const Monitor *mon) > >> { > >> return (mon->flags & MONITOR_USE_CONTROL); > >> } > >> > >> If monitor_cur_is_qmp() reads cur_mon twice (which it is entitled to > >> do), this crashes when the main thread sets cur_mon back to null in > >> between. > > > > Yes, but I thought we should not even use these error_vprintf() or > > sister functions outside the QMP handlers, or at least that's what I > > thought. For example, in parsers, we should always use error_setg() > > or something similar but never error_report(). > > error_report() & friends are for general use, not just for monitor > handlers. They are for reporting errors to a human user. error_setg() > is for passing errors to code.
Ah yes, error_report() will dump to stderr if without @cur_mon, which I obviously forgot. > > >> Did the OOB work make things any worse? Let's see. > >> > >> @cur_mon is null unless the main thread is running monitor code, either > >> HMP within monitor_read(): > >> > >> cur_mon = opaque; > >> > >> if (cur_mon->rs) { > >> for (i = 0; i < size; i++) > >> readline_handle_byte(cur_mon->rs, buf[i]); > >> } else { > >> if (size == 0 || buf[size - 1] != 0) > >> monitor_printf(cur_mon, "corrupted command\n"); > >> else > >> handle_hmp_command(cur_mon, (char *)buf); > >> } > >> > >> cur_mon = old_mon; > >> > >> or QMP within monitor_qmp_dispatch(): > >> > >> old_mon = cur_mon; > >> cur_mon = mon; > >> > >> rsp = qmp_dispatch(mon->qmp.commands, req, qmp_oob_enabled(mon)); > >> > >> cur_mon = old_mon; > >> > >> In both cases, old_mon is always null. > >> > >> Fine print: before commit 227a07552f3 "monitor: move the cur_mon hack > >> deeper for QMP", we ran more code for QMP with cur_mon set, namely the > >> JSON parser, but that doesn't matter here. > >> > >> More fine print: there's also qmp_human_monitor_command(), which stacks > >> an HMP monitor on top of the QMP monitor. Also doesn't matter here. > >> > >> The OOB work doesn't add any new races as long as > >> > >> * it doesn't add assignments to @cur_mon, and > >> > >> * none of the code it moves out of the main thread accesses @cur_mon. > >> > >> The first condition obviously holds. The second one isn't obvious, but > >> I figure it holds, too. > >> > >> Okay, I think I've convince myself the OOB work didn't add > >> cur_mon-related races. > > > > Hopefully, yes. Thanks for the double check. > > Wait! I think I found one. > > Say we have an HMP monitor running in the main thread (they always do), > and a QMP monitor running in @mon_iothread. > > Now both threads write to @cur_mon, without synchronization! The main > thread writes in monitor_read() when reading HMP input, and in > monitor_qmp_dispatch() when dispatching in-band QMP commands. > @mon_iothread writes in monitor_qmp_dispatch() when dispatching > out-of-band QMP commands. > > Example: > > in main thread: > > chardev reads input, calls > monitor_read() > old_mon = cur_mon; // null > cur_mon = opaque; // HMP monitor > ... > > in mon_iothread: > > chardev reads input, calls > monitor_qmp_read(), calls > handle_qmp_command() via JSON parser > calls monitor_qmp_dispatch() to execute OOB command > old_mon = cur_mon; // HMP monitor! > cur_mon = mon; // QMP monitor > ... > > in main thread > ... HMP code runs with @cur_mon pointing to QMP monitor > cur_mon = old_mon; // null > > in mon_iothread > old_mon = cur_mon; // HMP monitor! > > all threads > non-monitor code runs with @cur_mon pointing to HMP monitor Hmm seems so, then we might need this patch even eagerer... > > >> >> >> Should this go into 3.0 to reduce the risk of bugs? > >> >> > > >> >> > Yes I think it would be good to have that even for 3.0, since it still > >> >> > can be seen as a bug fix of existing code. > >> >> > >> >> Agreed. > >> >> > >> >> > Regards, > >> >> > > >> >> >> > Note that thread variables are not initialized to a valid value > >> >> >> > when new > >> >> >> > thread is created. > >> >> > >> >> Confusing. It sounds like @cur_mon's initial value would be > >> >> indeterminate, like an automatic variable's. Not true. Variables with > >> >> thread storage duration are initialized when the thread is created. > >> >> Since @cur_mon's declaration lacks an initializer, it'll be initialized > >> >> to a null pointer. Your sentence is correct when you consider that null > >> >> pointer not a valid value. > >> > > >> > Yes that's what I meant. So how about this? > >> > > >> > Note that the per-thread @cur_mon variable is not initialized to > >> > point to a valid Monitor struct when a new thread is created (the > >> > default value will be NULL). > >> > > >> > Please feel free to tune it up. > >> > >> I think what the patch really changes is the value of @cur_mon outside > >> the main thread: it remains null there now. Before, it depended on what > >> the main thread was doing, and therefore could not be used safely. > >> > >> In other words, the patch makes uses of @cur_mon like the one in > >> error_vprintf() shown above safe. > >> > >> I think that's what we should explain in the commit message. I can try > >> rewriting it, > > > > I'll appreciate that if so. > > Here's my try: > > monitor: Fix unsafe sharing of @cur_mon among threads > > @cur_mon is null unless the main thread is running monitor code, either > HMP code within monitor_read(), or QMP code within > monitor_qmp_dispatch(). > > Use of @cur_mon outside the main thread is therefore unsafe. > > Most of its uses are in monitor command handlers. These run in the main > thread. > > However, there are also uses hiding elsewhere, such as in > error_vprintf(), and thus error_report(), making these functions unsafe > outside the main thread. No such unsafe uses are known at this time. > Regardless, this is an unnecessary trap. It's an ancient trap, though. > > More recently, commit cf869d53172 "qmp: support out-of-band (oob) > execution" spiced things up: the monitor I/O thread assigns to @cur_mon > when executing commands out-of-band. Having two threads save, set and > restore @cur_mon without synchronization is definitely unsafe. We can > end up with @cur_mon null while the main thread runs monitor code, or > non-null while it runs non-monitor code. > > We could fix this by making the I/O thread not mess with @cur_mon, but > that would leave the trap armed and ready. > > Instead, make @cur_mon thread-local. It's now reliably null unless the > thread is running monitor code. Looks good to me. > > >> but right now I got to run. > > > > Must be lunch time! :) > > Time to prepare lunch for the family, actually :) Yeah I even knew that you cooked! :) Regards, -- Peter Xu