On Wed, 19 Aug 2015 17:38:30 +0200 Jan Kara <j...@suse.com> wrote: > From: Jan Kara <j...@suse.cz> > > When there are lots of messages accumulated in printk buffer, printing > them (especially over serial console) can take a long time (tens of > seconds). stop_machine() will effectively make all cpus spin in > multi_cpu_stop() waiting for the CPU doing printing to print all the > messages which triggers NMI softlockup watchdog and RCU stall detector > which add even more to the messages to print. Since machine doesn't do > anything (except serving interrupts) during this time, also network > connections are dropped and other disturbances may happen. > > Paper over the problem by waiting for printk buffer to be empty before > starting to stop CPUs. In theory a burst of new messages can be appended > to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100% > solution but it works OK in practice and I'm not aware of a reasonably > simple better solution.
Confused. Why don't patches 1 and 2 already fix this problem? > > ... > > @@ -2489,6 +2489,28 @@ struct tty_driver *console_device(int *index) > } > > /* > + * Wait until all messages accumulated in the printk buffer are printed to > + * console. Note that as soon as this function returns, new messages may be > + * added to the printk buffer by other CPUs. > + */ > +void console_flush(void) This doesn't seem a very good name. We already have console_cont_flush(), cont_flush(), etc. Can we think of something more specific? printk_log_buf_drain() perhaps. > +{ > + bool retry; > + unsigned long flags; > + > + while (1) { > + raw_spin_lock_irqsave(&logbuf_lock, flags); > + retry = console_seq != log_next_seq; > + raw_spin_unlock_irqrestore(&logbuf_lock, flags); Does this lock/unlock do anything useful? > + if (!retry || console_suspended) > + break; > + /* Cycle console_sem to wait for outstanding printing */ > + console_lock(); > + console_unlock(); > + } > +} > + > +/* > * Prevent further output on the passed console device so that (for example) > * serial drivers can disable console output before suspending a port, and > can > * re-enable output afterwards. > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index fd643d8c4b42..016d34621d2e 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -21,6 +21,7 @@ > #include <linux/smpboot.h> > #include <linux/atomic.h> > #include <linux/lglock.h> > +#include <linux/console.h> > > /* > * Structure to determine completion condition and record errors. May > @@ -543,6 +544,14 @@ int __stop_machine(int (*fn)(void *), void *data, const > struct cpumask *cpus) > return ret; > } > > + /* > + * If there are lots of outstanding messages, printing them can take a > + * long time and all cpus would be spinning waiting for the printing to > + * finish thus triggering NMI watchdog, RCU lockups etc. Wait for the > + * printing here to avoid these. > + */ > + console_flush(); This is pretty pointless if num_possible_cpus==1. I'd suggest setting printk_offload_chars=0 in this case, add some early bale-out into console_flush(). Or something along those lines. And make console_flush() go away altogether if CONFIG_SMP=n - it's pointless bloat. > /* Set the initial state and stop all online cpus. */ > set_state(&msdata, MULTI_STOP_PREPARE); > return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/