On 8/28/19 6:32 PM, Corey Minyard wrote: > On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote: >> From: Jes Sorensen <jsoren...@fb.com> >> >> I came across this in 4.16, but I believe the bug is still present >> in current 5.x, even if it is less likely to trigger. >> >> Basially stop_timer_and_thread() only calls del_timer_sync() if >> timer_running == true. However smi_mod_timer enables the timer before >> setting timer_running = true. > > All the modifications/checks for timer_running should be done under > the si_lock. It looks like a lock is missing in shutdown_smi(), > probably starting before setting interrupt_disabled to true and > after stop_timer_and_thread. I think that is the right fix for > this problem.
Hi Corey, I agree a spin lock could deal with this specific issue too, but calling del_timer_sync() is safe to call on an already disabled timer. The whole flagging of timer_running really doesn't make much sense in the first place either. As for interrupt_disabled that is even worse. There's multiple places in the code where interrupt_disabled is checked, some of them are not protected by a spin lock, including shutdown_smi() where you have this sequence: while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){ poll(smi_info); schedule_timeout_uninterruptible(1); } if (smi_info->handlers) disable_si_irq(smi_info); while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){ poll(smi_info); schedule_timeout_uninterruptible(1); } In this case you'll have to drop and retake the long several times. You also have this call sequence which leads to disable_si_irq() which checks interrupt_disabled: flush_messages() smi_event_handler() handle_transaction_done() handle_flags() alloc_msg_handle_irq() disable_si_irq() {disable,enable}_si_irq() themselves are racy: static inline bool disable_si_irq(struct smi_info *smi_info) { if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) { smi_info->interrupt_disabled = true; Basically interrupt_disabled need to be atomic here to have any value, unless you ensure to have a spin lock around every access to it. Cheers, Jes