Tejun, On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen <adrian.w...@gmail.com> wrote: > Hi Tejun, > > On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo <t...@kernel.org> wrote: >> Hello, Lei. >> >> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote: >>> We find a race condition as below: >>> CPU0 CPU1 >>> timer interrupt happen >>> __run_timers >>> __run_timers::spin_lock_irq(&base->lock) >>> __run_timers::spin_unlock_irq(&base->lock) >>> >>> __cancel_work_timer >>> >>> __cancel_work_timer::del_timer >>> >>> __cancel_work_timer::wait_on_work >>> >>> __cancel_work_timer::clear_work_data >>> __run_timers::call_timer_fn(timer, fn, data); >>> delayed_work_timer_fn::get_work_cwq >>> __run_timers::spin_lock_irq(&base->lock) >>> >>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__ >>> cpu0 is ready to >>> run the timer callback, which is delayed_work_timer_fn in our case. >> >> If del_timer() happens after the timer starts running, del_timer() >> would return NULL and try_to_grab_pending() will be called which will >> return >=0 iff if successfully steals the PENDING bit (ie. it's the >> sole owner of the work item). If del_timer() happens before the timer >> starts running, the timer function would never run. > > If del_timer() happen before __run_timers() is called, while timer irq > already happen, > would it return 1 for the timer is still not detached in __run_timers()? > If it is possible, then we would call try_to_grab_pending(), so that > work->data would > be cleared in this way. > >> >> clear_work_data() happens iff the work item is confirmed to be idle. >> At this point, I'm pretty skeptical this is a bug in workqueue itself >> and strongly suggest looking at the crashing workqueue user. > > Also I am not very familiar with workqueue mechanism, how many place > in kernel would > clear the work->data beside the clear_work_data()? > > From the memory, I cannot find any hint for work structure being destroyed. > So the only possible seems to me is the work->data be set by someone on > purpose. > > crash> struct delayed_work 0xbf03d544 -x > struct delayed_work { > work = { > data = { > counter = 0x300 > }, > entry = { > next = 0xbf03d548, > prev = 0xbf03d548 > }, > func = 0xbf014b00 > }, > timer = { > entry = { > next = 0x0, > prev = 0x200200 > }, > expires = 0x12b638b, > base = 0xc0844e01, > function = 0xc014c7a0 <delayed_work_timer_fn>, > data = 0xbf03d544, > slack = 0xffffffff, > start_pid = 0xffffffff, > start_site = 0x0, > start_comm = > "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" > } > }
I captured a trace log, which shows my previous suspicion is true: __cancel_work_timer is called before delayed_work_timer_fn, so that work->data is cleared. And when __cancel_work_timer is called, the timer is still pending, so del_timer would return 1, thus no try_to_grab_pending would be called. But it is very strange that in __run_timers, it still get the same timer. Then its callback, delayed_work_timer_fn, would be called, which cause the issue. The detach_timer in __cancel_work_timer should already move the timer from all list, am I right? Could it happen for the timer_list be queued twice, like queue over two cpu? If not, how could it happen? Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/