Hi, >>> Even if the total lock time can be reduced, it's possible that interrupt >>> handler is executed while the interrupted code is still holding the >>> semaphore. >>> I think your method only decrease the frequency of this problem. >>> Why does reducing the lock time solve this problem? >> there are several problems here that need addressing. It's not acceptable >> for our driver to wait up to 15 seconds, and we can (presumably) reduce it >> to milliseconds, so that would help a lot. We should in no case at all hold >> it for any period longer than (give or take) half a second, so working >> towards that is a very good step in the right direction. >> >> Adding the timer task back may also help, as we are no longer trying to >> aqcuire the sw_fw_semaphore in interrupt context, but we removed it for a >> reason, and I need to dig up what reason this exactly was before we can >> revert it. Jesse might know, so I'll talk to him. But this will not fix the >> fact that the semaphore is held for a long time :) > > Timer tasks that reschedule themselves are a pain. The watchdog timer task > had a couple of race conditions that were thought to be better fixed by > removing it all together. Please, let's not go down that road again!
I understand that the watchdog_task could cause a race when the timer task and e1000_down runs concurrently, resulting in memory double free. I think this problem occurs because interrupt handler is executed in same CPU as process that acquires semaphore. How about disabling interrupt while the process is holding the semaphore? I think this is possible, if the total lock time has been reduced. -- Kenzi Iwami ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html