Re: watchdog timeout panic in e1000 driver

Kenzo Iwami Wed, 01 Nov 2006 05:23:47 -0800

Hi,

>>> Even if the total lock time can be reduced, it's possible that interrupt
>>> handler is executed while the interrupted code is still holding the 
>>> semaphore.
>>> I think your method only decrease the frequency of this problem.
>>> Why does reducing the lock time solve this problem?
>> there are several problems here that need addressing. It's not acceptable 
>> for our driver to wait up to 15 seconds, and we can (presumably) reduce it 
>> to milliseconds, so that would help a lot. We should in no case at all hold 
>> it for any period longer than (give or take) half a second, so working 
>> towards that is a very good step in the right direction.
>>
>> Adding the timer task back may also help, as we are no longer trying to 
>> aqcuire the sw_fw_semaphore in interrupt context, but we removed it for a 
>> reason, and I need to dig up what reason this exactly was before we can 
>> revert it. Jesse might know, so I'll talk to him. But this will not fix the 
>> fact that the semaphore is held for a long time :)
> 
> Timer tasks that reschedule themselves are a pain.  The watchdog timer task
> had a couple of race conditions that were thought to be better fixed by
> removing it all together.  Please, let's not go down that road again!


I understand that the watchdog_task could cause a race when the timer task
and e1000_down runs concurrently, resulting in memory double free.

I think this problem occurs because interrupt handler is executed in same
CPU as process that acquires semaphore.
How about disabling interrupt while the process is holding the semaphore?
I think this is possible, if the total lock time has been reduced.

-- 
  Kenzi Iwami ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: watchdog timeout panic in e1000 driver

Reply via email to