Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-29 Thread Shaw Vrana
Hi Auke, On 4/26/06, Auke Kok <[EMAIL PROTECTED]> wrote: > I'm concerned about the addition of the netif_running check to > e1000_down. While something like this is needed, I'm not familiar > enough w/ the code to know if this is okay. > All explanations and comments are greatly appreciated. W

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-26 Thread Auke Kok
Shaw wrote: On 4/21/06, Andy Gospodarek <[EMAIL PROTECTED]> wrote: On 4/21/06, Michael Chan <[EMAIL PROTECTED]> wrote: On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote: I just hate to see extra resources used to solve problems that good coding can solve (not that my suggestion is nece

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-26 Thread Shaw
On 4/21/06, Andy Gospodarek <[EMAIL PROTECTED]> wrote: > On 4/21/06, Michael Chan <[EMAIL PROTECTED]> wrote: > > On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote: > > > > > I just hate to see extra resources used to solve problems that good > > > coding can solve (not that my suggestion is

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-21 Thread Andy Gospodarek
On 4/21/06, Michael Chan <[EMAIL PROTECTED]> wrote: > On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote: > > > I just hate to see extra resources used to solve problems that good > > coding can solve (not that my suggestion is necessarily a 'good' one), > > so I was trying to think of a way

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-21 Thread Michael Chan
On Fri, 2006-04-21 at 16:01 -0400, Andy Gospodarek wrote: > I just hate to see extra resources used to solve problems that good > coding can solve (not that my suggestion is necessarily a 'good' one), > so I was trying to think of a way to resolve this without explicitly > adding another workqueue

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-21 Thread Andy Gospodarek
On 4/21/06, Michael Chan <[EMAIL PROTECTED]> wrote: > On Fri, 2006-04-21 at 09:27 -0400, Andy Gospodarek wrote: > > > > > Isn't the only possibility for a linkwatch deadlock when the > > __LINK_STATE_LINKWATCH_PENDING but is set in dev->state? > > This device that you're about to close may be on th

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-21 Thread Michael Chan
On Fri, 2006-04-21 at 09:27 -0400, Andy Gospodarek wrote: > > Isn't the only possibility for a linkwatch deadlock when the > __LINK_STATE_LINKWATCH_PENDING but is set in dev->state? This device that you're about to close may be on the linkwatch list, or other devices may also be on the linkwat

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-21 Thread Andy Gospodarek
On Thu, Apr 20, 2006 at 06:24:36PM -0700, Michael Chan wrote: > On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote: > > > One simple solution is to establish a separate queue for RTNL-holding > > users or vice versa for non-RTNL holding networking users. That > > would allow the drivers to safel

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Thu, 2006-04-20 at 19:42 -0700, Shaw Vrana wrote: > I'll bite! Here's a patch to add a call to flush_scheduled_work() in > e1000_down. It's against 2.6.16.9. > You're not following our discussion. It is not safe to call flush_scheduled_work() in a driver's close() because it is holding the

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread shaw
I've replied to this once before, but haven't seen my last two emails on the list, so I'm sending again with different settings. Sorry for the noise. On Thursday 20 April 2006 17:10, Michael Chan wrote: > In tg3_remove_one(), we call flush_scheduled_work() in case the > reset_task is still pendi

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Fri, 2006-04-21 at 12:40 +1000, Herbert Xu wrote: > One simple solution is to establish a separate queue for RTNL-holding > users or vice versa for non-RTNL holding networking users. That > would allow the drivers to safely flush the non-RTNL queue while > holding the RTNL. You mean a separat

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Fri, Apr 21, 2006 at 12:37:36PM +1000, Herbert Xu wrote: > > Rather than dealing with this individually in each driver perhaps we should > come up with a more centralised solution? One simple solution is to establish a separate queue for RTNL-holding users or vice versa for non-RTNL holding ne

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Shaw Vrana
On Thursday 20 April 2006 17:10, Michael Chan wrote: > In tg3_remove_one(), we call flush_scheduled_work() in case the > reset_task is still pending. Here, it is safe to call > flush_scheduled_work() because we're not holding the rtnl. Again, when > it runs, nothing bad will happen because it will

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
Michael Chan <[EMAIL PROTECTED]> wrote: > > In tg3_remove_one(), we call flush_scheduled_work() in case the > reset_task is still pending. Here, it is safe to call Great. > flush_scheduled_work() because we're not holding the rtnl. Again, when Hmm doing a quick grep seems to indicate that quite

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Fri, 2006-04-21 at 11:33 +1000, Herbert Xu wrote: > Actually, what if the tg3_close is followed by a tg3_open? That could > produce a spurious reset which I suppose isn't that bad. Yes, an extra reset. And yes, it isn't too bad. > Also if the > module is unloaded bad things will happen as wel

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Fri, Apr 21, 2006 at 11:27:01AM +1000, herbert wrote: > On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote: > > > > If we're in tg3_close() and the reset task isn't running yet, tg3_close > > () will proceed. However, when the reset task finally runs, it will see > > that netif_running

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Thu, Apr 20, 2006 at 03:36:57PM -0700, Michael Chan wrote: > > If we're in tg3_close() and the reset task isn't running yet, tg3_close > () will proceed. However, when the reset task finally runs, it will see > that netif_running() is zero and will just return. Yes you're absolutely right. Tha

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Michael Chan
On Fri, 2006-04-21 at 09:51 +1000, Herbert Xu wrote: > > Actually TG3 is buggy too. If the reset task is scheduled but > isn't running yet there is no synchronisation here to prevent the > reset task from running after tg3_close releases the tp lock. > If we're in tg3_close() and the reset tas

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Fri, Apr 21, 2006 at 09:36:31AM +1000, herbert wrote: > > Yes that's definitely buggy. There needs to be some form of > synchronisation as the TG3 driver does. However, to be frank > I'm not too fond of what the TG3 driver does either. Is there > no better way than an msleep loop? Actually

Re: e1000_down and tx_timeout worker race cleaning the transmit buffers

2006-04-20 Thread Herbert Xu
On Thu, Apr 20, 2006 at 05:35:00PM +, [EMAIL PROTECTED] wrote: > > If the e1000_tx_timeout_task were running concurrently with e1000_down, it > seems that they could both attempt to kfree_skb concurrently when running > e1000_unmap_and_free_tx_resource. I googled around to find mention of