> -----Original Message----- > From: Luis Claudio R. Goncalves [mailto:[email protected]] > Sent: Thursday, July 11, 2013 11:45 AM > To: [email protected] > Cc: Clark Williams > Subject: Re: [E1000-devel] [RFC] igb: minimize busy loop on > igb_get_hw_semaphore > > Hello, > > A customer noticed a strange issue on his setup, a bonding interface composed > of two igb nics. After several debug sessions we are pretty sure the specific > symptom reported is caused by a busy loop on igb_get_hw_semaphore(). The > problem was reported on a 3.0.25 kernel but the patch below was written on > 3.8.13. > > The complete scenario is described below and there is a great chance that this > issue is only present (or at least more likely to be triggered) on the > PREEMPT_RT > enabled kernels... but I would like to confirm whether this solution is valid > or if > there is a better way to mitigate the problem. > > Thanks, > Luis
Hello Luis, This is a complicated setup and not something we'd be doing much testing on. The semaphore calls are intended to serialize access to certain areas in the hw, usually the PHY. Making the delays pre-emptible does not necessarily accomplish the same thing. Have you tested the proposed patch and does it speed things up enough to find what you need to find? Another thing to try is to reduce the time value rather than change the type of delay being used and see if you can find a way to speed things up that way. Let me know if there is more info I can provide. I can review your full lspci -vvv , ethtool ethX output and your .CONFIG for anything else to check and, of course a full dmesg that shows the problem you are seeing. I'm no bonding expert though, so if the problem is there, I may not have much to offer. Hope this helps. Carolyn Carolyn Wyborny Linux Development Networking Division Intel Corporation > > ---- > > igb: minimize busy loop on igb_get_hw_semaphore > > Bugzilla: 976912 > > In drivers/net/ethernet/intel/igb/e1000_82575.c, funtion > igb_release_swfw_sync_82575() there is this line: > > while (igb_get_hw_semaphore(hw) != 0); > > That is basically a busy loop waiting on a HW semaphore. > > A customer has a setup where two igb NICs are part of a bonding interface. > This customer also has a monitoring script that calls ifconfig often. It was > observed that in this scenario there is a chance that this ifconfig, that > happens > to hold the bond->lock while collecting statistics, enters this busy loop > waiting > for another thread clear that HW semaphore. > > Meanwhile, the irq/xxx-ethY-Tx threads, running at FIFO:85, try to acquire the > bond lock, held by ifconfig. As it happens on RT, a Priority Inheritance > operation > is started and ifconfig is boosted to FIFO:85 so that it may be able to > finish its > work sooner and release the bond->lock, desired by the aforementioned > threads. > > As ifconfig is running on a busy loop, waiting for the HW semaphore, this > thread > now runs a busy loop at a very high priority, preventing other threads on that > CPU from progressing. > > On that scenario, it seems that the thread holding the HW semaphore is also > waiting for a lock held by other task. This whole scenario leads to RCU stall > warnings, that have as side effects a crescent number of threads being stuck. > As this progresses, the livelock reaches threads on other CPUs and the system > becomes more and more unresponsive. > > This little patch aims to prevent the busy loop at a high priority (the code > called > by ifconfig in this example) to starve the threads on the same CPU. It may not > solve the issue but will at least lead us closer to the real issue, masked by > the > RCU stalls created by the busy loop. > > This is mostly a debug patch for a testing kernel. > > Signed-off-by: Luis Claudio R. Goncalves <[email protected]> > > diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c > b/drivers/net/ethernet/intel/igb/e1000_mac.c > index a5c7200..ec0be87 100644 > --- a/drivers/net/ethernet/intel/igb/e1000_mac.c > +++ b/drivers/net/ethernet/intel/igb/e1000_mac.c > @@ -1225,7 +1225,7 @@ s32 igb_get_hw_semaphore(struct e1000_hw *hw) > if (!(swsm & E1000_SWSM_SMBI)) > break; > > - udelay(50); > + usleep_range(50,51); > i++; > } > > @@ -1244,7 +1244,7 @@ s32 igb_get_hw_semaphore(struct e1000_hw *hw) > if (rd32(E1000_SWSM) & E1000_SWSM_SWESMBI) > break; > > - udelay(50); > + usleep_range(50,51); > } > > if (i == timeout) { > -- > [ Luis Claudio R. Goncalves Bass - Gospel - RT ] > [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9 2696 7203 D980 A448 C8F8 ] > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics Get end-to- > end visibility with application monitoring from AppDynamics Isolate > bottlenecks > and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > E1000-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit > http://communities.intel.com/community/wired ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
