On Fri, Jul 12, 2013 at 06:35:04PM +0000, Ronciak, John wrote:
| The request for stats should be happening only once every 2 seconds.  Do
| you have a script pounding on getting stats repeatedly?  Are you sure that
| it's the request for stats that is causing the issue you are seeing or are
| you guessing that this is the case?  Can you make this happen without
| bonding being involved (i.e. using multiple interfaces)?

The scenario in case has two interfaces with bonding. One composed of two
netxen NICs and the other one, the primary one, composed of two igb NICs.
There is a script (they call it a heartbeat) that verifies the health of
that bonding interface. If that interface fails, the connection fails over
to the other bonding interface.

The issue, as we could gather the scenario squinting through a vmcore, is
more likely to hapen with bonding involved, indeed. It also happens that
the threaded IRQs we use on RT, along with the priority inheritance support
helps a lot to trigger the issue. But please keep in mind that threaded
IRQs are already available upstream and there is no shortage of users for
bonding interfaces.

I have been trying to reproduce the issue in simpler scenarios but, so far,
no luck.

| Also, we see that at least some of you work for RH.  If this is on RHEL
| this is the incorrect forum for this and it should be handled through
| bugzillas and the weekly engineering call with Peter Martuccelli.   If it's
| for Fedora what's the RT stuff being used for there?  Please explain.

It is not a bug observed in RHEL. In fact, a similar email has been sent
to @linux-rt-users, as the code at the center of our analysis is present on
3.8.13-rt13 (the latest official Preempt RT patch released).

There is a missing piece in the scenario: who is holding the HW semaphore
for a long period? That could be a thread that gets preempted by higher
priority threads or even the HW itself... but the lack of this detail
prevents us from consistently reproducing the issue.

The patch sent on the first message was an attempt at minimizing the issue
that, as of now, may be unique to the Real Time kernel. I have no problems
in enclosing the suggested patch in #ifdef / #else to make it only present
on Real Time kernels... but I sent the original email seeking for advice on
whether that patch was seen as hurtful for the general case or not.

The whole idea was to avoid busy waiting (at a high priority) if someone
else has the HW semaphore.

Thanks again,
Luis

| Cheers,
| John
| 
| 
| > -----Original Message-----
| > From: Luis Claudio R. Goncalves [mailto:[email protected]]
| > Sent: Friday, July 12, 2013 11:08 AM
| > To: Wyborny, Carolyn
| > Cc: [email protected]; Clark Williams
| > Subject: Re: [E1000-devel] [RFC] igb: minimize busy loop on
| > igb_get_hw_semaphore
| > 
| > On Thu, Jul 11, 2013 at 10:46:31PM +0000, Wyborny, Carolyn wrote:
| > | > -----Original Message-----
| > | > From: Luis Claudio R. Goncalves [mailto:[email protected]]
| > | > Sent: Thursday, July 11, 2013 11:45 AM
| > | > To: [email protected]
| > | > Cc: Clark Williams
| > | > Subject: Re: [E1000-devel] [RFC] igb: minimize busy loop on
| > | > igb_get_hw_semaphore
| > | >
| > | > Hello,
| > | >
| > | > A customer noticed a strange issue on his setup, a bonding
| > interface
| > | > composed of two igb nics. After several debug sessions we are
| > pretty
| > | > sure the specific symptom reported is caused by a busy loop on
| > | > igb_get_hw_semaphore(). The problem was reported on a 3.0.25 kernel
| > | > but the patch below was written on 3.8.13.
| > | >
| > | > The complete scenario is described below and there is a great
| > chance
| > | > that this issue is only present (or at least more likely to be
| > | > triggered) on the PREEMPT_RT enabled kernels... but I would like to
| > | > confirm whether this solution is valid or if there is a better way
| > to mitigate the problem.
| > | >
| > | > Thanks,
| > | > Luis
| > |
| > | Hello Luis,
| > |
| > | This is a complicated setup and not something we'd be doing much
| > | testing on.  The semaphore calls are intended to serialize access to
| > | certain areas in the hw, usually the PHY.  Making the delays
| > | pre-emptible does not necessarily accomplish the same thing.
| > |
| > | Have you tested the proposed patch and does it speed things up enough
| > to
| > | find what you need to find?   Another thing to try is to reduce the
| > time
| > | value rather than change the type of delay being used and see if you
| > | can find a way to speed things up that way.
| > 
| > First of all, thanks for replying! :)
| > 
| > I have the impression that reducing the delay time on
| > igb_get_hw_semaphore() wouldn't help much here because
| > igb_release_swfw_sync_82575() has this piece of code:
| > 
| >     while (igb_get_hw_semaphore(hw) != 0);
| > 
| > So, even if the udelay used there was 1us, in cases like the one
| > described below, you would be subjected to unbound busy waits.
| > 
| > I can see that the issue happened because someone else (maybe even the
| > HW) was holding the semaphore for a long time.
| > 
| > Busy waits/loops are dangerous on RT when the process is running at
| > higher priorities. In this case ifconfig was a regular process but when
| > it requested the NIC stats, it held the bond->lock.
| > 
| > Then, while ifconfig was busy waiting on 'while
| > (igb_get_hw_semaphore(hw) != 0);'
| > the igb TX threads (we use threaded IRQs on RT) needed that lock. As
| > these IRQ threads run at higher RT priorities, in order to have their
| > work interrupted for smaller periods while waiting for threads running
| > at lower priorities they perform a Priority Inheritance operation, they
| > lend their priority to the lower priority thread until it releases the
| > lock.
| > 
| > This way, the regular process 'ifconfig' busy waiting for the HW
| > semaphore becomes a Real Time thread, running (in this example) at
| > FIFO:85 and therefore preventing any other thread of equal or lower
| > priority from getting any CPU time. If this persists for a long time,
| > several subsystems may experience problems and even collapse. One such
| > example is RCU.
| > 
| > Sorry if this email is getting a bit too big. While I understand the
| > need for serialization and the way it was done on
| > igb_get_hw_semaphore(), I would like to see if there is another way,
| > less likely to create a corner case in RT.
| > 
| > Again, this was observed only once and may not be easy to reproduce.
| > But it seems to be a real issue. All this scenario data was gathered by
| > debugging the vmcores (created by kdump) using crash.
| > 
| > Luis
| > 
| > | Let me know if there is more info I can provide.  I can review your
| > | full lspci -vvv , ethtool ethX output and your .CONFIG for anything
| > | else to check and, of course a full dmesg that shows the problem you
| > are seeing.
| > | I'm no bonding expert though, so if the problem is there, I may not
| > | have much to offer.
| > |
| > | Hope this helps.
| > |
| > | Carolyn
| > |
| > | Carolyn Wyborny
| > | Linux Development
| > | Networking Division
| > | Intel Corporation
| > |
| > |
| > | >
| > | > ----
| > | >
| > | > igb: minimize busy loop on igb_get_hw_semaphore
| > | >
| > | > Bugzilla: 976912
| > | >
| > | > In drivers/net/ethernet/intel/igb/e1000_82575.c, funtion
| > | > igb_release_swfw_sync_82575() there is this line:
| > | >
| > | >         while (igb_get_hw_semaphore(hw) != 0);
| > | >
| > | > That is basically a busy loop waiting on a HW semaphore.
| > | >
| > | > A customer has a setup where two igb NICs are part of a bonding
| > interface.
| > | > This customer also has a monitoring script that calls ifconfig
| > | > often. It was observed that in this scenario there is a chance that
| > | > this ifconfig, that happens to hold the bond->lock while collecting
| > | > statistics, enters this busy loop waiting for another thread clear
| > that HW semaphore.
| > | >
| > | > Meanwhile, the irq/xxx-ethY-Tx threads, running at FIFO:85, try to
| > | > acquire the bond lock, held by ifconfig. As it happens on RT, a
| > | > Priority Inheritance operation is started and ifconfig is boosted
| > to
| > | > FIFO:85 so that it may be able to finish its work sooner and
| > release
| > | > the bond->lock, desired by the aforementioned threads.
| > | >
| > | > As ifconfig is running on a busy loop, waiting for the HW
| > semaphore,
| > | > this thread now runs a busy loop at a very high priority,
| > preventing
| > | > other threads on that CPU from progressing.
| > | >
| > | > On that scenario, it seems that the thread holding the HW semaphore
| > | > is also waiting for a lock held by other task. This whole scenario
| > | > leads to RCU stall warnings, that have as side effects a crescent
| > number of threads being stuck.
| > | > As this progresses, the livelock reaches threads on other CPUs and
| > | > the system becomes more and more unresponsive.
| > | >
| > | > This little patch aims to prevent the busy loop at a high priority
| > | > (the code called by ifconfig in this example) to starve the threads
| > | > on the same CPU. It may not solve the issue but will at least lead
| > | > us closer to the real issue, masked by the RCU stalls created by
| > the busy loop.
| > | >
| > | > This is mostly a debug patch for a testing kernel.
| > | >
| > | > Signed-off-by: Luis Claudio R. Goncalves <[email protected]>
| > | >
| > | > diff --git a/drivers/net/ethernet/intel/igb/e1000_mac.c
| > | > b/drivers/net/ethernet/intel/igb/e1000_mac.c
| > | > index a5c7200..ec0be87 100644
| > | > --- a/drivers/net/ethernet/intel/igb/e1000_mac.c
| > | > +++ b/drivers/net/ethernet/intel/igb/e1000_mac.c
| > | > @@ -1225,7 +1225,7 @@ s32 igb_get_hw_semaphore(struct e1000_hw *hw)
| > | >                 if (!(swsm & E1000_SWSM_SMBI))
| > | >                         break;
| > | >
| > | > -               udelay(50);
| > | > +               usleep_range(50,51);
| > | >                 i++;
| > | >         }
| > | >
| > | > @@ -1244,7 +1244,7 @@ s32 igb_get_hw_semaphore(struct e1000_hw *hw)
| > | >                 if (rd32(E1000_SWSM) & E1000_SWSM_SWESMBI)
| > | >                         break;
| > | >
| > | > -               udelay(50);
| > | > +               usleep_range(50,51);
| > | >         }
| > | >
| > | >         if (i == timeout) {
| > | > --
| > --
| > [ Luis Claudio R. Goncalves             Red Hat  -  Realtime Team ]
| > [ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8 ]
| > 
| > 
| > -----------------------------------------------------------------------
| > -------
| > See everything from the browser to the database with AppDynamics Get
| > end-to-end visibility with application monitoring from AppDynamics
| > Isolate bottlenecks and diagnose root cause in seconds.
| > Start your free trial of AppDynamics Pro today!
| > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.cl
| > ktrk
| > _______________________________________________
| > E1000-devel mailing list
| > [email protected]
| > https://lists.sourceforge.net/lists/listinfo/e1000-devel
| > To learn more about Intel&#174; Ethernet, visit
| > http://communities.intel.com/community/wired
---end quoted text---

-- 
[ Luis Claudio R. Goncalves             Red Hat  -  Realtime Team ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8 ]


------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to