Hi, Emil

Thanks for your patch.
After I applied your patch, the following are the feedback from my users.

"
Users had tested the latest patch that you provided and it is much improved 
now. However it’s still not good enough as the users are planning field 
deployment. Here are their findings:

So close, but not quite 100%. I did run over 2500 re-negotiations on one 
interface of a bonded pair and got the 0 MBps status total of three times. The 
longest run without single error was something like 1800 re-negotiations or so. 
So, this version seems to improve the situation immensely (the unpatched driver 
fails like 25% of the time), but there still seems to remain some tiny race 
somewhere.

So  it seems the failure occurs once every 600-900 connections.
"

I delved into the source code. And I found that maybe this time slice can 
result in this problem.

bonding                ixgbe 
  |                     |
  |                    carrier_on
  |                     |
  |    <----------------|
 link_up                |
  |                     |
  |                    carrier_off
  |                     |
 get_link_speed ------->|
  |                     |

Now bonding driver is link up while speed is link_speed_unknown because of link 
flap.

To an independent nic, it is meaningless to get link speed while carrier is 
off. But to a slave nic, maybe it is helpful, especially nic link flaps.

Maybe this patch can fix the above time slice.

Any reply is appreciated.

Best Regards!
Zhu Yanjun

Reply via email to