> -----Original Message----- > From: Luca Boccassi [mailto:lboccass at Brocade.com] > Sent: Thursday, July 7, 2016 6:21 PM > To: Lu, Wenzhuo > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link > > On Thu, 2016-07-07 at 01:09 +0000, Lu, Wenzhuo wrote: > > Hi Luca, > > > > > > > -----Original Message----- > > > From: Luca Boccassi [mailto:lboccass at Brocade.com] > > > Sent: Thursday, July 7, 2016 12:23 AM > > > To: Lu, Wenzhuo > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link > > > > > > On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote: > > > > Hi Luca, > > > > > > > > > -----Original Message----- > > > > > From: Luca Boccassi [mailto:lboccass at Brocade.com] > > > > > Sent: Tuesday, July 5, 2016 5:53 PM > > > > > To: Lu, Wenzhuo > > > > > Cc: dev at dpdk.org > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link > > > > > > > > > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote: > > > > > > Hi Luca, > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Luca Boccassi [mailto:lboccass at Brocade.com] > > > > > > > Sent: Monday, July 4, 2016 11:48 PM > > > > > > > To: Lu, Wenzhuo > > > > > > > Cc: dev at dpdk.org > > > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF > > > > > > > link > > > > > > > > > > > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote: > > > > > > > > If the PF link is down and up, VF link will not work > > > > > > > > accordingly. > > > > > > > > This patch set addes the support of VF link reset. So, > > > > > > > > when VF receices the messges of physical link down/up. APP > > > > > > > > can reset the VF link and let it recover. > > > > > > > > > > > > > > > > PS: This patch set is splitted from a previous patch set, > > > > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base > > > > > > > > on the patch set *support mailbox interruption on ixgbe/igb VF*. > > > > > > > > > > > > > > > > Wenzhuo Lu (3): > > > > > > > > lib/librte_ether: support device reset > > > > > > > > ixgbe: implement device reset on VF > > > > > > > > igb: implement device reset on VF > > > > > > > > > > > > > > > > Zhe Tao (1): > > > > > > > > i40e: implement device reset on VF > > > > > > > > > > > > > > > > v1: > > > > > > > > - Added the implementation for the VF reset functionality. > > > > > > > > v2: > > > > > > > > - Changed the i40e related operations during VF reset. > > > > > > > > v3: > > > > > > > > - Resent the patches because of the mail sent issue. > > > > > > > > v4: > > > > > > > > - Removed some VF reset emulation code. > > > > > > > > v5: > > > > > > > > - Removed all the code related with lock. > > > > > > > > v6: > > > > > > > > - Updated the NIC feature overview matrix. > > > > > > > > - Added more explanation in the doxygen comment of reset API. > > > > > > > > > > > > > > > > doc/guides/nics/overview.rst | 1 + > > > > > > > > doc/guides/rel_notes/release_16_07.rst | 13 ++++++ > > > > > > > > drivers/net/e1000/igb_ethdev.c | 59 > ++++++++++++++++++++++++ > > > > > > > > drivers/net/i40e/i40e_ethdev.h | 4 ++ > > > > > > > > drivers/net/i40e/i40e_ethdev_vf.c | 83 > > > > > > > ++++++++++++++++++++++++++++++++++ > > > > > > > > drivers/net/i40e/i40e_rxtx.c | 10 ++++ > > > > > > > > drivers/net/i40e/i40e_rxtx.h | 4 ++ > > > > > > > > drivers/net/ixgbe/ixgbe_ethdev.c | 64 > > > +++++++++++++++++++++++++- > > > > > > > > drivers/net/ixgbe/ixgbe_ethdev.h | 2 +- > > > > > > > > drivers/net/ixgbe/ixgbe_rxtx.c | 12 +++-- > > > > > > > > lib/librte_ether/rte_ethdev.c | 17 +++++++ > > > > > > > > lib/librte_ether/rte_ethdev.h | 24 ++++++++++ > > > > > > > > lib/librte_ether/rte_ether_version.map | 7 +++ > > > > > > > > 13 files changed, 295 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > Hello Wenzhuo, > > > > > > > > > > > > > > I'm testing this patchset, but I am sporadically running > > > > > > > into an issue where the VFs reset fails after the PF flaps. > > > > > > > > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs > > > > > > > in. > > > > > > > > > > > > > > I am using calling rte_eth_dev_reset in response to a > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors > > > > > > > appear in the > > > > > > > log: > > > > > > > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link. > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed > > > > > > > queue_id=0 > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware > > > > > > > (-12) > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device. > > > > > > > > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call > > > > > > > in ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of > 2048. > > > > > > > The application has ~500 2MB hugepages, and there's 2GB of > > > > > > > free memory available on top of that. > > > > > > > > > > > > > > Have you seen this before? Any pointer or suggestion for > > > > > > > debugging? > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > -- > > > > > > > Kind regards, > > > > > > > Luca Boccassi > > > > > > I think the problem is the mbuf occupied by the packets is not > > > > > > released. This > > > > > memory has to be released by the APP, so my patches haven?t covered > this. > > > > > Actually an example is needed to show how to use the reset API. > > > > > I plan to modify the testpmd. > > > > > > You may notice this feature is postponed to 16.11. Would you > > > > > > like to wait for > > > > > the new version that will include an example? > > > > > > > > > > Hi, > > > > > > > > > > Unfortunately we need the VF reset working sooner than that, so > > > > > one way or the other I'll need to sort it out. Given I've got a > > > > > use case where this is happening, if it can be helpful for you > > > > > I'm more than happy to help as a guinea pig. If you could please > > > > > give some guidance/guidelines with regards to which API to use > > > > > to sort the mbuf > > > problem, I can try it out and give back some feedback. > > > > > > > > > > Thanks! > > > > I made a stupid mistake and deleted all my code. So, I have to > > > > take some time to rewrite it :( Attached the example I used to > > > > test the reset API. It's > > > modified from the l2fwd example. So you can compare it with l2fwd to > > > see what need to be added. > > > > Hopefully it can help :) > > > > > > Thanks! That made me understand a couple of things more, and I've > > > got past the problem. > > > > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking > call. > > > the _RESET event callback is fired when the PF goes down, but when I > > > call rte_eth_dev_reset it will block until the PF goes back up. > > > There is no way, as far as I can see, to know if the PF is back up before > calling rte_eth_dev_reset. > > > > > > This is a problem because, as far as I understand, I have to call > > > all the rte_eth_dev_ APIs from the same thread, in my case the > > > master thread, and I can't have that block potentially indefinitely. > > > > > > Would it be possible to have 2 events instead of 1, one when the PF > > > goes down and one when it goes up? This way an application would be > > > able to soft-stop the port (drain queues, etc) when the PF is down, > > > and then call the reset API when it goes back up. > > > > > > Thanks! > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events. > > 1, Normally we use kernel driver for PF. Now the kernel driver only have one > kind of message for link down and up. So we cannot tell if it's down or up. > > 2, When the PF is down, if we don't reset the VF, VF is not working. > > It cannot receive any message from PF. So we cannot know that when PF > > is up. It means normally we have to reset VF twice when PF down and > > up. (Surely we can wait a while when we receive the message from PF > > until PF is up. But we cannot tell how long the time is appropriate. > > So this *wait a while* may work for flash.) > > Thanks for the clarification, I understand. > > The problem with a blocking call is that we basically need to spawn one thread > per rte_eth_dev_reset call, since there is no way of knowing if a PF is down > for > good or just flapping, and we can't have a single thread managing all the > interfaces being blocked forever (EG: PF 1 and 2 go down, thread blocks on PF > 1 > reset call but it never returns, meanwhile PF 2 goes back up but call is never > made). > > A colleague of mine, Eric Kinzie, suggested to add a blocking boolean > parameter > to rte_eth_dev_reset API. If set to false, then the call will not block and > just does > one try and return an error (EAGAIN ?). Would this be an acceptable > proposition? It's a good suggestion. And I think if the parameter is set to false and the link is not up after trying once, it will be APP's responsibility to setup a timer or something like that to keep trying to bring up the link.
> > -- > Kind regards, > Luca Boccassi