> -----Original Message----- > From: Xing, Beilei <beilei.x...@intel.com> > Sent: Tuesday, December 8, 2020 8:14 AM > To: David Marchand <david.march...@redhat.com>; Guo, Jia > <jia....@intel.com> > Cc: dev@dpdk.org; Kinsella, Ray <ray.kinse...@intel.com>; Andrew Yourtchenko > (ayourtch) <ayour...@cisco.com>; Juraj Linkeš <juraj.lin...@pantheon.tech>; > Yigit, Ferruh <ferruh.yi...@intel.com> > Subject: RE: [dpdk-dev] Faulty VF initialization during DPDK startup when > multiple DPDK instances use different VFs with the same PF > > > > > -----Original Message----- > > From: dev <dev-boun...@dpdk.org> On Behalf Of David Marchand > > Sent: Monday, December 7, 2020 6:55 PM > > To: Xing, Beilei <beilei.x...@intel.com>; Guo, Jia <jia....@intel.com> > > Cc: dev@dpdk.org; Kinsella, Ray <ray.kinse...@intel.com>; Andrew > > Yourtchenko (ayourtch) <ayour...@cisco.com>; Juraj Linkeš > > <juraj.lin...@pantheon.tech>; Yigit, Ferruh <ferruh.yi...@intel.com> > > Subject: Re: [dpdk-dev] Faulty VF initialization during DPDK startup > > when multiple DPDK instances use different VFs with the same PF > > > > On Mon, Dec 7, 2020 at 11:49 AM Juraj Linkeš > > <juraj.lin...@pantheon.tech> > > wrote: > > > > > > Hi DPDK devs, > > > > > > A while back I've submitted this bug: > > https://bugs.dpdk.org/show_bug.cgi?id=578 and now we have a pretty > > good idea where the issue stems from. TL;DL: it seems to be in either > > XL710 firmware or i40e driver, with downstream effects which we may > > need to address in DPDK. > > > > > > What is the issue? > > > We're using an XL710 NIC with SR-IOV setup with multiple virtual > > > functions > > (VFs) that belong to the same physical function (PF). We're observing > > intermittent failures when multiple DPDK EAL instances are trying to > > initialize different VFs from the PF. One of the failures looks like this: > > > i40evf_check_api_version(): PF/VF API version mismatch:(0.0)-(1.1) > > > > > > This results in VPP (which uses DPDK to initialize these VFs) not > > > being able to > > use the VFs. There an associated syslog: > > > > > > [Thu Dec 3 02:30:56 2020] i40e 0000:05:00.1: Unable to send the > > > message to > > VF 49 aq_err 12 > > > > > > Digging in the sources we've found that this is the error message: > > > > > https://elixir.bootlin.com/linux/v4.15/source/drivers/net/ethernet/int > > el/i40ev > > f/i40e_adminq_cmd.h#L115 > > > > > > This suggests it's an issue with either the driver or firmware and > > > that leads us > > to two questions: > > > 1) Is this an expected condition to happen? What is the reason for > > > this > > contention and is it normal to have it, and what is the expected > > correct behavior of the calling code? > > aq_err 12 is I40E_AQ_RC_EBUSY, which is returned by firmware. It indicates > mailbox is full and device is too busy to handle other requests. So when > multiple > DPDK instances are trying to initialize different VFs from the PF, there'll > be many > requirements from PF to firmware, it will be easy to full the mailbox. > > > > 2) If "yes" to the previous question - then, since the caller in > > > this case > > initialization code of DPDK, should we address it there (e.g. some > > retries or a lock)? > > I agree to use retry or lock to address it, but it should be addressed in > kernel > driver not DPDK, since the kernel PF is responsible for communicating with > firmware. When there's aq_err 12 returned, PF should retry to send the AQ > command to firmware. >
Thanks, Beilei, for the clarification. Do you know how/where should I raise the bug with the i40e driver? The kernel bugzilla [0]? [0] https://bugzilla.kernel.org/ > > > > > > Are there any Intel (or SR-IOV) experts who could help with > > > answering the > > first question? Or is it possible that no matter what the expected > > behavior is should we address it in DPDK? > > > > Added i40e maintainers. > > > > > > -- > > David Marchand