> -----Original Message----- > From: Juraj Linkeš <juraj.lin...@pantheon.tech> > Sent: Tuesday, December 8, 2020 5:27 PM > To: Xing, Beilei <beilei.x...@intel.com>; David Marchand > <david.march...@redhat.com>; Guo, Jia <jia....@intel.com> > Cc: dev@dpdk.org; Kinsella, Ray <ray.kinse...@intel.com>; Andrew > Yourtchenko (ayourtch) <ayour...@cisco.com>; Yigit, Ferruh > <ferruh.yi...@intel.com> > Subject: RE: [dpdk-dev] Faulty VF initialization during DPDK startup when > multiple DPDK instances use different VFs with the same PF > > > > > -----Original Message----- > > From: Xing, Beilei <beilei.x...@intel.com> > > Sent: Tuesday, December 8, 2020 8:14 AM > > To: David Marchand <david.march...@redhat.com>; Guo, Jia > > <jia....@intel.com> > > Cc: dev@dpdk.org; Kinsella, Ray <ray.kinse...@intel.com>; Andrew > > Yourtchenko > > (ayourtch) <ayour...@cisco.com>; Juraj Linkeš > > <juraj.lin...@pantheon.tech>; Yigit, Ferruh <ferruh.yi...@intel.com> > > Subject: RE: [dpdk-dev] Faulty VF initialization during DPDK startup > > when multiple DPDK instances use different VFs with the same PF > > > > > > > > > -----Original Message----- > > > From: dev <dev-boun...@dpdk.org> On Behalf Of David Marchand > > > Sent: Monday, December 7, 2020 6:55 PM > > > To: Xing, Beilei <beilei.x...@intel.com>; Guo, Jia > > > <jia....@intel.com> > > > Cc: dev@dpdk.org; Kinsella, Ray <ray.kinse...@intel.com>; Andrew > > > Yourtchenko (ayourtch) <ayour...@cisco.com>; Juraj Linkeš > > > <juraj.lin...@pantheon.tech>; Yigit, Ferruh <ferruh.yi...@intel.com> > > > Subject: Re: [dpdk-dev] Faulty VF initialization during DPDK startup > > > when multiple DPDK instances use different VFs with the same PF > > > > > > On Mon, Dec 7, 2020 at 11:49 AM Juraj Linkeš > > > <juraj.lin...@pantheon.tech> > > > wrote: > > > > > > > > Hi DPDK devs, > > > > > > > > A while back I've submitted this bug: > > > https://bugs.dpdk.org/show_bug.cgi?id=578 and now we have a pretty > > > good idea where the issue stems from. TL;DL: it seems to be in > > > either > > > XL710 firmware or i40e driver, with downstream effects which we may > > > need to address in DPDK. > > > > > > > > What is the issue? > > > > We're using an XL710 NIC with SR-IOV setup with multiple virtual > > > > functions > > > (VFs) that belong to the same physical function (PF). We're > > > observing intermittent failures when multiple DPDK EAL instances are > > > trying to initialize different VFs from the PF. One of the failures looks > > > like > this: > > > > i40evf_check_api_version(): PF/VF API version mismatch:(0.0)-(1.1) > > > > > > > > This results in VPP (which uses DPDK to initialize these VFs) not > > > > being able to > > > use the VFs. There an associated syslog: > > > > > > > > [Thu Dec 3 02:30:56 2020] i40e 0000:05:00.1: Unable to send the > > > > message to > > > VF 49 aq_err 12 > > > > > > > > Digging in the sources we've found that this is the error message: > > > > > > > https://elixir.bootlin.com/linux/v4.15/source/drivers/net/ethernet/i > > > nt > > > el/i40ev > > > f/i40e_adminq_cmd.h#L115 > > > > > > > > This suggests it's an issue with either the driver or firmware and > > > > that leads us > > > to two questions: > > > > 1) Is this an expected condition to happen? What is the reason for > > > > this > > > contention and is it normal to have it, and what is the expected > > > correct behavior of the calling code? > > > > aq_err 12 is I40E_AQ_RC_EBUSY, which is returned by firmware. It > > indicates mailbox is full and device is too busy to handle other > > requests. So when multiple DPDK instances are trying to initialize > > different VFs from the PF, there'll be many requirements from PF to > > firmware, > it will be easy to full the mailbox. > > > > > > 2) If "yes" to the previous question - then, since the caller in > > > > this case > > > initialization code of DPDK, should we address it there (e.g. some > > > retries or a lock)? > > > > I agree to use retry or lock to address it, but it should be addressed > > in kernel driver not DPDK, since the kernel PF is responsible for > > communicating with firmware. When there's aq_err 12 returned, PF > > should retry to send the AQ command to firmware. > > > > Thanks, Beilei, for the clarification. Do you know how/where should I raise > the > bug with the i40e driver? The kernel bugzilla [0]? > > [0] https://bugzilla.kernel.org/ >
I think so, you should report it in kernel community or report to Intel PAE. > > > > > > > > Are there any Intel (or SR-IOV) experts who could help with > > > > answering the > > > first question? Or is it possible that no matter what the expected > > > behavior is should we address it in DPDK? > > > > > > Added i40e maintainers. > > > > > > > > > -- > > > David Marchand