On Mon, Dec 7, 2020 at 11:49 AM Juraj Linkeš <juraj.lin...@pantheon.tech> wrote: > > Hi DPDK devs, > > A while back I've submitted this bug: > https://bugs.dpdk.org/show_bug.cgi?id=578 and now we have a pretty good idea > where the issue stems from. TL;DL: it seems to be in either XL710 firmware or > i40e driver, with downstream effects which we may need to address in DPDK. > > What is the issue? > We're using an XL710 NIC with SR-IOV setup with multiple virtual functions > (VFs) that belong to the same physical function (PF). We're observing > intermittent failures when multiple DPDK EAL instances are trying to > initialize different VFs from the PF. One of the failures looks like this: > i40evf_check_api_version(): PF/VF API version mismatch:(0.0)-(1.1) > > This results in VPP (which uses DPDK to initialize these VFs) not being able > to use the VFs. There an associated syslog: > > [Thu Dec 3 02:30:56 2020] i40e 0000:05:00.1: Unable to send the message to > VF 49 aq_err 12 > > Digging in the sources we've found that this is the error message: > https://elixir.bootlin.com/linux/v4.15/source/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h#L115 > > This suggests it's an issue with either the driver or firmware and that leads > us to two questions: > 1) Is this an expected condition to happen? What is the reason for this > contention and is it normal to have it, and what is the expected correct > behavior of the calling code? > 2) If "yes" to the previous question - then, since the caller in this case > initialization code of DPDK, should we address it there (e.g. some retries or > a lock)? > > Are there any Intel (or SR-IOV) experts who could help with answering the > first question? Or is it possible that no matter what the expected behavior > is should we address it in DPDK?
Added i40e maintainers. -- David Marchand