On Mon, Dec 7, 2020 at 11:49 AM Juraj Linkeš <juraj.lin...@pantheon.tech> wrote:
>
> Hi DPDK devs,
>
> A while back I've submitted this bug: 
> https://bugs.dpdk.org/show_bug.cgi?id=578 and now we have a pretty good idea 
> where the issue stems from. TL;DL: it seems to be in either XL710 firmware or 
> i40e driver, with downstream effects which we may need to address in DPDK.
>
> What is the issue?
> We're using an XL710 NIC with SR-IOV setup with multiple virtual functions 
> (VFs) that belong to the same physical function (PF). We're observing 
> intermittent failures when multiple DPDK EAL instances are trying to 
> initialize different VFs from the PF. One of the failures looks like this:
> i40evf_check_api_version(): PF/VF API version mismatch:(0.0)-(1.1)
>
> This results in VPP (which uses DPDK to initialize these VFs) not being able 
> to use the VFs. There an associated syslog:
>
> [Thu Dec  3 02:30:56 2020] i40e 0000:05:00.1: Unable to send the message to 
> VF 49 aq_err 12
>
> Digging in the sources we've found that this is the error message:
> https://elixir.bootlin.com/linux/v4.15/source/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h#L115
>
> This suggests it's an issue with either the driver or firmware and that leads 
> us to two questions:
> 1) Is this an expected condition to happen? What is the reason for this 
> contention and is it normal to have it, and what is the expected correct 
> behavior of the calling code?
> 2) If "yes" to the previous question - then, since the caller in this case 
> initialization code of DPDK, should we address it there (e.g. some retries or 
> a lock)?
>
> Are there any Intel (or SR-IOV) experts who could help with answering the 
> first question? Or is it possible that no matter what the expected behavior 
> is should we address it in DPDK?

Added i40e maintainers.


-- 
David Marchand

Reply via email to