Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages

Bjorn Helgaas Tue, 27 Aug 2013 16:02:54 -0700

On Fri, Aug 23, 2013 at 3:41 PM, Skidmore, Donald C
<donald.c.skidm...@intel.com> wrote:
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelg...@google.com]
>> Sent: Friday, August 23, 2013 1:43 PM
>> To: Skidmore, Donald C
>> Cc: e1000-de...@lists.sourceforge.net; linux-...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; Don Dutile
>> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
>> to PF Nacked" messages
>>
>> On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
>> <donald.c.skidm...@intel.com> wrote:
>> >> -----Original Message-----
>> >> From: Bjorn Helgaas [mailto:bhelg...@google.com]
>> >> Sent: Friday, August 23, 2013 11:53 AM
>> >> To: Skidmore, Donald C
>> >> Cc: e1000-de...@lists.sourceforge.net; linux-...@vger.kernel.org;
>> >> linux- ker...@vger.kernel.org; Don Dutile
>> >> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of
>> >> type 00 to PF Nacked" messages
>> >>
>> >> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
>> >> > > -----Original Message-----
>> >> > > From: Bjorn Helgaas [mailto:bhelg...@google.com]
>> >> > > Sent: Friday, August 23, 2013 9:53 AM
>> >> > > To: Skidmore, Donald C
>> >> > > Cc: e1000-de...@lists.sourceforge.net; linux-...@vger.kernel.org;
>> >> > > linux- ker...@vger.kernel.org; Don Dutile
>> >> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last
>> >> > > Request of type 00 to PF Nacked" messages
>> >> > >
>> >> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas
>> >> > > <bhelg...@google.com>
>> >> > > wrote:
>> >> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
>> >> > > > <bhelg...@google.com>
>> >> > > wrote:
>> >> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
>> >> > > >> <bhelg...@google.com>
>> >> > > wrote:
>> >> > > >
>> >> > > >>> I played with this a little more and found this:
>> >> > > >>>
>> >> > > >>> 1) Magma card in z420, connected to chassis containing X540:
>> >> > > >>> fails (original report)
>> >> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
>> >> > > >>> fails
>> >> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
>> >> > > >>> works
>> >> > > >
>> >> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and
>> >> > > > it failed the same way.  I haven't bothered with config 2.
>> >> > > > It's not 100% reproducible, but at least it doesn't seem
>> >> > > > related to the expansion chassis.
>> >> > > >
>> >> > > > I attached the logs from config 3 to
>> >> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
>> >> > >
>> >> > > Is there anything I can do to help debug this?  Add
>> >> > > instrumentation, etc.?  It seems like I'm doing the simplest
>> >> > > possible thing -- just writing to the sysfs sriov_num_vfs file to 
>> >> > > enable
>> VFs.
>> >> > >
>> >> > > I almost think it must be related to my config somehow if nobody
>> >> > > else is seeing this, but at the same time, my config also seems
>> >> > > the simplest possible, so I don't know what I could be doing that's
>> unusual.
>> >> > >
>> >> > > Bjorn
>> >> >
>> >> > Hey Bjorn,
>> >> >
>> >> > I'm may be little confused so bear with me.
>> >> >
>> >> > Option 1 = (your normal set up), Magma card plugged to chasis, X540
>> >> > in
>> >> chasis.
>> >> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
>> >> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
>> >> >
>> >> > Options 1 & 2 - always fail
>> >> > Option 3 - sometimes fails (unsure at what rate failure occurs)
>> >> >
>> >> > Please correct me if I messed any of that up. :)
>> >>
>> >> Generally correct.  I've seen failures in all three configs, so I'm
>> >> only concerned with the simplest for now (config 3, no expansion chassis).
>> >>
>> >> > Another question I have relates to the lspci output you supplied in
>> >> > the
>> >> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run
>> >> lspci before you created the VF's?  If so could we see one while the 
>> >> failure
>> was occurring?
>> >>
>> >> That's correct, I collected the lspci output before reproducing the
>> >> problem.  I can't easily collect lspci afterwards because the machine
>> >> isn't responsive after the problem starts.
>> >>
>> >> > Also could you download the latest ixgbevf from source forge?
>> >> >
>> >> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
>> >> >
>> >> > If we add debugging messages it will be easier to patch this driver
>> >> > and it
>> >> contains our latest validated code base.
>> >>
>> >> I can do that if it turns out to be necessary.  But John Haller gave
>> >> me a good clue off-list:
>> >>
>> >> John wrote:
>> >> > I assume you want the VFs to be instantiated in a VM.  To do this,
>> >> > you need to blacklist the ixgbevf driver in the host (or not
>> >> > compile it into the host), or it will try to associate the driver
>> >> > in the host, rather than in the VM where you want it.  Then, the VM
>> >> > needs the ixgbevf driver, which will hopefully do a better job of
>> >> > talking to the mailbox in the host.  There is some work to assign
>> >> > the VF(s) to the VM, but I don't remember that offhand.
>> >>
>> >> I don't have any VMs (I started this whole thing because I was
>> >> looking at a PCI hotplug issue related to SR-IOV, so I don't really care
>> about VMs).
>> >>
>> >> So the ixgbevf driver on the *host* is claiming the new VFs, and it
>> >> sounds like maybe it can't handle that?
>> >>
>> >> Bjorn
>> >
>> > Not to speak for John, but I believe he was saying if you want to use your
>> VF's in a VM you need to make sure you don't run the ixgbevf driver on the
>> host as it will "claim" the VF's.  If you are NOT running any VM's then it is
>> perfectly fine to have both ixgbe and ixgbevf loaded.
>>
>> OK.  It certainly *seemed* surprising to have the ixgbevf driver blow up,
>> even if it was an error on my part to load it in the host.  Just let me know 
>> if
>> there's any more testing I can do.
>>
>> Bjorn
>
> Something is leading to the mbx messages being messed up as event by the " 
> Last Request of type 03 to PF Nacked" messages.   Have you tried reseting the 
> ixgbevf port (ethtool -r <your port>)?  Is it even possible to do this as you 
> mentioned that in the failure state the machine isn't very responsive?
>
> If it might be worthwhile to add logging into the ixgbevf and ixgbe drivers 
> around the mbx messages, with the hope being that it would help show what is 
> going between the two.  There have been some changes in that area of the 
> ixgbevf code as of late, so working off the latest source forge driver would 
> the easiest for me to send you patch on.  Sadly we haven't been able to 
> recreate the failure here so it makes it rather hard to debug.


I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
driver from http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/

I did notice what looks like a printk format problem and what appears
to be a bare MAC address with no label:

[  316.699504] ixgbevf: eth%d: ixgbevf_init_interrupt_scheme:
Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1
[  316.710897] ixgbevf: eth3: ixgbevf_probe: Intel(R) X540 Virtual Function
[  316.717608] 08:88:ff:ff:0d:ec

Sorry for wasting so much time on something that appears to be already fixed.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages

Reply via email to