Hi Dou, On Tue, 13 Sep 2016 14:56:15 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote:
> Hi Alex, > > I am Dou. > I am testing these patches. > > At 09/01/2016 03:44 AM, Alex Williamson wrote: > > [...] > > >> + > >> + pcie_cap_deverr_init(pdev); > >> + return pcie_aer_init(pdev, pos, size); > > > > pcie_aer_init() adds a v2 AER capability regardless of the version of > > the AER capability on the device. Is this expected? > > I think it is not good. > > v2 defines a lot > > more bits in various registers than v1, so are we simply hoping that > > devices have reserved bits as zero like they're supposed to? > > I read the Capability Version in PCIe Spec. it says: > > Capability Version – This field is a PCI-SIG defined version number > that indicates the version of the Capability structure present. > OR > This field must be 2h if the End-End TLP Prefix Supported bit (see > Section 7.8.15) is Set and must be 1h or 2h otherwise. > > In my option, I guess that the spec may mean that v1/v2 is used for the > End-End TLP Prefix Supported bit supported. > > And I find that Kernel and QEmu don't do some special work for it. this > feature may not affect the registers in AER. > > can you give me some AER bits that are defined in v2 ,but not in v1? There are quite a lot. In the uncorrectable error status register bits 21-25 are only defined for v2, same for the mask and severity for this register. Bits 14 & 15 are new for v2 in the correctable error status registers, status and mask. Bits 9-11 are new in the control register. > It's a > > bit strange that for an Intel 82576 NIC I get a v1 AER capability w/o > > aer=on and a v2 with. Thanks, > > > > Yes, I do the test, and get the same result for me. > > I use the following device to test: > > 06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network > Connection (rev 01) > > Firstly, In host, I use the command "lspci -vvv -s 06:00.0": > > [...] > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq+ ACSViol- > UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 12, GenCap- CGenEn- ChkCap- ChkEn- > [...] > > Secondly, I pass through the device to the guest. In the guest: > > [...] > Capabilities: [100 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq+ ACSViol- > UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 12, GenCap- CGenEn- ChkCap- ChkEn- > [...] > > They look the same, except the capabilities version. > > Thirdly, I drop the patches and do the test. After I launch QEmu, > In guest: > > [...] > Capabilities: [100 v1] Advanced Error Reporting > [...] > > Fourthly, I modify the PCI_ERR_VER: #define PCI_ERR_VER 1. > In guest: > [...] > Capabilities: [100 v1] Advanced Error Reporting > [...] > > They also have the same features. > > At last, I think the v1/v2 is not influence on our AER function. > But, it is actually strange that we can get a v1 AER capability > w/o aer=on and a v2 with. > I suggest that we add a capability version parameter to extend the > pcie_aer_init(),or add a new pcie_aer_v1_init() for the devices with > v1 AER cap. Agreed. Thanks, Alex