On 10/19/2017 3:43 PM, Patil, Harish wrote: > -----Original Message----- > From: Harish Patil <harish.pa...@cavium.com> > Date: Tuesday, October 17, 2017 at 9:50 PM > To: Thomas Monjalon <tho...@monjalon.net>, Ferruh Yigit > <ferruh.yi...@intel.com> > Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng....@intel.com>, > Jingjing Wu <jingjing...@intel.com>, "Thotton, Shijith" > <shijith.thot...@cavium.com>, Gregory Etelson <greg...@weka.io>, George > Prekas <george.pre...@epfl.ch>, "sta...@dpdk.org" <sta...@dpdk.org> > Subject: Re: [PATCH] igb_uio: revert open and release operations >> >> >> -----Original Message----- >> From: Thomas Monjalon <tho...@monjalon.net> >> Date: Tuesday, October 17, 2017 at 1:33 PM >> To: Ferruh Yigit <ferruh.yi...@intel.com>, Harish Patil >> <harish.pa...@cavium.com> >> Cc: "dev@dpdk.org" <dev@dpdk.org>, Jianfeng Tan <jianfeng....@intel.com>, >> Jingjing Wu <jingjing...@intel.com>, "Thotton, Shijith" >> <shijith.thot...@cavium.com>, Gregory Etelson <greg...@weka.io>, George >> Prekas <george.pre...@epfl.ch>, "sta...@dpdk.org" <sta...@dpdk.org> >> Subject: Re: [PATCH] igb_uio: revert open and release operations >> >>> 17/10/2017 22:14, Ferruh Yigit: >>>> There were bug reports about terminated application may leave device in >>>> undesired state: >>>> http://dpdk.org/ml/archives/dev/2016-November/049745.html >>>> http://dpdk.org/ml/archives/dev/2016-November/050932.html >>>> >>>> And a proposal to fix: >>>> http://dpdk.org/ml/archives/dev/2016-December/051844.html >>>> >>>> Later another proposal triggered the discussion: >>>> http://dpdk.org/ml/archives/dev/2017-May/066317.html >>>> >>>> Finally a fix patch pushed into v17.08: >>>> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of >>>> device file") >>>> >>>> Later a regression report sent related to the pushed patch: >>>> http://dpdk.org/ml/archives/dev/2017-September/075236.html >>>> >>>> And a fix for regression integrated into v17.11-rc1: >>>> http://dpdk.org/ml/archives/dev/2017-October/079166.html >>>> Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in >>>> VM") >>>> Commit: 6b9ed026a870 ("igb_uio: fix build with kernel <= 3.17") >>>> >>>> Even after the fix qede PMD reported to be broken: >>>> http://dpdk.org/ml/archives/dev/2017-October/079359.html >>>> >>>> So this patch reverts original fix and related commits. The related >>>> igb_uio code part turns back to v17.05 base. >>> [...] >>>> --- >>>> It would be nice to solve this issue in LTS release, but being close to >>>> the release and the error report without details makes it hard to work >>>> more on this issue. >>> >>> With this revert, we are back to the original issue. >>> We must really try the proposed solution. >>> Harish, please describe your issue and think how it could be fixed. >>> Jingjing made it work for i40e. >>> I know it is less effort to request a simple revert. >>> Please let's try to fix it once for all. >> >> Hi Ferruh/Thomas, >> I’m discussing it internally, so please hold on committing this patch till >> I revert back to you. >> >> Thanks. > >> [Harish] > >> 1) With the introduction of: >> Commit: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of >> device file”) > We saw failures with qede PF & SR-IOV VF initialization in PCI passthru > mode. > >> >> PF PCI passthru mode initialization failure was resolved by: >> “Commit: 5f6ff30dc507 ("igb_uio: fix interrupt enablement after FLR in >> VM")
Thank you for the update. > >> SR-IOV VF PCI passthru mode initialization issue is that PCI FLR and >> related device cleanup is not completed by the time VF driver starts >> loading. It results in the mbox command failure sent over the HW channel >> between VF and PF. This seems same reason why i40e added a check and wait loop. >> >> Even though pci_reset_function() waits for the stipulated amount of time >> per standards, VF FLR takes longer than that and pci_reset_function() & >> igb_uio_open() call returns before FLR completes and VF PMD driver tries >> to load before FLR completes leading to VF PMD initialization failure. >> >> >> We can work around this problem by adding driver delay/retry logic since >> there is no deterministic way of detecting FLR completion. But this is >> going to increase the driver load time. >> > >> 2) With the above patch ("igb_uio: issue FLR during open and release of >> device file), FLR is going to be issued unconditionally on all devices >> during igb_uio_open. We think it’s an over kill. FLR is required only for >> previous abnormal app termination. We already handle the abnormal app >> termination by doing necessary cleanup in the driver during load. This >> cleanup is more efficient as it is done only when required. So we feel >> that the drivers/devices needing such cleanup (the two cases listed >> below) should do it conditionally when required rather than >> igb_uio_open() unconditionally performing FLR all the time. >> >> e.g., > - cdb166963cae ("net/liquidio: add API for VF FLR”) Both 1) and 2) related to the pci_reset during open(). But the main functionality we are looking for is the pci_reset in release(). So we can remove reset during open() [1]. Will disabling pci_reset in open() [2] solve your problems? Jingjing, Jianfeng, What do you think? [1] Perhaps will need to revert or partially revert liquidio patch after this. [2] disable line "pci_reset_function(dev);" in the igbuio_pci_open() http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/igb_uio/igb_uio.c#n339 >> >> >> - http://dpdk.org/ml/archives/dev/2017-May/066317.html > > Thanks, > Harish >> >