On Thu, Aug 04, 2016 at 04:47:25PM +0100, Ferruh Yigit wrote: > On 8/4/2016 3:54 PM, Igor Ryzhov wrote: > > > >> 4 ???. 2016 ?., ? 16:21, Ferruh Yigit <ferruh.yigit at intel.com > >> <mailto:ferruh.yigit at intel.com>> ???????(?): > >> > >> On 8/4/2016 12:51 PM, Igor Ryzhov wrote: > >>> Hello Ferruh, > >>> > >>>> 4 ???. 2016 ?., ? 14:33, Ferruh Yigit <ferruh.yigit at intel.com > >>>> <mailto:ferruh.yigit at intel.com>> ???????(?): > >>>> > >>>> Hi Igor, > >>>> > >>>> On 8/3/2016 5:58 PM, Igor Ryzhov wrote: > >>>>> Hello. > >>>>> > >>>>> Function rte_eth_dev_attach can return false positive result. > >>>>> It happens because rte_eal_pci_probe_one returns zero if no driver > >>>>> is found for the device: > >>>>> ret = pci_probe_all_drivers(dev); > >>>>> if (ret < 0) > >>>>> goto err_return; > >>>>> return 0; > >>>>> (pci_probe_all_drivers returns 1 in that case) > >>>>> > >>>>> For example, it can be easily reproduced by trying to attach virtio > >>>>> device, managed by kernel driver. > >>>> > >>>> You are right, and I did able to reproduce this issue with virtio as you > >>>> suggest. > >>>> > >>>> But I wonder why rte_eth_dev_get_port_by_addr() is not catching this. > >>>> Perhaps a dev->attached check needs to be added into this function. > >> > >> With a second check, rte_eth_dev_get_port_by_addr() catches it if the > >> driver is missing. > >> > >> But for virtio case, problem is not missing driver. > >> Problem is eth_virtio_dev_init() is returning a positive value on fail. > >> > >> Call stack is: > >> rte_eal_pci_probe_one > >> pci_probe_all_drivers > >> rte_eal_pci_probe_one_driver > >> rte_eth_dev_init > >> eth_virtio_dev_init > >> > >> So rte_eal_pci_probe_one_driver() also returns positive value, as no > >> driver found, and rte_eth_dev_get_port_by_addr() returns a valid > >> port_id, since rte_eth_dev_init() allocated an eth_dev. > >> > >> Briefly, this can be fixed in virtio pmd, instead of eal pci. > >> > >>>> > >>>>> > >>>>> I think it should be: > >>>>> ret = pci_probe_all_drivers(dev); > >>>>> if (ret) > >>>>> goto err_return; > >>>>> return 0; > >>>> > >>>> Your proposal looks good to me. Will you send a patch? > >>> > >> > >> Original code silently ignores the if driver is missing for that dev, > >> although it is still questionable, I think we can keep this as it is. > >> > >>> Patch sent. > >> > >> Sorry for this, but can you please test with following modification in > >> virtio: > >> index 07d6449..c74eeee 100644 > >> --- a/drivers/net/virtio/virtio_ethdev.c > >> +++ b/drivers/net/virtio/virtio_ethdev.c > >> @@ -1156,7 +1156,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) > >> if (pci_dev) { > >> ret = vtpci_init(pci_dev, hw, &dev_flags); > >> if (ret) > >> - return ret; > >> + return -1; > >> } > >> > >> /* Reset the device although not necessary at startup */ > > > > I think it's not a good change, because it will break the idea of this > > patch - http://dpdk.org/browse/dpdk/commit/?id=ac5e1d83 > > Yes, breaks this one, I wasn't aware of this patch. But in this patch, > commit log says: "return 1 to tell the upper layer we > don't take over this device.", I am not sure upper layer designed for this. > > > > > Also, with your patch the application will not start, because > > rte_eal_pci_probe will fail: > > > > if (ret < 0) > > rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT > > " cannot be used\n", dev->addr.domain, dev->addr.bus, > > dev->addr.devid, dev->addr.function); > > Yes it fails, and this looks like intended behavior. This failure is > correct according code. > > > > > And now I think that maybe we should change the way rte_eal_pci_probe works. > > I think we shouldn't stop the application if just one of PCI devices is > > not probed successfully. > > Agreed. Overall rte_exit() usage already discussed a few times. > > I think best option is: > - don't exit app if rte_eal_pci_probe() fails, only print an error.
Whether or not the pci probe exits the app or not, I think it should signal a serious error if the probe fails and a device was explicitly whitelisted on the commandline. Given the user explicitly requested the device, a failure to use is probably a problem which requires the user to fix before running the app. /Bruce