On Mon, Feb 10, 2020 at 8:04 PM Gaetan Rivet <gr...@u256.net> wrote: > > Hi Somnath, > > Reformating the mails, to keep with the inner-posting (mixing top-posting and > inner-posting > makes it hard to follow). See the end of the mail. > > > >> On Wed, Feb 5, 2020 at 2:47 PM Gaetan Rivet <gr...@u256.net> wrote: > >>> > >>> On 05/02/2020 09:52, Somnath Kotur wrote: > >>>> Hello Gaetan, > >>>> > >>>> > >>>> On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <gr...@u256.net> wrote: > >>>>> > >>>>> On 04/02/2020 10:15, Somnath Kotur wrote: > >>>>>> As per the comments in this code section, "since there is a matching > >>>>>> device, > >>>>>> it is now its responsibility to manage the devargs we've just > >>>>>> inserted." > >>>>>> But the matching device ptr's devargs is still uninitialized or not > >>>>>> pointing > >>>>>> to the newest dev_args that were passed as a parameter to > >>>>>> local_dev_probe(). > >>>>>> This is needed particularly in the case when *probe is called again* > >>>>>> on an > >>>>>> already probed device(the parent device for the representor) as part > >>>>>> of adding > >>>>>> a representor port to an OVS switch(OVS-DPDK) like so: > >>>>>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \ > >>>>>> options:dpdk-devargs=0000:06:02.0,representor=[1] > >>>>>> > >>>>>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove") > >>>>>> > >>>>>> Signed-off-by: Somnath Kotur <somnath.ko...@broadcom.com> > >>>>>> --- > >>>>>> lib/librte_eal/common/eal_common_dev.c | 1 + > >>>>>> 1 file changed, 1 insertion(+) > >>>>>> > >>>>>> diff --git a/lib/librte_eal/common/eal_common_dev.c > >>>>>> b/lib/librte_eal/common/eal_common_dev.c > >>>>>> index 9e4f09d..311eef5 100644 > >>>>>> --- a/lib/librte_eal/common/eal_common_dev.c > >>>>>> +++ b/lib/librte_eal/common/eal_common_dev.c > >>>>>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device > >>>>>> *dev, const void *_name) > >>>>>> * those devargs shouldn't be removed manually anymore. > >>>>>> */ > >>>>>> > >>>>>> + dev->devargs = da; > >>>>>> ret = dev->bus->plug(dev); > >>>>>> if (ret > 0) > >>>>>> ret = -ENOTSUP; > >>>>>> > >>>>> > >>>>> Hello Somnath, > >>>>> > >>>>> On a surface level, the fix does not seem correct. > >>>>> The comment > >>>>> > >>>>> "Since there is a matching device, it is now its responsibility > >>>>> to manage the devargs we've just inserted. From this point, > >>>>> those devargs should'nt be removed manually anymore." > >>>>> > >>>>> means that the err_devarg label is not correct, on further error in the > >>>>> function, returning the error code and cleaning up the device is > >>>>> sufficient. > >>>>> > >>>>> Setting the devargs for a device is the responsibility of the bus scan > >>>>> function. > >>>>> In the PCI bus for example, this is done in pci_name_set(), called once > >>>>> a device name is fully qualified > >>>>> after scanning the system and thus being able to match a devargs to the > >>>>> device name. > >>>>> > >>>>> Can you please give more information about the device bus, and maybe > >>>>> trace the path taken > >>>>> by the line "ret = da->bus->scan();" a few lines above your edit? If > >>>>> your dev->devargs is not set > >>>>> afterward, it seems the bug would be there. > >>>>> > >>>> Sure, here is the stack trace from the pci_name_set() > >>>> pci_name_set (dev=0x570eab0) at > >>>> /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65 > >>>> 65 dev->name, sizeof(dev->name)); > >>>> (gdb) p /x *dev > >>>> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next = > >>>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus = > >>>> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus > >>>> = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000, > >>>> vendor_id = 0x14e4, > >>>> device_id = 0x16dc, subsystem_vendor_id = 0x14e4, > >>>> subsystem_device_id = 0x16d7}......... > >>>> (gdb) bt > >>>> #0 pci_name_set (dev=0x570eab0) at > >>>> /root/dpdk19.11/drivers/bus/pci/pci_common.c:65 > >>>> #1 0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230 > >>>> "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at > >>>> /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305 > >>>> #2 0x0000000000458188 in rte_pci_scan () at > >>>> /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488 > >>>> #3 0x00000000004b9530 in local_dev_probe (devargs=0x527cc10 > >>>> "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at > >>>> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158 > >>>> #4 0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10 > >>>> "0000:06:02.0,representor=[1]") at > >>>> /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227 > >>>> #5 0x00000000033d196a in netdev_dpdk_process_devargs > >>>> (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]", > >>>> errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798 > >>>> #6 0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880, > >>>> args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921 > >>>> #7 0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880, > >>>> args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494 > >>>> #8 0x00000000031e9a4e in iface_set_netdev_config > >>>> (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at > >>>> vswitchd/bridge.c:2015 > >>>> #9 0x00000000031e9bd1 in iface_do_create (br=0x52801d0, > >>>> iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8, > >>>> errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049 > >>>> #10 0x00000000031e9da6 in iface_create (br=0x52801d0, > >>>> iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100 > >>>> #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0, > >>>> wanted_ports=0x52802b0, with_requested_port=false) at > >>>> vswitchd/bridge.c:1164 > >>>> #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0, > >>>> wanted_ports=0x52802b0) at vswitchd/bridge.c:1180 > >>>> #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at > >>>> vswitchd/bridge.c:893 > >>>> #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324 > >>>> #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at > >>>> vswitchd/ovs-vswitchd.c:127 > >>>> > >>>>> It could be the bus that is not properly implemented, the devargs name > >>>>> not fully qualified, a special path taken by a vdev (I'm not sure what > >>>>> the ovsbr0 device is for example), or a specific representor pluging > >>>>> function, difficult to pinpoint exactly right now. > >>>> As you can see , the DBDF along with the 'representor=[1]' is passed > >>>> all the way down to local_dev_probe(). However when i traced further > >>>> inside pci_name_set() =>pci_devargs_lookup() > >>>> > >>>> static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) > >>>> { > >>>> struct rte_devargs *devargs; > >>>> struct rte_pci_addr addr; > >>>> > >>>> RTE_EAL_DEVARGS_FOREACH("pci", devargs) { > >>>> devargs->bus->parse(devargs->name, &addr); > >>>> ======================================> pci_parse() [1] > >>>> > >>>> if (!rte_pci_addr_cmp(&dev->addr, &addr)) > >>>> return devargs; > >>>> } > >>>> return NULL; > >>>> } > >>>> > >>>> [1]: This is the value of dev_args at this point in time - > >>>> p *devargs > >>>> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>}, > >>>> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name > >>>> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00 > >>>> "representor=[1]", drv_str = 0x56eec00 "representor=[1]"}, > >>>> bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str = > >>>> 0x0, data = 0x0} > >>>> > >>>> It has the 'representor' keyword still intact, but what gets passed to > >>>> pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0", > >>>> addr=0x7ffdb46ed180) i.e just the original DBDF of the function > >>>> that was already probed once during OVS-DPDK init... I'm not sure if > >>>> that is the reason , but eventually it walks the entire for_each loop > >>>> and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this > >>>> is where the 'devargs' value passed is getting lost > >>>> Hope that gives what you were looking for ? > >>>> > >>>> > >>>> Thanks > >>>> Som > >>>> > >>> > >>> Hello, > >>> > >>> Yes, thank you for the thorough response. > >>> Your second device has the BDF 06:02.01 while its parent has 06:02.00. > >>> The parent device already matched the devargs and its rte_device should > >>> already point to it. > >>> > >>> I think it is not correct to force link a parent device devargs to > >>> virtual functions spawned from it. > >>> The devargs is NULL, because there are none that were used to spawn the > >>> virtual functions. > >>> > >>> Now, can you describe your overarching issue more precisely? You are able > >>> to call rte_dev_probe twice on the same virtual function, because the > >>> check against it does not work without the devargs being linked? If > >>> that's correct, this is the real bug IMO, and there could be other ways > >>> to mitigate this. > >>> > >>> Thanks > >>> Gaetan > > > > On 07/02/2020 06:52, Somnath Kotur wrote: > > HI Gaetan, > > > > > > On Thu, Feb 6, 2020 at 9:36 AM Somnath Kotur <somnath.ko...@broadcom.com> > > wrote: > >> > >> Hello Gaetan, > >> Sorry my bad, i think i captured it for the wrong BDF ... > >> So the overarching goal here is that i want to create/add a > >> representor port for 06:02.01 using 06:02:00 as the backing/parent > >> device on the vswitch. > >> > >> I recaptured the debug with 06:02:00 > >> > >> Breakpoint 1, pci_name_set (dev=0x5d44e00) at > >> /root/dpdk-19.11/drivers/bus/pci/pci_common.c:65 > >> 65 dev->name, sizeof(dev->name)); > >> (gdb) p /x *dev > >> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next = > >> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus = > >> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus > >> = 0x6, devid = 0x2, function = 0x0}, id = {class_id = 0x20000, > >> vendor_id = 0x14e4, > >> device_id = 0x16dc, subsystem_vendor_id = 0x14e4, > >> subsystem_device_id = 0x16d7} > >> > >> devargs = pci_devargs_lookup(dev); > >> > >> p *devargs > >> $5 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>}, > >> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name > >> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0 > >> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"}, > >> bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str = > >> 0x0, data = 0x0} > >> > >> So as you can see here , this time the devargs is *not NULL* and has > >> the value of 'representor=[1]' > >> > >> local_dev_probe (devargs=0x5cefec0 "0000:06:02.0,representor=[1]", > >> new_dev=0x7ffd3e872758) at > >> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:162 > >> 162 dev = da->bus->find_device(NULL, cmp_dev_name, da->name); > >> (gdb) p *da > >> $7 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>}, > >> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name > >> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0 > >> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"}, > >> bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str = > >> 0x0, data = 0x0} > >> > >> (gdb) p *dev > >> $8 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5c6df00 > >> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00 > >> <rte_pci_bus>, numa_node = 0, devargs = 0x0} > >> > >> So it appears that 'find_device' is returning NULL devargs ? > >> > >> I found that indeed is the case in pci_find_device() > >> > >> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>, > >> data=0x5bb53f8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426 > >> 426 return &pdev->device; > >> > >> gdb) p pdev->device > >> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5b14f00 > >> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00 > >> <rte_pci_bus>, numa_node = 0, devargs = 0x0} > >> > > Any updates on this ? > > My thoughts on this are just as I'd suspected / suggested earlier the > > parent device (06:02:0) gets probed once > > during ovs-vswitchd bring up right after these 2 cmds > > > > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true > > # Now start ovs-vswitchd (--no-ovsdb-server, since it is already started) > > ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start > > > > After this point the pci probe was invoked on the parent device, but > > without any devargs hence it is NULL. > > > > Now when the 'ovs add-port' cmd is issued to add a representor port > > for example say 06:02:01 (using 06:02:00 as parent device) > > ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk > > options:dpdk-devargs=0000:06:02.0,representor=[1] > > > > What happens is > > > > *before* pci_name_set() > > p dev > > $1 = (struct rte_pci_device *) 0x55bd6b0 > > (gdb) p /x *dev > > $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next = > > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus = > > 0x3a59a00, numa_node = 0x0, > > devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2, > > function = 0x0}.......} > > > > *after* pci_name_set() > > > > p /x *dev > > $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next = > > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus = > > 0x3a59a00, numa_node = 0x0, > > devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2, > > function = 0x0}......} > > > > As you can see, 'devargs' for the pci_device is now populated .. > > > > But if you go further down in pci_scan_one() > > > > pci_scan_one (dirname=0x7ffea330a5e0 > > "/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0) > > at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351 > > 351 if > > (!rte_dev_is_probed(&dev2->device)) { > > (gdb) p dev2 > > $5 = (struct rte_pci_device *) 0x54de5e0 > > (gdb) p /x *dev2 > > $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device = > > {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver = > > 0x3a6b7f0, > > bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = > > 0x0, bus = 0x6, devid = 0x2, function = 0x0} > > > > we hit the pci_device ( 0x54de5e0) that was created during the first > > time probe of the parent device ? > > Since it is already probed, it goes to line 381 where it does frees > > the just allocated 'pci_device' inside pci_scan_one() via 'free(dev)' > > > > As you can see this pci_device does not have it's devargs set (rightly > > so as initially there were no devargs for this device) > > > > But as shown in the stack trace in my previous reply, when > > pci_find_device() walks the rte_pci_devices list , it finds this > > earlier probed device (without devargs) > > > > Alright, I think you are right, this is what is happening. > The issue IMO, is that the PCI scan is thus hitting an already existing > device, but we have > missed the case where the new device has more info that the previous one > (linked devargs). > That is correct
> > pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>, > > data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426 > > 426 return &pdev->device; > > (gdb) p pdev > > $15 = (struct rte_pci_device *) 0x54de5e0 > > > > (gdb) p /x *pdev > > $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device = > > {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver = > > 0x3a6b7f0, > > bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = > > 0x0, bus = 0x6, devid = 0x2, function = 0x0}, > > > > Hope this explains why the pci_device has NULL devargs at this point > > and how my fix to set it at this point helps solve the problem? > > > > Please let me know your thoughts > > > > I think the proper fix is instead to have a clean update during the PCI scan. > The proper way is to keep the old device, but override its fields as new info > was found. Agreed > > Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be > updated for consistency. > > My issue with your patch is that I think this issue is very specific to the > PCI bus and the capabilities > of some devices on it. It would be better to have a fully compliant scan + > probe ops considering > the supported capabilities, instead of forcing this on all buses. Sure, so i'm guessing you meant something like this ? dex 740a2cd..92a41c2 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -377,6 +377,8 @@ */ RTE_LOG(ERR, EAL, "Unexpected device scan at %s!\n", filename); + else if (dev2->device.devargs != dev->device.devargs) + pci_name_set(dev2); } free(dev); } Which is basically checking for devargs mismatch between the 'new' device and a match with an already probed device and return the old device while adjusting the new info (devargs) Is that OK? > > I'm wondering whether this can happen with an already existing devargs? If > so, is it more correct to keep > the new one, or ignore the probe, free the new devargs and report an error? > If the former, > please clean up properly the devargs using rte_devargs_remove() (before > calling pci_name_set() of course). > > > Thanks > > Som > > > > > > > > > >> Som > >> > > BR, > Gaetan