On Mon, Feb 10, 2020 at 8:04 PM Gaetan Rivet <gr...@u256.net> wrote:
>
> Hi Somnath,
>
> Reformating the mails, to keep with the inner-posting (mixing top-posting and 
> inner-posting
> makes it hard to follow). See the end of the mail.
>
>
> >> On Wed, Feb 5, 2020 at 2:47 PM Gaetan Rivet <gr...@u256.net> wrote:
> >>>
> >>> On 05/02/2020 09:52, Somnath Kotur wrote:
> >>>> Hello Gaetan,
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <gr...@u256.net> wrote:
> >>>>>
> >>>>> On 04/02/2020 10:15, Somnath Kotur wrote:
> >>>>>> As per the comments in this code section, "since there is a matching 
> >>>>>> device,
> >>>>>> it is now its responsibility to manage the devargs we've just 
> >>>>>> inserted."
> >>>>>> But the matching device ptr's devargs is still uninitialized or not 
> >>>>>> pointing
> >>>>>> to the newest dev_args that were passed as a parameter to 
> >>>>>> local_dev_probe().
> >>>>>> This is needed particularly in the case when *probe is called again* 
> >>>>>> on an
> >>>>>> already probed device(the parent device for the representor) as part 
> >>>>>> of adding
> >>>>>> a representor port to an OVS switch(OVS-DPDK) like so:
> >>>>>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
> >>>>>> options:dpdk-devargs=0000:06:02.0,representor=[1]
> >>>>>>
> >>>>>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> >>>>>>
> >>>>>> Signed-off-by: Somnath Kotur <somnath.ko...@broadcom.com>
> >>>>>> ---
> >>>>>>     lib/librte_eal/common/eal_common_dev.c | 1 +
> >>>>>>     1 file changed, 1 insertion(+)
> >>>>>>
> >>>>>> diff --git a/lib/librte_eal/common/eal_common_dev.c 
> >>>>>> b/lib/librte_eal/common/eal_common_dev.c
> >>>>>> index 9e4f09d..311eef5 100644
> >>>>>> --- a/lib/librte_eal/common/eal_common_dev.c
> >>>>>> +++ b/lib/librte_eal/common/eal_common_dev.c
> >>>>>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device 
> >>>>>> *dev, const void *_name)
> >>>>>>          * those devargs shouldn't be removed manually anymore.
> >>>>>>          */
> >>>>>>
> >>>>>> +     dev->devargs = da;
> >>>>>>         ret = dev->bus->plug(dev);
> >>>>>>         if (ret > 0)
> >>>>>>                 ret = -ENOTSUP;
> >>>>>>
> >>>>>
> >>>>> Hello Somnath,
> >>>>>
> >>>>> On a surface level, the fix does not seem correct.
> >>>>> The comment
> >>>>>
> >>>>>       "Since there is a matching device, it is now its responsibility
> >>>>>        to manage the devargs we've just inserted. From this point,
> >>>>>        those devargs should'nt be removed manually anymore."
> >>>>>
> >>>>> means that the err_devarg label is not correct, on further error in the
> >>>>> function, returning the error code and cleaning up the device is 
> >>>>> sufficient.
> >>>>>
> >>>>> Setting the devargs for a device is the responsibility of the bus scan 
> >>>>> function.
> >>>>> In the PCI bus for example, this is done in pci_name_set(), called once 
> >>>>> a device name is fully qualified
> >>>>> after scanning the system and thus being able to match a devargs to the 
> >>>>> device name.
> >>>>>
> >>>>> Can you please give more information about the device bus, and maybe 
> >>>>> trace the path taken
> >>>>> by the line "ret = da->bus->scan();" a few lines above your edit? If 
> >>>>> your dev->devargs is not set
> >>>>> afterward, it seems the bug would be there.
> >>>>>
> >>>> Sure, here is the stack trace from the pci_name_set()
> >>>> pci_name_set (dev=0x570eab0) at
> >>>> /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
> >>>> 65                              dev->name, sizeof(dev->name));
> >>>> (gdb) p /x *dev
> >>>> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> >>>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> >>>> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> >>>> = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
> >>>> vendor_id = 0x14e4,
> >>>>       device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> >>>> subsystem_device_id = 0x16d7}.........
> >>>> (gdb) bt
> >>>> #0  pci_name_set (dev=0x570eab0) at
> >>>> /root/dpdk19.11/drivers/bus/pci/pci_common.c:65
> >>>> #1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
> >>>> "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
> >>>> /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
> >>>> #2  0x0000000000458188 in rte_pci_scan () at
> >>>> /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
> >>>> #3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
> >>>> "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
> >>>> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
> >>>> #4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
> >>>> "0000:06:02.0,representor=[1]") at
> >>>> /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
> >>>> #5  0x00000000033d196a in netdev_dpdk_process_devargs
> >>>> (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
> >>>> errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
> >>>> #6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
> >>>> args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
> >>>> #7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
> >>>> args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
> >>>> #8  0x00000000031e9a4e in iface_set_netdev_config
> >>>> (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
> >>>> vswitchd/bridge.c:2015
> >>>> #9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
> >>>> iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
> >>>> errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
> >>>> #10 0x00000000031e9da6 in iface_create (br=0x52801d0,
> >>>> iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
> >>>> #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
> >>>> wanted_ports=0x52802b0, with_requested_port=false) at
> >>>> vswitchd/bridge.c:1164
> >>>> #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
> >>>> wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
> >>>> #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
> >>>> vswitchd/bridge.c:893
> >>>> #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
> >>>> #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
> >>>> vswitchd/ovs-vswitchd.c:127
> >>>>
> >>>>> It could be the bus that is not properly implemented, the devargs name 
> >>>>> not fully qualified, a special path taken by a vdev (I'm not sure what 
> >>>>> the ovsbr0 device is for example), or a specific representor pluging 
> >>>>> function, difficult to pinpoint exactly right now.
> >>>> As you can see , the DBDF along with the 'representor=[1]' is passed
> >>>> all the way down to local_dev_probe(). However when i traced further
> >>>> inside pci_name_set() =>pci_devargs_lookup()
> >>>>
> >>>> static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
> >>>> {
> >>>>           struct rte_devargs *devargs;
> >>>>           struct rte_pci_addr addr;
> >>>>
> >>>>           RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
> >>>>                   devargs->bus->parse(devargs->name, &addr);
> >>>> ======================================> pci_parse() [1]
> >>>>
> >>>>                   if (!rte_pci_addr_cmp(&dev->addr, &addr))
> >>>>                           return devargs;
> >>>>           }
> >>>>           return NULL;
> >>>> }
> >>>>
> >>>> [1]: This is the value of dev_args at this point in time -
> >>>> p *devargs
> >>>> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> >>>> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> >>>> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
> >>>> "representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
> >>>>     bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> >>>> 0x0, data = 0x0}
> >>>>
> >>>> It has the 'representor' keyword still intact, but what gets passed to
> >>>> pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
> >>>> addr=0x7ffdb46ed180) i.e just the original DBDF of the function
> >>>> that was already probed once during OVS-DPDK init... I'm not sure if
> >>>> that is the reason , but eventually it walks the entire for_each loop
> >>>> and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
> >>>> is where the 'devargs' value passed is getting lost
> >>>> Hope that gives what you were looking for ?
> >>>>
> >>>>
> >>>> Thanks
> >>>> Som
> >>>>
> >>>
> >>> Hello,
> >>>
> >>> Yes, thank you for the thorough response.
> >>> Your second device has the BDF 06:02.01 while its parent has 06:02.00. 
> >>> The parent device already matched the devargs and its rte_device should 
> >>> already point to it.
> >>>
> >>> I think it is not correct to force link a parent device devargs to 
> >>> virtual functions spawned from it.
> >>> The devargs is NULL, because there are none that were used to spawn the 
> >>> virtual functions.
> >>>
> >>> Now, can you describe your overarching issue more precisely? You are able 
> >>> to call rte_dev_probe twice on the same virtual function, because the 
> >>> check against it does not work without the devargs being linked? If 
> >>> that's correct, this is the real bug IMO, and there could be other ways 
> >>> to mitigate this.
> >>>
> >>> Thanks
> >>> Gaetan
> >
> > On 07/02/2020 06:52, Somnath Kotur wrote:
> > HI Gaetan,
> >
> >
> > On Thu, Feb 6, 2020 at 9:36 AM Somnath Kotur <somnath.ko...@broadcom.com> 
> > wrote:
> >>
> >> Hello Gaetan,
> >>              Sorry my bad, i think i captured it for the wrong BDF ...
> >> So the overarching goal here is that i want to create/add a
> >> representor port for 06:02.01 using 06:02:00 as the backing/parent
> >> device on the vswitch.
> >>
> >> I recaptured the debug with 06:02:00
> >>
> >> Breakpoint 1, pci_name_set (dev=0x5d44e00) at
> >> /root/dpdk-19.11/drivers/bus/pci/pci_common.c:65
> >> 65                              dev->name, sizeof(dev->name));
> >> (gdb) p /x *dev
> >> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> >> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> >> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> >> = 0x6, devid = 0x2, function = 0x0}, id = {class_id = 0x20000,
> >> vendor_id = 0x14e4,
> >>      device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> >> subsystem_device_id = 0x16d7}
> >>
> >> devargs = pci_devargs_lookup(dev);
> >>
> >>   p *devargs
> >> $5 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> >> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> >> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
> >> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
> >>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> >> 0x0, data = 0x0}
> >>
> >> So as you can see here , this time the devargs is *not NULL* and has
> >> the value of 'representor=[1]'
> >>
> >> local_dev_probe (devargs=0x5cefec0 "0000:06:02.0,representor=[1]",
> >> new_dev=0x7ffd3e872758) at
> >> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:162
> >> 162             dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
> >> (gdb) p *da
> >> $7 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> >> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> >> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
> >> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
> >>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> >> 0x0, data = 0x0}
> >>
> >> (gdb) p *dev
> >> $8 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5c6df00
> >> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
> >> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
> >>
> >> So it appears that 'find_device' is returning NULL devargs ?
> >>
> >> I found that indeed is the case in pci_find_device()
> >>
> >> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> >> data=0x5bb53f8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> >> 426                             return &pdev->device;
> >>
> >> gdb) p pdev->device
> >> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5b14f00
> >> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
> >> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
> >>
> > Any updates on this ?
> > My thoughts on this are just as I'd suspected / suggested earlier the
> > parent device (06:02:0) gets probed once
> > during ovs-vswitchd bring up right after these 2 cmds
> >
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > # Now start ovs-vswitchd (--no-ovsdb-server, since it is already started)
> > ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start
> >
> > After this point the pci probe was invoked on the parent device, but
> > without any devargs hence it is NULL.
> >
> > Now when the 'ovs add-port' cmd is issued to add a representor port
> > for example say 06:02:01 (using 06:02:00 as parent device)
> > ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk
> > options:dpdk-devargs=0000:06:02.0,representor=[1]
> >
> > What happens is
> >
> > *before* pci_name_set()
> > p dev
> > $1 = (struct rte_pci_device *) 0x55bd6b0
> > (gdb) p /x *dev
> > $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> > 0x3a59a00, numa_node = 0x0,
> >      devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2,
> > function = 0x0}.......}
> >
> > *after* pci_name_set()
> >
> > p /x *dev
> > $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> > 0x3a59a00, numa_node = 0x0,
> >      devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2,
> > function = 0x0}......}
> >
> > As you can see, 'devargs' for the pci_device is now populated ..
> >
> > But if you go further down in pci_scan_one()
> >
> > pci_scan_one (dirname=0x7ffea330a5e0
> > "/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0)
> >      at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351
> > 351                                     if 
> > (!rte_dev_is_probed(&dev2->device)) {
> > (gdb) p dev2
> > $5 = (struct rte_pci_device *) 0x54de5e0
> > (gdb) p /x *dev2
> > $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> > {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> > 0x3a6b7f0,
> >      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> > 0x0, bus = 0x6, devid = 0x2, function = 0x0}
> >
> > we hit the pci_device ( 0x54de5e0) that was created during the first
> > time probe of the parent device ?
> > Since it is already probed, it goes to line 381 where it does frees
> > the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'
> >
> > As you can see this pci_device does not have it's devargs set (rightly
> > so as initially there were no devargs for this device)
> >
> > But as shown in the stack trace in my previous reply, when
> > pci_find_device() walks the rte_pci_devices list , it finds this
> > earlier probed device (without devargs)
> >
>
> Alright, I think you are right, this is what is happening.
> The issue IMO, is that the PCI scan is thus hitting an already existing 
> device, but we have
> missed the case where the new device has more info that the previous one 
> (linked devargs).
>
That is correct

> > pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> > data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> > 426                             return &pdev->device;
> > (gdb) p pdev
> > $15 = (struct rte_pci_device *) 0x54de5e0
> >
> > (gdb) p /x *pdev
> > $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> > {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> > 0x3a6b7f0,
> >      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> > 0x0, bus = 0x6, devid = 0x2, function = 0x0},
> >
> > Hope this explains why the pci_device has NULL devargs at this point
> > and how my fix to set it at this point helps solve the problem?
> >
> > Please let me know your thoughts
> >
>
> I think the proper fix is instead to have a clean update during the PCI scan.
> The proper way is to keep the old device, but override its fields as new info 
> was found.

Agreed
>
> Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be 
> updated for consistency.
>
> My issue with your patch is that I think this issue is very specific to the 
> PCI bus and the capabilities
> of some devices on it. It would be better to have a fully compliant scan + 
> probe ops considering
> the supported capabilities, instead of forcing this on all buses.
Sure, so i'm guessing you meant something like this ?

dex 740a2cd..92a41c2 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -377,6 +377,8 @@
                                                 */
                                                RTE_LOG(ERR, EAL,
"Unexpected device scan at %s!\n",
                                                        filename);
+                                       else if (dev2->device.devargs
!= dev->device.devargs)
+                                               pci_name_set(dev2);
                                }
                                free(dev);
                        }

Which is basically checking for devargs mismatch between the 'new'
device and a match with an
already probed device and  return the old device while adjusting the
new info (devargs)
Is that OK?

>
> I'm wondering whether this can happen with an already existing devargs? If 
> so, is it more correct to keep
> the new one, or ignore the probe, free the new devargs and report an error? 
> If the former,
> please clean up properly the devargs using rte_devargs_remove() (before 
> calling pci_name_set() of course).
>
> > Thanks
> > Som
> >
> >
> >
> >
> >> Som
> >>
>
> BR,
> Gaetan

Reply via email to