On 10/02/2020 15:34, Gaetan Rivet wrote:
Hi Somnath,

Reformating the mails, to keep with the inner-posting (mixing top-posting and 
inner-posting
makes it hard to follow). See the end of the mail.



[...]

Any updates on this ?
My thoughts on this are just as I'd suspected / suggested earlier the
parent device (06:02:0) gets probed once
during ovs-vswitchd bring up right after these 2 cmds

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
# Now start ovs-vswitchd (--no-ovsdb-server, since it is already started)
ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start

After this point the pci probe was invoked on the parent device, but
without any devargs hence it is NULL.

Now when the 'ovs add-port' cmd is issued to add a representor port
for example say 06:02:01 (using 06:02:00 as parent device)
ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk
options:dpdk-devargs=0000:06:02.0,representor=[1]

What happens is

*before* pci_name_set()
p dev
$1 = (struct rte_pci_device *) 0x55bd6b0
(gdb) p /x *dev
$2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
{tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
0x3a59a00, numa_node = 0x0,
     devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2,
function = 0x0}.......}

*after* pci_name_set()

p /x *dev
$3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
{tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
0x3a59a00, numa_node = 0x0,
     devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2,
function = 0x0}......}

As you can see, 'devargs' for the pci_device is now populated ..

But if you go further down in pci_scan_one()

pci_scan_one (dirname=0x7ffea330a5e0
"/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0)
     at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351
351                                     if (!rte_dev_is_probed(&dev2->device)) {
(gdb) p dev2
$5 = (struct rte_pci_device *) 0x54de5e0
(gdb) p /x *dev2
$6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
{next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
0x3a6b7f0,
     bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
0x0, bus = 0x6, devid = 0x2, function = 0x0}

we hit the pci_device ( 0x54de5e0) that was created during the first
time probe of the parent device ?
Since it is already probed, it goes to line 381 where it does frees
the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'

As you can see this pci_device does not have it's devargs set (rightly
so as initially there were no devargs for this device)

But as shown in the stack trace in my previous reply, when
pci_find_device() walks the rte_pci_devices list , it finds this
earlier probed device (without devargs)


Alright, I think you are right, this is what is happening.
The issue IMO, is that the PCI scan is thus hitting an already existing device, 
but we have
missed the case where the new device has more info that the previous one 
(linked devargs).

pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
426                             return &pdev->device;
(gdb) p pdev
$15 = (struct rte_pci_device *) 0x54de5e0

(gdb) p /x *pdev
$16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
{next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
0x3a6b7f0,
     bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
0x0, bus = 0x6, devid = 0x2, function = 0x0},

Hope this explains why the pci_device has NULL devargs at this point
and how my fix to set it at this point helps solve the problem?

Please let me know your thoughts


I think the proper fix is instead to have a clean update during the PCI scan.
The proper way is to keep the old device, but override its fields as new info 
was found.

Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be 
updated for consistency.

My issue with your patch is that I think this issue is very specific to the PCI 
bus and the capabilities
of some devices on it. It would be better to have a fully compliant scan + 
probe ops considering
the supported capabilities, instead of forcing this on all buses.

I'm wondering whether this can happen with an already existing devargs? If so, 
is it more correct to keep
the new one, or ignore the probe, free the new devargs and report an error? If 
the former,
please clean up properly the devargs using rte_devargs_remove() (before calling 
pci_name_set() of course).

Unfortunately, this could probably happen with some devargs. In particular, the 
rte_devargs_insert() function will test whether devargs name are matching 
before inserting a new devargs. There is an ambiguity with the BDF / DBDF 
notation for the PCI devices in particular, which could lead to devargs already 
existing for some devices, but with a new one being inserted.

The fix would be to have the bus parse the device, and compare the binary blob 
resulting from this parsing.
The current bus->parse() API might help, but is currently too limited (no way 
to know how many bytes long the result is).

This is fastidious to have to write this because some leeway was allowed to the 
PCI back in the day. For now at least, mind this possibility when writing your 
fix, those above considerations could maybe be worked out with future API 
evolutions.

Reply via email to