On 11/25/2020 10:01 AM, David Marchand wrote:
On Tue, Nov 24, 2020 at 4:15 PM Timothy Redaelli <tredae...@redhat.com> wrote:

Commit 49b536fc3060 ("eal: load only shared libs from driver plugin 
directories")
introduced a check that any shared library must ends with .so, but it can't
work, at least, on Fedora/CentOS/RHEL since .so symlinks are not installed
when you install dpdk package, but only when you install dpdk-devel package.

This commit adds also a check for .so.ABI_VERSION to check for shared
lib.

See Fedora Packaging Guidelines for more informations:
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_devel_packages

Fixes: 49b536fc3060 ("eal: load only shared libs from driver plugin 
directories")
Cc: bruce.richard...@intel.com
Signed-off-by: Timothy Redaelli <tredae...@redhat.com>
---
  lib/librte_eal/common/eal_common_options.c | 6 ++++--
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index e6f77ad217..e9e2362c53 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -402,8 +402,10 @@ eal_plugindir_init(const char *path)
                 struct stat sb;
                 int nlen = strnlen(dent->d_name, sizeof(dent->d_name));

-               /* check if name ends in .so */
-               if (strcmp(&dent->d_name[nlen - 3], ".so") != 0)
+               /* check if name ends in .so or .so.ABI_VERSION */
+               if (strcmp(&dent->d_name[nlen - 3], ".so") != 0 &&
+                   strcmp(&dent->d_name[nlen - 4 - strlen(ABI_VERSION)],
+                          ".so."ABI_VERSION) != 0)
                         continue;

                 snprintf(sopath, sizeof(sopath), "%s/%s", path, dent->d_name);

A first doubt I had about this patch is about multiple loading of the
same driver, and its effects.
So I first had a look at the main branch current state (using
devtools/test-null.sh) and I can see multiple loading already happens
for statically linked drivers like the bond driver.

$ LD_LIBRARY_PATH=build/lib gdb build/app/dpdk-testpmd -ex "b main"
-ex "run -c 3 --no-huge -m 20 -d build/drivers '--log-level=*:debug'
-a 0:0.0 --vdev net_null1 --vdev net_null2 -- --no-mlockall
--total-num-mbufs=2048 -ia"
...
(gdb) info sharedlibrary
...
0x00007ffff796d5b0  0x00007ffff797d2d6  Yes
/home/dmarchan/dpdk/build/app/../drivers/librte_net_bond.so.21
...
(gdb) b rte_eal_init
(gdb) c
(gdb) finish
...
EAL: open shared lib build/drivers/librte_net_avp.so
EAL: open shared lib build/drivers/librte_net_bond.so
EAL: open shared lib build/drivers/librte_net_ixgbe.so
...
(gdb) set $elm=*(struct shared_driver *)solib_list.tqh_first
(gdb) while $elm
p $elm
set $elm=*(struct shared_driver *)($elm.next.tqe_next)
end
...
$67 = {next = {tqe_next = 0x5c3350, tqe_prev = 0x5c1310}, name =
"build/drivers/librte_net_avp.so", '\000' <repeats 4064 times>,
lib_handle = 0x616870}
$68 = {next = {tqe_next = 0x5c4370, tqe_prev = 0x5c2330}, name =
"build/drivers/librte_net_bond.so", '\000' <repeats 4063 times>,
lib_handle = 0x7ffff7995a80}
$69 = {next = {tqe_next = 0x5c5390, tqe_prev = 0x5c3350}, name =
"build/drivers/librte_net_ixgbe.so", '\000' <repeats 4062 times>,
lib_handle = 0x7ffff7942fc0}
...

I could not confirm the 0x7ffff7995a80 handle points at the first load
of the bond driver.
gdb does not seem to know of the linker internal structures, a bit
hard to tell but the addresses are in the same range, so likely the
same object.

I confirmed the bond constructor only being called once when the bond
driver is loaded because of static link:

$ LD_LIBRARY_PATH=build/lib gdb build/app/dpdk-testpmd -ex "b
vdrvinitfn_pmd_bond_drv" -ex "run -c 3 --no-huge -m 20 -d
build/drivers '--log-level=*:debug' -a 0:0.0 --vdev net_null1 --vdev
net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia"
GNU gdb (GDB) Fedora 8.3.50.20190824-30.fc31
...
Reading symbols from build/app/dpdk-testpmd...
Function "vdrvinitfn_pmd_bond_drv" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (vdrvinitfn_pmd_bond_drv) pending.
Starting program: /home/dmarchan/dpdk/build/app/dpdk-testpmd -c 3
--no-huge -m 20 -d build/drivers '--log-level=*:debug' -a 0:0.0 --vdev
net_null1 --vdev net_null2 -- --no-mlockall --total-num-mbufs=2048 -ia
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".

Breakpoint 1, vdrvinitfn_pmd_bond_drv () at
../drivers/net/bonding/rte_eth_bond_pmd.c:3755
3755    RTE_PMD_REGISTER_VDEV(net_bonding, pmd_bond_drv);
Missing separate debuginfos, use: dnf debuginfo-install
elfutils-libelf-0.179-2.fc31.x86_64 glibc-2.30-13.fc31.x86_64
jansson-2.12-4.fc31.x86_64 libbsd-0.9.1-4.fc31.x86_64
libfdt-1.6.0-1.fc31.x86_64 numactl-libs-2.0.12-3.fc31.x86_64
zlib-1.2.11-20.fc31.x86_64

(gdb) c
Continuing.
...
EAL: open shared lib build/drivers/librte_net_avp.so
EAL: open shared lib build/drivers/librte_net_bond.so
EAL: open shared lib build/drivers/librte_net_ixgbe.so
...
testpmd>


Now, about this patch.
In this test, it has the effect of loading all drivers twice (or
thrice if statically linked).
I see no problem with it since the linker is intelligent enough and
constructors are only called once.


My only remaining doubt is whether we should be looking for
.ABI_VERSION or .ABI_MAJOR extension.
This could be discussed, but in its current form, this patch lgtm.

Acked-by: David Marchand <david.march...@redhat.com>



I also checked the same, yes same file processed twice, with actual name and symlink, but since eal using 'realpath()' before 'dlopen()' the path passed to the 'dlopen()' is exact same for both symlink and actual file.

And 'dlopen()' has following in the man page:
"
If the same shared object is opened again with dlopen(), the same object handle is returned. The dynamic linker maintains reference counts for object handles, so a dynamically loaded shared object is not deallocated until dlclose() has been called on it as many times as dlopen() has succeeded on it. Constructors (see below) are called only when the object is actually loaded into memory (i.e., when the reference count increases to 1).
"

As I run/debug with or without the symlinks, it looks OK with patch,
Tested-by: Ferruh Yigit <ferruh.yi...@intel.com>

Reply via email to