Hi. I am testing a build of openvswitch with DPDK that we package for
our debian linux distribution called 'openvswitch-switch-dpdk' which
is the normal debian package with the ovs-vswitchd used within the
debian alternatives system (<- not too important). We are trying to
support the intel niantic and mellanox ConnectX3-Pro. We have seen no
issues with the niantic, however with the Mellanox card, the
ovs-vswitchd daemon fails if started in it's init script (the standard
init script in debian/ directory) to add the DPDK ports, I get this:
4f412dee-e2e5-42e5-be7e-dbee94c42652
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "dpdk0"
Interface "dpdk0"
type: dpdk
error: "could not open network device dpdk0 (Cannot
allocate memory)"
Port "dpdk1"
Interface "dpdk1"
type: dpdk
error: "could not open network device dpdk1 (Cannot
allocate memory)"
ovs_version: "2.5.1"
There wasn't anything particularly enlightening in the syslog:
2016-07-11T19:28:38.783Z|00015|dpdk|INFO|Interface dpdk1 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.783Z|00016|dpdk|ERR|Interface dpdk1(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.783Z|00017|bridge|WARN|could not open network
device dpdk1 (Cannot allocate memory)
2016-07-11T19:28:38.784Z|00018|bridge|INFO|bridge br0: added interface
br0 on port 65534
2016-07-11T19:28:38.795Z|00019|dpdk|INFO|Interface dpdk0 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.795Z|00020|dpdk|ERR|Interface dpdk0(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.795Z|00021|bridge|WARN|could not open network
device dpdk0 (Cannot allocate memory)
2016-07-11T19:28:38.795Z|00022|bridge|INFO|bridge br0: using datapath
ID 000036b6cbb99b41
2016-07-11T19:28:38.795Z|00023|connmgr|INFO|br0: added service
controller "punix:/var/run/openvswitch/br0.mgmt"
2016-07-11T19:28:38.888Z|00024|dpdk|INFO|Interface dpdk1 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.888Z|00025|dpdk|ERR|Interface dpdk1(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.888Z|00026|bridge|WARN|could not open network
device dpdk1 (Cannot allocate memory)
2016-07-11T19:28:38.899Z|00027|dpdk|INFO|Interface dpdk0 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.899Z|00028|dpdk|ERR|Interface dpdk0(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.899Z|00029|bridge|WARN|could not open network
device dpdk0 (Cannot allocate memory)
2016-07-11T19:28:38.902Z|00030|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.1
2016-07-11T19:28:43.767Z|00031|memory|INFO|247496 kB peak resident set
size after 10.2 seconds
2016-07-11T19:28:43.767Z|00032|memory|INFO|handlers:17 ports:1
revalidators:7 rules:5
This error doesn't occur with the same versions of ovs/dpdk compiled
and run as in INSTALL.DPDK.md. However as I will explain later there
is a difference between the way you run it when testing according to
INSTALL.DPDK.md and doing distribution-type testing.
Since this does not occur with niantic I looked for mellanox log
errors (I compiled the PMD with the DBG option):
# journalctl --full | grep -i mlx
Jul 11 13:27:28 bl460gen9-04 kernel: mlx_compat: module verification
failed: signature and/or required key missing - tainting kernel
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core
driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: device is working in
RoCE mode: Roce V1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs
is not supported by the devicegid_type 0 was chosen instead
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
speed is 8.0GT/s, device supports 8.0GT/s
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
width is x8, device supports x8
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA
Ethernet driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 256 TX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 16 RX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Initializing port
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered
PHC clock
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 256 TX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 16 RX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Initializing port
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5:
renamed from eth3
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4:
renamed from eth2
Jul 11 13:27:28 bl460gen9-04 logger[930]: openibd: start(): Detected
'mlx4_core' loaded with 'log_num_mgm_entry_size=-10' instead of
'log_num_mgm_entry_size=-7' as configured in '', calling stop...
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: removed PHC
Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core
driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: device is working in
RoCE mode: Roce V1
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs
is not supported by the devicegid_type 0 was chosen instead
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
speed is 8.0GT/s, device supports 8.0GT/s
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
width is x8, device supports x8
Jul 11 13:27:37 bl460gen9-04 kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib:
Mellanox ConnectX InfiniBand driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0:
mlx4_ib_add: allocated counter index 1 for port 1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0:
mlx4_ib_add: allocated counter index 3 for port 2
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA
Ethernet driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 256 TX rings
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 16 RX rings
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Initializing port
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered
PHC clock
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4:
renamed from eth0
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 256 TX rings
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 16 RX rings
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Initializing port
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5:
renamed from eth0
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: eth4: Link Up
Jul 11 13:27:59 bl460gen9-04 logger[1527]: openibd: Set node_desc for
mlx4_0: bl460gen9-04 HCA-1
Jul 11 13:28:38 bl460gen9-04 ovs-vswitchd[2004]: EAL: probe driver:
15b3:1007 librte_pmd_mlx4
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: EAL: probe
driver: 15b3:1007 librte_pmd_mlx4
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5430: mlx4_pci_devinit():
using driver device index 0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5452: mlx4_pci_devinit():
checking device "mlx4_0"
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5463: mlx4_pci_devinit():
PCI information matches, using device "mlx4_0" (VF: false)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5483: mlx4_pci_devinit():
device opened
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5486: mlx4_pci_devinit(): 2
port(s) detected
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit():
using port 1 (00000001)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit():
port 1 is not active: "down" (1)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit():
device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit():
maximum RSS indirection table size: 256
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit():
checksum offloading is supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2
tunnel checksum offloads are supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit():
port 1 MAC address is 24:be:05:c0:d2:a0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit():
port 1 ifname is "eth4"
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit():
port 1 MTU is 1500
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit():
forcing Ethernet interface up
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: frag:0 - size:1522
prefix:0 stride:1536
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: Setting RSS
context tunnel type to RSS on inner headers
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit():
using port 2 (00000002)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit():
port 2 is not active: "down" (1)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit():
device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit():
maximum RSS indirection table size: 256
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit():
checksum offloading is supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2
tunnel checksum offloads are supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit():
port 2 MAC address is 24:be:05:c0:d2:a8
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit():
port 2 ifname is "eth5"
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit():
port 2 MTU is 1500
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit():
forcing Ethernet interface up
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: frag:0 - size:1522
prefix:0 stride:1536
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: Setting RSS
context tunnel type to RSS on inner headers
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure():
0x840248: TX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure():
0x840248: RX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x840248: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 kernel: Modules linked in: tun
openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c crc32c_generic nfsd
auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc rdma_ucm(OE)
ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE)
ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) inet_lro mlx4_ib(OE) ib_sa(OE)
mlx4_en(OE) ib_mad(OE) ptp ib_core(OE) ib_addr(OE) ib_netlink(OE)
pps_core mlx4_core(OE) mlx_compat(OE) x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel vfat fat kvm iTCO_wdt irqbypass
iTCO_vendor_support crc32_pclmul hmac drbg ansi_cprng aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper mgag200 cryptd ttm
pcspkr drm_kms_helper evdev drm sb_edac i2c_algo_bit edac_core
fb_sys_fops syscopyarea sysfillrect sysimgblt lpc_ich i2c_core
mfd_core hpilo hpwdt ioatdma dca wmi ipmi_si ipmi_msghandler
Jul 11 13:28:38 bl460gen9-04 kernel: pcc_cpufreq acpi_cpufreq
processor acpi_power_meter button knem(OE) autofs4 ext4 crc16 mbcache
jbd2 usb_storage hid_generic usbhid hid sd_mod sg crc32c_intel
xhci_pci hpsa uhci_hcd ehci_pci xhci_hcd ehci_hcd scsi_transport_sas
scsi_mod usbcore be2net usb_common [last unloaded: mlx_compat]
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure():
0x83c200: TX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure():
0x83c200: RX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x83c200: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x840248: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x83c200: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
This is after rebooting a system. The kicker is if I launch ovs-vsctl
manually from the shell without --detach:
# ovs-vswitchd --dpdk -c 0x3 -- unix:/var/run/openvswitch/db.sock
-vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-
file=/var/log/openvswitch/ovs-vswitchd.log
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid
I get no errors - This is the exact same binary, and the command line
is copied from `ps -ef | grep ovs-vswitchd` after a failed run,
without the '--monitor --detach' options. I have a happy bridge at
least in the sense that there aren't any errors given by ovs-vsctl and
nothing bad in the logs, as in no 'error:' field in ovs-vsctl show and
no error from
# ovs-vsctl add-port br0 dpdkN -- set interface dpdkN type=dpdk
# ovs-vsctl show
4f412dee-e2e5-42e5-be7e-dbee94c42652
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "dpdk0"
Interface "dpdk0"
type: dpdk
Port "dpdk1"
Interface "dpdk1"
type: dpdk
ovs_version: "2.5.1"
So I looked into vswitchd/ovs-vswitchd.c and thought that perhaps the
issue had to do with daemonizing after rte_eal_init() possibly killing
child threads spawned by rte_eal_init (?) and made the following
patch:
Index: openvswitch/vswitchd/ovs-vswitchd.c
===================================================================
--- openvswitch.orig/vswitchd/ovs-vswitchd.c
+++ openvswitch/vswitchd/ovs-vswitchd.c
@@ -58,6 +58,16 @@ static bool want_mlockall;
static unixctl_cb_func ovs_vswitchd_exit;
+#define DPDK_OPTS_SIZ 2048
+/*
+ * variables/function for saving DPDK options off of the command line,
+ * to run dpdk_init _after_ daemonize is called.
+ */
+char *dpdk_argv[DPDK_OPTS_SIZ];
+int dpdk_argc;
+static int save_dpdk_opts(int argc, char **argv);
+
+
static char *parse_options(int argc, char *argv[], char **unixctl_path);
OVS_NO_RETURN static void usage(void);
@@ -71,7 +81,8 @@ main(int argc, char *argv[])
int retval;
set_program_name(argv[0]);
- retval = dpdk_init(argc,argv);
+
+ retval = save_dpdk_opts(argc, argv);
if (retval < 0) {
return retval;
}
@@ -97,6 +108,12 @@ main(int argc, char *argv[])
#endif
}
+ retval = dpdk_init(dpdk_argc, dpdk_argv);
+ if (retval < 0) {
+ return retval;
+ }
+
+
retval = unixctl_server_create(unixctl_path, &unixctl);
if (retval) {
exit(EXIT_FAILURE);
@@ -140,6 +157,38 @@ main(int argc, char *argv[])
return 0;
}
+
+static int
+save_dpdk_opts(int argc, char *argv[])
+{
+ int i=0;
+
+ memset(dpdk_argv, 0, DPDK_OPTS_SIZ*sizeof(char *));
+ dpdk_argc=0;
+
+ if (strcmp(argv[1], "--dpdk"))
+ return 0;
+
+ dpdk_argv[0] = argv[0];
+ dpdk_argc++;
+
+ for(i=1; i < argc; i++) {
+ if (!strcmp(argv[i], "--")) {
+ break;
+ }
+ dpdk_argv[i] = argv[i];
+ dpdk_argc++;
+ }
+
+ if (i < 2) {
+ return -1;
+ }
+
+ argv[i] = argv[0];
+
+ return i;
+}
+
static char *
parse_options(int argc, char *argv[], char **unixctl_pathp)
{
And it miraculously caused the error to go away e.g. the ports stay
after reboots whereas normally if I launch ovs-vswitchd without
--detach, get a good bridge with dpdk{0,1} ports and reboot I get that
above error state again. I have no idea why this might occur. The dpdk
apps all seem to work fine with the mellanox card albiet with a very
noticeable lag as they add the ports in (subjective) comparison with
the niantic card. Other than the fact that the patch works I can't
find any better evidence to substantiate my hypothesis that
daemonizing after rte_eal_init is causing the problem and currently is
just a best guess.
Thanks,
John
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss