Hi. I am testing a build of openvswitch with DPDK that we package for our debian linux distribution called 'openvswitch-switch-dpdk' which is the normal debian package with the ovs-vswitchd used within the debian alternatives system (<- not too important). We are trying to support the intel niantic and mellanox ConnectX3-Pro. We have seen no issues with the niantic, however with the Mellanox card, the ovs-vswitchd daemon fails if started in it's init script (the standard init script in debian/ directory) to add the DPDK ports, I get this:

4f412dee-e2e5-42e5-be7e-dbee94c42652
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
error: "could not open network device dpdk0 (Cannot allocate memory)"
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
error: "could not open network device dpdk1 (Cannot allocate memory)"
    ovs_version: "2.5.1"



There wasn't anything particularly enlightening in the syslog:

2016-07-11T19:28:38.783Z|00015|dpdk|INFO|Interface dpdk1 txq(0) setup error: Cannot allocate memory 2016-07-11T19:28:38.783Z|00016|dpdk|ERR|Interface dpdk1(rxq:1 txq:1) configure error: Cannot allocate memory 2016-07-11T19:28:38.783Z|00017|bridge|WARN|could not open network device dpdk1 (Cannot allocate memory) 2016-07-11T19:28:38.784Z|00018|bridge|INFO|bridge br0: added interface br0 on port 65534 2016-07-11T19:28:38.795Z|00019|dpdk|INFO|Interface dpdk0 txq(0) setup error: Cannot allocate memory 2016-07-11T19:28:38.795Z|00020|dpdk|ERR|Interface dpdk0(rxq:1 txq:1) configure error: Cannot allocate memory 2016-07-11T19:28:38.795Z|00021|bridge|WARN|could not open network device dpdk0 (Cannot allocate memory) 2016-07-11T19:28:38.795Z|00022|bridge|INFO|bridge br0: using datapath ID 000036b6cbb99b41 2016-07-11T19:28:38.795Z|00023|connmgr|INFO|br0: added service controller "punix:/var/run/openvswitch/br0.mgmt" 2016-07-11T19:28:38.888Z|00024|dpdk|INFO|Interface dpdk1 txq(0) setup error: Cannot allocate memory 2016-07-11T19:28:38.888Z|00025|dpdk|ERR|Interface dpdk1(rxq:1 txq:1) configure error: Cannot allocate memory 2016-07-11T19:28:38.888Z|00026|bridge|WARN|could not open network device dpdk1 (Cannot allocate memory) 2016-07-11T19:28:38.899Z|00027|dpdk|INFO|Interface dpdk0 txq(0) setup error: Cannot allocate memory 2016-07-11T19:28:38.899Z|00028|dpdk|ERR|Interface dpdk0(rxq:1 txq:1) configure error: Cannot allocate memory 2016-07-11T19:28:38.899Z|00029|bridge|WARN|could not open network device dpdk0 (Cannot allocate memory)
2016-07-11T19:28:38.902Z|00030|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.1
2016-07-11T19:28:43.767Z|00031|memory|INFO|247496 kB peak resident set size after 10.2 seconds 2016-07-11T19:28:43.767Z|00032|memory|INFO|handlers:17 ports:1 revalidators:7 rules:5

This error doesn't occur with the same versions of ovs/dpdk compiled and run as in INSTALL.DPDK.md. However as I will explain later there is a difference between the way you run it when testing according to INSTALL.DPDK.md and doing distribution-type testing.

Since this does not occur with niantic I looked for mellanox log errors (I compiled the PMD with the DBG option):

# journalctl --full | grep -i mlx
Jul 11 13:27:28 bl460gen9-04 kernel: mlx_compat: module verification failed: signature and/or required key missing - tainting kernel Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: device is working in RoCE mode: Roce V1 Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link width is x8, device supports x8 Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA Ethernet driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: Using 256 TX rings Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: Using 16 RX rings Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536 Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: Initializing port Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered PHC clock
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: Using 256 TX rings Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: Using 16 RX rings Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: frag:0 - size:1522 prefix:0 stride:1536 Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: Initializing port Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5: renamed from eth3 Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4: renamed from eth2 Jul 11 13:27:28 bl460gen9-04 logger[930]: openibd: start(): Detected 'mlx4_core' loaded with 'log_num_mgm_entry_size=-10' instead of 'log_num_mgm_entry_size=-7' as configured in '', calling stop...
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: removed PHC
Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: device is working in RoCE mode: Roce V1 Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link width is x8, device supports x8 Jul 11 13:27:37 bl460gen9-04 kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.3-1.0.0 (31 May 2016) Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: mlx4_ib_add: allocated counter index 1 for port 1 Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: mlx4_ib_add: allocated counter index 3 for port 2 Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA Ethernet driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: Using 256 TX rings Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: Using 16 RX rings Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536 Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1: Initializing port Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered PHC clock
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4: renamed from eth0 Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: Using 256 TX rings Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: Using 16 RX rings Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: frag:0 - size:1522 prefix:0 stride:1536 Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2: Initializing port Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5: renamed from eth0
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: eth4: Link Up
Jul 11 13:27:59 bl460gen9-04 logger[1527]: openibd: Set node_desc for mlx4_0: bl460gen9-04 HCA-1 Jul 11 13:28:38 bl460gen9-04 ovs-vswitchd[2004]: EAL: probe driver: 15b3:1007 librte_pmd_mlx4 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: EAL: probe driver: 15b3:1007 librte_pmd_mlx4 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5430: mlx4_pci_devinit(): using driver device index 0 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5452: mlx4_pci_devinit(): checking device "mlx4_0" Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5463: mlx4_pci_devinit(): PCI information matches, using device "mlx4_0" (VF: false) Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5483: mlx4_pci_devinit(): device opened Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5486: mlx4_pci_devinit(): 2 port(s) detected Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit(): using port 1 (00000001) Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit(): port 1 is not active: "down" (1) Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit(): device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit(): maximum RSS indirection table size: 256 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit(): checksum offloading is supported Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2 tunnel checksum offloads are supported Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit(): port 1 MAC address is 24:be:05:c0:d2:a0 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit(): port 1 ifname is "eth4" Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit(): port 1 MTU is 1500 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit(): forcing Ethernet interface up Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: frag:0 - size:1522 prefix:0 stride:1536 Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: Setting RSS context tunnel type to RSS on inner headers Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit(): using port 2 (00000002) Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit(): port 2 is not active: "down" (1) Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit(): device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit(): maximum RSS indirection table size: 256 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit(): checksum offloading is supported Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2 tunnel checksum offloads are supported Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit(): port 2 MAC address is 24:be:05:c0:d2:a8 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit(): port 2 ifname is "eth5" Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit(): port 2 MTU is 1500 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit(): forcing Ethernet interface up Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: frag:0 - size:1522 prefix:0 stride:1536 Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: Setting RSS context tunnel type to RSS on inner headers Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure(): 0x840248: TX queues number update: 0 -> 1 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure(): 0x840248: RX queues number update: 0 -> 1 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup(): 0x840248: configuring queue 0 for 2048 descriptors Jul 11 13:28:38 bl460gen9-04 kernel: Modules linked in: tun openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c crc32c_generic nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) inet_lro mlx4_ib(OE) ib_sa(OE) mlx4_en(OE) ib_mad(OE) ptp ib_core(OE) ib_addr(OE) ib_netlink(OE) pps_core mlx4_core(OE) mlx_compat(OE) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel vfat fat kvm iTCO_wdt irqbypass iTCO_vendor_support crc32_pclmul hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper mgag200 cryptd ttm pcspkr drm_kms_helper evdev drm sb_edac i2c_algo_bit edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt lpc_ich i2c_core mfd_core hpilo hpwdt ioatdma dca wmi ipmi_si ipmi_msghandler Jul 11 13:28:38 bl460gen9-04 kernel: pcc_cpufreq acpi_cpufreq processor acpi_power_meter button knem(OE) autofs4 ext4 crc16 mbcache jbd2 usb_storage hid_generic usbhid hid sd_mod sg crc32c_intel xhci_pci hpsa uhci_hcd ehci_pci xhci_hcd ehci_hcd scsi_transport_sas scsi_mod usbcore be2net usb_common [last unloaded: mlx_compat] Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248: CQ creation failure: Cannot allocate memory Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup(): cleaning up 0x7ffc112649e0 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts(): 0x7ffc112649e0: freeing WRs Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure(): 0x83c200: TX queues number update: 0 -> 1 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure(): 0x83c200: RX queues number update: 0 -> 1 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup(): 0x83c200: configuring queue 0 for 2048 descriptors Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200: CQ creation failure: Cannot allocate memory Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup(): cleaning up 0x7ffc112649e0 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts(): 0x7ffc112649e0: freeing WRs Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup(): 0x840248: configuring queue 0 for 2048 descriptors Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248: CQ creation failure: Cannot allocate memory Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup(): cleaning up 0x7ffc112649e0 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts(): 0x7ffc112649e0: freeing WRs Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup(): 0x83c200: configuring queue 0 for 2048 descriptors Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200: CQ creation failure: Cannot allocate memory Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup(): cleaning up 0x7ffc112649e0 Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts(): 0x7ffc112649e0: freeing WRs


This is after rebooting a system. The kicker is if I launch ovs-vsctl manually from the shell without --detach:

# ovs-vswitchd --dpdk -c 0x3 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log- file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid

I get no errors - This is the exact same binary, and the command line is copied from `ps -ef | grep ovs-vswitchd` after a failed run, without the '--monitor --detach' options. I have a happy bridge at least in the sense that there aren't any errors given by ovs-vsctl and nothing bad in the logs, as in no 'error:' field in ovs-vsctl show and no error from

# ovs-vsctl add-port br0 dpdkN -- set interface dpdkN type=dpdk
# ovs-vsctl show
4f412dee-e2e5-42e5-be7e-dbee94c42652
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
    ovs_version: "2.5.1"

So I looked into vswitchd/ovs-vswitchd.c and thought that perhaps the issue had to do with daemonizing after rte_eal_init() possibly killing child threads spawned by rte_eal_init (?) and made the following patch:
Index: openvswitch/vswitchd/ovs-vswitchd.c
===================================================================
--- openvswitch.orig/vswitchd/ovs-vswitchd.c
+++ openvswitch/vswitchd/ovs-vswitchd.c
@@ -58,6 +58,16 @@ static bool want_mlockall;

 static unixctl_cb_func ovs_vswitchd_exit;

+#define DPDK_OPTS_SIZ 2048
+/*
+ * variables/function for saving DPDK options off of the command line,
+ * to run dpdk_init _after_ daemonize is called.
+ */
+char *dpdk_argv[DPDK_OPTS_SIZ];
+int dpdk_argc;
+static int save_dpdk_opts(int argc, char **argv);
+
+
 static char *parse_options(int argc, char *argv[], char **unixctl_path);
 OVS_NO_RETURN static void usage(void);

@@ -71,7 +81,8 @@ main(int argc, char *argv[])
     int retval;

     set_program_name(argv[0]);
-    retval = dpdk_init(argc,argv);
+
+    retval = save_dpdk_opts(argc, argv);
     if (retval < 0) {
         return retval;
     }
@@ -97,6 +108,12 @@ main(int argc, char *argv[])
 #endif
     }

+    retval = dpdk_init(dpdk_argc, dpdk_argv);
+    if (retval < 0) {
+        return retval;
+    }
+
+
     retval = unixctl_server_create(unixctl_path, &unixctl);
     if (retval) {
         exit(EXIT_FAILURE);
@@ -140,6 +157,38 @@ main(int argc, char *argv[])
     return 0;
 }

+
+static int
+save_dpdk_opts(int argc, char *argv[])
+{
+    int i=0;
+
+    memset(dpdk_argv, 0, DPDK_OPTS_SIZ*sizeof(char *));
+    dpdk_argc=0;
+
+    if (strcmp(argv[1], "--dpdk"))
+        return 0;
+
+    dpdk_argv[0] = argv[0];
+    dpdk_argc++;
+
+    for(i=1; i < argc; i++) {
+        if (!strcmp(argv[i], "--")) {
+        break;
+        }
+        dpdk_argv[i] = argv[i];
+    dpdk_argc++;
+    }
+
+    if (i < 2) {
+      return -1;
+    }
+
+    argv[i] = argv[0];
+
+    return i;
+}
+
 static char *
 parse_options(int argc, char *argv[], char **unixctl_pathp)
 {


And it miraculously caused the error to go away e.g. the ports stay after reboots whereas normally if I launch ovs-vswitchd without --detach, get a good bridge with dpdk{0,1} ports and reboot I get that above error state again. I have no idea why this might occur. The dpdk apps all seem to work fine with the mellanox card albiet with a very noticeable lag as they add the ports in (subjective) comparison with the niantic card. Other than the fact that the patch works I can't find any better evidence to substantiate my hypothesis that daemonizing after rte_eal_init is causing the problem and currently is just a best guess.


Thanks,
  John







_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to