It isn't present - building the master branch against our regular DPDK package caused the issue to go away.

Thanks,
  John

On 07/11/2016 08:50 PM, Aaron Conole wrote:

Hi John,

Can you try with the latest master branch openvswitch and confirm
whether this issue is present?  The way DPDK is initialized is changed
with releases after 2.5 (currently, nothing released yet).  It would be
good to confirm whether your issue is present or not present.

Thanks,
Aaron

John Phillips <john.philli...@hpe.com> writes:

Hi. I am testing a build of openvswitch with DPDK that we package for
our debian linux distribution called 'openvswitch-switch-dpdk' which
is the normal debian package with the ovs-vswitchd used within the
debian alternatives system (<- not too important). We are trying to
support the intel niantic and mellanox ConnectX3-Pro. We have seen no
issues with the niantic, however with the Mellanox card, the
ovs-vswitchd daemon fails if started in it's init script (the standard
init script in debian/ directory) to add the DPDK ports, I get this:

4f412dee-e2e5-42e5-be7e-dbee94c42652
     Bridge "br0"
         Port "br0"
             Interface "br0"
                 type: internal
         Port "dpdk0"
             Interface "dpdk0"
                 type: dpdk
                 error: "could not open network device dpdk0 (Cannot
allocate memory)"
         Port "dpdk1"
             Interface "dpdk1"
                 type: dpdk
                 error: "could not open network device dpdk1 (Cannot
allocate memory)"
     ovs_version: "2.5.1"



There wasn't anything particularly enlightening in the syslog:

2016-07-11T19:28:38.783Z|00015|dpdk|INFO|Interface dpdk1 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.783Z|00016|dpdk|ERR|Interface dpdk1(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.783Z|00017|bridge|WARN|could not open network
device dpdk1 (Cannot allocate memory)
2016-07-11T19:28:38.784Z|00018|bridge|INFO|bridge br0: added interface
br0 on port 65534
2016-07-11T19:28:38.795Z|00019|dpdk|INFO|Interface dpdk0 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.795Z|00020|dpdk|ERR|Interface dpdk0(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.795Z|00021|bridge|WARN|could not open network
device dpdk0 (Cannot allocate memory)
2016-07-11T19:28:38.795Z|00022|bridge|INFO|bridge br0: using datapath
ID 000036b6cbb99b41
2016-07-11T19:28:38.795Z|00023|connmgr|INFO|br0: added service
controller "punix:/var/run/openvswitch/br0.mgmt"
2016-07-11T19:28:38.888Z|00024|dpdk|INFO|Interface dpdk1 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.888Z|00025|dpdk|ERR|Interface dpdk1(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.888Z|00026|bridge|WARN|could not open network
device dpdk1 (Cannot allocate memory)
2016-07-11T19:28:38.899Z|00027|dpdk|INFO|Interface dpdk0 txq(0) setup
error: Cannot allocate memory
2016-07-11T19:28:38.899Z|00028|dpdk|ERR|Interface dpdk0(rxq:1 txq:1)
configure error: Cannot allocate memory
2016-07-11T19:28:38.899Z|00029|bridge|WARN|could not open network
device dpdk0 (Cannot allocate memory)
2016-07-11T19:28:38.902Z|00030|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.1
2016-07-11T19:28:43.767Z|00031|memory|INFO|247496 kB peak resident set
size after 10.2 seconds
2016-07-11T19:28:43.767Z|00032|memory|INFO|handlers:17 ports:1
revalidators:7 rules:5

This error doesn't occur with the same versions of ovs/dpdk compiled
and run as in INSTALL.DPDK.md. However as I will explain later there
is a difference between the way you run it when testing according to
INSTALL.DPDK.md and doing distribution-type testing.

Since this does not occur with niantic I looked for mellanox log
errors (I compiled the PMD with the DBG option):

# journalctl --full | grep -i mlx
Jul 11 13:27:28 bl460gen9-04 kernel: mlx_compat: module verification
failed: signature and/or required key missing - tainting kernel
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core
driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: device is working in
RoCE mode: Roce V1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs
is not supported by the devicegid_type 0 was chosen instead
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
speed is 8.0GT/s, device supports 8.0GT/s
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
width is x8, device supports x8
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA
Ethernet driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 256 TX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 16 RX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Initializing port
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered
PHC clock
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 256 TX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 16 RX rings
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Initializing port
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5:
renamed from eth3
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4:
renamed from eth2
Jul 11 13:27:28 bl460gen9-04 logger[930]: openibd: start(): Detected
'mlx4_core' loaded with 'log_num_mgm_entry_size=-10' instead of
'log_num_mgm_entry_size=-7' as configured in '', calling stop...
Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: removed PHC
Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core
driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: device is working in
RoCE mode: Roce V1
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs
is not supported by the devicegid_type 0 was chosen instead
Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
speed is 8.0GT/s, device supports 8.0GT/s
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
width is x8, device supports x8
Jul 11 13:27:37 bl460gen9-04 kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib:
Mellanox ConnectX InfiniBand driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0:
mlx4_ib_add: allocated counter index 1 for port 1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0:
mlx4_ib_add: allocated counter index 3 for port 2
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA
Ethernet driver v3.3-1.0.0 (31 May 2016)
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 256 TX rings
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Using 16 RX rings
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
Initializing port
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered
PHC clock
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4:
renamed from eth0
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 256 TX rings
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Using 16 RX rings
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
frag:0 - size:1522 prefix:0 stride:1536
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
Initializing port
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5:
renamed from eth0
Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: eth4: Link Up
Jul 11 13:27:59 bl460gen9-04 logger[1527]: openibd: Set node_desc for
mlx4_0: bl460gen9-04 HCA-1
Jul 11 13:28:38 bl460gen9-04 ovs-vswitchd[2004]: EAL: probe driver:
15b3:1007 librte_pmd_mlx4
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: EAL:   probe
driver: 15b3:1007 librte_pmd_mlx4
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5430: mlx4_pci_devinit():
using driver device index 0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5452: mlx4_pci_devinit():
checking device "mlx4_0"
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5463: mlx4_pci_devinit():
PCI information matches, using device "mlx4_0" (VF: false)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5483: mlx4_pci_devinit():
device opened
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5486: mlx4_pci_devinit(): 2
port(s) detected
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit():
using port 1 (00000001)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit():
port 1 is not active: "down" (1)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit():
device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit():
maximum RSS indirection table size: 256
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit():
checksum offloading is supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2
tunnel checksum offloads are supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit():
port 1 MAC address is 24:be:05:c0:d2:a0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit():
port 1 ifname is "eth4"
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit():
port 1 MTU is 1500
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit():
forcing Ethernet interface up
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: frag:0 - size:1522
prefix:0 stride:1536
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: Setting RSS
context tunnel type to RSS on inner headers
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit():
using port 2 (00000002)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit():
port 2 is not active: "down" (1)
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit():
device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit():
maximum RSS indirection table size: 256
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit():
checksum offloading is supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2
tunnel checksum offloads are supported
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit():
port 2 MAC address is 24:be:05:c0:d2:a8
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit():
port 2 ifname is "eth5"
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit():
port 2 MTU is 1500
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit():
forcing Ethernet interface up
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: frag:0 - size:1522
prefix:0 stride:1536
Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: Setting RSS
context tunnel type to RSS on inner headers
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure():
0x840248: TX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure():
0x840248: RX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x840248: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 kernel: Modules linked in: tun
openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c crc32c_generic nfsd
auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc rdma_ucm(OE)
ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE)
ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) inet_lro mlx4_ib(OE) ib_sa(OE)
mlx4_en(OE) ib_mad(OE) ptp ib_core(OE) ib_addr(OE) ib_netlink(OE)
pps_core mlx4_core(OE) mlx_compat(OE) x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel vfat fat kvm iTCO_wdt irqbypass
iTCO_vendor_support crc32_pclmul hmac drbg ansi_cprng aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper mgag200 cryptd ttm
pcspkr drm_kms_helper evdev drm sb_edac i2c_algo_bit edac_core
fb_sys_fops syscopyarea sysfillrect sysimgblt lpc_ich i2c_core
mfd_core hpilo hpwdt ioatdma dca wmi ipmi_si ipmi_msghandler
Jul 11 13:28:38 bl460gen9-04 kernel:  pcc_cpufreq acpi_cpufreq
processor acpi_power_meter button knem(OE) autofs4 ext4 crc16 mbcache
jbd2 usb_storage hid_generic usbhid hid sd_mod sg crc32c_intel
xhci_pci hpsa uhci_hcd ehci_pci xhci_hcd ehci_hcd scsi_transport_sas
scsi_mod usbcore be2net usb_common [last unloaded: mlx_compat]
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure():
0x83c200: TX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure():
0x83c200: RX queues number update: 0 -> 1
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x83c200: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x840248: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
0x83c200: configuring queue 0 for 2048 descriptors
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200:
CQ creation failure: Cannot allocate memory
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
cleaning up 0x7ffc112649e0
Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
/build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
0x7ffc112649e0: freeing WRs


This is after rebooting a system. The kicker is if I launch ovs-vsctl
manually from the shell without --detach:

# ovs-vswitchd --dpdk -c 0x3 -- unix:/var/run/openvswitch/db.sock
-vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-
file=/var/log/openvswitch/ovs-vswitchd.log
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid

I get no errors - This is the exact same binary, and the command line
is copied from `ps -ef | grep ovs-vswitchd` after a failed run,
without the '--monitor --detach' options. I have a happy bridge at
least in the sense that there aren't any errors given by ovs-vsctl and
nothing bad in the logs, as in no 'error:' field in ovs-vsctl show and
no error from

# ovs-vsctl add-port br0 dpdkN -- set interface dpdkN type=dpdk
# ovs-vsctl show
4f412dee-e2e5-42e5-be7e-dbee94c42652
     Bridge "br0"
         Port "br0"
             Interface "br0"
                 type: internal
         Port "dpdk0"
             Interface "dpdk0"
                 type: dpdk
         Port "dpdk1"
             Interface "dpdk1"
                 type: dpdk
     ovs_version: "2.5.1"

So I looked into vswitchd/ovs-vswitchd.c and thought that perhaps the
issue had to do with daemonizing after rte_eal_init() possibly killing
child threads spawned by rte_eal_init (?) and made the following
patch:
Index: openvswitch/vswitchd/ovs-vswitchd.c
===================================================================
--- openvswitch.orig/vswitchd/ovs-vswitchd.c
+++ openvswitch/vswitchd/ovs-vswitchd.c
@@ -58,6 +58,16 @@ static bool want_mlockall;

  static unixctl_cb_func ovs_vswitchd_exit;

+#define DPDK_OPTS_SIZ 2048
+/*
+ * variables/function for saving DPDK options off of the command line,
+ * to run dpdk_init _after_ daemonize is called.
+ */
+char *dpdk_argv[DPDK_OPTS_SIZ];
+int dpdk_argc;
+static int save_dpdk_opts(int argc, char **argv);
+
+
  static char *parse_options(int argc, char *argv[], char **unixctl_path);
  OVS_NO_RETURN static void usage(void);

@@ -71,7 +81,8 @@ main(int argc, char *argv[])
      int retval;

      set_program_name(argv[0]);
-    retval = dpdk_init(argc,argv);
+
+    retval = save_dpdk_opts(argc, argv);
      if (retval < 0) {
          return retval;
      }
@@ -97,6 +108,12 @@ main(int argc, char *argv[])
  #endif
      }

+    retval = dpdk_init(dpdk_argc, dpdk_argv);
+    if (retval < 0) {
+        return retval;
+    }
+
+
      retval = unixctl_server_create(unixctl_path, &unixctl);
      if (retval) {
          exit(EXIT_FAILURE);
@@ -140,6 +157,38 @@ main(int argc, char *argv[])
      return 0;
  }

+
+static int
+save_dpdk_opts(int argc, char *argv[])
+{
+    int i=0;
+
+    memset(dpdk_argv, 0, DPDK_OPTS_SIZ*sizeof(char *));
+    dpdk_argc=0;
+
+    if (strcmp(argv[1], "--dpdk"))
+        return 0;
+
+    dpdk_argv[0] = argv[0];
+    dpdk_argc++;
+
+    for(i=1; i < argc; i++) {
+        if (!strcmp(argv[i], "--")) {
+        break;
+        }
+        dpdk_argv[i] = argv[i];
+    dpdk_argc++;
+    }
+
+    if (i < 2) {
+      return -1;
+    }
+
+    argv[i] = argv[0];
+
+    return i;
+}
+
  static char *
  parse_options(int argc, char *argv[], char **unixctl_pathp)
  {


And it miraculously caused the error to go away e.g. the ports stay
after reboots whereas normally if I launch ovs-vswitchd without
--detach, get a good bridge with dpdk{0,1} ports and reboot I get that
above error state again. I have no idea why this might occur. The dpdk
apps all seem to work fine with the mellanox card albiet with a very
noticeable lag as they add the ports in (subjective) comparison with
the niantic card. Other than the fact that the patch works I can't
find any better evidence to substantiate my hypothesis that
daemonizing after rte_eal_init is causing the problem and currently is
just a best guess.


Thanks,
   John







_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to