Hello,

I'm having some trouble with the linux-cp netlink plugin. After building it from the patch set (<https://gerrit.fd.io/r/c/vpp/+/31122>), it does correctly receive netlink messages and insert routes from the linux kernel table into the VPP FIB. When loading a large amount of routes however (full IPv4 table), VPP crashes after loading about 400k routes.

It appears to be receiving a SIGABRT that terminates the VPP process:

May 27 06:10:33 pdx1rtr1 vnet[2232]: received signal SIGABRT, PC 0x7fe9b99bdce1 May 27 06:10:33 pdx1rtr1 vnet[2232]: #0 0x00007fe9b9de1a7b 0x7fe9b9de1a7b May 27 06:10:33 pdx1rtr1 vnet[2232]: #1 0x00007fe9b9d13140 0x7fe9b9d13140 May 27 06:10:33 pdx1rtr1 vnet[2232]: #2 0x00007fe9b99bdce1 gsignal + 0x141 May 27 06:10:33 pdx1rtr1 vnet[2232]: #3 0x00007fe9b99a7537 abort + 0x123 May 27 06:10:33 pdx1rtr1 vnet[2232]: #4 0x000055d43480a1f3 0x55d43480a1f3 May 27 06:10:33 pdx1rtr1 vnet[2232]: #5 0x00007fe9b9c9c8d5 vec_resize_allocate_memory + 0x285 May 27 06:10:33 pdx1rtr1 vnet[2232]: #6 0x00007fe9b9d71feb vlib_validate_combined_counter + 0xdb May 27 06:10:33 pdx1rtr1 vnet[2232]: #7 0x00007fe9ba4f1e55 load_balance_create + 0x205 May 27 06:10:33 pdx1rtr1 vnet[2232]: #8 0x00007fe9ba4c639d fib_entry_src_mk_lb + 0x38d May 27 06:10:33 pdx1rtr1 vnet[2232]: #9 0x00007fe9ba4c64a4 fib_entry_src_action_install + 0x44 May 27 06:10:33 pdx1rtr1 vnet[2232]: #10 0x00007fe9ba4c681b fib_entry_src_action_activate + 0x17b May 27 06:10:33 pdx1rtr1 vnet[2232]: #11 0x00007fe9ba4c3780 fib_entry_create + 0x70 May 27 06:10:33 pdx1rtr1 vnet[2232]: #12 0x00007fe9ba4b9afc fib_table_entry_update + 0x29c May 27 06:10:33 pdx1rtr1 vnet[2232]: #13 0x00007fe935fcedce 0x7fe935fcedce May 27 06:10:33 pdx1rtr1 vnet[2232]: #14 0x00007fe935fd2ab5 0x7fe935fd2ab5 May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Main process exited, code=killed, status=6/ABRT May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Failed with result 'signal'. May 27 06:10:33 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time. May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Scheduled restart job, restart counter is at 2. May 27 06:10:34 pdx1rtr1 systemd[1]: Stopped vector packet processing engine. May 27 06:10:34 pdx1rtr1 systemd[1]: vpp.service: Consumed 12.505s CPU time. May 27 06:10:34 pdx1rtr1 systemd[1]: Starting vector packet processing engine... May 27 06:10:34 pdx1rtr1 systemd[1]: Started vector packet processing engine.

Here's what I'm working with:

root@pdx1rtr1:~# uname -a
Linux pdx1rtr1 5.10.0-7-amd64 #1 SMP Debian 5.10.38-1 (2021-05-20) x86_64 GNU/Linux
root@pdx1rtr1:~# vppctl show ver
vpp v21.10-rc0~3-g3f3da0d27 built by nate on altair at 2021-05-27T01:21:58
root@pdx1rtr1:~# bird --version
BIRD version 2.0.7

And some adjusted sysctl params:

net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
vm.nr_hugepages = 1024
vm.max_map_count = 3096
vm.hugetlb_shm_group = 0
kernel.shmmax = 2147483648

In case it's at all helpful, I ran a "sh ip fib sum" every second and restarted BIRD to observe when the routes start processing, and to get the last known fib state before the crash:

Thu May 27 06:10:20 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
   Prefix length         Count
                  0               1
                  4               2
                  8               3
                  9               5
                 10              29
                 11              62
                 12             169
                 13             357
                 14             702
                 15            1140
                 16            7110
                 17            4710
                 18            7763
                 19           13814
                 20           22146
                 21           26557
                 22           51780
                 23           43914
                 24          227173
                 27               1
                 32               6
Thu May 27 06:10:21 UTC 2021
clib_socket_init: connect (fd 3, '/run/vpp/cli.sock'): Connection refused
Thu May 27 06:10:22 UTC 2021
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[default-route:1, ]
   Prefix length         Count
                  0               1
                  4               2
                 32               2


I'm new to VPP so let me know if there are other logs that would be useful too.

Cheers,
Nate



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19483): https://lists.fd.io/g/vpp-dev/message/19483
Mute This Topic: https://lists.fd.io/mt/83119168/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to