Hi vpp-dev,

I'm hitting an assert in vlib_next_frame_change_ownership (vlib/main.c)

 ASSERT (vec_len (node->next_nodes) == node_runtime->n_next_nodes);

I added some code to see what was going on

 #ifdef CLIB_ASSERT_ENABLE
   if (vec_len (node->next_nodes) != node_runtime->n_next_nodes)
     {
       clib_warning ("%s: in %s: vec_len (node->next_nodes) %u != %u 
node_runtime->n_next_nodes)",
                     __FUNCTION__, node->name, vec_len (node->next_nodes), 
node_runtime->n_next_nodes);
       for (int i = 0; i < vec_len (node->next_nodes); i++)
           clib_warning ("%s: next %u: %u %s", __FUNCTION__, i, 
node->next_nodes[i],
                         vlib_get_node (vm, node->next_nodes[i])->name);
     }
 #endif
   ASSERT (vec_len (node->next_nodes) == node_runtime->n_next_nodes);

And this was the output:

 2019-09-17 12:21:18.375523: 1: vlib_next_frame_change_ownership:293: 
vlib_next_frame_change_ownership: in ip4-arp: vec_len (node->next_nodes) 3 != 2 
node_runtime-  >n_next_nodes)
 2019-09-17 12:21:18.375600: 1: vlib_next_frame_change_ownership:296: 
vlib_next_frame_change_ownership: next 0: 595 error-drop
 2019-09-17 12:21:18.375630: 1: vlib_next_frame_change_ownership:296: 
vlib_next_frame_change_ownership: next 1: 611 UnknownEthernet1-output
 2019-09-17 12:21:18.375654: 1: vlib_next_frame_change_ownership:296: 
vlib_next_frame_change_ownership: next 2: 0 null-node
 2019-09-17 12:21:18.375678: 1: 
/var/build/mb-build/openwrt-dd/build_dir/target-aarch64_cortex-a53+neon-vfpv4_glibc-2.22/vpp-19.04.2/src/vlib/main.c:300
 (vlib_next_frame_change_ownership) assertion `vec_len (node->next_nodes) == 
node_runtime->n_next_nodes' fails

The use case is that I've got an ipsec tunnel that's *just* been brought up 
(i.e., admin up). I'm immediately sending traffic on it in a worker thread 
(polling output routine) to the other end of the tunnel (and thus receiving 
from the other end which is doing the same thing). Depending on the order the 
tunnel endpoint interfaces are brought up I may or may not hit the above, but 
it happens most of the time on at least one endpoint.

In any case, is this hitting some sort of race condition with node/graph 
construction? I wonder this b/c I would think it should not happen that a 
node's next array is larger than the node's runtime count of the same, but only 
for some short period of time.

Thanks,
Chris.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14004): https://lists.fd.io/g/vpp-dev/message/14004
Mute This Topic: https://lists.fd.io/mt/34176568/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to