Hi all, TLDR: when running tests vs vpp with multiple workers, roughly 25% of tests fail or crash vpp. It looks like buffer management is still not completely thread safe.
I've pushed work-in-progress make test modification which runs the test against both single-thread and multiple-worker vpp. There are quite a few failures and/or coredumps while running against multiple-worker vpp. These test cases are failing at this time: ACLPluginConnTestCase BFD4TestCase BFDFIBTestCase TestDHCP Datapath DisableFP DisableIPFIX Flowprobe ReenableFP ReenableIPFIX TestGRE TestIPv4FibCrud TestIp4VrfMultiInst TestIP6VrfMultiInst TestL2fib TestL2bdArpTerm TestL2bdMultiInst TestLB TestNAT64 TestSNAT TestSpan TestVxlanGpe it seems that there are still some thread safety issues with the buffer management based on the TestSpan crash: #2 0x0000000000406d1e in os_exit (code=code@entry=1) at /home/ksekera/vpp/build-data/../src/vpp/vnet/main.c:287 #3 0x00007f139af0c2fa in unix_signal_handler (signum=<optimized out>, si=<optimized out>, uc=<optimized out>) at /home/ksekera/vpp/build-data/../src/vlib/unix/main.c:118 #4 <signal handler called> #5 mheap_put (v=0x7f1356bdf000, uoffset=18446744073709549696) at /home/ksekera/vpp/build-data/../src/vppinfra/mheap.c:797 #6 0x00007f139aeb6574 in vlib_buffer_add_to_free_list (do_init=1 '\001', buffer_index=<optimized out>, f=0x7f1359d0f780, vm=0x7f139b1252e0 <vlib_global_main>) at /home/ksekera/vpp/build-data/../src/vlib/buffer_funcs.h:861 #7 vlib_buffer_free_inline (follow_buffer_next=1, n_buffers=256, buffers=<optimized out>, vm=0x7f139b1252e0 <vlib_global_main>) at /home/ksekera/vpp/build-data/../src/vlib/buffer.c:705 #8 vlib_buffer_free_internal (vm=0x7f139b1252e0 <vlib_global_main>, buffers=0x7f135b504110, n_buffers=<optimized out>) at /home/ksekera/vpp/build-data/../src/vlib/buffer.c:730 #9 0x00007f139aaba427 in vlib_buffer_free (n_buffers=256, buffers=<optimized out>, vm=0x7f139b1252e0 <vlib_global_main>) at /home/ksekera/vpp/build-data/../src/vlib/buffer_funcs.h:327 #10 pg_output (vm=0x7f139b1252e0 <vlib_global_main>, node=<optimized out>, frame=<optimized out>) at /home/ksekera/vpp/build-data/../src/vnet/pg/output.c:83 (gdb) #5 mheap_put (v=0x7f1356bdf000, uoffset=18446744073709549696) at /home/ksekera/vpp/build-data/../src/vppinfra/mheap.c:797 797 if (e->n_user_data != n->prev_n_user_data) (gdb) p *n Cannot access memory at address 0x7f1556bde87c (gdb) here is the patch set if you want to try it out... https://gerrit.fd.io/r/#/c/8090 it's still a bit clunky, as the testing is done in two phases - first the full suite is run vs single-thread vpp, then vs multiple-worker vpp. It's not straightforward to do this in one run (so that instead of running A, B, C vs single and A, B, C vs multi we run A (vs single), A (vs multi), B (vs single), B (vs multi), C (vs single), C (vs multi)) so that's why for now it's implemented this way. If you want to skip the single-thread tests to speed up your own testing, run it like this: env VPP_TEST_SKIP_SINGLE_THREAD=y make test Currently, the number of worker threads is set as the core count minus two, with a cap of 8. Higher number causes the ACL plugin to freak out (memory allocation failure) and the VPP refuses to start, ruining the day for everybody. Regards, Klement _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev