Hello, all Recently I found that the TLS half open lock is not well implemented, and if enabling multiple thread, there are chances to get the following core dump info in debug mode.
(gdb) where #0 0x00007f7a0848e428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f7a0849002a in __GI_abort () at abort.c:89 #2 0x0000000000407f0b in os_panic () at /home/pyu4/git-home/vpp_clean/vpp/src/vpp/vnet/main.c:331 #3 0x00007f7a08867bd0 in debugger () at /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:84 #4 0x00007f7a08868008 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7f7a0a0add78 "%s:%d (%s) assertion `%s' fails") at /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:143 #5 0x00007f7a09e10be0 in tls_ctx_half_open_get (ctx_index=48) at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:126 #6 0x00007f7a09e11889 in tls_session_connected_callback (tls_app_index=0, ho_ctx_index=48, tls_session=0x7f79c9b6d1c0, is_fail=0 '\000') at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:404 #7 0x00007f7a09d5ea6e in session_stream_connect_notify (tc=0x7f79c9b655fc, is_fail=0 '\000') at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/session/session.c:648 #8 0x00007f7a099cb969 in tcp46_syn_sent_inline (vm=0x7f79c8a25100, node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0, is_ip4=1) at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2306 #9 0x00007f7a099cbe00 in tcp4_syn_sent (vm=0x7f79c8a25100, node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0) at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2387 #10 0x00007f7a08fefa35 in dispatch_node (vm=0x7f79c8a25100, node=0x7f79c9a60500, type=VLIB_NODE_TYPE_INTERNAL, dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f79c8b2a9c0, last_time_stamp=902372436923868) at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:988 #11 0x00007f7a08feffee in dispatch_pending_node (vm=0x7f79c8a25100, pending_frame_index=7, last_time_stamp=902372436923868) at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1138 #12 0x00007f7a08ff1bed in vlib_main_or_worker_loop (vm=0x7f79c8a25100, is_main=0) at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1554 #13 0x00007f7a08ff240c in vlib_worker_loop (vm=0x7f79c8a25100) at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1634 #14 0x00007f7a09035541 in vlib_worker_thread_fn (arg=0x7f79ca4a41c0) at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:1760 #15 0x00007f7a0888aa38 in clib_calljmp () from /home/pyu4/git-home/vpp_clean/vpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so #16 0x00007f7761198d70 in ?? () #17 0x00007f7a090300be in vlib_worker_thread_bootstrap_fn (arg=0x7f79ca4a41c0) at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:684 seemly current code design may have race condition and to make the same index returned, and after one free one index, the other will cause core dump. I did a simple fix to add a big lock as the following, and this kind of core dump never happen again. Do you guys have a better solution for the lock? + clib_rwlock_writer_lock (&tm->half_open_rwlock); pool_get_aligned_will_expand (tm->half_open_ctx_pool, will_expand, 0); if (PREDICT_FALSE (will_expand && vlib_num_workers ())) { - clib_rwlock_writer_lock (&tm->half_open_rwlock); pool_get (tm->half_open_ctx_pool, ctx); memset (ctx, 0, sizeof (*ctx)); ctx_index = ctx - tm->half_open_ctx_pool; - clib_rwlock_writer_unlock (&tm->half_open_rwlock); } else { @@ -104,6 +103,8 @@ tls_ctx_half_open_alloc (void) memset (ctx, 0, sizeof (*ctx)); ctx_index = ctx - tm->half_open_ctx_pool; } + clib_rwlock_writer_unlock (&tm->half_open_rwlock); return ctx_index; }
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#10302): https://lists.fd.io/g/vpp-dev/message/10302 Mute This Topic: https://lists.fd.io/mt/24975100/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-