Hi Ping, Looks like a free happens just before on thread 2. Could you try protecting the allocation of a half-open ctx that does not expand the pool with a reader lock, to see if this keeps on happening?
Florin > On Aug 28, 2018, at 2:28 AM, Yu, Ping <ping...@intel.com> wrote: > > Hi, Florin, > > Yes, you are right, and all alloc operation is performed by thread 0. An > interesting thing is that if running “test echo clients nclients 300 uri > tls://10.10.1.1/1111 <tls://10.10.1.1/1111>” in clients with 4 threads, I can > easily catch the case one same index be alloc twice by thread 0. > > thread 0: alloc: 145 > thread 0: alloc: 69 > thread 4 free: 151 > thread 0: alloc: 151 > thread 2 free: 149 > thread 3 free: 155 > thread 0: alloc: 149 > thread 0: alloc: 149 > thread 0: alloc: 58 > thread 0: alloc: 9 > thread 0: alloc: 29 > thread 3 free: 146 > thread 0: alloc: 146 > thread 2 free: 153 > thread 0: alloc: 144 > thread 0: alloc: 153 > thread 0: alloc: 124 > thread 3 free: 25 > thread 0: alloc: 25 > > From: Florin Coras [mailto:fcoras.li...@gmail.com] > Sent: Tuesday, August 28, 2018 10:24 AM > To: Yu, Ping <ping...@intel.com> > Cc: Florin Coras (fcoras) <fco...@cisco.com>; vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] TLS half open lock > > Hi Ping, > > The expectation is that all connects/listens come on the main thread (with > the worker barrier held). In other words, we only need to support a one > writer, multiple readers scenario. > > Florin > > > On Aug 27, 2018, at 6:29 PM, Yu, Ping <ping...@intel.com > <mailto:ping...@intel.com>> wrote: > > Hi, Florin, > > To check if it is about to expand is also lockless. Is there any issue if two > threads check the pool simultaneously, and just one slot is available? One > code will do normal get, and the other thread is expanding the pool? > > Thanks > Ping > <> > <>From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> > [mailto:vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>] On Behalf Of Florin > Coras > Sent: Tuesday, August 28, 2018 12:51 AM > To: Yu, Ping <ping...@intel.com <mailto:ping...@intel.com>> > Cc: Florin Coras (fcoras) <fco...@cisco.com <mailto:fco...@cisco.com>>; > vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> > Subject: Re: [vpp-dev] TLS half open lock > > Hi Ping, > > The current implementation only locks the half-open pool if the pool is about > to expand. This is done to increase speed by avoiding unnecessary locking, > i.e., if pool is not about to expand, it should be safe to get a new element > from it without affecting readers. Now the thing to figure out is why this is > happening. Does the slowdown due to the “big lock” avoid some race or is > there more to it? > > First of all, how many workers do you have configured and how many sessions > are you allocating/connecting? Do you see failed connects? > > Your tls_ctx_half_open_get line numbers don’t match my code. Did you by > chance modify something else? > > Thanks, > Florin > > > > On Aug 27, 2018, at 9:22 AM, Yu, Ping <ping...@intel.com > <mailto:ping...@intel.com>> wrote: > > Hello, all > > Recently I found that the TLS half open lock is not well implemented, and if > enabling multiple thread, there are chances to get the following core dump > info in debug mode. > > (gdb) where > #0 0x00007f7a0848e428 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:54 > #1 0x00007f7a0849002a in __GI_abort () at abort.c:89 > #2 0x0000000000407f0b in os_panic () at > /home/pyu4/git-home/vpp_clean/vpp/src/vpp/vnet/main.c:331 > #3 0x00007f7a08867bd0 in debugger () at > /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:84 > #4 0x00007f7a08868008 in _clib_error (how_to_die=2, function_name=0x0, > line_number=0, > fmt=0x7f7a0a0add78 "%s:%d (%s) assertion `%s' fails") at > /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:143 > #5 0x00007f7a09e10be0 in tls_ctx_half_open_get (ctx_index=48) at > /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:126 > #6 0x00007f7a09e11889 in tls_session_connected_callback (tls_app_index=0, > ho_ctx_index=48, tls_session=0x7f79c9b6d1c0, > is_fail=0 '\000') at > /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:404 > #7 0x00007f7a09d5ea6e in session_stream_connect_notify (tc=0x7f79c9b655fc, > is_fail=0 '\000') > at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/session/session.c:648 > #8 0x00007f7a099cb969 in tcp46_syn_sent_inline (vm=0x7f79c8a25100, > node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0, is_ip4=1) > at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2306 > #9 0x00007f7a099cbe00 in tcp4_syn_sent (vm=0x7f79c8a25100, > node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0) > at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2387 > #10 0x00007f7a08fefa35 in dispatch_node (vm=0x7f79c8a25100, > node=0x7f79c9a60500, type=VLIB_NODE_TYPE_INTERNAL, > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f79c8b2a9c0, > last_time_stamp=902372436923868) > at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:988 > #11 0x00007f7a08feffee in dispatch_pending_node (vm=0x7f79c8a25100, > pending_frame_index=7, last_time_stamp=902372436923868) > at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1138 > #12 0x00007f7a08ff1bed in vlib_main_or_worker_loop (vm=0x7f79c8a25100, > is_main=0) > at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1554 > #13 0x00007f7a08ff240c in vlib_worker_loop (vm=0x7f79c8a25100) at > /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1634 > #14 0x00007f7a09035541 in vlib_worker_thread_fn (arg=0x7f79ca4a41c0) at > /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:1760 > #15 0x00007f7a0888aa38 in clib_calljmp () > from > /home/pyu4/git-home/vpp_clean/vpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so > #16 0x00007f7761198d70 in ?? () > #17 0x00007f7a090300be in vlib_worker_thread_bootstrap_fn (arg=0x7f79ca4a41c0) > at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:684 > > seemly current code design may have race condition and to make the same index > returned, and after one free one index, the other will cause core dump. > I did a simple fix to add a big lock as the following, and this kind of core > dump never happen again. > Do you guys have a better solution for the lock? > > + clib_rwlock_writer_lock (&tm->half_open_rwlock); > pool_get_aligned_will_expand (tm->half_open_ctx_pool, will_expand, 0); > if (PREDICT_FALSE (will_expand && vlib_num_workers ())) > { > - clib_rwlock_writer_lock (&tm->half_open_rwlock); > pool_get (tm->half_open_ctx_pool, ctx); > memset (ctx, 0, sizeof (*ctx)); > ctx_index = ctx - tm->half_open_ctx_pool; > - clib_rwlock_writer_unlock (&tm->half_open_rwlock); > } > else > { > @@ -104,6 +103,8 @@ tls_ctx_half_open_alloc (void) > memset (ctx, 0, sizeof (*ctx)); > ctx_index = ctx - tm->half_open_ctx_pool; > } > + clib_rwlock_writer_unlock (&tm->half_open_rwlock); > return ctx_index; > } > > > > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > > View/Reply Online (#10302): https://lists.fd.io/g/vpp-dev/message/10302 > <https://lists.fd.io/g/vpp-dev/message/10302> > Mute This Topic: https://lists.fd.io/mt/24975100/675152 > <https://lists.fd.io/mt/24975100/675152> > Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io> > Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub > <https://lists.fd.io/g/vpp-dev/unsub> [fcoras.li...@gmail.com > <mailto:fcoras.li...@gmail.com>] > -=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#10320): https://lists.fd.io/g/vpp-dev/message/10320 Mute This Topic: https://lists.fd.io/mt/24975100/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-