Re: [vpp-dev] TLS half open lock

Florin Coras Tue, 28 Aug 2018 07:52:57 -0700

Hi Ping, 

Looks like a free happens just before on thread 2. Could you try protecting the 
allocation of a half-open ctx that does not expand the pool with a reader lock, 
to see if this keeps on happening?


Florin

> On Aug 28, 2018, at 2:28 AM, Yu, Ping <ping...@intel.com> wrote:
> 
> Hi, Florin,
>  
> Yes, you are right, and all alloc operation is performed by thread 0. An 
> interesting thing is that if running “test echo clients nclients 300 uri 
> tls://10.10.1.1/1111 <tls://10.10.1.1/1111>” in clients with 4 threads, I can 
> easily catch the case one same index be alloc twice by thread 0.
>  
> thread 0: alloc: 145
> thread 0: alloc: 69
> thread 4 free: 151
> thread 0: alloc: 151
> thread 2 free: 149
> thread 3 free: 155
> thread 0: alloc: 149
> thread 0: alloc: 149
> thread 0: alloc: 58
> thread 0: alloc: 9
> thread 0: alloc: 29
> thread 3 free: 146
> thread 0: alloc: 146
> thread 2 free: 153
> thread 0: alloc: 144
> thread 0: alloc: 153
> thread 0: alloc: 124
> thread 3 free: 25
> thread 0: alloc: 25
>  
> From: Florin Coras [mailto:fcoras.li...@gmail.com] 
> Sent: Tuesday, August 28, 2018 10:24 AM
> To: Yu, Ping <ping...@intel.com>
> Cc: Florin Coras (fcoras) <fco...@cisco.com>; vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] TLS half open lock
>  
> Hi Ping, 
>  
> The expectation is that all connects/listens come on the main thread (with 
> the worker barrier held). In other words, we only need to support a one 
> writer, multiple readers scenario. 
>  
> Florin 
> 
> 
> On Aug 27, 2018, at 6:29 PM, Yu, Ping <ping...@intel.com 
> <mailto:ping...@intel.com>> wrote:
>  
> Hi, Florin,
>  
> To check if it is about to expand is also lockless. Is there any issue if two 
> threads check the pool simultaneously, and just one slot is available? One 
> code will do normal get, and the other thread is expanding the pool?
>  
> Thanks
> Ping
>   <>
>  <>From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> 
> [mailto:vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>] On Behalf Of Florin 
> Coras
> Sent: Tuesday, August 28, 2018 12:51 AM
> To: Yu, Ping <ping...@intel.com <mailto:ping...@intel.com>>
> Cc: Florin Coras (fcoras) <fco...@cisco.com <mailto:fco...@cisco.com>>; 
> vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: Re: [vpp-dev] TLS half open lock
>  
> Hi Ping, 
>  
> The current implementation only locks the half-open pool if the pool is about 
> to expand. This is done to increase speed by avoiding unnecessary locking, 
> i.e., if pool is not about to expand, it should be safe to get a new element 
> from it without affecting readers. Now the thing to figure out is why this is 
> happening. Does the slowdown due to the “big lock” avoid some race or is 
> there more to it?
>  
> First of all, how many workers do you have configured and how many sessions 
> are you allocating/connecting? Do you see failed connects? 
>  
> Your tls_ctx_half_open_get line numbers don’t match my code. Did you by 
> chance modify something else?
>  
> Thanks,
> Florin
> 
> 
> 
> On Aug 27, 2018, at 9:22 AM, Yu, Ping <ping...@intel.com 
> <mailto:ping...@intel.com>> wrote:
>  
> Hello, all
>  
> Recently I found that the TLS half open lock is not well implemented, and if 
> enabling multiple thread, there are chances to get the following core dump 
> info in debug mode.
>  
> (gdb) where
> #0  0x00007f7a0848e428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x00007f7a0849002a in __GI_abort () at abort.c:89
> #2  0x0000000000407f0b in os_panic () at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vpp/vnet/main.c:331
> #3  0x00007f7a08867bd0 in debugger () at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:84
> #4  0x00007f7a08868008 in _clib_error (how_to_die=2, function_name=0x0, 
> line_number=0,
>     fmt=0x7f7a0a0add78 "%s:%d (%s) assertion `%s' fails") at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vppinfra/error.c:143
> #5  0x00007f7a09e10be0 in tls_ctx_half_open_get (ctx_index=48) at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:126
> #6  0x00007f7a09e11889 in tls_session_connected_callback (tls_app_index=0, 
> ho_ctx_index=48, tls_session=0x7f79c9b6d1c0,
>     is_fail=0 '\000') at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tls/tls.c:404
> #7  0x00007f7a09d5ea6e in session_stream_connect_notify (tc=0x7f79c9b655fc, 
> is_fail=0 '\000')
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/session/session.c:648
> #8  0x00007f7a099cb969 in tcp46_syn_sent_inline (vm=0x7f79c8a25100, 
> node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0, is_ip4=1)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2306
> #9  0x00007f7a099cbe00 in tcp4_syn_sent (vm=0x7f79c8a25100, 
> node=0x7f79c9a60500, from_frame=0x7f79c8b2a9c0)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vnet/tcp/tcp_input.c:2387
> #10 0x00007f7a08fefa35 in dispatch_node (vm=0x7f79c8a25100, 
> node=0x7f79c9a60500, type=VLIB_NODE_TYPE_INTERNAL,
>     dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f79c8b2a9c0, 
> last_time_stamp=902372436923868)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:988
> #11 0x00007f7a08feffee in dispatch_pending_node (vm=0x7f79c8a25100, 
> pending_frame_index=7, last_time_stamp=902372436923868)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1138
> #12 0x00007f7a08ff1bed in vlib_main_or_worker_loop (vm=0x7f79c8a25100, 
> is_main=0)
>     at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1554
> #13 0x00007f7a08ff240c in vlib_worker_loop (vm=0x7f79c8a25100) at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vlib/main.c:1634
> #14 0x00007f7a09035541 in vlib_worker_thread_fn (arg=0x7f79ca4a41c0) at 
> /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:1760
> #15 0x00007f7a0888aa38 in clib_calljmp ()
>    from 
> /home/pyu4/git-home/vpp_clean/vpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so
> #16 0x00007f7761198d70 in ?? ()
> #17 0x00007f7a090300be in vlib_worker_thread_bootstrap_fn (arg=0x7f79ca4a41c0)
> at /home/pyu4/git-home/vpp_clean/vpp/src/vlib/threads.c:684
>  
> seemly current code design may have race condition and to make the same index 
> returned, and after one free one index, the other will cause core dump. 
> I did a simple fix to add a big lock as the following, and this kind of core 
> dump never happen again.
> Do you guys have a better solution for the lock?
>  
> +      clib_rwlock_writer_lock (&tm->half_open_rwlock);
>    pool_get_aligned_will_expand (tm->half_open_ctx_pool, will_expand, 0);
>    if (PREDICT_FALSE (will_expand && vlib_num_workers ()))
>      {
> -      clib_rwlock_writer_lock (&tm->half_open_rwlock);
>        pool_get (tm->half_open_ctx_pool, ctx);
>        memset (ctx, 0, sizeof (*ctx));
>        ctx_index = ctx - tm->half_open_ctx_pool;
> -      clib_rwlock_writer_unlock (&tm->half_open_rwlock);
>      }
>    else
>      {
> @@ -104,6 +103,8 @@ tls_ctx_half_open_alloc (void)
>        memset (ctx, 0, sizeof (*ctx));
>        ctx_index = ctx - tm->half_open_ctx_pool;
>      }
> +      clib_rwlock_writer_unlock (&tm->half_open_rwlock);
>    return ctx_index;
> }
>  
>  
>  
>  
>  
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#10302): https://lists.fd.io/g/vpp-dev/message/10302 
> <https://lists.fd.io/g/vpp-dev/message/10302>
> Mute This Topic: https://lists.fd.io/mt/24975100/675152 
> <https://lists.fd.io/mt/24975100/675152>
> Group Owner: vpp-dev+ow...@lists.fd.io <mailto:vpp-dev+ow...@lists.fd.io>
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub 
> <https://lists.fd.io/g/vpp-dev/unsub>  [fcoras.li...@gmail.com 
> <mailto:fcoras.li...@gmail.com>]
> -=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10320): https://lists.fd.io/g/vpp-dev/message/10320
Mute This Topic: https://lists.fd.io/mt/24975100/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] TLS half open lock

Reply via email to