Thank you Florin!

On Tue, Nov 24, 2020 at 8:54 PM Florin Coras <fcoras.li...@gmail.com> wrote:

> Hi Bill,
>
> Ah, you must manually enable “life cycle” events since those are
> responsible for track initialization, i.e., tcp debug group 0 level 1. To
> avoid the hassle, here’s a patch [1] that enables them whenever events for
> another group are enabled.
>
> Regards,
> Florin
>
> [1] https://gerrit.fd.io/r/c/vpp/+/30116
>
> On Nov 24, 2020, at 7:35 AM, Bill Vaughan <billvaug...@gmail.com> wrote:
>
> Thanks for the feedback Florin. I am just now getting back to this.
>
> Perhaps I am doing something wrong to enable the TCP debug, but it looks
> pretty simple.
>
> I enable the flags in tcp_debug.h:
>
> diff --git a/src/vnet/tcp/tcp_debug.h b/src/vnet/tcp/tcp_debug.h
> index f4e7c3971..8806d8e14 100644
> --- a/src/vnet/tcp/tcp_debug.h
> +++ b/src/vnet/tcp/tcp_debug.h
> @@ -23,14 +23,14 @@
>   * debug configuration. Comes with some overhead so it's not recommended
> for
>   * production/performance scenarios. Takes priority over TCP_DEBUG_ENABLE.
>   */
> -#define TCP_DEBUG_ALWAYS (0)
> +#define TCP_DEBUG_ALWAYS (1)
>  /**
>   * Build debugging infra only if enabled. Debug components controlled via
>   * macros that follow.
>   */
>  #define TCP_DEBUG_ENABLE (0)
>
> -#define TCP_DEBUG_SM (0)
> +#define TCP_DEBUG_SM (1)
>  #define TCP_DEBUG_CC (0)
>  #define TCP_DEBUG_CS (0)
>  #define TCP_DEBUG_LC (0 || TCP_DEBUG_SM || TCP_DEBUG_CC || TCP_DEBUG_CS)
>
>
> Do a full rebuild (I have tried incremental and full).
>
> Then enable TCP debug via the CLI:
>
> spoke1# tcp debug ?
>   tcp debug                                tcp [show] [debug group <N>
> level <N>]
> spoke1# tcp debug show
> Index     Group                         Level
> 0         life cycle                    0
> 1         state machine                 0
> 2         congestion control            0
> 3         cc stats                      0
> spoke1# tcp debug group 1 level 10
> spoke1# tcp debug show
> Index     Group                         Level
> 0         life cycle                    0
> 1         state machine                 10
> 2         congestion control            0
> 3         cc stats                      0
>
> <---- Start traffic ---->
>
> spoke1# /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.c:208
> (elog_track_register) assertion `t->name' fails
>
> Thread 1 "vpp_main" received signal SIGABRT, Aborted.
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) up
> #1  0x00007ffff46c68b1 in __GI_abort () at abort.c:79
> 79 abort.c: No such file or directory.
> (gdb)
> #2  0x0000000000407113 in os_panic () at
> /home/bvaughan/espace/grid_vpp/src/vpp/vnet/main.c:368
> 368  abort ();
> (gdb)
> #3  0x00007ffff55a5ad9 in debugger () at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/error.c:84
> 84  os_panic ();
> (gdb)
> #4  0x00007ffff55a5857 in _clib_error (how_to_die=2, function_name=0x0,
> line_number=0,
>     fmt=0x7ffff56753af "%s:%d (%s) assertion `%s' fails")
>     at /home/bvaughan/espace/grid_vpp/src/vppinfra/error.c:143
> 143    debugger ();
> (gdb)
> #5  0x00007ffff5598e54 in elog_track_register (em=0x7ffff6432f40
> <vlib_global_main+960>, t=0x7fffbd40e260)
>     at /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.c:208
> 208  ASSERT (t->name);
> (gdb) p /x *t
> *$1 = {name = 0x0, track_index_plus_one = 0x3}*
> (gdb) where
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007ffff46c68b1 in __GI_abort () at abort.c:79
> #2  0x0000000000407113 in os_panic () at
> /home/bvaughan/espace/grid_vpp/src/vpp/vnet/main.c:368
> #3  0x00007ffff55a5ad9 in debugger () at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/error.c:84
> #4  0x00007ffff55a5857 in _clib_error (how_to_die=2, function_name=0x0,
> line_number=0, fmt=0x7ffff56753af "%s:%d (%s) assertion `%s' fails") at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/error.c:143
> #5  0x00007ffff5598e54 in elog_track_register (em=0x7ffff6432f40
> <vlib_global_main+960>, t=0x7fffbd40e260) at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.c:208
> #6  0x00007ffff5597f49 in elog_event_data_inline (em=0x7ffff6432f40
> <vlib_global_main+960>, type=0x7ffff7b34be8 <tcp_send_synack._e>,
> track=0x7fffbd40e260, cpu_time=2715086657921106) at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.h:312
> #7  0x00007ffff5597e5d in elog_event_data (em=0x7ffff6432f40
> <vlib_global_main+960>, type=0x7ffff7b34be8 <tcp_send_synack._e>,
> track=0x7fffbd40e260, cpu_time=2715086657921106) at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.c:68
> #8  0x00007ffff707a9e0 in elog_event_data_not_inline (em=0x7ffff6432f40
> <vlib_global_main+960>, type=0x7ffff7b34be8 <tcp_send_synack._e>,
> track=0x7fffbd40e260, cpu_time=2715086657921106) at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.h:358
> #9  0x00007ffff706d135 in elog_data (em=0x7ffff6432f40
> <vlib_global_main+960>, type=0x7ffff7b34be8 <tcp_send_synack._e>,
> track=0x7fffbd40e260) at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/elog.h:428
> #10 0x00007ffff706db40 in tcp_send_synack (tc=0x7fffbd40e200) at
> /home/bvaughan/espace/grid_vpp/src/vnet/tcp/tcp_output.c:856
> #11 0x00007ffff7082cd3 in tcp46_listen_inline (vm=0x7ffff6432b80
> <vlib_global_main>, node=0x7fffb61bbe80, from_frame=0x7fffbd3b7a40,
> is_ip4=1) at /home/bvaughan/espace/grid_vpp/src/vnet/tcp/tcp_input.c:2641
> #12 0x00007ffff70826ba in tcp4_listen_node_fn (vm=0x7ffff6432b80
> <vlib_global_main>, node=0x7fffb61bbe80, from_frame=0x7fffbd3b7a40) at
> /home/bvaughan/espace/grid_vpp/src/vnet/tcp/tcp_input.c:2667
> #13 0x00007ffff6178d36 in dispatch_node (vm=0x7ffff6432b80
> <vlib_global_main>, node=0x7fffb61bbe80, type=VLIB_NODE_TYPE_INTERNAL,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7fffbd3b7a40,
> last_time_stamp=2715086657730646) at
> /home/bvaughan/espace/grid_vpp/src/vlib/main.c:1194
> #14 0x00007ffff6179817 in dispatch_pending_node (vm=0x7ffff6432b80
> <vlib_global_main>, pending_frame_index=6,
> last_time_stamp=2715086657730646) at
> /home/bvaughan/espace/grid_vpp/src/vlib/main.c:1353
> #15 0x00007ffff6173400 in vlib_main_or_worker_loop (vm=0x7ffff6432b80
> <vlib_global_main>, is_main=1) at
> /home/bvaughan/espace/grid_vpp/src/vlib/main.c:1846
> #16 0x00007ffff617518a in vlib_main_loop (vm=0x7ffff6432b80
> <vlib_global_main>) at /home/bvaughan/espace/grid_vpp/src/vlib/main.c:1974
> #17 0x00007ffff6174f57 in vlib_main (vm=0x7ffff6432b80 <vlib_global_main>,
> input=0x7fffb6bf3fa8) at /home/bvaughan/espace/grid_vpp/src/vlib/main.c:2220
> #18 0x00007ffff61fd84e in thread0 (arg=140737324985216) at
> /home/bvaughan/espace/grid_vpp/src/vlib/unix/main.c:660
> #19 0x00007ffff55c9c64 in clib_calljmp () at
> /home/bvaughan/espace/grid_vpp/src/vppinfra/longjmp.S:123
> #20 0x00007fffffffd190 in ?? ()
> #21 0x00007ffff61fd3c3 in vlib_unix_main (argc=71, argv=0x71e510) at
> /home/bvaughan/espace/grid_vpp/src/vlib/unix/main.c:733
> #22 0x00000000004067f6 in main (argc=71, argv=0x71e510) at
> /home/bvaughan/espace/grid_vpp/src/vpp/vnet/main.c:293
> (gdb)
>
>
>
> t->name is set in the TCPEVT_INIT_HANDLER macro which does not appear to
> be getting called. I added a print to that macro and it was never hit.
>
>
> Regarding the main issue of the the tcp dispatch error:
>
> tcp_input_dispatch_buffer:2808: tcp conn 138 disp error state TIME_WAIT
>> flags 0x12 SYN ACK
>>
>>
> You mentioned that this could be a threading issue.  I have been running
> only with the main thread for all of this testing (no workers).  I plan to
> add those, but I want to test for correctness first then add performance.
> My understanding is that if I do not have any workers, then VPP runs
> everything out of the main thread including the Process Nodes. I am using
> the Process Nodes right now without locks since I don't have threading
> enabled.  Please let me know if my threading assumption is incorrect.
>
> Also, the client on the other end of the TCP connection is from Ubuntu
> 18.04. I am testing with ssh/scp/iperf3.
>
> Thanks,
> -Bill
>
>
>
>
>
>
>
> On Sun, Nov 22, 2020 at 1:24 PM Florin Coras <fcoras.li...@gmail.com>
> wrote:
>
>> Hi Bill,
>>
>> Inline.
>>
>> On Nov 22, 2020, at 6:17 AM, Bill Vaughan <billvaug...@gmail.com> wrote:
>>
>> I should have stated that I am running 2009.
>>
>> On Sun, Nov 22, 2020 at 8:48 AM Bill Vaughan via lists.fd.io
>> <billvaughan=gmail....@lists.fd.io> wrote:
>>
>>> Hi all,
>>>
>>> I am working on a project where I need to initiate TCP connections from
>>> the VPP session layer. In my overnight test runs I am seeing a TCP dispatch
>>> error that I am trying to track down. This happens after thousands of test
>>> runs. It looks like I am able to issue a Connect call but the SYN-ACK from
>>> the other side is coming back to a TCP conn that is in a TIME_WAIT state.
>>>
>>> tcp_input_dispatch_buffer:2808: tcp conn 138 disp error state TIME_WAIT
>>> flags 0x12 SYN ACK
>>>
>>>
>> That’s somewhat unexpected, if the peer is correctly implemented. The
>> only thing that comes to mind is to check that your app closes sessions
>> from the right threads.
>>
>>
>>> I wanted to get some TCP state machine debug data but it appears the
>>> tcp_debug macros are not hooked up.
>>>
>>
>> That’s on purpose because of performance degradation although there’s an
>> option of leaving it always on by enabling TCP_DEBUG_ALWAYS and then
>> configuring the features you want debugged.
>>
>> I have enabled all of the compile time debug flags in tcp_debug.h
>>> but tcp_evt_track_register() is not getting called and this causes an
>>> assert in the elog code. Looking through the TCP connection setup code I
>>> don't see where the debug handler macros are getting called
>>> (TCP_EVT_OPEN_HANDLER, TCP_EVT_CLOSE_HANDLER, etc).  I tried
>>> adding TCP_EVT_OPEN_HANDLER to tcp_session_open()
>>> and tcp_evt_track_register() was hit, but then it asserted somewhere else.
>>>
>>
>> TCP_EVT (TCP_EVT_OPEN, tc) is called in tcp_session_open and TCP_EVT
>> (TCP_EVT_SYN_RCVD, child, 1) is called in tcp46_listen_inline. That is,
>> when connections are set up, that’s when we setup the elog track.
>>
>> Regards,
>> Florin
>>
>>
>>> Before I spend too much more time going down this rabbit hole, I wanted
>>> to see if anyone knows why the debug macros are not hooked up.
>>>
>>> Thanks,
>>> -Bill
>>>
>>>
>>>
>>>
>>>
>>>
>> 
>>
>>
>>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18134): https://lists.fd.io/g/vpp-dev/message/18134
Mute This Topic: https://lists.fd.io/mt/78430523/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to