Hi Dave,
we are using VPP Version *21.10.*

Thanks and regards,
Sudhir

On Fri, Mar 10, 2023 at 5:31 PM Dave Barach <v...@barachs.net> wrote:

> I should have had the sense to ask this earlier: which version of vpp are
> you using?
>
>
>
> The line number in your debug snippet is more than 100 lines off from
> master/latest. The timer wheel code has been relatively untouched, but
> there have been several important fixes over the years...
>
>
>
> D.
>
>
>
> diff --git a/src/vlib/main.c b/src/vlib/main.c
> index af0fcd1cb..55c231d8b 100644
> --- a/src/vlib/main.c
> +++ b/src/vlib/main.c
> @@ -1490,6 +1490,9 @@ dispatch_suspended_process (vlib_main_t * vm,
>      }
>    else
>      {
> +           if (strcmp((char *)node->name, "rtb-vpp-epoll-process") == 0) {
> +                   ASSERT(0);
> +           }
>
>
>
> *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Sudhir
> CR via lists.fd.io
> *Sent:* Thursday, March 9, 2023 4:00 AM
> *To:* vpp-dev@lists.fd.io
> *Cc:* rtbrick....@lists.fd.io
> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>
>
>
> Hi Dave,
>
> Please excuse my delayed response. It took some time to recreate this
> issue.
>
> I made changes to our process node as per your suggestion. now our process
> node code looks like this
>
>
>
> while (1) {
>
>         vlib_process_wait_for_event_or_clock (vm,
> RTB_VPP_EPOLL_PROCESS_NODE_TIMER);
>         event_type = vlib_process_get_events (vm, &event_data);
>         vec_reset_length(event_data);
>
>         switch (event_type) {
>             case ~0: /* handle timer expirations */
>                 rtb_event_loop_run_once ();
>                 break;
>
>             default: /* bug! */
>                 ASSERT (0);
>         }
>     }
>
> After these changes we didn't observe any assertions but we hit the
> process node suspend issue. with this it is clear other than time out we
> are not getting any other events.
>
>
>
> In the issue state I have collected vlib_process node
> (rtb_vpp_epoll_process) flags value and it seems to be correct (flags = 11).
>
>
>
> Please find the vlib_process_t and vlib_node_t data structure values
> collected in the issue state below.
>
>
>
> vlib_process_t:
>
> ============
>
> $38 = {
>   cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177",
>   node_runtime = {
>     cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177",
>     function = 0x7f9b2bbc7e80 <rtb_vpp_epoll_process>,
>     errors = 0x7f9b3076a560,
>     clocks_since_last_overflow = 0,
>     max_clock = 3785970526,
>     max_clock_n = 0,
>     calls_since_last_overflow = 0,
>     vectors_since_last_overflow = 0,
>     next_frame_index = 1668,
>     node_index = 437,
>     input_main_loops_per_call = 0,
>     main_loop_count_last_dispatch = 4147405645,
>     main_loop_vector_stats = {0, 0},
>     flags = 0,
>     state = 0,
>     n_next_nodes = 0,
>     cached_next_index = 0,
>     thread_index = 0,
>     runtime_data = 0x7f9b2da503c6 ""
>   },
>   return_longjmp = {
>     regs = {94502584873984, 140304430422064, 140306731463680,
> 94502584874048, 94502640552512, 0, 140304430422032, 140306703608766}
>   },
>   resume_longjmp = {
>     regs = {94502584873984, 140304161734368, 140306731463680,
> 94502584874048, 94502640552512, 0, 140304161734272, 140304430441787}
>   },
>   *flags = 11, *
>   log2_n_stack_bytes = 16,
>   suspended_process_frame_index = 0,
>   n_suspends = 0,
>   pending_event_data_by_type_index = 0x7f9b307b8310,
>   non_empty_event_type_bitmap = 0x7f9b307b8390,
>   one_time_event_type_bitmap = 0x0,
>   event_type_index_by_type_opaque = 0x7f9b2dab8bd8,
>   event_type_pool = 0x7f9b2dcb5978,
>   resume_clock_interval = 1000,
>   stop_timer_handle = 3098,
>   output_function = 0x0,
>   output_function_arg = 0,
>   stack = 0x7f9b1bb78000
> }
>
>
>
> vlib_node_t
>
> =========
>
>  (gdb) p *n
>
> $17 = {
>   function = 0x7f9b2bbc7e80 <rtb_vpp_epoll_process>,
>   name = 0x7f9b3076a3f0 "rtb-vpp-epoll-process",
>   name_elog_string = 11783,
>   stats_total = {
>     calls = 0,
>     vectors = 0,
>     clocks = 1971244932732,
>     suspends = 6847366,
>     max_clock = 3785970526,
>     max_clock_n = 0
>   },
>   stats_last_clear = {
>     calls = 0,
>     vectors = 0,
>     clocks = 0,
>     suspends = 0,
>     max_clock = 0,
>     max_clock_n = 0
>   },
>   type = VLIB_NODE_TYPE_PROCESS,
>   index = 437,
>   runtime_index = 40,
>   runtime_data = 0x0,
>   flags = 0,
>   state = 0 '\000',
>   runtime_data_bytes = 0 '\000',
>   protocol_hint = 0 '\000',
>   n_errors = 0,
>   scalar_size = 0,
>   vector_size = 0,
>   error_heap_handle = 0,
>   error_heap_index = 0,
>   error_counters = 0x0,
>   next_node_names = 0x7f9b3076a530,
>   next_nodes = 0x0,
>   sibling_of = 0x0,
>   sibling_bitmap = 0x0,
>   n_vectors_by_next_node = 0x0,
>   next_slot_by_node = 0x0,
>   prev_node_bitmap = 0x0,
>   owner_node_index = 4294967295,
>   owner_next_index = 4294967295,
>   format_buffer = 0x0,
>   unformat_buffer = 0x0,
>   format_trace = 0x0,
>   validate_frame = 0x0,
>   state_string = 0x0,
>   node_fn_registrations = 0x0
> }
>
>
>
> I added an assert statement before clearing *VLIB_PROCESS_IS_RUNNING* flag
> in *dispatch_suspended_process* function.
>
> But this assert statement is not hitting.
>
>
>
> diff --git a/src/vlib/main.c b/src/vlib/main.c
> index af0fcd1cb..55c231d8b 100644
> --- a/src/vlib/main.c
> +++ b/src/vlib/main.c
> @@ -1490,6 +1490,9 @@ dispatch_suspended_process (vlib_main_t * vm,
>      }
>    else
>      {
> +           if (strcmp((char *)node->name, "rtb-vpp-epoll-process") == 0) {
> +                   ASSERT(0);
> +           }
>        p->flags &= ~VLIB_PROCESS_IS_RUNNING;
>        pool_put_index (nm->suspended_process_frames,
>                       p->suspended_process_frame_index);
>
>
>
> I am not able to figure out why this process node is suspended in some
> scenarios. Can you please help me by providing some pointers to debug and
> resolve this issue.
>
>
>
> Hi Jinsh,
>
> I applied your patch to my code. The issue is not solved with your patch.
> Thank you for helping me out.
>
>
>
> Thanks and Regards,
>
> Sudhir
>
>
>
>
>
> On Fri, Mar 3, 2023 at 12:53 PM Sudhir CR via lists.fd.io <sudhir=
> rtbrick....@lists.fd.io> wrote:
>
> Hi Chetan,
>
> In our case we are observing this issue occasionally exact steps  to
> recreate the issue are not known.
>
> I made changes to our process node as suggested by dave and with these
> changes trying to recreate the issue.
>
> Soon I will update my results and findings in this mail thread.
>
>
>
> Thanks and Regards,
>
> Sudhir
>
>
>
> On Fri, Mar 3, 2023 at 12:37 PM chetan bhasin <chetan.bhasin...@gmail.com>
> wrote:
>
> Hi Sudhir,
>
>
>
> Is your issue resolved?
>
>
>
> Actually we are facing same issue on vpp.2106.
>
> In our case "api-rx-ring" is not getting called.
>
> in our usecase workers are calling some functions in main-thread context
> leading to RPC message and memory is allocated from api section.
>
> This leads to Api-segment memory is used fully and leads to crash.
>
>
>
> Thanks,
>
> Chetan
>
>
>
> On Mon, Feb 20, 2023, 18:24 Sudhir CR via lists.fd.io <sudhir=
> rtbrick....@lists.fd.io> wrote:
>
> Hi Dave,
>
> Thank you very much for your inputs. I will try this out and get back to
> you with the results.
>
>
>
> Regards,
>
> Sudhir
>
>
>
> On Mon, Feb 20, 2023 at 6:01 PM Dave Barach <v...@barachs.net> wrote:
>
> Please try something like this, to eliminate the possibility that some bit
> of code is sending this process an event. It’s not a good idea to skip the
> vec_reset_length (event_data) step.
>
>
>
> while (1)
>
> {
>
>    uword event_type, * event_data = 0;
>
>    int i;
>
>
>
>    vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */);
>
>
>
>    event_type = vlib_process_get_events (vm, &event_data);
>
>
>
>    switch (event_type) {
>
>   case ~0: /* handle timer expirations */
>
>        rtb_event_loop_run_once ();
>
>        break;
>
>
>
>    default: /* bug! */
>
>        ASSERT (0);
>
>    }
>
>
>
>    vec_reset_length(event_data);
>
> }
>
>
>
> *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Sudhir
> CR via lists.fd.io
> *Sent:* Monday, February 20, 2023 4:02 AM
> *To:* vpp-dev@lists.fd.io
> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>
>
>
> Hi Dave,
> Thank you for your response and help.
>
>
>
> Please find the additional details below.
>
> VPP Version *21.10*
>
>
> We are creating a process node* rtb-vpp-epoll-process *to handle control
> plane events like interface add/delete, route add/delete.
> This process node waits for *10ms* of time (Not Interested in any events
> ) once 10ms is expired it will process control plane events mentioned above.
>
> code snippet looks like below
>
>
>
> ```
>
> static uword
> rtb_vpp_epoll_process (vlib_main_t                 *vm,
>                        vlib_node_runtime_t  *rt,
>                        vlib_frame_t         *f)
> {
>
>     ...
>     ...
>     while (1) {
>         vlib_process_wait_for_event_or_clock (vm, 10e-3);
>         vlib_process_get_events (vm, NULL);
>
>         rtb_event_loop_run_once();   *<---- controlplane events handling*
>     }
> }
> ```
>
> What we observed is that sometimes (when there is a high controlplane load
> like request to install more routes) "rtb-vpp-epoll-process" is suspended
> and not scheduled furever. this we found by using "show runtime
> rtb-vpp-epoll-process"*  (*in "show runtime rtb-vpp-epoll-process"
> command output suspends counter is not incrementing.)
>
> *show runtime output in working case :*
>
>
> ```
> DBGvpp# show runtime rtb-vpp-epoll-process
>              Name                 State         Calls          Vectors
>    *Suspends*         Clocks       Vectors/Call
> rtb-vpp-epoll-process           any wait                 0               0
>          *192246*          1.91e6            0.00
> DBGvpp#
>
> DBGvpp# show runtime rtb-vpp-epoll-process
>              Name                 State         Calls          Vectors
>    *Suspends*         Clocks       Vectors/Call
> rtb-vpp-epoll-process           any wait                 0               0
>          *193634*          1.89e6            0.00
> DBGvpp#
>
> ```
>
>
> *show runtime output in issue case :```*
>
> DBGvpp# show runtime rtb-vpp-epoll-process
>
>              Name                 State         Calls          Vectors        
> *Suspends*         Clocks       Vectors/Call
>
> rtb-vpp-epoll-process           any wait                 0               0    
>        *81477*          7.08e6            0.00
>
> DBGvpp# show runtime rtb-vpp-epoll-process
>
>              Name                 State         Calls          Vectors        
> *Suspends *        Clocks       Vectors/Call
>
> rtb-vpp-epoll-process           any wait                 0               0    
>        *81477*          7.08e6            0.00
>
> *```*
>
> Other process nodes like lldp-process,
> ip4-neighbor-age-process, ip6-ra-process running without any issue. only
> "rtb-vpp-epoll-process" process node suspended forever.
>
>
>
> Please let me know if any additional information is required.
>
> Hi Jinsh,
> Thanks for pointing me to the issue you faced. The issue I am facing looks
> similar.
> I will verify with the given patch.
>
>
> Thanks and Regards,
>
> Sudhir
>
>
>
> On Sun, Feb 19, 2023 at 6:19 AM jinsh11 <jins...@chinatelecom.cn> wrote:
>
> HI:
>
>    - I have the same problem,
>
> bfd process node stop running. I raised this issue,
>
> https://lists.fd.io/g/vpp-dev/message/22380
> I think there is a problem with the porcess scheduling module when using
> the time wheel.
>
>
>
>
>
> NOTICE TO RECIPIENT This e-mail message and any attachments are
> confidential and may be privileged. If you received this e-mail in error,
> any review, use, dissemination, distribution, or copying of this e-mail is
> strictly prohibited. Please notify us immediately of the error by return
> e-mail and please delete this message from your system. For more
> information about Rtbrick, please visit us at www.rtbrick.com
>
>
>
>
>
> NOTICE TO RECIPIENT This e-mail message and any attachments are
> confidential and may be privileged. If you received this e-mail in error,
> any review, use, dissemination, distribution, or copying of this e-mail is
> strictly prohibited. Please notify us immediately of the error by return
> e-mail and please delete this message from your system. For more
> information about Rtbrick, please visit us at www.rtbrick.com
>
>
>
>
>
>
>
> NOTICE TO RECIPIENT This e-mail message and any attachments are
> confidential and may be privileged. If you received this e-mail in error,
> any review, use, dissemination, distribution, or copying of this e-mail is
> strictly prohibited. Please notify us immediately of the error by return
> e-mail and please delete this message from your system. For more
> information about Rtbrick, please visit us at www.rtbrick.com
>
>
>
>
>
> NOTICE TO RECIPIENT This e-mail message and any attachments are
> confidential and may be privileged. If you received this e-mail in error,
> any review, use, dissemination, distribution, or copying of this e-mail is
> strictly prohibited. Please notify us immediately of the error by return
> e-mail and please delete this message from your system. For more
> information about Rtbrick, please visit us at www.rtbrick.com
>
> 
>
>

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 
<http://www.rtbrick.com>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22691): https://lists.fd.io/g/vpp-dev/message/22691
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to