Hi Dave, we are using VPP Version *21.10.* Thanks and regards, Sudhir
On Fri, Mar 10, 2023 at 5:31 PM Dave Barach <v...@barachs.net> wrote: > I should have had the sense to ask this earlier: which version of vpp are > you using? > > > > The line number in your debug snippet is more than 100 lines off from > master/latest. The timer wheel code has been relatively untouched, but > there have been several important fixes over the years... > > > > D. > > > > diff --git a/src/vlib/main.c b/src/vlib/main.c > index af0fcd1cb..55c231d8b 100644 > --- a/src/vlib/main.c > +++ b/src/vlib/main.c > @@ -1490,6 +1490,9 @@ dispatch_suspended_process (vlib_main_t * vm, > } > else > { > + if (strcmp((char *)node->name, "rtb-vpp-epoll-process") == 0) { > + ASSERT(0); > + } > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Sudhir > CR via lists.fd.io > *Sent:* Thursday, March 9, 2023 4:00 AM > *To:* vpp-dev@lists.fd.io > *Cc:* rtbrick....@lists.fd.io > *Subject:* Re: [vpp-dev] process node suspended indefinitely > > > > Hi Dave, > > Please excuse my delayed response. It took some time to recreate this > issue. > > I made changes to our process node as per your suggestion. now our process > node code looks like this > > > > while (1) { > > vlib_process_wait_for_event_or_clock (vm, > RTB_VPP_EPOLL_PROCESS_NODE_TIMER); > event_type = vlib_process_get_events (vm, &event_data); > vec_reset_length(event_data); > > switch (event_type) { > case ~0: /* handle timer expirations */ > rtb_event_loop_run_once (); > break; > > default: /* bug! */ > ASSERT (0); > } > } > > After these changes we didn't observe any assertions but we hit the > process node suspend issue. with this it is clear other than time out we > are not getting any other events. > > > > In the issue state I have collected vlib_process node > (rtb_vpp_epoll_process) flags value and it seems to be correct (flags = 11). > > > > Please find the vlib_process_t and vlib_node_t data structure values > collected in the issue state below. > > > > vlib_process_t: > > ============ > > $38 = { > cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177", > node_runtime = { > cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177", > function = 0x7f9b2bbc7e80 <rtb_vpp_epoll_process>, > errors = 0x7f9b3076a560, > clocks_since_last_overflow = 0, > max_clock = 3785970526, > max_clock_n = 0, > calls_since_last_overflow = 0, > vectors_since_last_overflow = 0, > next_frame_index = 1668, > node_index = 437, > input_main_loops_per_call = 0, > main_loop_count_last_dispatch = 4147405645, > main_loop_vector_stats = {0, 0}, > flags = 0, > state = 0, > n_next_nodes = 0, > cached_next_index = 0, > thread_index = 0, > runtime_data = 0x7f9b2da503c6 "" > }, > return_longjmp = { > regs = {94502584873984, 140304430422064, 140306731463680, > 94502584874048, 94502640552512, 0, 140304430422032, 140306703608766} > }, > resume_longjmp = { > regs = {94502584873984, 140304161734368, 140306731463680, > 94502584874048, 94502640552512, 0, 140304161734272, 140304430441787} > }, > *flags = 11, * > log2_n_stack_bytes = 16, > suspended_process_frame_index = 0, > n_suspends = 0, > pending_event_data_by_type_index = 0x7f9b307b8310, > non_empty_event_type_bitmap = 0x7f9b307b8390, > one_time_event_type_bitmap = 0x0, > event_type_index_by_type_opaque = 0x7f9b2dab8bd8, > event_type_pool = 0x7f9b2dcb5978, > resume_clock_interval = 1000, > stop_timer_handle = 3098, > output_function = 0x0, > output_function_arg = 0, > stack = 0x7f9b1bb78000 > } > > > > vlib_node_t > > ========= > > (gdb) p *n > > $17 = { > function = 0x7f9b2bbc7e80 <rtb_vpp_epoll_process>, > name = 0x7f9b3076a3f0 "rtb-vpp-epoll-process", > name_elog_string = 11783, > stats_total = { > calls = 0, > vectors = 0, > clocks = 1971244932732, > suspends = 6847366, > max_clock = 3785970526, > max_clock_n = 0 > }, > stats_last_clear = { > calls = 0, > vectors = 0, > clocks = 0, > suspends = 0, > max_clock = 0, > max_clock_n = 0 > }, > type = VLIB_NODE_TYPE_PROCESS, > index = 437, > runtime_index = 40, > runtime_data = 0x0, > flags = 0, > state = 0 '\000', > runtime_data_bytes = 0 '\000', > protocol_hint = 0 '\000', > n_errors = 0, > scalar_size = 0, > vector_size = 0, > error_heap_handle = 0, > error_heap_index = 0, > error_counters = 0x0, > next_node_names = 0x7f9b3076a530, > next_nodes = 0x0, > sibling_of = 0x0, > sibling_bitmap = 0x0, > n_vectors_by_next_node = 0x0, > next_slot_by_node = 0x0, > prev_node_bitmap = 0x0, > owner_node_index = 4294967295, > owner_next_index = 4294967295, > format_buffer = 0x0, > unformat_buffer = 0x0, > format_trace = 0x0, > validate_frame = 0x0, > state_string = 0x0, > node_fn_registrations = 0x0 > } > > > > I added an assert statement before clearing *VLIB_PROCESS_IS_RUNNING* flag > in *dispatch_suspended_process* function. > > But this assert statement is not hitting. > > > > diff --git a/src/vlib/main.c b/src/vlib/main.c > index af0fcd1cb..55c231d8b 100644 > --- a/src/vlib/main.c > +++ b/src/vlib/main.c > @@ -1490,6 +1490,9 @@ dispatch_suspended_process (vlib_main_t * vm, > } > else > { > + if (strcmp((char *)node->name, "rtb-vpp-epoll-process") == 0) { > + ASSERT(0); > + } > p->flags &= ~VLIB_PROCESS_IS_RUNNING; > pool_put_index (nm->suspended_process_frames, > p->suspended_process_frame_index); > > > > I am not able to figure out why this process node is suspended in some > scenarios. Can you please help me by providing some pointers to debug and > resolve this issue. > > > > Hi Jinsh, > > I applied your patch to my code. The issue is not solved with your patch. > Thank you for helping me out. > > > > Thanks and Regards, > > Sudhir > > > > > > On Fri, Mar 3, 2023 at 12:53 PM Sudhir CR via lists.fd.io <sudhir= > rtbrick....@lists.fd.io> wrote: > > Hi Chetan, > > In our case we are observing this issue occasionally exact steps to > recreate the issue are not known. > > I made changes to our process node as suggested by dave and with these > changes trying to recreate the issue. > > Soon I will update my results and findings in this mail thread. > > > > Thanks and Regards, > > Sudhir > > > > On Fri, Mar 3, 2023 at 12:37 PM chetan bhasin <chetan.bhasin...@gmail.com> > wrote: > > Hi Sudhir, > > > > Is your issue resolved? > > > > Actually we are facing same issue on vpp.2106. > > In our case "api-rx-ring" is not getting called. > > in our usecase workers are calling some functions in main-thread context > leading to RPC message and memory is allocated from api section. > > This leads to Api-segment memory is used fully and leads to crash. > > > > Thanks, > > Chetan > > > > On Mon, Feb 20, 2023, 18:24 Sudhir CR via lists.fd.io <sudhir= > rtbrick....@lists.fd.io> wrote: > > Hi Dave, > > Thank you very much for your inputs. I will try this out and get back to > you with the results. > > > > Regards, > > Sudhir > > > > On Mon, Feb 20, 2023 at 6:01 PM Dave Barach <v...@barachs.net> wrote: > > Please try something like this, to eliminate the possibility that some bit > of code is sending this process an event. It’s not a good idea to skip the > vec_reset_length (event_data) step. > > > > while (1) > > { > > uword event_type, * event_data = 0; > > int i; > > > > vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */); > > > > event_type = vlib_process_get_events (vm, &event_data); > > > > switch (event_type) { > > case ~0: /* handle timer expirations */ > > rtb_event_loop_run_once (); > > break; > > > > default: /* bug! */ > > ASSERT (0); > > } > > > > vec_reset_length(event_data); > > } > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Sudhir > CR via lists.fd.io > *Sent:* Monday, February 20, 2023 4:02 AM > *To:* vpp-dev@lists.fd.io > *Subject:* Re: [vpp-dev] process node suspended indefinitely > > > > Hi Dave, > Thank you for your response and help. > > > > Please find the additional details below. > > VPP Version *21.10* > > > We are creating a process node* rtb-vpp-epoll-process *to handle control > plane events like interface add/delete, route add/delete. > This process node waits for *10ms* of time (Not Interested in any events > ) once 10ms is expired it will process control plane events mentioned above. > > code snippet looks like below > > > > ``` > > static uword > rtb_vpp_epoll_process (vlib_main_t *vm, > vlib_node_runtime_t *rt, > vlib_frame_t *f) > { > > ... > ... > while (1) { > vlib_process_wait_for_event_or_clock (vm, 10e-3); > vlib_process_get_events (vm, NULL); > > rtb_event_loop_run_once(); *<---- controlplane events handling* > } > } > ``` > > What we observed is that sometimes (when there is a high controlplane load > like request to install more routes) "rtb-vpp-epoll-process" is suspended > and not scheduled furever. this we found by using "show runtime > rtb-vpp-epoll-process"* (*in "show runtime rtb-vpp-epoll-process" > command output suspends counter is not incrementing.) > > *show runtime output in working case :* > > > ``` > DBGvpp# show runtime rtb-vpp-epoll-process > Name State Calls Vectors > *Suspends* Clocks Vectors/Call > rtb-vpp-epoll-process any wait 0 0 > *192246* 1.91e6 0.00 > DBGvpp# > > DBGvpp# show runtime rtb-vpp-epoll-process > Name State Calls Vectors > *Suspends* Clocks Vectors/Call > rtb-vpp-epoll-process any wait 0 0 > *193634* 1.89e6 0.00 > DBGvpp# > > ``` > > > *show runtime output in issue case :```* > > DBGvpp# show runtime rtb-vpp-epoll-process > > Name State Calls Vectors > *Suspends* Clocks Vectors/Call > > rtb-vpp-epoll-process any wait 0 0 > *81477* 7.08e6 0.00 > > DBGvpp# show runtime rtb-vpp-epoll-process > > Name State Calls Vectors > *Suspends * Clocks Vectors/Call > > rtb-vpp-epoll-process any wait 0 0 > *81477* 7.08e6 0.00 > > *```* > > Other process nodes like lldp-process, > ip4-neighbor-age-process, ip6-ra-process running without any issue. only > "rtb-vpp-epoll-process" process node suspended forever. > > > > Please let me know if any additional information is required. > > Hi Jinsh, > Thanks for pointing me to the issue you faced. The issue I am facing looks > similar. > I will verify with the given patch. > > > Thanks and Regards, > > Sudhir > > > > On Sun, Feb 19, 2023 at 6:19 AM jinsh11 <jins...@chinatelecom.cn> wrote: > > HI: > > - I have the same problem, > > bfd process node stop running. I raised this issue, > > https://lists.fd.io/g/vpp-dev/message/22380 > I think there is a problem with the porcess scheduling module when using > the time wheel. > > > > > > NOTICE TO RECIPIENT This e-mail message and any attachments are > confidential and may be privileged. If you received this e-mail in error, > any review, use, dissemination, distribution, or copying of this e-mail is > strictly prohibited. Please notify us immediately of the error by return > e-mail and please delete this message from your system. For more > information about Rtbrick, please visit us at www.rtbrick.com > > > > > > NOTICE TO RECIPIENT This e-mail message and any attachments are > confidential and may be privileged. If you received this e-mail in error, > any review, use, dissemination, distribution, or copying of this e-mail is > strictly prohibited. Please notify us immediately of the error by return > e-mail and please delete this message from your system. For more > information about Rtbrick, please visit us at www.rtbrick.com > > > > > > > > NOTICE TO RECIPIENT This e-mail message and any attachments are > confidential and may be privileged. If you received this e-mail in error, > any review, use, dissemination, distribution, or copying of this e-mail is > strictly prohibited. Please notify us immediately of the error by return > e-mail and please delete this message from your system. For more > information about Rtbrick, please visit us at www.rtbrick.com > > > > > > NOTICE TO RECIPIENT This e-mail message and any attachments are > confidential and may be privileged. If you received this e-mail in error, > any review, use, dissemination, distribution, or copying of this e-mail is > strictly prohibited. Please notify us immediately of the error by return > e-mail and please delete this message from your system. For more > information about Rtbrick, please visit us at www.rtbrick.com > > > > -- NOTICE TO RECIPIENT This e-mail message and any attachments are confidential and may be privileged. If you received this e-mail in error, any review, use, dissemination, distribution, or copying of this e-mail is strictly prohibited. Please notify us immediately of the error by return e-mail and please delete this message from your system. For more information about Rtbrick, please visit us at www.rtbrick.com <http://www.rtbrick.com>
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22691): https://lists.fd.io/g/vpp-dev/message/22691 Mute This Topic: https://lists.fd.io/mt/97032803/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-