Hi Eelco, Thank you for the quick response.
I did not add those logs, because in order to reproduce the issue, we have to send lots of packets to the host. So there are too many ukeys created/deleted to do logging. And can we ensure that this [1] is the only place for ovs to free the ukey? [1] https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084 For last mail, can this issue be concurrent read-and-update/delete? The revalidator_sweep__ is trying to lock the ukey->mutex, while another thread is updating the ukey->mutex to NULL and free ukey. LIU Yulong On Mon, Feb 26, 2024 at 5:41 PM Eelco Chaudron <echau...@redhat.com> wrote: > > > > On 26 Feb 2024, at 9:33, LIU Yulong wrote: > > > Hi, > > > > I have read the code by comparing the call stack of the core files > > carefully, and found > > a potential race condition. Please confirm whether the following 3 threads > > have a race condition. Just did some code trace, can such > > race condition happen? > > > > * PMD thread1 ===================================: > > -> pmd_thread_main > > -> dp_netdev_process_rxq_port > > -> dp_netdev_input > > -> dp_netdev_input__ > > -> handle_packet_upcall > > -> dp_netdev_upcall > > -> upcall_cb > > -> ukey_install > > -> ukey_install__ > > -> try_ukey_replace: > > ovs_mutex_lock(&new_ukey->mutex); > > <---------- the CMAP_FOR_EACH loop in the revalidator_sweep__ run a > > bit earlier than the cmap_replace next line, so the old_ukey can be > > iterated. [1] > > cmap_replace(&umap->cmap, &old_ukey->cmap_node, > > &new_ukey->cmap_node, new_ukey->hash); > > ovsrcu_postpone(ukey_delete__, old_ukey); > > <-------- delete the ukey asynchronously. [2] > > transition_ukey(old_ukey, UKEY_DELETED); <-------- > > transition the ukey state to UKEY_DELETED, most core files show that > > the ukey last state change was at this line. [3] > > transition_ukey(new_ukey, UKEY_VISIBLE); > > > > [1] > > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1892 > > [2] > > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1896 > > [3] > > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897 > > > > This function try_ukey_replace was called many times, because the > > `counter_upcall_ukey_replace` is not zero. > > For instance: > > { > > name = 0x55ba9755206b "upcall_ukey_replace", > > count = 0x55ba971c7610 <upcall_ukey_replace_count>, > > total = 2287997, > > last_total = 2287997, > > min = {221, 247, 444, 278, 324, 570, 379, 464, 283, 280, 0, 427}, > > hr = {3300, 4378, 3557, 4554, 3748, 3710, 4340, 3559, 4296, 3759, > > 3522, 4136, 3660, 4428, 3802, 3652, 3880, 3375, 4806, 4221, 4158, > > 3816, 3750, 3846, 3761, 3653, 4293, 3816, 3723, 3691, 4033, 468, 4117, > > 3659, 4007, 3536, > > 3439, 4440, 3388, 4079, 3876, 3865, 4339, 3757, 3481, 4027, 3989, > > 3633, 3737, 3564, 3403, 3992, 3793, 4390, 4124, 4354, 4164, 4383, > > 4237, 3667} > > } > > > > * RCU thread2 ===================================: > > -> ovsrcu_postpone_thread > > -> ovsrcu_call_postponed > > -> ukey_delete__ <------------ This > > function is not thead safe IMO, it has mark > > OVS_NO_THREAD_SAFETY_ANALYSIS. [4] > > > > recirc_refs_unref(&ukey->recircs); > > xlate_cache_delete(ukey->xcache); > > ofpbuf_delete(ovsrcu_get(struct ofpbuf *, &ukey->actions)); > > ovs_mutex_destroy(&ukey->mutex); <-------------- Just > > set ukey mutex to NULL. [5][6][7] > > free(ukey); > > > > [4] > > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2074 > > [5] > > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2083 > > [6] https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L131 > > [7] https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L124 > > > > * revalidator thread3 ===================================: > > > > -> udpif_revalidator > > -> revalidator_sweep > > -> revalidator_sweep__ > > > > CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) { > > enum ukey_state ukey_state; > > > > if (ovs_mutex_trylock(&ukey->mutex)) { <-------------- > > Core at here, because of the NULL pointer. [8] > > continue; > > } > > [8] > > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2900 > > > > CMIIW, if this race condition can happen, IMO, it is mostly because > > the umap is not locked during the sweep CMAP_FOR_EACH loop. > > Or some RCU protection did not work properly. > > The code looks fine to me, we are taking the lock when we update the cmap, > which is a requirement, however iterating over the cmap as done above does > not require a lock. > > If the RCU protection would have been broken, we would have seen a lot more > errors. Did you have any luck adding the traces I suggested earlier, to see > if it’s a use after free? > > Cheers, > > Eelco > > > On Wed, Feb 21, 2024 at 6:40 PM Eelco Chaudron <echau...@redhat.com> wrote: > >> > >> > >> > >> On 21 Feb 2024, at 4:26, LIU Yulong wrote: > >> > >>> Thank you very much for your reply. > >>> > >>> The problem is not easy to reproduce, we have to wait a random long time > >>> to see > >>> if the issue happens again. It can be more than one day or longer. > >>> OVS 2.17 with dpdk 20.11 had run to core before, so it's hard to say > >>> if it is related to DPDK. > >>> I'm running the ovs without offload to see if the issue can happen in > >>> recent days. > >>> > >>> And again, TLDR, paste more thread call stacks. > >>> Most of the threads are in the state of sched_yield, nanosleep, > >>> epoll_wait and poll. > >> > >> If this looks like a memory trash issue, it might be hard to figure out. > >> Does the ukey show any kind of pattern, i.e. does the trashed data look > >> like anything known? > >> Maybe it’s a use after free, so you could add some debugging code > >> logging/recording all free and xmalloc of the ukey structure, to see that > >> when it crashes it was actually allocated? > >> > >> Hope this helps you getting started. > >> > >> //Eelco > >> > >>> The following threads are in working state. So hope this can have > >>> clues for investigation. > >>> > >>> Thread 14 (Thread 0x7fd34002b700 (LWP 91928)): > >>> #0 0x00007fd344487b6d in recvmsg () at > >>> ../sysdeps/unix/syscall-template.S:81 > >>> #1 0x0000562773cb8d03 in mp_handle () > >>> #2 0x00007fd344480e65 in start_thread (arg=0x7fd34002b700) at > >>> pthread_create.c:307 > >>> #3 0x00007fd34260988d in clone () at > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>> > >>> Thread 13 (Thread 0x7fd3359d7700 (LWP 91929)): > >>> #0 0x00007fd34448799d in accept () at > >>> ../sysdeps/unix/syscall-template.S:81 > >>> #1 0x0000562773cd8f3c in socket_listener () > >>> #2 0x00007fd344480e65 in start_thread (arg=0x7fd3359d7700) at > >>> pthread_create.c:307 > >>> #3 0x00007fd34260988d in clone () at > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>> > >>> Thread 6 (Thread 0x7fd304663700 (LWP 91965)): > >>> #0 0x00007fd34448771d in read () at ../sysdeps/unix/syscall-template.S:81 > >>> #1 0x00007fd343b42bfb in _mlx5dv_devx_get_event () from > >>> /lib64/libmlx5.so.1 > >>> #2 0x0000562773936d86 in mlx5_vdpa_event_handle () > >>> #3 0x00007fd344480e65 in start_thread (arg=0x7fd304663700) at > >>> pthread_create.c:307 > >>> #4 0x00007fd34260988d in clone () at > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>> > >>> Thread 2 (Thread 0x7fd305730700 (LWP 91943)): > >>> #0 ccmap_find_slot_protected (count=<synthetic pointer>, > >>> hash=hash@entry=1669671676, b=b@entry=0x7fd2f8012a80) at > >>> lib/ccmap.c:278 > >>> #1 ccmap_inc_bucket_existing (b=b@entry=0x7fd2f8012a80, > >>> hash=hash@entry=1669671676, inc=inc@entry=1) at lib/ccmap.c:281 > >>> #2 0x0000562773d4b015 in ccmap_try_inc > >>> (impl=impl@entry=0x7fd2f8012a40, hash=hash@entry=1669671676, > >>> inc=inc@entry=1) at lib/ccmap.c:464 > >>> #3 0x0000562773d4b224 in ccmap_inc (ccmap=ccmap@entry=0x7fd2f802a7e8, > >>> hash=1669671676) at lib/ccmap.c:485 > >>> #4 0x0000562773d4975a in classifier_replace (cls=<optimized out>, > >>> rule=rule@entry=0x7fd2fac70e28, version=<optimized out>, > >>> conjs=<optimized out>, n_conjs=<optimized out>) > >>> at lib/classifier.c:579 > >>> #5 0x0000562773d49e99 in classifier_insert (cls=<optimized out>, > >>> rule=rule@entry=0x7fd2fac70e28, version=<optimized out>, > >>> conj=<optimized out>, n_conj=<optimized out>) > >>> at lib/classifier.c:694 > >>> #6 0x0000562773d00fc8 in replace_rule_start > >>> (ofproto=ofproto@entry=0x5627778cc420, ofm=ofm@entry=0x7fd3057235f0, > >>> old_rule=<optimized out>, new_rule=new_rule@entry=0x7fd2fac70e20) > >>> at ofproto/ofproto.c:5645 > >>> #7 0x0000562773d010e4 in add_flow_start (ofproto=0x5627778cc420, > >>> ofm=0x7fd3057235f0) at ofproto/ofproto.c:5256 > >>> #8 0x0000562773d0122d in modify_flows_start__ > >>> (ofproto=ofproto@entry=0x5627778cc420, ofm=ofm@entry=0x7fd3057235f0) > >>> at ofproto/ofproto.c:5824 > >>> #9 0x0000562773d01eac in modify_flow_start_strict > >>> (ofm=0x7fd3057235f0, ofproto=0x5627778cc420) at ofproto/ofproto.c:5953 > >>> #10 ofproto_flow_mod_start (ofproto=0x5627778cc420, > >>> ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:8112 > >>> #11 0x0000562773d0225a in ofproto_flow_mod_learn_start > >>> (ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:5491 > >>> #12 0x0000562773d040ad in ofproto_flow_mod_learn > >>> (ofm=ofm@entry=0x7fd3057235f0, keep_ref=<optimized out>, > >>> limit=<optimized out>, below_limitp=below_limitp@entry=0x7fd305723510) > >>> at ofproto/ofproto.c:5576 > >>> #13 0x0000562773d2641e in xlate_learn_action > >>> (ctx=ctx@entry=0x7fd305729a60, learn=learn@entry=0x562777db4618) at > >>> ofproto/ofproto-dpif-xlate.c:5547 > >>> #14 0x0000562773d2aafb in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7232 > >>> #15 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777db4470, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #16 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #17 0x0000562773d2ab8d in xlate_ofpact_resubmit > >>> (resubmit=0x56277781db28, resubmit=0x56277781db28, > >>> resubmit=0x56277781db28, is_last_action=true, ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4823 > >>> #18 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized > >>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>> group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7107 > >>> #19 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777ab9220, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #20 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #21 0x0000562773d2ab8d in xlate_ofpact_resubmit > >>> (resubmit=0x562777b9f3c8, resubmit=0x562777b9f3c8, > >>> resubmit=0x562777b9f3c8, is_last_action=true, ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4823 > >>> #22 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized > >>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>> group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7107 > >>> #23 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777d6dc90, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #24 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> ---Type <return> to continue, or q <return> to quit--- > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #25 0x0000562773d2ab8d in xlate_ofpact_resubmit > >>> (resubmit=0x562777c13c58, resubmit=0x562777c13c58, > >>> resubmit=0x562777c13c58, is_last_action=true, ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4823 > >>> #26 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized > >>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>> group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7107 > >>> #27 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562778046000, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #28 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #29 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7110 > >>> #30 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777ad0ac0, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #31 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #32 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7110 > >>> #33 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x56277785c0d0, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #34 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #35 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7110 > >>> #36 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777a86ce0, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #37 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #38 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7110 > >>> #39 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x56277781b710, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #40 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #41 0x0000562773d2ac1c in do_xlate_actions > >>> (ofpacts=ofpacts@entry=0x562777833a38, > >>> ofpacts_len=ofpacts_len@entry=32, ctx=ctx@entry=0x7fd305729a60, > >>> is_last_action=is_last_action@entry=true, > >>> group_bucket_action=group_bucket_action@entry=false) at > >>> ofproto/ofproto-dpif-xlate.c:7110 > >>> #42 0x0000562773d30f68 in clone_xlate_actions (actions=0x562777833a38, > >>> actions_len=32, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>> group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:5809 > >>> #43 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d30d70 <clone_xlate_actions>, > >>> is_last_action=true, deepens=true, rule=0x562777ad5640, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #44 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d30d70 <clone_xlate_actions>) > >>> at ofproto/ofproto-dpif-xlate.c:4512 > >>> #45 0x0000562773d2cf20 in patch_port_output > >>> (ctx=ctx@entry=0x7fd305729a60, out_dev=0x562777ebeec0, > >>> is_last_action=is_last_action@entry=true, in_dev=0x562777f43ee0, > >>> in_dev=0x562777f43ee0) > >>> at ofproto/ofproto-dpif-xlate.c:3890 > >>> #46 0x0000562773d2d2f7 in compose_output_action__ > >>> (ctx=ctx@entry=0x7fd305729a60, ofp_port=2, xr=xr@entry=0x0, > >>> check_stp=check_stp@entry=true, > >>> is_last_action=is_last_action@entry=true, > >>> truncate=truncate@entry=false) at ofproto/ofproto-dpif-xlate.c:4205 > >>> #47 0x0000562773d2fdd0 in compose_output_action (truncate=false, > >>> is_last_action=true, xr=0x0, ofp_port=<optimized out>, > >>> ctx=0x7fd305729a60) at ofproto/ofproto-dpif-xlate.c:4360 > >>> #48 xlate_output_action (ctx=ctx@entry=0x7fd305729a60, port=<optimized > >>> out>, controller_len=<optimized out>, > >>> may_packet_in=may_packet_in@entry=true, > >>> is_last_action=is_last_action@entry=true, > >>> truncate=truncate@entry=false, > >>> group_bucket_action=group_bucket_action@entry=false) at > >>> ofproto/ofproto-dpif-xlate.c:5305 > >>> ---Type <return> to continue, or q <return> to quit--- > >>> #49 0x0000562773d2972f in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:6960 > >>> #50 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777ad5430, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #51 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #52 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized out>, > >>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>> is_last_action=<optimized out>, group_bucket_action=<optimized out>) > >>> at ofproto/ofproto-dpif-xlate.c:7110 > >>> #53 0x0000562773d26c85 in xlate_recursively > >>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>> is_last_action=true, deepens=false, rule=0x562777ac2fb0, > >>> ctx=0x7fd305729a60) > >>> at ofproto/ofproto-dpif-xlate.c:4383 > >>> #54 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized out>, > >>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, > >>> is_last_action=true, xlator=0x562773d29490 <do_xlate_actions>) at > >>> ofproto/ofproto-dpif-xlate.c:4512 > >>> #55 0x0000562773d2ac1c in do_xlate_actions > >>> (ofpacts=ofpacts@entry=0x56277797b318, > >>> ofpacts_len=ofpacts_len@entry=8, ctx=ctx@entry=0x7fd305729a60, > >>> is_last_action=is_last_action@entry=true, > >>> group_bucket_action=group_bucket_action@entry=false) at > >>> ofproto/ofproto-dpif-xlate.c:7110 > >>> #56 0x0000562773d330d6 in xlate_actions (xin=xin@entry=0x7fd30572a920, > >>> xout=xout@entry=0x7fd30572ad38) at ofproto/ofproto-dpif-xlate.c:7924 > >>> #57 0x0000562773d2241b in upcall_xlate (wc=0x7fd30572bfe0, > >>> odp_actions=0x7fd30572b7b0, upcall=0x7fd30572acd0, > >>> udpif=0x562777850cf0) at ofproto/ofproto-dpif-upcall.c:1340 > >>> #58 process_upcall (udpif=udpif@entry=0x562777850cf0, > >>> upcall=upcall@entry=0x7fd30572acd0, > >>> odp_actions=odp_actions@entry=0x7fd30572b7b0, > >>> wc=wc@entry=0x7fd30572bfe0) > >>> at ofproto/ofproto-dpif-upcall.c:1602 > >>> #59 0x0000562773d22c39 in upcall_cb (packet=<optimized out>, > >>> flow=0x7fd30572bd40, ufid=<optimized out>, pmd_id=<optimized out>, > >>> type=<optimized out>, userdata=<optimized out>, > >>> actions=0x7fd30572b7b0, wc=0x7fd30572bfe0, > >>> put_actions=0x7fd30572b7f0, aux=0x562777850cf0) at > >>> ofproto/ofproto-dpif-upcall.c:1461 > >>> #60 0x0000562773d52198 in dp_netdev_upcall > >>> (pmd=pmd@entry=0x7fd305731010, packet_=packet_@entry=0x21152c180, > >>> flow=flow@entry=0x7fd30572bd40, wc=wc@entry=0x7fd30572bfe0, > >>> ufid=ufid@entry=0x7fd30572b790, type=type@entry=DPIF_UC_MISS, > >>> userdata=userdata@entry=0x0, actions=actions@entry=0x7fd30572b7b0, > >>> put_actions=put_actions@entry=0x7fd30572b7f0) > >>> at lib/dpif-netdev.c:9141 > >>> #61 0x0000562773d66f13 in handle_packet_upcall > >>> (put_actions=0x7fd30572b7f0, actions=0x7fd30572b7b0, > >>> key=0x7fd30572cc40, packet=0x21152c180, pmd=0x7fd305731010) at > >>> lib/dpif-netdev.c:11303 > >>> #62 fast_path_processing (pmd=pmd@entry=0x7fd305731010, > >>> packets_=packets_@entry=0x7fd30572d0c0, > >>> keys=keys@entry=0x7fd30572cc30, > >>> flow_map=flow_map@entry=0x7fd30572cae0, > >>> index_map=index_map@entry=0x7fd30572cad0 "", in_port=<optimized > >>> out>) at lib/dpif-netdev.c:11426 > >>> #63 0x0000562773d680c1 in dp_netdev_input__ (pmd=<optimized out>, > >>> packets=<optimized out>, md_is_valid=md_is_valid@entry=false, > >>> port_no=<optimized out>) at lib/dpif-netdev.c:11520 > >>> #64 0x0000562773d6a15d in dp_netdev_input (pmd=<optimized out>, > >>> packets=<optimized out>, port_no=<optimized out>) at > >>> lib/dpif-netdev.c:11558 > >>> #65 0x0000562773d6a2cf in dp_netdev_process_rxq_port > >>> (pmd=pmd@entry=0x7fd305731010, rxq=0x56277796a5e0, port_no=3) at > >>> lib/dpif-netdev.c:6660 > >>> #66 0x0000562773d6a759 in pmd_thread_main (f_=<optimized out>) at > >>> lib/dpif-netdev.c:8267 > >>> #67 0x0000562773df805f in ovsthread_wrapper (aux_=<optimized out>) at > >>> lib/ovs-thread.c:422 > >>> #68 0x00007fd344480e65 in start_thread (arg=0x7fd305730700) at > >>> pthread_create.c:307 > >>> #69 0x00007fd34260988d in clone () at > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>> > >>> Thread 1 (Thread 0x7fd334307700 (LWP 91942)): > >>> #0 0x00007fd342541337 in __GI_raise (sig=sig@entry=6) at > >>> ../nptl/sysdeps/unix/sysv/linux/raise.c:55 > >>> #1 0x00007fd342542a28 in __GI_abort () at abort.c:90 > >>> #2 0x0000562773e2c4ee in ovs_abort_valist (err_no=<optimized out>, > >>> format=<optimized out>, args=args@entry=0x7fd334302340) at > >>> lib/util.c:499 > >>> #3 0x0000562773e2c584 in ovs_abort (err_no=err_no@entry=0, > >>> format=format@entry=0x5627740cad18 "%s: %s() passed uninitialized > >>> ovs_mutex") at lib/util.c:491 > >>> #4 0x0000562773df72e1 in ovs_mutex_lock_at > >>> (l_=l_@entry=0x7fd2f907df68, where=where@entry=0x5627740a7fc0 > >>> "ofproto/ofproto-dpif-upcall.c:2214") at lib/ovs-thread.c:75 > >>> #5 0x0000562773d1e98d in ukey_delete (umap=umap@entry=0x562777853400, > >>> ukey=ukey@entry=0x7fd2f907df20) at ofproto/ofproto-dpif-upcall.c:2214 > >>> #6 0x0000562773d202da in revalidator_sweep__ > >>> (revalidator=revalidator@entry=0x562777897b00, > >>> purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3048 > >>> #7 0x0000562773d241a6 in revalidator_sweep > >>> (revalidator=0x562777897b00) at ofproto/ofproto-dpif-upcall.c:3072 > >>> #8 udpif_revalidator (arg=0x562777897b00) at > >>> ofproto/ofproto-dpif-upcall.c:1086 > >>> #9 0x0000562773df805f in ovsthread_wrapper (aux_=<optimized out>) at > >>> lib/ovs-thread.c:422 > >>> #10 0x00007fd344480e65 in start_thread (arg=0x7fd334307700) at > >>> pthread_create.c:307 > >>> #11 0x00007fd34260988d in clone () at > >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>> > >>> > >>> Thanks > >>> LIU Yulong > >>> > >>> On Mon, Feb 19, 2024 at 8:12 PM Eelco Chaudron <echau...@redhat.com> > >>> wrote: > >>>> > >>>> > >>>> > >>>> On 19 Feb 2024, at 13:09, Ilya Maximets wrote: > >>>> > >>>>> On 2/19/24 11:14, Eelco Chaudron wrote: > >>>>>> > >>>>>> > >>>>>> On 19 Feb 2024, at 10:34, LIU Yulong wrote: > >>>>>> > >>>>>>> Hi OVS experts, > >>>>>>> > >>>>>>> Our ovs-vswitchd runs to core at the ovs_mutex_trylock(&ukey->mutex) > >>>>>>> in the > >>>>>>> function revalidator_sweep__. > >>>>>>> > >>>>>>> I've sent the mail before but have no response. > >>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052604.html > >>>>>>> > >>>>>>> So I'm trying to send this mail again. And I may apologize in advance > >>>>>>> because > >>>>>>> I would like to post as much useful information as possible to help > >>>>>>> identify > >>>>>>> potential issues. So this mail will have a really long text. > >>>>>>> > >>>>>>> Compared to the mail 2023-August/052604.html, we upgrade the OVS to > >>>>>>> 2.17.8 > >>>>>>> and DPDK to 22.11 to pray for good luck that maybe the community has > >>>>>>> potential > >>>>>>> fixes for this issue. But unfortunately, the ovs-vswitchd still runs > >>>>>>> to core. > >>>>>> > >>>>>> As you mentioned it looks like some memory corruption, which I have > >>>>>> not seen before. > >>>>>> > >>>>>> Have you tried this without rte offload? This is the only feature I > >>>>>> never used. > >>>>>> There is a 2.17.9 with DPDK 22.11.6 you could try. > >>>>> > >>>>> OVS 2.17 is not supposed to work with DPDK 22.11, it's supposed to work > >>>>> with 21.11. > >>>>> See the compatibility table here: > >>>>> https://docs.openvswitch.org/en/latest/faq/releases/ > >>>>> > >>>>> Though it's hard to tell if DPDK version is anyhow related to the issue. > >>>> > >>>> My mistake, I was supposed to type 21.11.6 :( But yes if they are using > >>>> 22.11, that could also be the problem. I would suggest using the > >>>> supported version and see if the problem goes away. > >>>> > >>>> //Eelco > >>>> > >>>>> Best regards, Ilya Maximets. > >>>> > >> > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss