Hi Eelco, Thank you.
Patch sent to the mail list: https://mail.openvswitch.org/pipermail/ovs-dev/2024-March/412474.html On Wed, Mar 13, 2024 at 5:34 PM Eelco Chaudron <echau...@redhat.com> wrote: > > > > On 13 Mar 2024, at 10:19, LIU Yulong wrote: > > > Hi guys, > > > > Send a pull request with that try_lock movement fix based on the former > > tests: > > https://github.com/openvswitch/ovs/pull/421 > > > > Does that make sense to you? > > I’m a bit behind emails, etc. so did not look at your emails yet. But for OVS > we use an email-based workflow, see here; > https://docs.openvswitch.org/en/latest/internals/contributing/submitting-patches/. > If you use this more people will see your patch and can review it. > > Cheers, > > Eelco > > > > > On Tue, Mar 12, 2024 at 3:11 PM LIU Yulong <liuyulong...@gmail.com> wrote: > >> > >> Updates: > >> > >> Ukey attributes we already have: > >> > >> long long int created OVS_GUARDED; /* Estimate of creation > >> time. */ > >> unsigned int state_thread OVS_GUARDED; /* Thread that transitions. > >> */ > >> > >> Added more attributes [1] to the ukey: > >> > >> const char *state_before OVS_GUARDED; /* locator state before > >> (last) transition. */ > >> long long int modified; /* Time of last transition. */ > >> unsigned create_tid; /* Ukey created thread id. */ > >> > >> [1] > >> https://github.com/gotostack/ovs/commit/8ddc4f512783e6b883b102b821e0f05916a9c255 > >> > >> After that, a core file shows: > >> > >> 1) The pmd ctx-> now: > >> p ((struct dp_netdev_pmd_thread *) 0x7f804b733010)->ctx > >> $10 = {now = 12529082556818, last_rxq = 0x55f009029720, emc_insert_min > >> = 42949672, smc_enable_db = false} > >> > >> 2)ukey in the core code call stack > >> p * (struct udpif_key *) 0x7f803c360710 > >> $11 = { created = 12529082056, modified = 12529082553, create_tid = 9} > >> > >> 3) Circular buffer same address for free action > >> ukey_addr = 0x7f803c360710, timestamp = 12529082556703 > >> > >> PMD cxt->now 12529082556818 is near the ukey free time 12529082556703, > >> it's about 115us. > >> > >> Adding more timesmap [2] to every ukey state to record the ukey state > >> transition: > >> long long int ukey_create_time; /* Time of ukey creation. */ > >> long long int ukey_visible_time; /* Time of ukey visible. */ > >> long long int ukey_operational_time; /* Time of ukey operational. */ > >> long long int ukey_evicting_time; /* Time of ukey evicting. */ > >> long long int ukey_evicted_time; /* Time of ukey evicted. */ > >> long long int ukey_deleted_time; /* Time of ukey deleted. */ > >> long long int ukey_destroy_time; /* Time of ukey destroy. */ > >> long long int ukey_replace_time; /* Time of ukey replace. */ > >> > >> [2] > >> https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050 > >> > >> And a core file shows: > >> > >> ukey_create_time = 13217283578366, > >> ukey_visible_time = 13217283578366, > >> ukey_operational_time = 13217283583044, > >> ukey_evicting_time = 13217289145192, > >> ukey_evicted_time = 13217289145245, > >> ukey_deleted_time = 13217289154654, > >> ukey_destroy_time = 13217289156490, This is set just before the > >> ovs_mutex_destroy(&ukey->mutex); > >> ukey_replace_time = 13217289154654 > >> > >> pmd->ctx: > >> $4 = { > >> now = 13217289156482, > >> last_rxq = 0x55b34db74f50, > >> emc_insert_min = 42949672, > >> smc_enable_db = false > >> } > >> > >> ukey_replace_time and ukey_deleted_time are the same. > >> > >> ukey_destroy_time - pmd-ctx.now = 8 (13217289156490 - 13217289156482) > >> > >> And also added a seep_now just before the mostly core code line: > >> https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050#diff-be6e2339300cb2a7efa8eca531a668a94ce9f06dd717ba73bb1b508fee27e887R3030 > >> sweep_now = time_usec(); > >> if (ovs_mutex_trylock(&ukey->mutex)) { > >> continue; > >> } > >> > >> ukey_destroy_time - sweep_now = -78 (13217289156490 - 13217289156568) > >> > >> Means that ukey_destory is a bit earlier than revalidator_sweep__ try_lock. > >> > >> > >> > >> According to these informations, I assume that the umap and ukey > >> iteration has race condition between > >> PMD thread, RCU thread and the revalidator thread. And based on the > >> core/abort point in the code > >> stack. I moved the umap lock to outside of CMAP_FOR_EACH loop [3]. > >> [3] > >> https://github.com/gotostack/ovs/commit/2919a242be7d0ee079c278a8488188694f20f827 > >> > >> No more core was seen during that revalidator_sweep__ procedure for 4 days > >> now. > >> > >> But if I revert this lock movement, the core can show again in a few hours. > >> > >> So, please take a look at this lock movement patch, if it make sense to > >> you. > >> > >> > >> Regards, > >> > >> LIU Yulong > >> > >> > >> On Fri, Mar 1, 2024 at 6:06 PM LIU Yulong <liuyulong...@gmail.com> wrote: > >>> > >>> Hi, > >>> > >>> Add some updates: > >>> > >>> 1. > >>> We added a debug attribute `state_before ` to the ukey to record more > >>> life cycle details of a ukey: > >>> state_where = 0x55576027b868 "ofproto/ofproto-dpif-upcall.c:XXXX", > >>> [1], it is UKEY_DELETED. > >>> state_before = 0x55576027b630 "ofproto/ofproto-dpif-upcall.c:XXXX", > >>> [2], it was UKEY_EVICTED. > >>> > >>> [1] > >>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897 > >>> [2] > >>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2470 > >>> > >>> Still, all of the ukeys did the replace action. > >>> > >>> 2. The ukey circular buffer [1] does not work well, the buffer still > >>> has {0} after a long time run, and the number is absolutely less than > >>> `counter_upcall_ukey_free`. > >>> [1] > >>> https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46 > >>> And, can not add an `allocate` entry to this buffer for "ukey > >>> xmalloc". The circular buffer > >>> mutex seems not to work well, core many times at > >>> `ovs_mutex_unlock(&ukey_free_buffer.mutex)`. > >>> > >>> 3. Ilya's patch [2] was applied, but I have not seen the abort log for > >>> now. > >>> [2] > >>> https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5 > >>> > >>> 4. dump all ukeys from the core file, we noticed that almost all > >>> UKEY_EVICTED ukeys are changed state at `transition_ukey_at` by the > >>> revalidator thread. > >>> (The `state_thread` attribute of the ukey) > >>> But, the core bt shows the related ukey was changed state at PMD thread. > >>> For instance: > >>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normal ukey and the revalidator > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread: > >>> (struct umap *) 0x55cce9556140: > >>> (struct udpif_key *) 0x7f3aad584a80: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3aac24ce20: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3aac6526e0: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3aad731970: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3aac91ce50: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3aadd69be0: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3aad759040: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3a8c0d6d50: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> (struct udpif_key *) 0x7f3a8c851300: > >>> state = UKEY_EVICTED > >>> state_thread = 5 > >>> > >>> #8 0x000055cce5d7005f in ovsthread_wrapper (aux_=<optimized out>) at > >>> lib/ovs-thread.c:422 > >>> auxp = <optimized out> > >>> aux = {start = 0x55cce5c9c0d0 <udpif_revalidator>, arg = > >>> 0x55cce9595780, name = "revalidator\000\000\000\000"} > >>> id = 5 > >>> subprogram_name = 0x7f3ad00008c0 "\020 " > >>> #9 0x00007f3af2afee65 in start_thread (arg=0x7f3ae2986700) at > >>> pthread_create.c:307 > >>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > >>> > >>>>>>>>>>>>>>>>>>> core ukey and PMD thread > >>> > >>> p * (struct udpif_key *) 0x7f3aac156e80 > >>> $2 = {cmap_node = {next = {p = 0x7f3aaec2b3a0}}, key = 0x7f3aac402810, > >>> key_len = 0, mask = 0x0, mask_len = 172, ufid = {u32 = {2563111187, > >>> 2445823588, 3143939231, 3011838433}, u64 = {lo = 10504732324808489235, > >>> hi = 12935747573714826399}}, ufid_present = true, hash = > >>> 2623373230, pmd_id = 35, mutex = {lock = {__data = {__lock = 0, > >>> __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, > >>> __elision = 0, __list = { > >>> __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 > >>> times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}, > >>> where = 0x0}, stats = {n_packets = 3, n_bytes = 852, used = 871199854, > >>> tcp_flags = 16}, > >>> created = 871199014, dump_seq = 8822382946, reval_seq = 8822381178, > >>> state = UKEY_DELETED, state_thread = 8 > >>> > >>> PMD thread ID: > >>> #6 0x000055cce5d7005f in ovsthread_wrapper (aux_=<optimized out>) at > >>> lib/ovs-thread.c:422 > >>> auxp = <optimized out> > >>> aux = {start = 0x55cce5ce2460 <pmd_thread_main>, arg = > >>> 0x7f3ab2e6a010, name = "pmd-c35/id:\000:\177\000"} > >>> id = 8 > >>> subprogram_name = 0x7f3aac0008c0 "p\v\"\255:\177" > >>> #7 0x00007f3af2afee65 in start_thread (arg=0x7f3ae0582700) at > >>> pthread_create.c:307 > >>>>>>>>>>>>>>>>>>>>>>> > >>> > >>> The running threads are: > >>> > >>> # ps -T -o spid,comm $(pidof ovs-vswitchd) > >>> SPID COMMAND > >>> 100866 ovs-vswitchd > >>> 100867 eal-intr-thread > >>> 100868 rte_mp_handle > >>> 100872 ovs-vswitchd > >>> 100873 dpdk_watchdog1 > >>> 100876 urcu2 > >>> 100888 ct_clean7 > >>> 100889 ipf_clean6 > >>> 100890 hw_offload3 > >>> 100891 handler4 > >>> 100892 revalidator5 # 1 revalidator thread > >>> 100893 pmd-c03/id:9 > >>> 100894 pmd-c35/id:8 # Mostly 1 PMD thread is working! Another is idle > >>> forever. > >>> 100925 vhost_reconn > >>> 100926 vhost-events > >>> > >>> So, this can prove that there are two threads were trying to > >>> manipulate the same ukey? > >>> > >>> * PMD thread replaced the old_ukey and transitioned the state. > >>> * RCU thread freed the ukey mutex. > >>> * The revalidator thread tries to lock the old_ukey mutex. > >>> > >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052960.html > >>> Any idea to simulate the race contion? > >>> > >>> Thank you. > >>> > >>> Regards, > >>> LIU Yulong > >>> > >>> > >>> On Tue, Feb 27, 2024 at 6:14 PM Eelco Chaudron <echau...@redhat.com> > >>> wrote: > >>>> > >>>> > >>>> > >>>> On 27 Feb 2024, at 9:49, LIU Yulong wrote: > >>>> > >>>>> Yes, that makes sense. > >>>>> > >>>>> Another question is how to distinguish the core at line of > >>>>> ovs_mutex_trylock in revalidator_sweep__ is after the free(ukey), > >>>>> since the core trace has no timestamp. > >>>> > >>>> This is hard to figure out without adding a time variable (and make sure > >>>> it’s not optimized out) in revalidator_sweep__() > >>>> > >>>> As you are using OVS-DPDK, you can read the values from the pmd->ctx.now > >>>> context, which should be close. > >>>> > >>>>> This line in the function 'ukey_create__' should be the only place > >>>>> where ovs allocated the memory for ukey: > >>>>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L1777 > >>>>> > >>>>> Right? > >>>> > >>>> Yes, this should be the only place. > >>>> > >>>>> > >>>>> If it is true, I will update the buffer structure, and a > >>>>> counter_upcall_ukey_allocate as well. > >>>>> > >>>>> On Tue, Feb 27, 2024 at 3:34 PM Eelco Chaudron <echau...@redhat.com> > >>>>> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 27 Feb 2024, at 4:44, LIU Yulong wrote: > >>>>>> > >>>>>>> @Eelco, as you suggested, added such circular buffer to my local OVS: > >>>>>>> https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46 > >>>>>> > >>>>>> I should also add allocate logging, or else you might not know if a > >>>>>> buffer was allocated at the same address. > >>>>>> Maybe add a bool to the record structure to indicate if it’s an > >>>>>> allocate or free. > >>>>>> > >>>>>> //Eelco > >>>>>> > >>>>>>> gdb shows such data structure: > >>>>>>> 2232 ukey_free_buffer.index = (ukey_free_buffer.index + 1) % (1024 > >>>>>>> * 1024); // Circular buffer > >>>>>>> (gdb) p ukey_free_buffer > >>>>>>> $1 = { > >>>>>>> records = {{ > >>>>>>> ukey_addr = 0x7f8a0d871700, > >>>>>>> timestamp = 1709003328 > >>>>>>> }, { > >>>>>>> ukey_addr = 0x7f8a0f969120, > >>>>>>> timestamp = 1709003365 > >>>>>>> }, { > >>>>>>> ukey_addr = 0x7f8a0defe190, > >>>>>>> timestamp = 1709003393 > >>>>>>> }, { > >>>>>>> ukey_addr = 0x7f8a0984aea0, > >>>>>>> timestamp = 1709003452 > >>>>>>> }...}, > >>>>>>> index = 3, > >>>>>>> mutex = { > >>>>>>> lock = { > >>>>>>> __data = { > >>>>>>> __lock = 1, > >>>>>>> __count = 0, > >>>>>>> __owner = 45210, > >>>>>>> __nusers = 1, > >>>>>>> __kind = 2, > >>>>>>> __spins = 0, > >>>>>>> __elision = 0, > >>>>>>> __list = { > >>>>>>> __prev = 0x0, > >>>>>>> __next = 0x0 > >>>>>>> } > >>>>>>> }, > >>>>>>> __size = > >>>>>>> "\001\000\000\000\000\000\000\000\232\260\000\000\001\000\000\000\002", > >>>>>>> '\000' <repeats 22 times>, > >>>>>>> __align = 1 > >>>>>>> }, > >>>>>>> where = 0x55c35a347d18 "ofproto/ofproto-dpif-upcall.c:2229" > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> and counter_upcall_ukey_free is: > >>>>>>> $2 = {name = 0x5622b448f612 "upcall_ukey_free", count = 0x5622b41047f0 > >>>>>>> <upcall_ukey_free_count>, total = 79785, last_total = 79785, min = {0, > >>>>>>> 0, 0, 0, 0, 55, 22681, 11703, 13877, 12750, 0, 18719}, hr = {79785, > >>>>>>> 0 <repeats 59 times>}} > >>>>>>> > >>>>>>> Let's see how this goes. > >>>>>>> > >>>>>>> Thank you. > >>>>>>> > >>>>>>> On Tue, Feb 27, 2024 at 9:05 AM LIU Yulong <liuyulong...@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> @Ilya, thank you, I will add that patch. > >>>>>>>> > >>>>>>>> @Eelco, thank you again, I will add a RL log to the free(ukey). Hope > >>>>>>>> we can get something useful. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Feb 26, 2024 at 7:55 PM Ilya Maximets <i.maxim...@ovn.org> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> On 2/26/24 11:20, Eelco Chaudron wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 26 Feb 2024, at 11:10, LIU Yulong wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Eelco, > >>>>>>>>>>> > >>>>>>>>>>> Thank you for the quick response. > >>>>>>>>>>> > >>>>>>>>>>> I did not add those logs, because in order to reproduce the > >>>>>>>>>>> issue, we > >>>>>>>>>>> have to send lots of packets to the host. > >>>>>>>>>>> So there are too many ukeys created/deleted to do logging. > >>>>>>>>>> > >>>>>>>>>> Maybe a circular buffer with all alloc/free (+ 1ukey address, and > >>>>>>>>>> timestamp), 1 or 2 Mb of memory can hold a lot. > >>>>>>>>>> > >>>>>>>>>>> And can we ensure that this [1] is the only place for ovs to free > >>>>>>>>>>> the ukey? > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084 > >>>>>>>>>> > >>>>>>>>>> Yes, this should be the only place, and should always be done > >>>>>>>>>> through an RCU delayed delete. > >>>>>>>>>> > >>>>>>>>>>> For last mail, can this issue be concurrent > >>>>>>>>>>> read-and-update/delete? > >>>>>>>>>>> The revalidator_sweep__ is trying to lock the ukey->mutex, while > >>>>>>>>>>> another thread is updating the ukey->mutex to NULL and free ukey. > >>>>>>>>>> > >>>>>>>>>> This should not happen as the delete should happen by the delayed > >>>>>>>>>> RCU delete, and if the ukey is still in the cmap after the delayed > >>>>>>>>>> delete (quiescent state) something is wrong. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I agree with Eelco and I don't see any abvious issues with the > >>>>>>>>> current > >>>>>>>>> implementation. > >>>>>>>>> > >>>>>>>>> However, the ususal suspect for RCU problems is entering quiescent > >>>>>>>>> state > >>>>>>>>> while iterating RCU-protected structure. Though I'm not sure how > >>>>>>>>> that can > >>>>>>>>> happen in the revalidator, usually such issues are hiding somewhere > >>>>>>>>> way > >>>>>>>>> down the call stack. I made a small patch that can help to be sure > >>>>>>>>> that > >>>>>>>>> this doesn't actually happen in your setup: > >>>>>>>>> https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5 > >>>>>>>>> > >>>>>>>>> Could you try it? > >>>>>>>>> > >>>>>>>>> The change will log an error message and abort the process if we > >>>>>>>>> happen > >>>>>>>>> to enter quiescent state while iterating over the hash map. Core > >>>>>>>>> dump > >>>>>>>>> will point to a problematic call. > >>>>>>>>> > >>>>>>>>> Best regards, Ilya Maximets. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> LIU Yulong > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Feb 26, 2024 at 5:41 PM Eelco Chaudron > >>>>>>>>>>> <echau...@redhat.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 26 Feb 2024, at 9:33, LIU Yulong wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have read the code by comparing the call stack of the core > >>>>>>>>>>>>> files > >>>>>>>>>>>>> carefully, and found > >>>>>>>>>>>>> a potential race condition. Please confirm whether the > >>>>>>>>>>>>> following 3 threads > >>>>>>>>>>>>> have a race condition. Just did some code trace, can such > >>>>>>>>>>>>> race condition happen? > >>>>>>>>>>>>> > >>>>>>>>>>>>> * PMD thread1 ===================================: > >>>>>>>>>>>>> -> pmd_thread_main > >>>>>>>>>>>>> -> dp_netdev_process_rxq_port > >>>>>>>>>>>>> -> dp_netdev_input > >>>>>>>>>>>>> -> dp_netdev_input__ > >>>>>>>>>>>>> -> handle_packet_upcall > >>>>>>>>>>>>> -> dp_netdev_upcall > >>>>>>>>>>>>> -> upcall_cb > >>>>>>>>>>>>> -> ukey_install > >>>>>>>>>>>>> -> ukey_install__ > >>>>>>>>>>>>> -> try_ukey_replace: > >>>>>>>>>>>>> ovs_mutex_lock(&new_ukey->mutex); > >>>>>>>>>>>>> <---------- the CMAP_FOR_EACH loop in the revalidator_sweep__ > >>>>>>>>>>>>> run a > >>>>>>>>>>>>> bit earlier than the cmap_replace next line, so the old_ukey > >>>>>>>>>>>>> can be > >>>>>>>>>>>>> iterated. [1] > >>>>>>>>>>>>> cmap_replace(&umap->cmap, &old_ukey->cmap_node, > >>>>>>>>>>>>> &new_ukey->cmap_node, new_ukey->hash); > >>>>>>>>>>>>> ovsrcu_postpone(ukey_delete__, old_ukey); > >>>>>>>>>>>>> <-------- delete the ukey asynchronously. [2] > >>>>>>>>>>>>> transition_ukey(old_ukey, UKEY_DELETED); > >>>>>>>>>>>>> <-------- > >>>>>>>>>>>>> transition the ukey state to UKEY_DELETED, most core files show > >>>>>>>>>>>>> that > >>>>>>>>>>>>> the ukey last state change was at this line. [3] > >>>>>>>>>>>>> transition_ukey(new_ukey, UKEY_VISIBLE); > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1892 > >>>>>>>>>>>>> [2] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1896 > >>>>>>>>>>>>> [3] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897 > >>>>>>>>>>>>> > >>>>>>>>>>>>> This function try_ukey_replace was called many times, because > >>>>>>>>>>>>> the > >>>>>>>>>>>>> `counter_upcall_ukey_replace` is not zero. > >>>>>>>>>>>>> For instance: > >>>>>>>>>>>>> { > >>>>>>>>>>>>> name = 0x55ba9755206b "upcall_ukey_replace", > >>>>>>>>>>>>> count = 0x55ba971c7610 <upcall_ukey_replace_count>, > >>>>>>>>>>>>> total = 2287997, > >>>>>>>>>>>>> last_total = 2287997, > >>>>>>>>>>>>> min = {221, 247, 444, 278, 324, 570, 379, 464, 283, 280, 0, > >>>>>>>>>>>>> 427}, > >>>>>>>>>>>>> hr = {3300, 4378, 3557, 4554, 3748, 3710, 4340, 3559, 4296, > >>>>>>>>>>>>> 3759, > >>>>>>>>>>>>> 3522, 4136, 3660, 4428, 3802, 3652, 3880, 3375, 4806, 4221, > >>>>>>>>>>>>> 4158, > >>>>>>>>>>>>> 3816, 3750, 3846, 3761, 3653, 4293, 3816, 3723, 3691, 4033, > >>>>>>>>>>>>> 468, 4117, > >>>>>>>>>>>>> 3659, 4007, 3536, > >>>>>>>>>>>>> 3439, 4440, 3388, 4079, 3876, 3865, 4339, 3757, 3481, 4027, > >>>>>>>>>>>>> 3989, > >>>>>>>>>>>>> 3633, 3737, 3564, 3403, 3992, 3793, 4390, 4124, 4354, 4164, > >>>>>>>>>>>>> 4383, > >>>>>>>>>>>>> 4237, 3667} > >>>>>>>>>>>>> } > >>>>>>>>>>>>> > >>>>>>>>>>>>> * RCU thread2 ===================================: > >>>>>>>>>>>>> -> ovsrcu_postpone_thread > >>>>>>>>>>>>> -> ovsrcu_call_postponed > >>>>>>>>>>>>> -> ukey_delete__ <------------ > >>>>>>>>>>>>> This > >>>>>>>>>>>>> function is not thead safe IMO, it has mark > >>>>>>>>>>>>> OVS_NO_THREAD_SAFETY_ANALYSIS. [4] > >>>>>>>>>>>>> > >>>>>>>>>>>>> recirc_refs_unref(&ukey->recircs); > >>>>>>>>>>>>> xlate_cache_delete(ukey->xcache); > >>>>>>>>>>>>> ofpbuf_delete(ovsrcu_get(struct ofpbuf *, > >>>>>>>>>>>>> &ukey->actions)); > >>>>>>>>>>>>> ovs_mutex_destroy(&ukey->mutex); > >>>>>>>>>>>>> <-------------- Just > >>>>>>>>>>>>> set ukey mutex to NULL. [5][6][7] > >>>>>>>>>>>>> free(ukey); > >>>>>>>>>>>>> > >>>>>>>>>>>>> [4] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2074 > >>>>>>>>>>>>> [5] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2083 > >>>>>>>>>>>>> [6] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L131 > >>>>>>>>>>>>> [7] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L124 > >>>>>>>>>>>>> > >>>>>>>>>>>>> * revalidator thread3 ===================================: > >>>>>>>>>>>>> > >>>>>>>>>>>>> -> udpif_revalidator > >>>>>>>>>>>>> -> revalidator_sweep > >>>>>>>>>>>>> -> revalidator_sweep__ > >>>>>>>>>>>>> > >>>>>>>>>>>>> CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) { > >>>>>>>>>>>>> enum ukey_state ukey_state; > >>>>>>>>>>>>> > >>>>>>>>>>>>> if (ovs_mutex_trylock(&ukey->mutex)) { > >>>>>>>>>>>>> <-------------- > >>>>>>>>>>>>> Core at here, because of the NULL pointer. [8] > >>>>>>>>>>>>> continue; > >>>>>>>>>>>>> } > >>>>>>>>>>>>> [8] > >>>>>>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2900 > >>>>>>>>>>>>> > >>>>>>>>>>>>> CMIIW, if this race condition can happen, IMO, it is mostly > >>>>>>>>>>>>> because > >>>>>>>>>>>>> the umap is not locked during the sweep CMAP_FOR_EACH loop. > >>>>>>>>>>>>> Or some RCU protection did not work properly. > >>>>>>>>>>>> > >>>>>>>>>>>> The code looks fine to me, we are taking the lock when we update > >>>>>>>>>>>> the cmap, which is a requirement, however iterating over the > >>>>>>>>>>>> cmap as done above does not require a lock. > >>>>>>>>>>>> > >>>>>>>>>>>> If the RCU protection would have been broken, we would have seen > >>>>>>>>>>>> a lot more errors. Did you have any luck adding the traces I > >>>>>>>>>>>> suggested earlier, to see if it’s a use after free? > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> > >>>>>>>>>>>> Eelco > >>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Feb 21, 2024 at 6:40 PM Eelco Chaudron > >>>>>>>>>>>>> <echau...@redhat.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 21 Feb 2024, at 4:26, LIU Yulong wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you very much for your reply. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The problem is not easy to reproduce, we have to wait a > >>>>>>>>>>>>>>> random long time to see > >>>>>>>>>>>>>>> if the issue happens again. It can be more than one day or > >>>>>>>>>>>>>>> longer. > >>>>>>>>>>>>>>> OVS 2.17 with dpdk 20.11 had run to core before, so it's hard > >>>>>>>>>>>>>>> to say > >>>>>>>>>>>>>>> if it is related to DPDK. > >>>>>>>>>>>>>>> I'm running the ovs without offload to see if the issue can > >>>>>>>>>>>>>>> happen in > >>>>>>>>>>>>>>> recent days. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> And again, TLDR, paste more thread call stacks. > >>>>>>>>>>>>>>> Most of the threads are in the state of sched_yield, > >>>>>>>>>>>>>>> nanosleep, > >>>>>>>>>>>>>>> epoll_wait and poll. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If this looks like a memory trash issue, it might be hard to > >>>>>>>>>>>>>> figure out. Does the ukey show any kind of pattern, i.e. does > >>>>>>>>>>>>>> the trashed data look like anything known? > >>>>>>>>>>>>>> Maybe it’s a use after free, so you could add some debugging > >>>>>>>>>>>>>> code logging/recording all free and xmalloc of the ukey > >>>>>>>>>>>>>> structure, to see that when it crashes it was actually > >>>>>>>>>>>>>> allocated? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hope this helps you getting started. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> //Eelco > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The following threads are in working state. So hope this can > >>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>> clues for investigation. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thread 14 (Thread 0x7fd34002b700 (LWP 91928)): > >>>>>>>>>>>>>>> #0 0x00007fd344487b6d in recvmsg () at > >>>>>>>>>>>>>>> ../sysdeps/unix/syscall-template.S:81 > >>>>>>>>>>>>>>> #1 0x0000562773cb8d03 in mp_handle () > >>>>>>>>>>>>>>> #2 0x00007fd344480e65 in start_thread (arg=0x7fd34002b700) at > >>>>>>>>>>>>>>> pthread_create.c:307 > >>>>>>>>>>>>>>> #3 0x00007fd34260988d in clone () at > >>>>>>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thread 13 (Thread 0x7fd3359d7700 (LWP 91929)): > >>>>>>>>>>>>>>> #0 0x00007fd34448799d in accept () at > >>>>>>>>>>>>>>> ../sysdeps/unix/syscall-template.S:81 > >>>>>>>>>>>>>>> #1 0x0000562773cd8f3c in socket_listener () > >>>>>>>>>>>>>>> #2 0x00007fd344480e65 in start_thread (arg=0x7fd3359d7700) at > >>>>>>>>>>>>>>> pthread_create.c:307 > >>>>>>>>>>>>>>> #3 0x00007fd34260988d in clone () at > >>>>>>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thread 6 (Thread 0x7fd304663700 (LWP 91965)): > >>>>>>>>>>>>>>> #0 0x00007fd34448771d in read () at > >>>>>>>>>>>>>>> ../sysdeps/unix/syscall-template.S:81 > >>>>>>>>>>>>>>> #1 0x00007fd343b42bfb in _mlx5dv_devx_get_event () from > >>>>>>>>>>>>>>> /lib64/libmlx5.so.1 > >>>>>>>>>>>>>>> #2 0x0000562773936d86 in mlx5_vdpa_event_handle () > >>>>>>>>>>>>>>> #3 0x00007fd344480e65 in start_thread (arg=0x7fd304663700) at > >>>>>>>>>>>>>>> pthread_create.c:307 > >>>>>>>>>>>>>>> #4 0x00007fd34260988d in clone () at > >>>>>>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thread 2 (Thread 0x7fd305730700 (LWP 91943)): > >>>>>>>>>>>>>>> #0 ccmap_find_slot_protected (count=<synthetic pointer>, > >>>>>>>>>>>>>>> hash=hash@entry=1669671676, b=b@entry=0x7fd2f8012a80) at > >>>>>>>>>>>>>>> lib/ccmap.c:278 > >>>>>>>>>>>>>>> #1 ccmap_inc_bucket_existing (b=b@entry=0x7fd2f8012a80, > >>>>>>>>>>>>>>> hash=hash@entry=1669671676, inc=inc@entry=1) at > >>>>>>>>>>>>>>> lib/ccmap.c:281 > >>>>>>>>>>>>>>> #2 0x0000562773d4b015 in ccmap_try_inc > >>>>>>>>>>>>>>> (impl=impl@entry=0x7fd2f8012a40, hash=hash@entry=1669671676, > >>>>>>>>>>>>>>> inc=inc@entry=1) at lib/ccmap.c:464 > >>>>>>>>>>>>>>> #3 0x0000562773d4b224 in ccmap_inc > >>>>>>>>>>>>>>> (ccmap=ccmap@entry=0x7fd2f802a7e8, > >>>>>>>>>>>>>>> hash=1669671676) at lib/ccmap.c:485 > >>>>>>>>>>>>>>> #4 0x0000562773d4975a in classifier_replace (cls=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> rule=rule@entry=0x7fd2fac70e28, version=<optimized out>, > >>>>>>>>>>>>>>> conjs=<optimized out>, n_conjs=<optimized out>) > >>>>>>>>>>>>>>> at lib/classifier.c:579 > >>>>>>>>>>>>>>> #5 0x0000562773d49e99 in classifier_insert (cls=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> rule=rule@entry=0x7fd2fac70e28, version=<optimized out>, > >>>>>>>>>>>>>>> conj=<optimized out>, n_conj=<optimized out>) > >>>>>>>>>>>>>>> at lib/classifier.c:694 > >>>>>>>>>>>>>>> #6 0x0000562773d00fc8 in replace_rule_start > >>>>>>>>>>>>>>> (ofproto=ofproto@entry=0x5627778cc420, > >>>>>>>>>>>>>>> ofm=ofm@entry=0x7fd3057235f0, > >>>>>>>>>>>>>>> old_rule=<optimized out>, > >>>>>>>>>>>>>>> new_rule=new_rule@entry=0x7fd2fac70e20) > >>>>>>>>>>>>>>> at ofproto/ofproto.c:5645 > >>>>>>>>>>>>>>> #7 0x0000562773d010e4 in add_flow_start > >>>>>>>>>>>>>>> (ofproto=0x5627778cc420, > >>>>>>>>>>>>>>> ofm=0x7fd3057235f0) at ofproto/ofproto.c:5256 > >>>>>>>>>>>>>>> #8 0x0000562773d0122d in modify_flows_start__ > >>>>>>>>>>>>>>> (ofproto=ofproto@entry=0x5627778cc420, > >>>>>>>>>>>>>>> ofm=ofm@entry=0x7fd3057235f0) > >>>>>>>>>>>>>>> at ofproto/ofproto.c:5824 > >>>>>>>>>>>>>>> #9 0x0000562773d01eac in modify_flow_start_strict > >>>>>>>>>>>>>>> (ofm=0x7fd3057235f0, ofproto=0x5627778cc420) at > >>>>>>>>>>>>>>> ofproto/ofproto.c:5953 > >>>>>>>>>>>>>>> #10 ofproto_flow_mod_start (ofproto=0x5627778cc420, > >>>>>>>>>>>>>>> ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:8112 > >>>>>>>>>>>>>>> #11 0x0000562773d0225a in ofproto_flow_mod_learn_start > >>>>>>>>>>>>>>> (ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:5491 > >>>>>>>>>>>>>>> #12 0x0000562773d040ad in ofproto_flow_mod_learn > >>>>>>>>>>>>>>> (ofm=ofm@entry=0x7fd3057235f0, keep_ref=<optimized out>, > >>>>>>>>>>>>>>> limit=<optimized out>, > >>>>>>>>>>>>>>> below_limitp=below_limitp@entry=0x7fd305723510) > >>>>>>>>>>>>>>> at ofproto/ofproto.c:5576 > >>>>>>>>>>>>>>> #13 0x0000562773d2641e in xlate_learn_action > >>>>>>>>>>>>>>> (ctx=ctx@entry=0x7fd305729a60, > >>>>>>>>>>>>>>> learn=learn@entry=0x562777db4618) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:5547 > >>>>>>>>>>>>>>> #14 0x0000562773d2aafb in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7232 > >>>>>>>>>>>>>>> #15 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777db4470, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #16 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #17 0x0000562773d2ab8d in xlate_ofpact_resubmit > >>>>>>>>>>>>>>> (resubmit=0x56277781db28, resubmit=0x56277781db28, > >>>>>>>>>>>>>>> resubmit=0x56277781db28, is_last_action=true, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4823 > >>>>>>>>>>>>>>> #18 do_xlate_actions (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized > >>>>>>>>>>>>>>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7107 > >>>>>>>>>>>>>>> #19 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ab9220, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #20 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #21 0x0000562773d2ab8d in xlate_ofpact_resubmit > >>>>>>>>>>>>>>> (resubmit=0x562777b9f3c8, resubmit=0x562777b9f3c8, > >>>>>>>>>>>>>>> resubmit=0x562777b9f3c8, is_last_action=true, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4823 > >>>>>>>>>>>>>>> #22 do_xlate_actions (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized > >>>>>>>>>>>>>>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7107 > >>>>>>>>>>>>>>> #23 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777d6dc90, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #24 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> ---Type <return> to continue, or q <return> to quit--- > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #25 0x0000562773d2ab8d in xlate_ofpact_resubmit > >>>>>>>>>>>>>>> (resubmit=0x562777c13c58, resubmit=0x562777c13c58, > >>>>>>>>>>>>>>> resubmit=0x562777c13c58, is_last_action=true, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4823 > >>>>>>>>>>>>>>> #26 do_xlate_actions (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized > >>>>>>>>>>>>>>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7107 > >>>>>>>>>>>>>>> #27 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562778046000, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #28 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #29 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #30 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ad0ac0, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #31 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #32 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #33 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x56277785c0d0, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #34 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #35 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #36 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777a86ce0, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #37 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #38 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #39 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x56277781b710, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #40 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #41 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=ofpacts@entry=0x562777833a38, > >>>>>>>>>>>>>>> ofpacts_len=ofpacts_len@entry=32, > >>>>>>>>>>>>>>> ctx=ctx@entry=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=is_last_action@entry=true, > >>>>>>>>>>>>>>> group_bucket_action=group_bucket_action@entry=false) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #42 0x0000562773d30f68 in clone_xlate_actions > >>>>>>>>>>>>>>> (actions=0x562777833a38, > >>>>>>>>>>>>>>> actions_len=32, ctx=0x7fd305729a60, is_last_action=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:5809 > >>>>>>>>>>>>>>> #43 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d30d70 <clone_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=true, rule=0x562777ad5640, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #44 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d30d70 > >>>>>>>>>>>>>>> <clone_xlate_actions>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #45 0x0000562773d2cf20 in patch_port_output > >>>>>>>>>>>>>>> (ctx=ctx@entry=0x7fd305729a60, out_dev=0x562777ebeec0, > >>>>>>>>>>>>>>> is_last_action=is_last_action@entry=true, > >>>>>>>>>>>>>>> in_dev=0x562777f43ee0, > >>>>>>>>>>>>>>> in_dev=0x562777f43ee0) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:3890 > >>>>>>>>>>>>>>> #46 0x0000562773d2d2f7 in compose_output_action__ > >>>>>>>>>>>>>>> (ctx=ctx@entry=0x7fd305729a60, ofp_port=2, xr=xr@entry=0x0, > >>>>>>>>>>>>>>> check_stp=check_stp@entry=true, > >>>>>>>>>>>>>>> is_last_action=is_last_action@entry=true, > >>>>>>>>>>>>>>> truncate=truncate@entry=false) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4205 > >>>>>>>>>>>>>>> #47 0x0000562773d2fdd0 in compose_output_action > >>>>>>>>>>>>>>> (truncate=false, > >>>>>>>>>>>>>>> is_last_action=true, xr=0x0, ofp_port=<optimized out>, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) at ofproto/ofproto-dpif-xlate.c:4360 > >>>>>>>>>>>>>>> #48 xlate_output_action (ctx=ctx@entry=0x7fd305729a60, > >>>>>>>>>>>>>>> port=<optimized > >>>>>>>>>>>>>>> out>, controller_len=<optimized out>, > >>>>>>>>>>>>>>> may_packet_in=may_packet_in@entry=true, > >>>>>>>>>>>>>>> is_last_action=is_last_action@entry=true, > >>>>>>>>>>>>>>> truncate=truncate@entry=false, > >>>>>>>>>>>>>>> group_bucket_action=group_bucket_action@entry=false) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:5305 > >>>>>>>>>>>>>>> ---Type <return> to continue, or q <return> to quit--- > >>>>>>>>>>>>>>> #49 0x0000562773d2972f in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:6960 > >>>>>>>>>>>>>>> #50 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ad5430, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #51 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #52 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=<optimized out>, > >>>>>>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=<optimized out>, > >>>>>>>>>>>>>>> group_bucket_action=<optimized out>) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #53 0x0000562773d26c85 in xlate_recursively > >>>>>>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>, > >>>>>>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ac2fb0, > >>>>>>>>>>>>>>> ctx=0x7fd305729a60) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4383 > >>>>>>>>>>>>>>> #54 xlate_table_action (ctx=0x7fd305729a60, > >>>>>>>>>>>>>>> in_port=<optimized out>, > >>>>>>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>, > >>>>>>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> is_last_action=true, xlator=0x562773d29490 > >>>>>>>>>>>>>>> <do_xlate_actions>) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512 > >>>>>>>>>>>>>>> #55 0x0000562773d2ac1c in do_xlate_actions > >>>>>>>>>>>>>>> (ofpacts=ofpacts@entry=0x56277797b318, > >>>>>>>>>>>>>>> ofpacts_len=ofpacts_len@entry=8, ctx=ctx@entry=0x7fd305729a60, > >>>>>>>>>>>>>>> is_last_action=is_last_action@entry=true, > >>>>>>>>>>>>>>> group_bucket_action=group_bucket_action@entry=false) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:7110 > >>>>>>>>>>>>>>> #56 0x0000562773d330d6 in xlate_actions > >>>>>>>>>>>>>>> (xin=xin@entry=0x7fd30572a920, > >>>>>>>>>>>>>>> xout=xout@entry=0x7fd30572ad38) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:7924 > >>>>>>>>>>>>>>> #57 0x0000562773d2241b in upcall_xlate (wc=0x7fd30572bfe0, > >>>>>>>>>>>>>>> odp_actions=0x7fd30572b7b0, upcall=0x7fd30572acd0, > >>>>>>>>>>>>>>> udpif=0x562777850cf0) at ofproto/ofproto-dpif-upcall.c:1340 > >>>>>>>>>>>>>>> #58 process_upcall (udpif=udpif@entry=0x562777850cf0, > >>>>>>>>>>>>>>> upcall=upcall@entry=0x7fd30572acd0, > >>>>>>>>>>>>>>> odp_actions=odp_actions@entry=0x7fd30572b7b0, > >>>>>>>>>>>>>>> wc=wc@entry=0x7fd30572bfe0) > >>>>>>>>>>>>>>> at ofproto/ofproto-dpif-upcall.c:1602 > >>>>>>>>>>>>>>> #59 0x0000562773d22c39 in upcall_cb (packet=<optimized out>, > >>>>>>>>>>>>>>> flow=0x7fd30572bd40, ufid=<optimized out>, pmd_id=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> type=<optimized out>, userdata=<optimized out>, > >>>>>>>>>>>>>>> actions=0x7fd30572b7b0, wc=0x7fd30572bfe0, > >>>>>>>>>>>>>>> put_actions=0x7fd30572b7f0, aux=0x562777850cf0) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:1461 > >>>>>>>>>>>>>>> #60 0x0000562773d52198 in dp_netdev_upcall > >>>>>>>>>>>>>>> (pmd=pmd@entry=0x7fd305731010, > >>>>>>>>>>>>>>> packet_=packet_@entry=0x21152c180, > >>>>>>>>>>>>>>> flow=flow@entry=0x7fd30572bd40, wc=wc@entry=0x7fd30572bfe0, > >>>>>>>>>>>>>>> ufid=ufid@entry=0x7fd30572b790, > >>>>>>>>>>>>>>> type=type@entry=DPIF_UC_MISS, > >>>>>>>>>>>>>>> userdata=userdata@entry=0x0, > >>>>>>>>>>>>>>> actions=actions@entry=0x7fd30572b7b0, > >>>>>>>>>>>>>>> put_actions=put_actions@entry=0x7fd30572b7f0) > >>>>>>>>>>>>>>> at lib/dpif-netdev.c:9141 > >>>>>>>>>>>>>>> #61 0x0000562773d66f13 in handle_packet_upcall > >>>>>>>>>>>>>>> (put_actions=0x7fd30572b7f0, actions=0x7fd30572b7b0, > >>>>>>>>>>>>>>> key=0x7fd30572cc40, packet=0x21152c180, pmd=0x7fd305731010) at > >>>>>>>>>>>>>>> lib/dpif-netdev.c:11303 > >>>>>>>>>>>>>>> #62 fast_path_processing (pmd=pmd@entry=0x7fd305731010, > >>>>>>>>>>>>>>> packets_=packets_@entry=0x7fd30572d0c0, > >>>>>>>>>>>>>>> keys=keys@entry=0x7fd30572cc30, > >>>>>>>>>>>>>>> flow_map=flow_map@entry=0x7fd30572cae0, > >>>>>>>>>>>>>>> index_map=index_map@entry=0x7fd30572cad0 "", > >>>>>>>>>>>>>>> in_port=<optimized > >>>>>>>>>>>>>>> out>) at lib/dpif-netdev.c:11426 > >>>>>>>>>>>>>>> #63 0x0000562773d680c1 in dp_netdev_input__ (pmd=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> packets=<optimized out>, md_is_valid=md_is_valid@entry=false, > >>>>>>>>>>>>>>> port_no=<optimized out>) at lib/dpif-netdev.c:11520 > >>>>>>>>>>>>>>> #64 0x0000562773d6a15d in dp_netdev_input (pmd=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> packets=<optimized out>, port_no=<optimized out>) at > >>>>>>>>>>>>>>> lib/dpif-netdev.c:11558 > >>>>>>>>>>>>>>> #65 0x0000562773d6a2cf in dp_netdev_process_rxq_port > >>>>>>>>>>>>>>> (pmd=pmd@entry=0x7fd305731010, rxq=0x56277796a5e0, port_no=3) > >>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>> lib/dpif-netdev.c:6660 > >>>>>>>>>>>>>>> #66 0x0000562773d6a759 in pmd_thread_main (f_=<optimized > >>>>>>>>>>>>>>> out>) at > >>>>>>>>>>>>>>> lib/dpif-netdev.c:8267 > >>>>>>>>>>>>>>> #67 0x0000562773df805f in ovsthread_wrapper (aux_=<optimized > >>>>>>>>>>>>>>> out>) at > >>>>>>>>>>>>>>> lib/ovs-thread.c:422 > >>>>>>>>>>>>>>> #68 0x00007fd344480e65 in start_thread (arg=0x7fd305730700) at > >>>>>>>>>>>>>>> pthread_create.c:307 > >>>>>>>>>>>>>>> #69 0x00007fd34260988d in clone () at > >>>>>>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thread 1 (Thread 0x7fd334307700 (LWP 91942)): > >>>>>>>>>>>>>>> #0 0x00007fd342541337 in __GI_raise (sig=sig@entry=6) at > >>>>>>>>>>>>>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:55 > >>>>>>>>>>>>>>> #1 0x00007fd342542a28 in __GI_abort () at abort.c:90 > >>>>>>>>>>>>>>> #2 0x0000562773e2c4ee in ovs_abort_valist (err_no=<optimized > >>>>>>>>>>>>>>> out>, > >>>>>>>>>>>>>>> format=<optimized out>, args=args@entry=0x7fd334302340) at > >>>>>>>>>>>>>>> lib/util.c:499 > >>>>>>>>>>>>>>> #3 0x0000562773e2c584 in ovs_abort (err_no=err_no@entry=0, > >>>>>>>>>>>>>>> format=format@entry=0x5627740cad18 "%s: %s() passed > >>>>>>>>>>>>>>> uninitialized > >>>>>>>>>>>>>>> ovs_mutex") at lib/util.c:491 > >>>>>>>>>>>>>>> #4 0x0000562773df72e1 in ovs_mutex_lock_at > >>>>>>>>>>>>>>> (l_=l_@entry=0x7fd2f907df68, where=where@entry=0x5627740a7fc0 > >>>>>>>>>>>>>>> "ofproto/ofproto-dpif-upcall.c:2214") at lib/ovs-thread.c:75 > >>>>>>>>>>>>>>> #5 0x0000562773d1e98d in ukey_delete > >>>>>>>>>>>>>>> (umap=umap@entry=0x562777853400, > >>>>>>>>>>>>>>> ukey=ukey@entry=0x7fd2f907df20) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:2214 > >>>>>>>>>>>>>>> #6 0x0000562773d202da in revalidator_sweep__ > >>>>>>>>>>>>>>> (revalidator=revalidator@entry=0x562777897b00, > >>>>>>>>>>>>>>> purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3048 > >>>>>>>>>>>>>>> #7 0x0000562773d241a6 in revalidator_sweep > >>>>>>>>>>>>>>> (revalidator=0x562777897b00) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:3072 > >>>>>>>>>>>>>>> #8 udpif_revalidator (arg=0x562777897b00) at > >>>>>>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:1086 > >>>>>>>>>>>>>>> #9 0x0000562773df805f in ovsthread_wrapper (aux_=<optimized > >>>>>>>>>>>>>>> out>) at > >>>>>>>>>>>>>>> lib/ovs-thread.c:422 > >>>>>>>>>>>>>>> #10 0x00007fd344480e65 in start_thread (arg=0x7fd334307700) at > >>>>>>>>>>>>>>> pthread_create.c:307 > >>>>>>>>>>>>>>> #11 0x00007fd34260988d in clone () at > >>>>>>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>> LIU Yulong > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Mon, Feb 19, 2024 at 8:12 PM Eelco Chaudron > >>>>>>>>>>>>>>> <echau...@redhat.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 19 Feb 2024, at 13:09, Ilya Maximets wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 2/19/24 11:14, Eelco Chaudron wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 19 Feb 2024, at 10:34, LIU Yulong wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi OVS experts, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Our ovs-vswitchd runs to core at the > >>>>>>>>>>>>>>>>>>> ovs_mutex_trylock(&ukey->mutex) in the > >>>>>>>>>>>>>>>>>>> function revalidator_sweep__. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I've sent the mail before but have no response. > >>>>>>>>>>>>>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052604.html > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So I'm trying to send this mail again. And I may > >>>>>>>>>>>>>>>>>>> apologize in advance because > >>>>>>>>>>>>>>>>>>> I would like to post as much useful information as > >>>>>>>>>>>>>>>>>>> possible to help identify > >>>>>>>>>>>>>>>>>>> potential issues. So this mail will have a really long > >>>>>>>>>>>>>>>>>>> text. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Compared to the mail 2023-August/052604.html, we upgrade > >>>>>>>>>>>>>>>>>>> the OVS to 2.17.8 > >>>>>>>>>>>>>>>>>>> and DPDK to 22.11 to pray for good luck that maybe the > >>>>>>>>>>>>>>>>>>> community has potential > >>>>>>>>>>>>>>>>>>> fixes for this issue. But unfortunately, the ovs-vswitchd > >>>>>>>>>>>>>>>>>>> still runs to core. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> As you mentioned it looks like some memory corruption, > >>>>>>>>>>>>>>>>>> which I have not seen before. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Have you tried this without rte offload? This is the only > >>>>>>>>>>>>>>>>>> feature I never used. > >>>>>>>>>>>>>>>>>> There is a 2.17.9 with DPDK 22.11.6 you could try. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> OVS 2.17 is not supposed to work with DPDK 22.11, it's > >>>>>>>>>>>>>>>>> supposed to work with 21.11. > >>>>>>>>>>>>>>>>> See the compatibility table here: > >>>>>>>>>>>>>>>>> https://docs.openvswitch.org/en/latest/faq/releases/ > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Though it's hard to tell if DPDK version is anyhow related > >>>>>>>>>>>>>>>>> to the issue. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> My mistake, I was supposed to type 21.11.6 :( But yes if > >>>>>>>>>>>>>>>> they are using 22.11, that could also be the problem. I > >>>>>>>>>>>>>>>> would suggest using the supported version and see if the > >>>>>>>>>>>>>>>> problem goes away. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> //Eelco > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best regards, Ilya Maximets. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>> > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss