Thread cancellation is a pain point. Emulating it properly is nearly impossible (without hooking into various OS calls which are supposed to be "cancellation points"). I do like the cancellation token idea, but I'm not sure how existing clients, who rely on existing phtread_cancel() semantic, will react to that - it will require a code re-write.
How about we defer fixing this to another follow-up change? -----Original Message----- From: Dmitry Kozlyuk <dmitry.kozl...@gmail.com> Sent: Thursday, April 29, 2021 1:44 PM To: Narcisa Ana Maria Vasile <navas...@linux.microsoft.com> Cc: dev@dpdk.org; thomas <tho...@monjalon.net>; Khoa To <k...@microsoft.com>; Narcisa Ana Maria Vasile <narcisa.vas...@microsoft.com>; Dmitry Malloy <dmit...@microsoft.com>; Tyler Retzlaff <roret...@microsoft.com>; tal...@nvidia.com; Omar Cardona <ocard...@microsoft.com>; bruce.richard...@intel.com; david.march...@redhat.com; Kadam, Pallavi <pallavi.ka...@intel.com> Subject: [EXTERNAL] Re: [PATCH v6 06/10] eal: add thread lifetime management 2021-04-02 18:39 (UTC-0700), Narcisa Ana Maria Vasile: [...] > diff --git a/lib/librte_eal/windows/rte_thread.c > b/lib/librte_eal/windows/rte_thread.c > index f61103bbc..86bbd7bc2 100644 > --- a/lib/librte_eal/windows/rte_thread.c > +++ b/lib/librte_eal/windows/rte_thread.c > @@ -329,6 +329,131 @@ rte_thread_attr_set_priority(rte_thread_attr_t > *thread_attr, > return 0; > } > > +int > +rte_thread_create(rte_thread_t *thread_id, > + const rte_thread_attr_t *thread_attr, > + void *(*thread_func)(void *), void *args) { > + int ret = 0; > + HANDLE thread_handle = NULL; > + GROUP_AFFINITY thread_affinity; > + > + thread_handle = CreateThread(NULL, 0, > + (LPTHREAD_START_ROUTINE)(ULONG_PTR)thread_func, thread_func returns void* (8 bytes), LPTHREAD_START_ROUTING returns DWORD (4 byte), is this cast safe? It seems to be on x86, can't say for ARM64. > + args, 0, thread_id); > + if (thread_handle == NULL) { > + ret = rte_thread_translate_win32_error(); > + RTE_LOG_WIN32_ERR("CreateThread()"); > + goto cleanup; > + } After this point, in case of errors the thread remains running, while the function reports failure. It's better to create the thread suspended, configure its attributes (destroying the thread on failure), then resume (start) the thread. > + > + if (thread_attr != NULL) { > + if (CPU_COUNT(&thread_attr->cpuset) > 0) { > + ret = > rte_convert_cpuset_to_affinity(&thread_attr->cpuset, &thread_affinity); > + if (ret != 0) { > + RTE_LOG(DEBUG, EAL, "Unable to convert cpuset > to thread affinity\n"); > + goto cleanup; > + } > + > + if (!SetThreadGroupAffinity(thread_handle, > &thread_affinity, NULL)) { > + ret = rte_thread_translate_win32_error(); > + RTE_LOG_WIN32_ERR("SetThreadGroupAffinity()"); > + goto cleanup; > + } > + } > + ret = rte_thread_set_priority(*thread_id, > thread_attr->priority); > + if (ret != 0) { > + RTE_LOG(DEBUG, EAL, "Unable to set thread priority\n"); > + goto cleanup; > + } > + } > + > + return 0; > + > +cleanup: > + if (thread_handle != NULL) { > + CloseHandle(thread_handle); > + thread_handle = NULL; > + } > + return ret; > +} [...] > + > +int > +rte_thread_cancel(rte_thread_t thread_id) { > + int ret = 0; > + HANDLE thread_handle = NULL; > + > + thread_handle = OpenThread(THREAD_TERMINATE, FALSE, thread_id); > + if (thread_handle == NULL) { > + ret = rte_thread_translate_win32_error(); > + RTE_LOG_WIN32_ERR("OpenThread()"); > + goto cleanup; > + } > + > + /* > + * TODO: Behavior is different between POSIX and Windows threads. > + * POSIX threads wait for a cancellation point. > + * Current Windows emulation kills thread at any point. > + */ > + ret = TerminateThread(thread_handle, 0); This is not acceptable. There is a handful of pthread_cancel() usages in DPDK. The threads they cancel take spinlocks and mutexes, so interrupt at arbitrary point may cause a deadlock. User Cancellation points Notes ---- ------------------- ----- net/ipn3ke libfdt functions? net/kni usleep Linux-only raw/ifpga open/read/write/close vdpa/ifc read vdpa/mlx5 usleep, nanosleep lib/eventdev rte_epoll_wait no pthread CP Possible solutions: 1. Make specific DPDK functions cancellation points for this API and ensure DPDK users of pthread_cancel() call it. This way is almost non-invasive, but it's more work in EAL. 2. Divert from pthread style: add cancellation token concept, so that DPDK users can check themselves if they're cancelled. This is invasive, but very explicit and flexible. > + if (ret != 0) { > + ret = rte_thread_translate_win32_error(); > + RTE_LOG_WIN32_ERR("TerminateThread()"); > + goto cleanup; > + } > + > +cleanup: > + if (thread_handle != NULL) { > + CloseHandle(thread_handle); > + thread_handle = NULL; This line is not needed. > + } > + return ret; > +} > + > int > rte_thread_key_create(rte_thread_key *key, > __rte_unused void (*destructor)(void *))