----- On Jun 14, 2022, at 11:49 PM, Minlan Wang wangmin...@szsandstone.com wrote:
> Hi, Mathieu, > The commit on branch stable-0.12 correponds to the tarball we downloaded is > this: > > commit d5277e807192178ddb79f56ecbbd5ac3c4994f60 (HEAD -> v0.12.1.b, tag: > v0.12.1) > Author: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> > Date: Wed Apr 22 08:51:41 2020 -0400 > > Version 0.12.1 > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> > > The OS we are using is CentOS Linux release 7.9.2009 (Core), not CentOS 8.2 > as mentioned before. And the kernel version is: 3.10.0-1160.el7.x86_64. Can you reproduce with an up-to-date kernel ? AFAIU, kernel version 3.10.0-1160.el7.x86_64 was built on Mon Oct 19 18:34:26 2020 [1], whereas 3.10.0-1160.66.1.el7 was built on Wed May 18 18:19:46 2022 [2]. [1] https://rpmfind.net/linux/RPM/centos/7.9.2009/x86_64/Packages/kernel-3.10.0-1160.el7.x86_64.html [2] https://rpmfind.net/linux/RPM/centos/updates/7.9.2009/x86_64/Packages/kernel-3.10.0-1160.66.1.el7.x86_64.html I just want to make sure we are not hitting a bug in the futex system call. Can you also try to reproduce with an up-to-date Linux kernel ? (5.18.4) Can you provide a reproducer I could try on my own system ? Thanks, Mathieu > > On Tue, Jun 14, 2022 at 11:53:16AM -0400, Mathieu Desnoyers wrote: >> Also, I notice that you appear to be using an internal liburcu API (not >> public) >> from outside of the liburcu project, which is not really expected. > We are trying to move some linux kernel module function into userspace, and > found that the urcu internal workqueue.h has all the things we need for a > replace for kernel workqueue, so we decided to give it a try. > >> >> If your process forks without exec, make sure you wire up the equivalent of >> rculfhash pthread_atfork functions which call urcu_workqueue_pause_worker(), >> urcu_workqueue_resume_worker() and urcu_workqueue_create_worker(). > There's no fork/exec in the process who is calling alloc_workqueue, and the > threads who are enqueue work into the workqueue is created by calling > pthread_create. > >> >> Also, can you validate of you have many workqueue worker threads trying to >> dequeue from the same workqueue in parallel ? This is unsupported and would >> cause the kind of issues you are observing here. > The workqueue thread is created by calling urcu_workqueue_create in the code > below, and it is the only thread which will dequeue work from the workqueue. > Though, there are multiple threads who will enqueue work by calling > urcu_workqueue_queue_work(wq, work, work->func). > --- > static void workqueue_init_fn(struct urcu_workqueue *workqueue, void *priv) > { > pthread_t tid; > const char *name; > char thread_name[16] = {0}; > > if (!priv) > return; > > name = (const char *)priv; > tid = pthread_self(); > > memcpy(thread_name, name, 15); > if (pthread_setname_np(tid, thread_name)) { > pr_err("failed to set thread name for workqueue %s\n", name); > } > > urcu_memb_register_thread(); > } > > static void workqueue_finalize_fn(struct urcu_workqueue *workqueue, void > *priv) > { > urcu_memb_unregister_thread(); > if (priv) > free(priv); > } > > struct workqueue_struct *alloc_workqueue(const char *fmt, > unsigned int flags, > int max_active, ...) > { > const char *name; > > name = strdup(fmt); > if (!name) { > pr_err("failed to dup name for workqueue %s\n", fmt); > return NULL; > } > > return urcu_workqueue_create(0, -1, (void *)name, > NULL, /* grace */ > workqueue_init_fn, /* init */ > workqueue_finalize_fn, /* finalize */ > NULL, /* before wait */ > NULL, /* after wake up */ > NULL, /* before pasue */ > NULL); /* after resume */ > } > --- > > B.R > Minlan -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev