Re: [lttng-dev] [PATCH lttng-modules] Add new tracepoints for dma_fence
On 2022-09-08 04:23, Rouven Czerwinski via lttng-dev wrote: Allows usage of dma_fence tracepoints from lttng. Hi Rouven, This patch looks good. Merged into the master branch of lttng-modules. Thanks, Mathieu Signed-off-by: Rouven Czerwinski --- include/instrumentation/events/dma_fence.h | 83 ++ src/probes/Kbuild | 7 ++ src/probes/lttng-probe-dma-fence.c | 32 + 3 files changed, 122 insertions(+) create mode 100644 include/instrumentation/events/dma_fence.h create mode 100644 src/probes/lttng-probe-dma-fence.c diff --git a/include/instrumentation/events/dma_fence.h b/include/instrumentation/events/dma_fence.h new file mode 100644 index ..95d94ed5 --- /dev/null +++ b/include/instrumentation/events/dma_fence.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dma_fence + +#if !defined(LTTNG_TRACE_DMA_FENCE_H) || defined(TRACE_HEADER_MULTI_READ) +#define LTTNG_TRACE_DMA_FENCE_H + +#include + +LTTNG_TRACEPOINT_EVENT_CLASS(dma_fence_class, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence), + + TP_FIELDS( + ctf_string(driver, fence->ops->get_driver_name(fence)) + ctf_string(timeline, fence->ops->get_timeline_name(fence)) + ctf_integer(unsigned int, context, fence->context) + ctf_integer(unsigned int, seqno, fence->seqno) + ) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_emit, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_init, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_destroy, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_enable_signal, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_signaled, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_wait_start, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +LTTNG_TRACEPOINT_EVENT_INSTANCE(dma_fence_class, + dma_fence_wait_end, + + TP_PROTO(struct dma_fence *fence), + + TP_ARGS(fence) +) + +#endif /* LTTNG_TRACE_DMA_FENCE_H */ + +/* This part must be outside protection */ +#include diff --git a/src/probes/Kbuild b/src/probes/Kbuild index aa002534..7597389b 100644 --- a/src/probes/Kbuild +++ b/src/probes/Kbuild @@ -97,6 +97,13 @@ endif # CONFIG_X86 obj-$(CONFIG_LTTNG) += lttng-probe-signal.o +ifneq ($(CONFIG_DMA_SHARED_BUFFER),) + obj-$(CONFIG_LTTNG) += $(shell \ +if [ $(VERSION) -ge 5 \ + -o \( $VERSION -eq 4 -a $(PATCHLEVEL) -ge 9 \) ] ; then \ + echo "lttng-probe-dma-fence.o" ; fi;) +endif # CONFIG_DMA_SHARED_BUFFER + ifneq ($(CONFIG_BLOCK),) # need blk_cmd_buf_len ifneq ($(CONFIG_EVENT_TRACING),) diff --git a/src/probes/lttng-probe-dma-fence.c b/src/probes/lttng-probe-dma-fence.c new file mode 100644 index ..a6c9cd12 --- /dev/null +++ b/src/probes/lttng-probe-dma-fence.c @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: (GPL-2.0-only or LGPL-2.1-only) + * + * probes/lttng-probe-dma-fence.c + * + * LTTng dma-fence probes. + * + * Copyright (C) 2022 Pengutronix, Rouven Czerwinski + */ + +#include +/* + * Create the tracepoint static inlines from the kernel to validate that our + * trace event macros match the kernel we run on. + */ +#include + +/* + * Create LTTng tracepoint probes. + */ +#define LTTNG_PACKAGE_BUILD +#define CREATE_TRACE_POINTS +#define TRACE_INCLUDE_PATH instrumentation/events + +#include + +MODULE_LICENSE("GPL and additional rights"); +MODULE_AUTHOR("Rouven Czerwinski "); +MODULE_DESCRIPTION("LTTng dma-fence probes"); +MODULE_VERSION(__stringify(LTTNG_MODULES_MAJOR_VERSION) "." + __stringify(LTTNG_MODULES_MINOR_VERSION) "." + __stringify(LTTNG_MODULES_PATCHLEVEL_VERSION) + LTTNG_MODULES_EXTRAVERSION); -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] [PATCH lttng-modules] Add new tracepoints for drm_scheduler
On 2022-09-08 04:25, Rouven Czerwinski via lttng-dev wrote: Allows usage of the drm_gpu_scheduler tracepoints within lttng. Signed-off-by: Rouven Czerwinski --- .../events/drm_gpu_scheduler.h| 63 +++ [...] diff --git a/src/probes/Kbuild b/src/probes/Kbuild index 7597389b..2846b0c7 100644 --- a/src/probes/Kbuild +++ b/src/probes/Kbuild @@ -104,6 +104,13 @@ ifneq ($(CONFIG_DMA_SHARED_BUFFER),) echo "lttng-probe-dma-fence.o" ; fi;) endif # CONFIG_DMA_SHARED_BUFFER +ifneq ($(CONFIG_DRM_SCHED),) + obj-$(CONFIG_LTTNG) += $(shell \ +if [ $(VERSION) -ge 5 \ + -o \( $VERSION -eq 4 -a $(PATCHLEVEL) -ge 16 \) ] ; then \ + echo "lttng-probe-drm-sched.o" ; fi;) +endif # CONFIG_DRM_SCHED + ifneq ($(CONFIG_BLOCK),) # need blk_cmd_buf_len ifneq ($(CONFIG_EVENT_TRACING),) diff --git a/src/probes/lttng-probe-drm-sched.c b/src/probes/lttng-probe-drm-sched.c new file mode 100644 index ..fe8f9cb2 --- /dev/null +++ b/src/probes/lttng-probe-drm-sched.c @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: (GPL-2.0-only or LGPL-2.1-only) + * + * probes/lttng-probe-drm-sched.c + * + * LTTng drm-sched probes. + * + * Copyright (C) 2022 Pengutronix, Rouven Czerwinski + */ +#include + This patch is missing an important piece here, see similar situation for regmap: /* * Create the tracepoint static inlines from the kernel to validate that our * trace event macros match the kernel we run on. */ #include <../../drivers/base/regmap/trace.h> and its associated checks in Kbuild: ifneq ($(CONFIG_REGMAP),) regmap_dep_4_1 = $(srctree)/drivers/base/regmap/trace.h ifneq ($(wildcard $(regmap_dep_4_1)),) obj-$(CONFIG_LTTNG) += lttng-probe-regmap.o else $(warning File $(regmap_dep_4_1) not found. Probe "regmap" is disabled. Need Linux 4.1+ kernel source tree to enable it.) endif # $(wildcard $(regmap_dep_4_1)), endif # CONFIG_REGMAP This is required to validate that the tracepoint signature we build against indeed matches the types expected by the tracepoint probe callbacks. Unfortunately, the drm-sched instrumentation is located in drivers/gpu/drm/scheduler/gpu_scheduler_trace.h which is not available with installed kernel headers. So we really need access to the kernel sources to validate this signature. If we don't validate this at compile-time, this can generate kernel modules that will crash the kernel at runtime if the tracepoint signature changes in future kernels. This is something that is not acceptable. Thanks, Mathieu +/* + * Create LTTng tracepoint probes. + */ +#define LTTNG_PACKAGE_BUILD +#define CREATE_TRACE_POINTS +#define TRACE_INCLUDE_PATH instrumentation/events + +#include + +MODULE_LICENSE("GPL and additional rights"); +MODULE_AUTHOR("Rouven Czerwinski "); +MODULE_DESCRIPTION("LTTng drm-gpu-scheduler probes"); +MODULE_VERSION(__stringify(LTTNG_MODULES_MAJOR_VERSION) "." + __stringify(LTTNG_MODULES_MINOR_VERSION) "." + __stringify(LTTNG_MODULES_PATCHLEVEL_VERSION) + LTTNG_MODULES_EXTRAVERSION); -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] URCU background threads vs signalfd
On 2022-09-22 05:15, Eric Wong via lttng-dev wrote: Hello, I'm using urcu-bp + rculfhash + call_rcu to implement malloc instrumentation (via LD_PRELOAD) on an existing single-threaded Perl codebase which uses Linux signalfd. signalfd depends on signals being blocked in all threads of the process, otherwise threads with unblocked signals can receive them and starve the signalfd. While some threads in URCU do block signals (e.g. workqueue worker for rculfhash), the call_rcu thread and rculfhash partition_resize_helper threads do not... Should all threads URCU creates block signals (aside from SIGRCU)? Yes, I think you are right. The SIGRCU signal is only needed for the urcu-signal flavor though. Would you like to submit a patch ? Thanks, Mathieu There might be other places, too, I haven't looked too closely... Anyways, I'm currently relying on this workaround to ensure initial calls to call_rcu and cds_lfht_resize are run with signals blocked: /* error-checking omitted for brevity */ __attribute__((constructor)) static void my_ctor(void) { sigset_t set, old; struct foo_hdr *h; sigfillset(&set); pthread_sigmask(SIG_SETMASK, &set, &old); g_tbl = cds_lfht_new(8192, 1, 0, CDS_LFHT_AUTO_RESIZE, 0); h = malloc(sizeof(struct foo_hdr))); if (h) /* force call_rcu to start background thread */ call_rcu(&h->as.dead, free_foo_hdr_rcu); /* start more background threads before unblocking signals */ cds_lfht_resize(g_tbl, 16384); pthread_sigmask(SIG_SETMASK, &old, NULL); } But I suspect it's better to ensure signals are blocked in all URCU-created threads... Thanks. ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
Re: [lttng-dev] URCU background threads vs signalfd
Mathieu Desnoyers wrote: > On 2022-09-22 05:15, Eric Wong via lttng-dev wrote: > > Hello, I'm using urcu-bp + rculfhash + call_rcu to implement > > malloc instrumentation (via LD_PRELOAD) on an existing > > single-threaded Perl codebase which uses Linux signalfd. > > > > signalfd depends on signals being blocked in all threads > > of the process, otherwise threads with unblocked signals > > can receive them and starve the signalfd. > > > > While some threads in URCU do block signals (e.g. workqueue > > worker for rculfhash), the call_rcu thread and rculfhash > > partition_resize_helper threads do not... > > > > Should all threads URCU creates block signals (aside from SIGRCU)? > > Yes, I think you are right. The SIGRCU signal is only needed for the > urcu-signal flavor though. > > Would you like to submit a patch ? Sure. Is there a way to detect at runtime when urcu-signal is in use so SIGRCU (SIGUSR1) doesn't get unblocked when using other flavors? I actually use SIGUSR1 in my signalfd-using codebase. I also want to remove cds_lfht_worker_init entirely since it's racy. Signal blocking needs to be done in the parent before pthread_create to avoid a window where the child has unblocked signals. Thanks. Anyways, this is my work-in-progress: diff --git a/src/rculfhash.c b/src/rculfhash.c index 7c0b9fb..5f455af 100644 --- a/src/rculfhash.c +++ b/src/rculfhash.c @@ -1251,6 +1251,7 @@ void partition_resize_helper(struct cds_lfht *ht, unsigned long i, struct partition_resize_work *work; int ret; unsigned long thread, nr_threads; + sigset_t newmask, oldmask; urcu_posix_assert(nr_cpus_mask != -1); if (nr_cpus_mask < 0 || len < 2 * MIN_PARTITION_PER_THREAD) @@ -1273,6 +1274,12 @@ void partition_resize_helper(struct cds_lfht *ht, unsigned long i, dbg_printf("error allocating for resize, single-threading\n"); goto fallback; } + + ret = sigfillset(&newmask); + urcu_posix_assert(!ret); + ret = pthread_sigmask(SIG_BLOCK, &newmask, &oldmask); + urcu_posix_assert(!ret); + for (thread = 0; thread < nr_threads; thread++) { work[thread].ht = ht; work[thread].i = i; @@ -1294,6 +1301,10 @@ void partition_resize_helper(struct cds_lfht *ht, unsigned long i, } urcu_posix_assert(!ret); } + + ret = pthread_sigmask(SIG_SETMASK, &oldmask, NULL); + urcu_posix_assert(!ret); + for (thread = 0; thread < nr_threads; thread++) { ret = pthread_join(work[thread].thread_id, NULL); urcu_posix_assert(!ret); diff --git a/src/urcu-call-rcu-impl.h b/src/urcu-call-rcu-impl.h index e9366b4..9f85d55 100644 --- a/src/urcu-call-rcu-impl.h +++ b/src/urcu-call-rcu-impl.h @@ -434,6 +434,7 @@ static void call_rcu_data_init(struct call_rcu_data **crdpp, { struct call_rcu_data *crdp; int ret; + sigset_t newmask, oldmask; crdp = malloc(sizeof(*crdp)); if (crdp == NULL) @@ -448,9 +449,18 @@ static void call_rcu_data_init(struct call_rcu_data **crdpp, crdp->gp_count = 0; cmm_smp_mb(); /* Structure initialized before pointer is planted. */ *crdpp = crdp; + + ret = sigfillset(&newmask); + urcu_posix_assert(!ret); + ret = pthread_sigmask(SIG_BLOCK, &newmask, &oldmask); + urcu_posix_assert(!ret); + ret = pthread_create(&crdp->tid, NULL, call_rcu_thread, crdp); if (ret) urcu_die(ret); + + ret = pthread_sigmask(SIG_SETMASK, &oldmask, NULL); + urcu_posix_assert(!ret); } /* diff --git a/src/urcu-defer-impl.h b/src/urcu-defer-impl.h index b5d7926..1c96287 100644 --- a/src/urcu-defer-impl.h +++ b/src/urcu-defer-impl.h @@ -409,9 +409,18 @@ void defer_rcu(void (*fct)(void *p), void *p) static void start_defer_thread(void) { int ret; + sigset_t newmask, oldmask; + + ret = sigfillset(&newmask); + urcu_posix_assert(!ret); + ret = pthread_sigmask(SIG_BLOCK, &newmask, &oldmask); + urcu_posix_assert(!ret); ret = pthread_create(&tid_defer, NULL, thr_defer, NULL); urcu_posix_assert(!ret); + + ret = pthread_sigmask(SIG_SETMASK, &oldmask, NULL); + urcu_posix_assert(!ret); } static void stop_defer_thread(void) diff --git a/src/workqueue.c b/src/workqueue.c index b6361ad..1039d72 100644 --- a/src/workqueue.c +++ b/src/workqueue.c @@ -284,6 +284,7 @@ struct urcu_workqueue *urcu_workqueue_create(unsigned long flags, { struct urcu_workqueue *workqueue; int ret; + sigset_t newmask, oldmask; workqueue = malloc(sizeof(*workqueue)); if (workqueue == NULL) @@ -304,10 +305,20 @@ struct urcu_workqueue *urcu_workqueue_create(unsigned long flags, workqueue->cpu_affinity = cpu_affinity; workqueue->loop_count = 0; cmm_smp_mb(); /* Structure initialized before pointer i
Re: [lttng-dev] URCU background threads vs signalfd
On 2022-09-23 13:55, Eric Wong wrote: Mathieu Desnoyers wrote: On 2022-09-22 05:15, Eric Wong via lttng-dev wrote: Hello, I'm using urcu-bp + rculfhash + call_rcu to implement malloc instrumentation (via LD_PRELOAD) on an existing single-threaded Perl codebase which uses Linux signalfd. signalfd depends on signals being blocked in all threads of the process, otherwise threads with unblocked signals can receive them and starve the signalfd. While some threads in URCU do block signals (e.g. workqueue worker for rculfhash), the call_rcu thread and rculfhash partition_resize_helper threads do not... Should all threads URCU creates block signals (aside from SIGRCU)? Yes, I think you are right. The SIGRCU signal is only needed for the urcu-signal flavor though. Would you like to submit a patch ? Sure. Is there a way to detect at runtime when urcu-signal is in use so SIGRCU (SIGUSR1) doesn't get unblocked when using other flavors? I actually use SIGUSR1 in my signalfd-using codebase. I also want to remove cds_lfht_worker_init entirely since it's racy. Signal blocking needs to be done in the parent before pthread_create to avoid a window where the child has unblocked signals. Thanks. Anyways, this is my work-in-progress: Perhaps with this on top of your wip patch ? The idea is to always block all signals before creating threads, and only unblock SIGRCU when registering a urcu-signal thread. (compile-tested only) diff --git a/src/rculfhash.c b/src/rculfhash.c index 5f455af3..a41cac83 100644 --- a/src/rculfhash.c +++ b/src/rculfhash.c @@ -2174,29 +2174,6 @@ static struct urcu_atfork cds_lfht_atfork = { .after_fork_child = cds_lfht_after_fork_child, }; -/* - * Block all signals for the workqueue worker thread to ensure we don't - * disturb the application. The SIGRCU signal needs to be unblocked for - * the urcu-signal flavor. - */ -static void cds_lfht_worker_init( - struct urcu_workqueue *workqueue __attribute__((unused)), - void *priv __attribute__((unused))) -{ - int ret; - sigset_t mask; - - ret = sigfillset(&mask); - if (ret) - urcu_die(errno); - ret = sigdelset(&mask, SIGRCU); - if (ret) - urcu_die(errno); - ret = pthread_sigmask(SIG_SETMASK, &mask, NULL); - if (ret) - urcu_die(ret); -} - static void cds_lfht_init_worker(const struct rcu_flavor_struct *flavor) { flavor->register_rculfhash_atfork(&cds_lfht_atfork); @@ -2205,7 +2182,7 @@ static void cds_lfht_init_worker(const struct rcu_flavor_struct *flavor) if (cds_lfht_workqueue_user_count++) goto end; cds_lfht_workqueue = urcu_workqueue_create(0, -1, NULL, - NULL, cds_lfht_worker_init, NULL, NULL, NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL, NULL, NULL); end: mutex_unlock(&cds_lfht_fork_mutex); } diff --git a/src/urcu.c b/src/urcu.c index 59f2e8f1..cf4d6d03 100644 --- a/src/urcu.c +++ b/src/urcu.c @@ -110,6 +110,8 @@ static int init_done; void __attribute__((constructor)) rcu_init(void); void __attribute__((destructor)) rcu_exit(void); + +static DEFINE_URCU_TLS(int, rcu_signal_was_blocked); #endif /* @@ -537,8 +539,52 @@ int rcu_read_ongoing(void) return _rcu_read_ongoing(); } +#ifdef RCU_SIGNAL +/* + * Make sure the signal used by the urcu-signal flavor is unblocked + * while the thread is registered. + */ +static +void urcu_signal_unblock(void) +{ + sigset_t mask, oldmask; + int ret; + + ret = sigemptyset(&mask); + urcu_posix_assert(!ret); + ret = sigaddset(&mask, SIGRCU); + urcu_posix_assert(!ret); + ret = pthread_sigmask(SIG_UNBLOCK, &mask, &oldmask); + urcu_posix_assert(!ret); + URCU_TLS(rcu_signal_was_blocked) = sigismember(&oldmask, SIGRCU); +} + +static +void urcu_signal_restore(void) +{ + sigset_t mask; + int ret; + + if (!URCU_TLS(rcu_signal_was_blocked)) + return; + ret = sigemptyset(&mask); + urcu_posix_assert(!ret); + ret = sigaddset(&mask, SIGRCU); + urcu_posix_assert(!ret); + ret = pthread_sigmask(SIG_BLOCK, &mask, NULL); + urcu_posix_assert(!ret); +} +#else +static +void urcu_signal_unblock(void) { } +static +void urcu_signal_restore(void) { } +#endif + void rcu_register_thread(void) { + urcu_signal_unblock(); + URCU_TLS(rcu_reader).tid = pthread_self(); urcu_posix_assert(URCU_TLS(rcu_reader).need_mb == 0); urcu_posix_assert(!(URCU_TLS(rcu_reader).ctr & URCU_GP_CTR_NEST_MASK)); @@ -558,6 +604,8 @@ void rcu_unregister_thread(void) URCU_TLS(rcu_reader).registered = 0; cds_list_del(&URCU_TLS(rcu_reader).node); mutex_unlock(&rcu_registry_lock); + + urcu_signal_restore(); } #ifdef RCU_MEMBARRIER -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com ___ lttng-d
Re: [lttng-dev] URCU background threads vs signalfd
Mathieu Desnoyers wrote: > On 2022-09-23 13:55, Eric Wong wrote: > > Mathieu Desnoyers wrote: > > > On 2022-09-22 05:15, Eric Wong via lttng-dev wrote: > > > > Hello, I'm using urcu-bp + rculfhash + call_rcu to implement > > > > malloc instrumentation (via LD_PRELOAD) on an existing > > > > single-threaded Perl codebase which uses Linux signalfd. > > > > > > > > signalfd depends on signals being blocked in all threads > > > > of the process, otherwise threads with unblocked signals > > > > can receive them and starve the signalfd. > > > > > > > > While some threads in URCU do block signals (e.g. workqueue > > > > worker for rculfhash), the call_rcu thread and rculfhash > > > > partition_resize_helper threads do not... > > > > > > > > Should all threads URCU creates block signals (aside from SIGRCU)? > > > > > > Yes, I think you are right. The SIGRCU signal is only needed for the > > > urcu-signal flavor though. > > > > > > Would you like to submit a patch ? > > > > Sure. > > > > Is there a way to detect at runtime when urcu-signal is in use > > so SIGRCU (SIGUSR1) doesn't get unblocked when using other flavors? > > > > I actually use SIGUSR1 in my signalfd-using codebase. > > > > I also want to remove cds_lfht_worker_init entirely since it's racy. > > Signal blocking needs to be done in the parent before pthread_create > > to avoid a window where the child has unblocked signals. > > > > Thanks. Anyways, this is my work-in-progress: > > > > Perhaps with this on top of your wip patch ? The idea is to always block all > signals before creating threads, and only unblock SIGRCU when registering a > urcu-signal thread. (compile-tested only) Thanks, that makes sense. It passes: make check short_bench My original signalfd + urcu-bp case works well, too, with my constructor workarounds reverted. (I ported our patches ported to to 0.10.2 for Debian buster (oldstable)). I don't know if the existing test coverage is sufficient, though. Waiting on regtest... ___ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev