[lttng-dev] Unexport of kvm_x86_ops vs tracer modules

2022-04-08 Thread Mathieu Desnoyers via lttng-dev
Hi Sean, Hi Paolo,

I have a question regarding a unexport of kvm_x86_ops that made its
way into 5.18-rc (commit dfc4e6ca04 ("KVM: x86: Unexport kvm_x86_ops").
This is in the context of tracing. Especially, LTTng implements probes
for x86 kvm events, e.g. x86 kvm_exit. It receives a struct kvm_vcpu *
as parameter, and uses kvm_x86_ops.get_exit_info() to translate this
into meaningful fields.

LTTng is an out of tree kernel module, which currently relies on the export.
Indeed, arch/x86/kvm/x86.c exports a set of tracepoints to kernel modules, e.g.:

EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry)

But any probe implementation hooking on that tracepoint would need kvm_x86_ops
to translate the struct kvm_vcpu * into meaningful tracing data.

I could work-around this on my side in ugly ways, but I would like to discuss
how kernel module tracers are expected to implement kvm events probes without
the kvm_x86_ops symbol ? Perhaps there is an alternative way to convert the
fields in this structure to meaningful information without using the
kvm_x86_ops callbacks that I am not aware of ?

The LTTng kernel tracer uses get_exit_info() and get_segment_base() callbacks
from kvm_x86_ops.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Unexport of kvm_x86_ops vs tracer modules

2022-04-08 Thread Paolo Bonzini via lttng-dev

On 4/8/22 17:36, Mathieu Desnoyers wrote:

LTTng is an out of tree kernel module, which currently relies on the export.
Indeed, arch/x86/kvm/x86.c exports a set of tracepoints to kernel modules, e.g.:

EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry)

But any probe implementation hooking on that tracepoint would need kvm_x86_ops
to translate the struct kvm_vcpu * into meaningful tracing data.

I could work-around this on my side in ugly ways, but I would like to discuss
how kernel module tracers are expected to implement kvm events probes without
the kvm_x86_ops symbol ?


The conversion is done in the TP_fast_assign snippets, which are part of 
kvm.ko and therefore do not need the export.  As I understand it, the 
issue is that LTTng cannot use the TP_fast_assign snippets, because they 
are embedded in the trace_event_raw_event_* symbols?


We cannot do the extraction before calling trace_kvm_exit, because it's 
expensive.


Paolo

___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Unexport of kvm_x86_ops vs tracer modules

2022-04-08 Thread Mathieu Desnoyers via lttng-dev
- On Apr 8, 2022, at 12:24 PM, Paolo Bonzini pbonz...@redhat.com wrote:

> On 4/8/22 17:36, Mathieu Desnoyers wrote:
>> LTTng is an out of tree kernel module, which currently relies on the export.
>> Indeed, arch/x86/kvm/x86.c exports a set of tracepoints to kernel modules, 
>> e.g.:
>> 
>> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry)
>> 
>> But any probe implementation hooking on that tracepoint would need 
>> kvm_x86_ops
>> to translate the struct kvm_vcpu * into meaningful tracing data.
>> 
>> I could work-around this on my side in ugly ways, but I would like to discuss
>> how kernel module tracers are expected to implement kvm events probes without
>> the kvm_x86_ops symbol ?
> 
> The conversion is done in the TP_fast_assign snippets, which are part of
> kvm.ko and therefore do not need the export.  As I understand it, the
> issue is that LTTng cannot use the TP_fast_assign snippets, because they
> are embedded in the trace_event_raw_event_* symbols?

Indeed, the fact that the TP_fast_assign snippets are embedded in the
trace_event_raw_event_* symbols is an issue for LTTng. This ties those
to ftrace.

AFAIK, TP_fast_assign copies directly into ftrace ring buffers, and then
afterwards things like dynamic filters are applied, which then "uncommits" the
events if need be (and if possible). Also, TP_fast_assign is tied to the
ftrace ring buffer event layout. The fact that the TP_STRUCT__entry() 
(description)
and TP_fast_assign() (open-coded C) are separate fields really focuses on a
use-case where all data is serialized to a ring buffer.

In LTTng, the event fields are made available to a filter interpreter prior to
being copied into LTTng's ring buffer. This is made possible by implementing
our own LTTNG_TRACEPOINT_EVENT code generation headers. In addition, we have
recently released an event notification mechanism (lttng 2.13) which captures
specific event fields to send with an immediate notification (thus bypassing the
tracer buffering). We are also currently working on a LTTng trace hit counters
mechanism, which performs aggregation through per-cpu counters, which doesn't
even allocate a ring buffer.

For those reasons, LTTng reimplements its own tracepoint probe callbacks. All
those sit within LTTng kernel modules, which means we currently need the 
exported
kvm_x86_ops callbacks.

> We cannot do the extraction before calling trace_kvm_exit, because it's
> expensive.

I suspect that extracting relevant data prior to calling trace_kvm_exit
is too expensive because it cannot be skipped when the tracepoint is
disabled. This is because trace_kvm_exit() is a static inline function,
and the check to figure out if the event is enabled is within that function.
Unfortunately, even if the tracepoint is disabled, the side-effects of the
parameters passed to trace_kvm_exit() must happen.

I've solved this in LTTng-UST by implementing a lttng_ust_tracepoint()
macro, which basically "lifts" the tracepoint enabled check before the
evaluation of the arguments.

You could achieve something similar by using trace_kvm_exit_enabled() in the
kernel like so:

  if (trace_kvm_exit_enabled())
  trace_kvm_exit();

Which would skip evaluation of the argument side-effects when the tracepoint is
disabled.

By doing that, when multiple tracers are attached to a kvm tracepoint, the
translation from pointer-to-internal-structure to meaningful fields would only
need to be done once when a tracepoint is hit. And this would remove the need
for using kvm_x86_ops callbacks from tracer probe functions.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev