Re: [lttng-dev] [EXTERNAL] Re: LTTng enable-event performance issue during active session

Kienan Stewart via lttng-dev Tue, 17 Oct 2023 08:39:33 -0700

Hi Zach,

the change to address the enable-event performance has been merged intoboth lttng-ust master and stable-2.13 if you would like to build fromsource before the next release.


thanks,
kienan


On 2023-10-09 06:29, Kramer, Zach wrote:

Hi Kienan,

Thank you for the thorough explanation and proposed changeset.

My use case was to synchronize with a user-configurable set of events at 
runtime, which was easiest done individually.
In the meantime I worked around the performance hit by ensuring all available 
events have already been created on the session before starting it.

Looking forward to the changeset.

Thanks,
Zach

-----Original Message-----
From: Kienan Stewart <kstew...@efficios.com>
Sent: Friday, 6 October 2023 18:13
To: lttng-dev@lists.lttng.org; Kramer, Zach <zach.kra...@cognex.com>
Subject: [EXTERNAL] Re: [lttng-dev] LTTng enable-event performance issue during 
active session

Hi Zach,

there's a proposed change to lttng-ust which addresses the issue you are seeing, 
but it hasn't finished clearing review & testing yet.
https://review.lttng.org/c/lttng-ust/+/11006

thanks,
kienan

On 2023-10-05 18:04, Kienan Stewart via lttng-dev wrote:

Hi Zach

apologies for the delay in replying to your question.

On 2023-09-26 08:27, Kramer, Zach via lttng-dev wrote:

Hi,

I am observing a performance issue with regards to enabling events
while a session is active and was wondering if this is expected.

LTTng versions:

   * lttng-tools: 2.13.9
   * lttng-ust: 2.13.6
Steps to reproduce:

  1. Ensure many userspace tracepoints are available in `lttng list -u`
     e.g. 100
  2. Create a new session
  3. Start session
  4. Enable new events on session

The time it takes to enable each new event has increasing cost e.g.

  1. Event 1: 1ms
  2. Event 100: 15ms
  3. Event 1000: 150ms
  4. àin total about 1.5 minutes to enable 1000 events

If either:

  1. No userspace tracepoints are available  2. Or session is not
started until /after /the events are enabled

Then the time it takes to enable new events is constant (e.g. 1ms).

Below is a bash script demonstrating this behavior:

# Pre-requisite: have many userspace tracepoints available

lttng create foo

lttng enable-channel -u -s foo bar

lttng start foo

total_t1=$(date +%s%3N);

for i in {1..100}

do

          t1=$(date +%s%3N);

          lttng enable-event -u lttng_iter_$i -s foo -c bar >
/dev/null

          t2=$(date +%s%3N);

          echo "Event #$i took $((t2-t1)) ms"

done

total_t2=$(date +%s%3N);

echo "----------------------------------------------------"

echo "Enabling events on active session took $((total_t2-total_t1)) ms"

echo "----------------------------------------------------"

lttng destroy foo

lttng create foo

lttng enable-channel -u -s foo bar

total_t1=$(date +%s%3N);

for i in {1..100}

do

          t1=$(date +%s%3N);

          lttng enable-event -u lttng_iter_$i -s foo -c bar >
/dev/null

          t2=$(date +%s%3N);

          echo "Event #$i took $((t2-t1)) ms"

done

total_t2=$(date +%s%3N);

echo "----------------------------------------------------"

echo "Enabling events on inactive session took $((total_t2-total_t1)) ms"

echo "----------------------------------------------------"

lttng destroy foo

Is this reproducible for you? Any insight is appreciated.


I'm able to reproduce what you're seeing, and it's not completely
unexpected based on my understanding of the architecture of lttng.

When a session is active and an event rule is enabled/disabled, the
sessiond will notify each of the registered applications of the event
rule state changes. This communication is going through either unix or
tcp sockets, and that portion of the protocol seems to struggle with
many small changes as the protocol to communicate the changes doesn't
support batching.

Enabling multiple events in a single call reduces some of the
overhead, but the changes will still be communicated serially. This
means a single call activating thousands events can still take a long
time to process (eg. `lttng enable-event -s foo -c bar -u
tp0,tp1,...,tpN`)

Using glob patterns or `--all` will be significantly faster as the UST
applications can digest those types of event rule changes with a
single set of messages. In cases where you want most but not all
events, flipping the logic to enable "*" but add events or patterns to
omit with "--exclude" is going to be a better strategy.

Do you have cases where you need to activate many of events but which
couldn't be covered by using glob patterns and/or exclusions?

thanks,
kienan

Many thanks,

Zach


_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org

  _______________________________________________

lttng-dev mailing list
lttng-dev@lists.lttng.org

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] [EXTERNAL] Re: LTTng enable-event performance issue during active session

Reply via email to