Re: Malformed Tracefiles

Kienan Stewart via lttng-dev Wed, 25 Jun 2025 13:52:34 -0700

Hi Georg,

On 6/25/25 1:27 PM, Georg Muck wrote:

Hi Kienan,


thanks for your quick response. :)

I replaced "--daemonize" with "--background".

I guess I should briefly describe my goal to avoid confusion: I want to
trace a ROS2-session online (tried LTTng-live, but it costs too much
ressources, hence the aggressive rotation of 1s). LTTng gives me the path
to a newly created trace-file as soon as it is created. This path is
forwarded to a Thread-Pool that does the rest (analyses each tracefile by
itself)… . However, the tracefiles are removed in the end, cause I'm not
interested in them, only the program should be interested in them (forgot
to remove it in the example). The problem only occurs when I start to many
ROS2 instances at once, meaning gathering too many tracepoints at once (at
once meaning in a very short amount of time). It works fine for 1-2
instances, but it doesn't work for 16 instances.

1. *Before any traced application is run, babeltrace2 has the following

error output:* Ah ok, thought I would be on the right track.

2. *After a traced application is run whether is produces enabled events

or not, I see the events recorded*: What do you mean by that? There is no
babeltrace2 error if traced instances are running, even tho they are not
creating events?


This is what I observed with a fresh start:

before running any traced application: bt2 error with no traced content

after running a traced application with no events: bt2 no longerproduced an error, but output 0 events (as expected).

In this case I expect that it is because the consumer started and basicmetadata was available so that was written in the snapshot. If I recallcorrectly, the consumer daemon only starts when the first applicationconnects


after running a traced application with some events: as usual

3. *When your session script stops, the output folder is deleted*: like

described, forgot to remove, but it is on purpose

I tried *tp**:tptest*, but there is no output shown by babeltrace2 (is that
intended?):

I had the 100 events produced by gen-ust-events binary that invoked. Youran it but the events weren't recorded? Regardless I think we're on thesame page that the session setup program you are using is fine right now.

[georg@KARTOFF8xE-zenbook /tmp/structural_traces]$ babeltrace2 archives
[georg@KARTOFF8xE-zenbook /tmp/structural_traces]$


However, I figured something out:
When I trace the "ros:rcl_node_init"-Tracepoint and start 12 ROS2
Instances, I receive the following output for the entire archive-directory:
[georg@KARTOFF8xE-zenbook /tmp/structural_traces/archives]$ babeltrace2 .

[19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(24897)
ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
}, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420, node_name =
"saturn_listener", namespace = "/hallo/saturn" }
[19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898 },
{ node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
"erde_talker", namespace = "/hallo/erde" }

All coming from archive-file number 10:
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10]$
babeltrace2 .

[19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(*24897*)
ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
}, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420, node_name =
"saturn_listener", namespace = "/hallo/saturn" }
[19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898 },
{ node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
"erde_talker", namespace = "/hallo/erde" }

However, looking deeper into that file I saw that there are all 12
instances (+some of ros2 itself) I started within the trace-directory
number 10!
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
ls

listener-24897-20250625-190141  listener-24903-20250625-190141
  ros2-24447-20250625-190131  ros2-24887-20250625-190141
  talker-24900-20250625-190141  talker-24916-20250625-190141
listener-24899-20250625-190141  listener-24914-20250625-190141
  ros2-24451-20250625-190131  talker-24896-20250625-190141
  talker-24902-20250625-190141
listener-24901-20250625-190141  listener-24917-20250625-190141
  ros2-24834-20250625-190134  talker-24898-20250625-190141
  talker-24904-20250625-190141

BUT:
I looked into the specific directory of a traced instance (underlined it)
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
ls listener-*24897*-20250625-190141/

index     structuralChannel_0  structuralChannel_2  structuralChannel_4
  structuralChannel_6  structuralChannel_8  structuralChannel_10
*metadata*  structuralChannel_1  structuralChannel_3  structuralChannel_5
  structuralChannel_7  structuralChannel_9  structuralChannel_11

and of a non-traced instance:
[georg@KARTOFF8xE-zenbook       
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
ls listener-24899-20250625-190141

index                structuralChannel_1  structuralChannel_3
  structuralChannel_5  structuralChannel_7  structuralChannel_9
structuralChannel_11
structuralChannel_0  structuralChannel_2  structuralChannel_4
  structuralChannel_6  structuralChannel_8  structuralChannel_10

Turns out: There is no metadata file within the second one. Therefore
babeltrace2 is not able to parse the file. Do you have an idea why?

From your later e-mail with the updated sessiond log which shows thePERROR for too many open files it sounds like you have a thread to pull on.

You can try using the per-uid buffering scheme to reduce the number ofFDs required when running 12 concurrent instances of ros.

Best regards,
Georg

Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*


Am Mi., 25. Juni 2025 um 17:50 Uhr schrieb Kienan Stewart <
kstew...@efficios.com>:

Hi Georg,

On 6/25/25 9:36 AM, Georg Muck wrote:

Hi Kienan,

I did exactly that, without success.

The log of the lttng-sessiond (lttng.log) did not change after starting

the

ROS2 instances. I have also added the output of "lttng list
structuralSession"
I start my tracing instance as follows:

At first there is a bash-script that sets everything up, it does the
following:

lttng destroy -a # to destroy all existing sessions
lttng-sessiond --daemonize --verbose-consumer -vvv &> "/tmp/lttng.log" #


When started with '--daemonize', the file descriptors are closed[1]
after forking which is why you don't see any further output to
/tmp/lttng.log. Try with '--background' instead.

starting a new session Deamon
./exemplary_tracing_session # starting my exemplary tracing session

The exemplary tracing session is written by the help of the c Api
(appendix).


I modified your script to add 'tp:tptest' (a tracepoint from the
lttng-tools test suite program gen-ust-events) and did the following:

```
$ lttng-sessiond --background --verbose-consumer -vvv &> sessiond.log
$ ./exemplary_tracing_session &
$ lttng-tools/./tests/utils/testapp/gen-ust-events/gen-ust-events
babeltrace2 /tmp/structural_traces | wc -l
100
```

It seems to be working fine for me. A couple of notes:

1. Before any traced application is run, babeltrace2 has the following
error output: No trace was found based on input `/tmp/structural_traces`
2. After a traced application is run whether is produces enabled events
or not, I see the events recorded.
3. When your session script stops, the output folder is deleted.

I started the the same procedure on another machine. The same fault


The fault is that the babeltrace2 has an error, right? Are you willing
to share your trace that has the problem and/or show the output of
babeltrace2's error?

occurred. I think I need to configure more within the `lttngDomain`. But

don't know how, would you have an idea/hint?

Thanks for your help.

Best regards,
Georg

Ps.: I also added the log of my tracing tool (structural.log)

Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*


Am Mo., 23. Juni 2025 um 16:26 Uhr schrieb Kienan Stewart <
kstew...@efficios.com>:

Hi Georg,

a good first step is to run lttng-session with `-vvv --verbose-consumer`
and inspect the log file and `LTTNG_UST_DEBUG=1` in the environment.

It may also be useful to run your traced applications with
`LTTNG_UST_DEBUG=1` set in the environment.

Do you have steps that reproduce the issue? E.g., all the commands to
set up the tracing session and so forth.

thanks,
kienan

On 6/19/25 10:58 AM, Georg Muck via lttng-dev wrote:

Hey,

I am using the LTTng tracer (2.13.11, C-API, rotation=1s) to trace my

ROS2

system (userspace only). Everything works just fine within my Docker
Container (Ubuntu 24.04), but if I start the same procedure on the host
itself (also Ubuntu 24.04), it does not create a TraceFile if to many
tracepoints occur (speaking of roundabout 1500+ per second) (sometimes

it

malformed the tracefile, but I was not able to recreate that. If I

remember

correctly, Babeltrace2 threw an error that said that there was no
metadata-file). I played around with some configurations (like

increasing

the buffer size from ~16kB to ~512kB or a less aggressive rotation),

but

the problem stayed.

Would be thankful for help. I guess I delivered to less information,

please

reach out what is interesting at this point, I am not sure about that

:).


Best regards,
Georg



Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*


thanks,
kienan

[1]: https://lttng.org/man/8/lttng-sessiond/v2.13/#doc-opt--daemonize

Re: Malformed Tracefiles

Reply via email to