Hi Georg,
On 6/25/25 1:27 PM, Georg Muck wrote:
Hi Kienan,
thanks for your quick response. :)
I replaced "--daemonize" with "--background".
I guess I should briefly describe my goal to avoid confusion: I want to
trace a ROS2-session online (tried LTTng-live, but it costs too much
ressources, hence the aggressive rotation of 1s). LTTng gives me the path
to a newly created trace-file as soon as it is created. This path is
forwarded to a Thread-Pool that does the rest (analyses each tracefile by
itself)… . However, the tracefiles are removed in the end, cause I'm not
interested in them, only the program should be interested in them (forgot
to remove it in the example). The problem only occurs when I start to many
ROS2 instances at once, meaning gathering too many tracepoints at once (at
once meaning in a very short amount of time). It works fine for 1-2
instances, but it doesn't work for 16 instances.
1. *Before any traced application is run, babeltrace2 has the following
error output:* Ah ok, thought I would be on the right track.
2. *After a traced application is run whether is produces enabled events
or not, I see the events recorded*: What do you mean by that? There is no
babeltrace2 error if traced instances are running, even tho they are not
creating events?
This is what I observed with a fresh start:
before running any traced application: bt2 error with no traced content
after running a traced application with no events: bt2 no longer
produced an error, but output 0 events (as expected).
In this case I expect that it is because the consumer started and basic
metadata was available so that was written in the snapshot. If I recall
correctly, the consumer daemon only starts when the first application
connects
after running a traced application with some events: as usual
3. *When your session script stops, the output folder is deleted*: like
described, forgot to remove, but it is on purpose
I tried *tp**:tptest*, but there is no output shown by babeltrace2 (is that
intended?):
I had the 100 events produced by gen-ust-events binary that invoked. You
ran it but the events weren't recorded? Regardless I think we're on the
same page that the session setup program you are using is fine right now.
[georg@KARTOFF8xE-zenbook /tmp/structural_traces]$ babeltrace2 archives
[georg@KARTOFF8xE-zenbook /tmp/structural_traces]$
However, I figured something out:
When I trace the "ros:rcl_node_init"-Tracepoint and start 12 ROS2
Instances, I receive the following output for the entire archive-directory:
[georg@KARTOFF8xE-zenbook /tmp/structural_traces/archives]$ babeltrace2 .
[19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(24897)
ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
}, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420, node_name =
"saturn_listener", namespace = "/hallo/saturn" }
[19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898 },
{ node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
"erde_talker", namespace = "/hallo/erde" }
All coming from archive-file number 10:
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10]$
babeltrace2 .
[19:01:41.474172508] (+?.?????????) KARTOFF8xE-zenbook:listener:(*24897*)
ros2:rcl_node_init: { cpu_id = 1 }, { procname = "listener", vpid = 24897
}, { node_handle = 0x5CCBD0EFA280, rmw_handle = 0x5CCBD107A420, node_name =
"saturn_listener", namespace = "/hallo/saturn" }
[19:01:41.493613263] (+0.019440755) KARTOFF8xE-zenbook:talker:(24898)
ros2:rcl_node_init: { cpu_id = 4 }, { procname = "talker", vpid = 24898 },
{ node_handle = 0x5A8E10DFC6E0, rmw_handle = 0x5A8E10F8AA60, node_name =
"erde_talker", namespace = "/hallo/erde" }
However, looking deeper into that file I saw that there are all 12
instances (+some of ros2 itself) I started within the trace-directory
number 10!
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
ls
listener-24897-20250625-190141 listener-24903-20250625-190141
ros2-24447-20250625-190131 ros2-24887-20250625-190141
talker-24900-20250625-190141 talker-24916-20250625-190141
listener-24899-20250625-190141 listener-24914-20250625-190141
ros2-24451-20250625-190131 talker-24896-20250625-190141
talker-24902-20250625-190141
listener-24901-20250625-190141 listener-24917-20250625-190141
ros2-24834-20250625-190134 talker-24898-20250625-190141
talker-24904-20250625-190141
BUT:
I looked into the specific directory of a traced instance (underlined it)
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
ls listener-*24897*-20250625-190141/
index structuralChannel_0 structuralChannel_2 structuralChannel_4
structuralChannel_6 structuralChannel_8 structuralChannel_10
*metadata* structuralChannel_1 structuralChannel_3 structuralChannel_5
structuralChannel_7 structuralChannel_9 structuralChannel_11
and of a non-traced instance:
[georg@KARTOFF8xE-zenbook
/tmp/structural_traces/archives/20250625T190141+0200-20250625T190142+0200-10/ust/pid]$
ls listener-24899-20250625-190141
index structuralChannel_1 structuralChannel_3
structuralChannel_5 structuralChannel_7 structuralChannel_9
structuralChannel_11
structuralChannel_0 structuralChannel_2 structuralChannel_4
structuralChannel_6 structuralChannel_8 structuralChannel_10
Turns out: There is no metadata file within the second one. Therefore
babeltrace2 is not able to parse the file. Do you have an idea why?
From your later e-mail with the updated sessiond log which shows the
PERROR for too many open files it sounds like you have a thread to pull on.
You can try using the per-uid buffering scheme to reduce the number of
FDs required when running 12 concurrent instances of ros.
Best regards,
Georg
Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
Am Mi., 25. Juni 2025 um 17:50 Uhr schrieb Kienan Stewart <
kstew...@efficios.com>:
Hi Georg,
On 6/25/25 9:36 AM, Georg Muck wrote:
Hi Kienan,
I did exactly that, without success.
The log of the lttng-sessiond (lttng.log) did not change after starting
the
ROS2 instances. I have also added the output of "lttng list
structuralSession"
I start my tracing instance as follows:
At first there is a bash-script that sets everything up, it does the
following:
lttng destroy -a # to destroy all existing sessions
lttng-sessiond --daemonize --verbose-consumer -vvv &> "/tmp/lttng.log" #
When started with '--daemonize', the file descriptors are closed[1]
after forking which is why you don't see any further output to
/tmp/lttng.log. Try with '--background' instead.
starting a new session Deamon
./exemplary_tracing_session # starting my exemplary tracing session
The exemplary tracing session is written by the help of the c Api
(appendix).
I modified your script to add 'tp:tptest' (a tracepoint from the
lttng-tools test suite program gen-ust-events) and did the following:
```
$ lttng-sessiond --background --verbose-consumer -vvv &> sessiond.log
$ ./exemplary_tracing_session &
$ lttng-tools/./tests/utils/testapp/gen-ust-events/gen-ust-events
babeltrace2 /tmp/structural_traces | wc -l
100
```
It seems to be working fine for me. A couple of notes:
1. Before any traced application is run, babeltrace2 has the following
error output: No trace was found based on input `/tmp/structural_traces`
2. After a traced application is run whether is produces enabled events
or not, I see the events recorded.
3. When your session script stops, the output folder is deleted.
I started the the same procedure on another machine. The same fault
The fault is that the babeltrace2 has an error, right? Are you willing
to share your trace that has the problem and/or show the output of
babeltrace2's error?
occurred. I think I need to configure more within the `lttngDomain`. But
I
don't know how, would you have an idea/hint?
Thanks for your help.
Best regards,
Georg
Ps.: I also added the log of my tracing tool (structural.log)
Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
Am Mo., 23. Juni 2025 um 16:26 Uhr schrieb Kienan Stewart <
kstew...@efficios.com>:
Hi Georg,
a good first step is to run lttng-session with `-vvv --verbose-consumer`
and inspect the log file and `LTTNG_UST_DEBUG=1` in the environment.
It may also be useful to run your traced applications with
`LTTNG_UST_DEBUG=1` set in the environment.
Do you have steps that reproduce the issue? E.g., all the commands to
set up the tracing session and so forth.
thanks,
kienan
On 6/19/25 10:58 AM, Georg Muck via lttng-dev wrote:
Hey,
I am using the LTTng tracer (2.13.11, C-API, rotation=1s) to trace my
ROS2
system (userspace only). Everything works just fine within my Docker
Container (Ubuntu 24.04), but if I start the same procedure on the host
itself (also Ubuntu 24.04), it does not create a TraceFile if to many
tracepoints occur (speaking of roundabout 1500+ per second) (sometimes
it
malformed the tracefile, but I was not able to recreate that. If I
remember
correctly, Babeltrace2 threw an error that said that there was no
metadata-file). I played around with some configurations (like
increasing
the buffer size from ~16kB to ~512kB or a less aggressive rotation),
but
the problem stayed.
Would be thankful for help. I guess I delivered to less information,
please
reach out what is interesting at this point, I am not sure about that
:).
Best regards,
Georg
Never burn the bridges before you
If you've burned the bridges behind
Never burn the one that you're walking upon
Or you're sure to hit the water in time. *(Savatage, Streets, 1991)*
thanks,
kienan
[1]: https://lttng.org/man/8/lttng-sessiond/v2.13/#doc-opt--daemonize