Hi Aditya,
I want to add a further detail.
While neither arm32 nor arm64 have support for quick lookups of the
current cpu id in vDSO, I think that if you are using a recent linux
kernel (>= 4.18) with a recent glibc (>= 2.32 with RSEQ_SIG defined)
sched_getcpu should use RSEQ instead, and it much faster than getcpu
syscall.
C.f.
https://github.com/bminor/glibc/commit/6e29cb3f61ff5432c78a1c84b0d9b123a350ab36
thanks,
kienan
On 6/18/24 10:00 AM, Kienan Stewart via lttng-dev wrote:
Hi Aditya,
On 6/18/24 8:55 AM, Aditya Kurdunkar via lttng-dev wrote:
Hello,
Please bear with me if this is a naive question. I am working on an
embedded ARM chip (1GB ram, 2CPUs) where I want to collect trace
events for a long duration of time. From the research that I have done
(mostly reading papers on LTTng tracing, conference talks and
documentation) I have seen it mentioned that for ARM the overhead is
greater because the system call to get the CPU is quite slow. In my
use case I am okay with not havingĀ this information. The current
benchmarks show a 3 microsecond overhead of a single tracepoint on ARM
in comparison to
Is there a specific detail that leads you to believe that getcpu is
taking the bulk of the time?
Are the performance of your embedded arm chip and the x86_64 system you
are comparing at all similar otherwise?
Regardless I think this comparison may misleading. It sounds like want
you want to measure is the time + resources required to run your
application with and without tracing on the same platform, rather than
comparing two dissimilar platforms?
Please note that the UST overhead (e.g. spawn an application, launch the
UST thread, connect to the sessiond, transfer configuration and buffer
pointers) is comparatively large for a single event, rather than over
the course of a 'long' running application.
The default behaviour is to block main program execution until the
registration completes or times out. In many cases you may want to
disable that timeout for quicker startup at the cost of potentially
losing event(s) right at the beginning. C.f. LTTNG_UST_REGISTER_TIMEOUT
in https://lttng.org/man/3/lttng-ust/v2.13/#doc-_environment_variables
150ns on a x86 machine. Hence, my question is: Is it possible to
disable recording the CPU somehow? Any suggestions for decreasing the
overhead other than this are welcome.
It is always enabled c.f.
https://lttng.org/man/3/lttng-ust/v2.13/#doc-_context_information
However, I suppose you could try to use a custom getcpu plugin, e.g.
https://github.com/lttng/lttng-ust/tree/master/doc/examples/getcpu-override to return a dummy value.
If you detailed your tracing and benchmark setup it might be possible to
provide additional guidance.
Regards,
Aditya
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
thanks,
kienan
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev