Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

Mathieu Desnoyers via lttng-dev Tue, 31 Jan 2023 08:18:20 -0800

On 2023-01-31 11:08, Mathieu Desnoyers wrote:

On 2023-01-30 01:50, Beckius, Mikael via lttng-dev wrote:
Hello Matthieu!
I have looked at this in place of Anders and as far as I can tell thisis not an arm64 issue but an arm issue. And even on arm__ARM_FEATURE_UNALIGNED is 1 so it seems the problem only occurs ifsize equals 8.
So for ARM, perhaps we should do the following in include/lttng/ust-arch.h:

#if defined(LTTNG_UST_ARCH_ARM) && defined(__ARM_FEATURE_UNALIGNED)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif
And refer tohttps://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options
Based on that documentation, it is possible to build with-mno-unaligned-access,
and for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures,
unaligned accesses are not enabled.

I would only push this kind of change into the master branch though, due to
its impact and the fact that this is only a performance improvement.


But setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for arm32
when __ARM_FEATURE_UNALIGNED is defined would still cause issues for
8-byte lttng_inline_memcpy with my proposed patch right ?

AFAIU 32-bit arm with __ARM_FEATURE_UNALIGNED has unaligned accesses for
2 and 4 bytes accesses, but somehow traps for unaligned 8-bytes
accesses ?

Thanks,

Mathieu

In addition I did some performance testing of lttng_inline_memcpy byextracting it and adding it to a simple test program. It appears thatthe general performance increases on arm, arm64, arm on arm64 hardwareand x86-64. But it also appears that on arm if you end up in memcpythe old code where you call memcpy directly is actually slightly faster.
Nothing unexpected here. Just make sure that your test program does notcall lttng_inline_memcpywith constant size values which end up optimizing away branches. In thecontext where lttng_inline_memcpy
is used, most of the time its arguments are not constants.
Skipping the memcpy fallback on arm for unaligned copies of sizes 2and 4 further improves the performance
This would be naturally done on your board if we conditionally
set LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 for__ARM_FEATURE_UNALIGNED
right ?
and setting LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1 yields thebest performance on arm64.
This could go into lttng-ust master branch as well, e.g.:

#if defined(LTTNG_UST_ARCH_AARCH64)
#define LTTNG_UST_ARCH_HAS_EFFICIENT_UNALIGNED_ACCESS 1
#endif

Thanks!

Mathieu
Micke
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] lttng-consumerd crash on aarch64 due to x86 arch specific optimization

Reply via email to