Hi,

I made some test runs and you are right: The difference is negligible,
because at least in my test with the -O3 flag the compiler produces code
that starts with the unaligned beginning, then processes the large
middle part with aligned reads and SIMD instructions and then the rest.
Even in a synthetic benchmark only over the a single function to convert
the data the difference is less than 2%.
Maybe on very old processors this is different, I tested on AMD Zen 3
and Intel Ivy Bridge. But of course SSE code will crash with direct
unaligned reads (instructions that can take a memory address as argument).

Best regards
Stefan


Am 17.03.25 um 08:37 schrieb Martijn van Beurden:
Op ma 17 mrt 2025 om 02:58 schreef Stefan Oltmanns <stefan-oltma...@gmx.net>:

Hi,

yes, SSE requires aligned buffers for operations that directly read an
operand from memory

libFLAC does unaligned SIMD all the time, both SSE and AVX, so I don't
think that is true. See

https://c9x.me/x86/html/file_module_x86_id_184.html

I'm not sure what you want from me here. On fairly modern CPUs,
unaligned memory access isn't really slower in most use cases. I
highly doubt this would really be a performance problem in your code
in any way. Maybe you can present some numbers?

Kind regards, Martijn van Beurden

_______________________________________________
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev

Reply via email to