On Tue, Sep 23, 2025 at 1:06 PM Damien Stewart <[email protected]> wrote: > > On 22/9/25 4:33 am, John Paul Adrian Glaubitz wrote: > > Modern compilers are already extremely good at optimizing code such that > > they > > use specific CPU extensions such that you often don't need handwritten > > assembly > > for optimal performance. > > [...] > > You've likely read an article about The Byte Order Fallacy. While I > agree in principle I disagree in practice. No one in this day and age or > any other I can imagine would split a scalar load/store into discrete > parts by the byte to be cross portable. It also doesn't produce > efficient code.
As someone who has written a lot of SIMD code and cryptographic code -- and timed the operations -- I can say it does matter in practice. It may be the operation is a nop for one endian, and a byte swap for another endian. But it does matter. > Even on x64 the latest GCC doesn't even know or doesn't > figure out what code is doing that reads data bytes and shifts it in > place. For either endian. It doesn't see what is going on to reduce it > to direct access or a movebe. It lacks endian AI. Yeah, it is hit or miss whether Clang and GCC will produce optimal code for some endian related patterns, like reading a byte array and turning it into a machine word. Jeff

