I'm thinking of applying a patch like this
--- i/zdbsp.h +++ w/zdbsp.h @@ -237,23 +237,47 @@ inline fixed_t DMulScale32 (fixed_t a, fixed_t b, fixed_t c, fixed_t d) #define LittleShort(x) CFSwapInt16LittleToHost(x) #define LittleLong(x) CFSwapInt32LittleToHost(x) #else -#ifdef __BIG_ENDIAN__ +^M +inline char is_little_endian()^M +{^M + const unsigned short one = 1;^M + return *(char *)(&one) == 1;^M +}^M// Swap 16bit, that is, MSB and LSB byte. // No masking with 0xFF should be necessary. inline short LittleShort (short x){ + if(is_little_endian()) {^M + return x;^M + }^M return (short)((((unsigned short)x)>>8) | (((unsigned short)x)<<8)); }
(etc) i.e., remove the preprocessor branching, provide just one implementation of the relevant functions, expand the variants with the missing ones (long etc) and do a runtime branch on endianness. From initial testing, GCC completely removes the branching with -O1 and presumably higher, I expect llvm does too.
In my initial tests this was the same speed on LE but took 4 times as much memory. It might be I'm not comparing apples to oranges wrt compile flags, though -- this is "cmake build" versus whatever the debhelper wrappers choose, so I'll try to correct for that and re-test.
-- Jonathan Dowland [email protected] https://jmtd.net

