I'm thinking of applying a patch like this

--- i/zdbsp.h
+++ w/zdbsp.h
@@ -237,23 +237,47 @@ inline fixed_t DMulScale32 (fixed_t a, fixed_t b, fixed_t 
c, fixed_t d)
 #define LittleShort(x)         CFSwapInt16LittleToHost(x)
 #define LittleLong(x)          CFSwapInt32LittleToHost(x)
 #else
-#ifdef __BIG_ENDIAN__
+^M
+inline char is_little_endian()^M
+{^M
+  const unsigned short one = 1;^M
+  return *(char *)(&one) == 1;^M
+}^M
// Swap 16bit, that is, MSB and LSB byte. // No masking with 0xFF should be necessary. inline short LittleShort (short x)
 {
+       if(is_little_endian()) {^M
+               return x;^M
+       }^M
        return (short)((((unsigned short)x)>>8) | (((unsigned short)x)<<8));
 }

(etc) i.e., remove the preprocessor branching, provide just one implementation of the relevant functions, expand the variants with the missing ones (long etc) and do a runtime branch on endianness. From initial testing, GCC completely removes the branching with -O1 and presumably higher, I expect llvm does too.

In my initial tests this was the same speed on LE but took 4 times as much memory. It might be I'm not comparing apples to oranges wrt compile flags, though -- this is "cmake build" versus whatever the debhelper wrappers choose, so I'll try to correct for that and re-test.



--

Jonathan Dowland
[email protected]
https://jmtd.net

Reply via email to