On 09/12/2017 02:32 AM, Jürg Billeter wrote:
To support applications that assume big-endian memory layout on little- endian systems, I'm considering adding support for reversing the storage order to GCC. In contrast to the existing scalar storage order support for structs, the goal is to reverse the storage order for all memory operations to achieve maximum compatibility with the behavior on big-endian systems, as far as observable by the application.
Intel has support for this in icc. It took about 5 years for a small team to make it work on a very large application. That includes both the compiler development and application development time. There are a lot of complicated issues that need to solved to make this work on real code, both in the compiler and in the application code. There is a Dr Dobbs article about some of it, search for "Writing a Bi-Endian Compiler" if you are interested.
Even though they got it working, it was painful to use. Icc goes to a lot of trouble to optimize away unnecessary byte-swapping to improve performance, but that meant any variable could be big or little endian despite how it was declared, and could be different endianness at different places in the code, and could even be both endianness (stored in two locations) at the same time if the code needed both endianness. Sometimes we'd find a bug, and it would take a week to figure out if it was a compiler bug or an application bug.
To facilitate byte swapping at endian boundaries (kernel or libraries), I'm also considering developing a new GCC builtin that can byte-swap whole structs in memory. There are limitations to this, e.g., unions could not be supported in general. However, I still expect this to be very useful.
There is a lot more stuff that will cause problems. Byte-swapping FP doesn't make sense. You can only byte swap a variable if you know its type, but you don't know the type of a va_list ap argument, so you can't call a big-endian vprintf from little-endian code and vice versa. If you have a template expanded in both big and little endian code, you will run into problems unless name mangling changes to include endian info, which means you lose ABI compatibility with the current name mangling scheme.
There will also be trouble with variables in shared libraries that get initialized by the dynamic linker. You will either have to add a new set of other-endian relocations, or else you will have to add code to byte-swap data after relocations are performed, probably via an init routine, which will have to run before the other init routines. There is also the same issue with static linking, but that one is a little easier to handle, as you can use a post-linking pass to edit the binary and byte swap stuff that needs to be byte swapped after relocations are performed.
To handle endian boundaries, you willl need to force all declarations to have an endianness, and you will need to convert when calling a big-endian function from a little-endian function, and vice versa, and you will need to give an error if you see something you can't convert, like a va_list argument. Besides the issue of the C library not changing endianness, you will likely also have third party libraries that you can't change the endianness of, and that need to be linked into your application.
Before you start, you should give some thought to how debugging will work. DWARF does have an endianity attribute, you will need to set it correctly, or debugging will be hopeless. Even if you set it correctly, if you have optimizations to remove unnecessary byte swapping, debugging optimized code will still be hard and people using the compiler will have to be trained on how to deal with endianness issues.
And there are lots of other problems, I don't have time to document them all, or even remember them all. Personally, I think you are better off trying to fix the application to make it more portable. Fixing the compiler is not a magic solution to the problem that is any easier than fixing the application.
Jim