http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46128
--- Comment #2 from Siarhei Siamashka <siarhei.siamashka at gmail dot com> 2010-10-25 14:43:52 UTC --- (In reply to comment #1) > Note that there may be problems clobbering D registers. See bug 43440. I > don't think Richard Earnshaw's patch > <http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00978.html> ever got > reviewed or pinged - it probably needs pinging. (In general, unreviewed > patches are best pinged about weekly.) Yes, that's a very well known bug. But there should be no problems with D registers, only Q registers are affected. They say codesourcery already has it fixed (so I assume the patch has been at least reviewed): http://www.beagleboard.org/irclogs/index.php?date=2010-06-27 # [11:19:58] <ssvb> "raster: check gcc bugzilla - http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440" # [11:19:59] <mru> "the gcc "design" makes it very hard to do this conversion" # [11:20:01] <raster> "unliek a cpu - gcc can be fixed and updated easily :)" # [11:20:14] <raster> "mru: seems codesourcery managed it" > > More generally, it would be beneficial to be able to optimize routines using > > specific VFPv3 instructions (such as VMOV's immediate-operand form), or to > > make > > use of VFPv4's fused-mulitply-accumulate instructions. > > For fused multiply-add, the best approach is to describe them in the ARM > .md files using the new fma: RTL facility, so that calls to fma / fmaf / > __builtin_fma / __builtin_fmaf use the instructions automatically as on > other targets whose .md files have been updated like this. But still there are cases when performance is actually important and builtins/intrinsics are ruled out because of this. Inline assembly is convenient because it can be added directly to C sources, without any need to tweak makefiles or build scripts. This makes inline assembly a good choice for small non-intrusive performance patches. Another inconvenience is that in order to check whether for example ARMv6 instructions are supported, one has to use constructs like this (identifiers fished out from gcc sources): #if defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || \ defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || \ defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) || \ defined(__ARM_ARCH_6M__) || defined(__ARM_ARCH_7__) || \ defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || \ defined(__ARM_ARCH_7M__) [...] #endif And this is not very maintainable because future gcc versions may introduce more predefined symbols for newer arm architecture variants. It would be much nicer if it was possible to just do something like: #if defined(__arm__) && (__ARM_ARCH__ >= 6) [...] #endif It's basically the same problem as VFP variant identification.