https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69538
--- Comment #6 from avieira at gcc dot gnu.org --- I had a look at this and after some digging I found the bug is not due to LTO, but rather with "local" functions. If you make bar static you will end up with the same faulty behavior. After some more digging I found myself going through the 'untyped_call' code in arm.md. Here I found both 'untyped_call' and 'untyped_return' had not been adjusted to be able to cope with HardFP ABI's. I wrote a patch to mend this, needs a bit more work, but I think it's correct and I might put it on gcc-patches at a later time. However, when I started thinking of how I was going to "fix" this wrong-code generation, I realized that due to the way untyped_call's and untyped_return's are constructed and the nature of '__builtin_return' and '__builtin_apply', you do not know which registers are actually used to return the values, you only know it might be 'r0-r4' and 'd0-d7'. So even though I know the call-site would expect a return of type 'double' in 'r0-r1', because this is local function (aka 'ARM_PCS_AAPCS_LOCAL') and the target does not support double precision, there is no way for me to know in which of the registers the function is actually returning, so I dont know what registers to move to 'r0-r1'. So .... I don't think we can get this builtin to work for single precision VFPs, without compromising on the way we use local function returns.