On Tue, Dec 8, 2015 at 9:13 PM, Bernd Edlinger <bernd.edlin...@hotmail.de> wrote: > Hi, > > Am 08.12.2015 um 19:23 schrieb Uros Bizjak: >> On Tue, Dec 8, 2015 at 7:10 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >>> Hello! >>> >>>> this patch fixes the asm statements in the gcc.target/i386/sse4_1-round* >>>> test cases. >>>> >>>> They do lots of things that are just absolutely forbidden, like clobber >>>> registers >>>> that are not mentioned in the clobber list, and create a hidden data flow. >>>> >>>> The test cases work just by chance, and You can see the asm statements >>>> ripped completely apart by the loop optimizer if you try to do the >>>> assembler >>>> part in a loop: >>>> >>>> for (i = 0; i < 10; i++) { >>>> __asm__ ("fld" ASM_SUFFIX " %0" : : "m" (*&f)); >>>> >>>> __asm__ ("fstcw %0" : "=m" (*&saved_cw)); >>>> new_cw = saved_cw & clr_mask; >>>> new_cw |= type; >>>> __asm__ ("fldcw %0" : : "m" (*&new_cw)); >>>> >>>> __asm__ ("frndint\n" >>>> "fstp" ASM_SUFFIX " %0\n" : "=m" (*&ret)); >>>> __asm__ ("fldcw %0" : : "m" (*&saved_cw)); >>>> } >>>> return ret; >>>> >>>> So this patch avoids the hidden data flow, and >>>> adds "st" to the clobber list. >>>> >>>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu >>>> OK for trunk? >>> Uh, no. >>> >>> x87 is a strange beast, and even your patch is wrong. The assembly >>> should be written in this way: >>> >>> __asm__ ("fnstcw %0" : "=m" (saved_cw)); >>> >>> new_cw = saved_cw & clr_mask; >>> new_cw |= type; >>> >>> __asm__ ("fldcw %2\n\t" >>> "frndint\n\t" >>> "fldcw %3" : "=t" (ret) >>> : "0" (f), "m" (new_cw), "m" (saved_cw)); >>> >>> I'm testing the attached patch. > > Thanks. > > That is certainly a good idea to use the "=t"(ret) : "0" (f) ! > > But could you explain, just for my curiosity, why my approach > was wrong? It may have to do with the semantic of "st" clobber, right? > I thought that allows the inline-assembler to push one value on the > fpu-stack, > and it is responsible for removing that value again.
Not "wrong" in the sense that it won't work, but it is not written in the optimal way. The st(0) register is not clobbered, it is loaded with the input argument, and the value is returned there. So, when asm input arguments are described in the right way, there is no need to explicitly move values to the stack, the compiler will do it for you. Saying that, I see we don't need to define ASM_SUFFIX anymore. I'll prepare the patch that removes these #defines. Uros.