Hi there,
The assembler instructions themselves don't allow the target to be the same as the source unfortunately so removing the '&' is difficult. (If I enforce the same thing without a '&' in inline asm using builtins and building the expression manually to generate a new reg rtx if the dest/source are the same do you think it will optimize better?)
However, I don't see why it isn't eliminating the move that is generated when it realises that the temporary source is discarded. It seems to do this ok if it is just a define_insn with raw multi-line assembly, but I can't use multi-line assembly or it destroys optimizations that occur if sub-register access is performed, ie. if I overwrite the second v4sf in a v16sf type, gcc nicely gets rid of the move of that particular sub-register when it copies the entire v16sf around - something I was quite impressed by.
Regards
Dylan
"James E Wilson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
Dylan Cuthbert wrote:asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j" (vec2), "j" (frog) ); asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );
It is the goal of the register allocator to use as few registers as possible, which means that we will try to use the same register for input and output here. Until we get to reload, where we see the early clobber (&), and then are forced to add a copy so that the instruction has separate input and output registers.
Early clobbers are bad. Don't ever use them unless you have to. Just because the instruction operates on pieces of the input does not mean & is necessary. You only add the & if the input and output operands must be in different non-overlapping registers.
This is just a guess. Try compiling with -da and looking at the register assignments in the .lreg and .greg files, and also at what reload did. It is possible that there could be something else going on. -- Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com