Am Samstag, 4. Juni 2005 15:04 schrieb Paolo Bonzini:
> > (parallel [
> >  (use (operands[0]))
> >  (set (operands[0]) (minus:HI (operands[1]) (operands[2]))
> >  (note "please delete the entire embracing parallel instruction before
> > register life-time analysis by a new pass: It pretends to use operands 1
> > and 2 while in fact this instruction does nothing except from giving
> > hints to GCSE.")
> > ])
>
> This seems define_insn_and_split, but it is a lot more complex than what
> you probably can do...
I already have confirmed that GCSE is smart enough to deal with such kind of 
expressions. It effectively ignores the (use) operand when searching for 
common expressions and I know that it *will* optimize away a later 
instruction that has the same "set" statement: I have seen that it works when 
experimenting with a sub-optimal divmodsi4 expand pattern that has been 
supplemented by such type of parallel [use) (set)] instruction. 

Concerning the seeming similarity with "define_insn_and_split": The *huge* 
benefit of subreg lowering at expand in comparison to define_insn_and_split 
is that all of the power of the optimizers before reload can work on the 
resulting instruction sequences. E.g. IMO there is no way to implement

"
uint8_t a; // automatic variable in registers
int8_t b; // automatic variable in registers
int16_t c; // variable in static memory

c = a | (b << 8);
" 

efficiently for AVR when splitting after expand: Gcc before reload will first 
allocate four additional 8-Bit registers: It will store the sign-extended b 
in to two of them and the zero-extended a in the other two new registers. For 
doing this it will insert instruction sequences for calculating the 
sign-extension for b. It will insert instruction sequences for the 
zero-extension of a. It will excecute a shift instruction for the resulting 
16 bit-value of the sign-extended b. Afterwards will come a 16 bit "ior" 
instruction of the two new 16-bit values and finally, after all this 
unnecessary work it will emit two QImode memory moves for the two bytes of 
the 16 bit-result.
With appropriately use of Richard Henderson's patch, all that comes out after 
the optimizers before reload is "two QImode moves to memory".
The optimizer passes after reload are not smart enough to identify the 
optimization opportunities. Also it would be too late to take profit of the 
four registers that have been allocated without actually needing them.

> IIRC, s390 does use add with carry and subtract with borrow instructions
> effectively (alc and slb in IBM360^W s390-ese).  Search the archives on
> google or gmane.
Thank's for the hint. I'll have a look at the 360 port.

> > - combine -re-run
>
> No way, combine is too expensive...  Its simplification engine is fine,
> but a lot of things ought to be redone from scratch so that it becomes a
> serious instruction selection pass.

I agree with you that this might not be a good choice for targets like x86 and 
I am not suggesting to include this option in the lists of passes run with 
any of the default options like -O0 and -O3. Maybe it could be included with 
"expensive-optimizations". 

Concerning compile times, one should keep in mind that a typical AVR target 
disposes of 8k - 64k program memory only. IMO, compile time degradation on 
the host machine would be readily accepted by all of the AVR users if it 
improves code. Build time for my entire projects amount to roughly 30 
seconds, most of which used for checking dependencies, linking and report 
file generation. In my personal opinion: Everyone using the AVR port would be 
happy even if degradation would amount to a factor of 10! 

Yours,

Björn

Reply via email to