Am Samstag, 4. Juni 2005 15:04 schrieb Paolo Bonzini: > > (parallel [ > > (use (operands[0])) > > (set (operands[0]) (minus:HI (operands[1]) (operands[2])) > > (note "please delete the entire embracing parallel instruction before > > register life-time analysis by a new pass: It pretends to use operands 1 > > and 2 while in fact this instruction does nothing except from giving > > hints to GCSE.") > > ]) > > This seems define_insn_and_split, but it is a lot more complex than what > you probably can do... I already have confirmed that GCSE is smart enough to deal with such kind of expressions. It effectively ignores the (use) operand when searching for common expressions and I know that it *will* optimize away a later instruction that has the same "set" statement: I have seen that it works when experimenting with a sub-optimal divmodsi4 expand pattern that has been supplemented by such type of parallel [use) (set)] instruction.
Concerning the seeming similarity with "define_insn_and_split": The *huge* benefit of subreg lowering at expand in comparison to define_insn_and_split is that all of the power of the optimizers before reload can work on the resulting instruction sequences. E.g. IMO there is no way to implement " uint8_t a; // automatic variable in registers int8_t b; // automatic variable in registers int16_t c; // variable in static memory c = a | (b << 8); " efficiently for AVR when splitting after expand: Gcc before reload will first allocate four additional 8-Bit registers: It will store the sign-extended b in to two of them and the zero-extended a in the other two new registers. For doing this it will insert instruction sequences for calculating the sign-extension for b. It will insert instruction sequences for the zero-extension of a. It will excecute a shift instruction for the resulting 16 bit-value of the sign-extended b. Afterwards will come a 16 bit "ior" instruction of the two new 16-bit values and finally, after all this unnecessary work it will emit two QImode memory moves for the two bytes of the 16 bit-result. With appropriately use of Richard Henderson's patch, all that comes out after the optimizers before reload is "two QImode moves to memory". The optimizer passes after reload are not smart enough to identify the optimization opportunities. Also it would be too late to take profit of the four registers that have been allocated without actually needing them. > IIRC, s390 does use add with carry and subtract with borrow instructions > effectively (alc and slb in IBM360^W s390-ese). Search the archives on > google or gmane. Thank's for the hint. I'll have a look at the 360 port. > > - combine -re-run > > No way, combine is too expensive... Its simplification engine is fine, > but a lot of things ought to be redone from scratch so that it becomes a > serious instruction selection pass. I agree with you that this might not be a good choice for targets like x86 and I am not suggesting to include this option in the lists of passes run with any of the default options like -O0 and -O3. Maybe it could be included with "expensive-optimizations". Concerning compile times, one should keep in mind that a typical AVR target disposes of 8k - 64k program memory only. IMO, compile time degradation on the host machine would be readily accepted by all of the AVR users if it improves code. Build time for my entire projects amount to roughly 30 seconds, most of which used for checking dependencies, linking and report file generation. In my personal opinion: Everyone using the AVR port would be happy even if degradation would amount to a factor of 10! Yours, Björn