The following patch implements general spilling one class pseudos into another class hard registers *instead of memory* in LRA.
Currently, the patch implements spilling of general reg pseudos into SSE regs for Intel Core architecture as it is recommended by Intel optimization guide. Such optimization improves performance and size of the generated code with LRA. The size is improved because movd insn (moving general regs to/from SSE regs) has smaller size that x86 load/store from stack with address offset bigger than 128). There is also a steady improvement in code performance with usage of such optimization for Intel core processors. The optimization worsens code performance for AMD processors (Phenom and Bulldozer) because usage of movd insn is less profitable than st/ld and it is obvious why X86_TUNE_INTER_UNIT_MOVES is off for such processors. The optimization worsens code performance for Intel Atom although one could think the opposite as X86_TUNE_INTER_UNIT_MOVES is on for this processor. Interesting enough that switching X86_TUNE_INTER_UNIT_MOVES off for Atom practically does not change the code performance whithout the optimization. The optimization might be useful for some other processors which have direct move insns for the two considered classes and when IRA for some reasons did not use the class union. At least I see that we could try this for ARM (spilling general regs into VF regs) and for extended powerpc architecture (spilling general regs into fp regs). What is only necessary is just to define two macros. I am going to do it for ARM and see is this optimization beneficial for OMAP4. Although I think it is not as fp units with VF regs in ARM implementations I know are too separate from integer units. The patch was successfully bootstrapped on x86/x86-64 with additional options -mtune=corei7 -march=corei7. Committed as rev. 185884. 2012-03-27 Vladimir Makarov <vmaka...@redhat.com> * common.opt (flra-reg-spill): New option. * doc/tm.texi (TARGET_SPILL_CLASS, TARGET_SPILL_CLASS_MODE): New hooks. * target.def (spill_class, spill_class_mode): New hooks. * target.h: Include tm.h. * lra-int.h (lra_reg_spill_p): New external. * lra.c (lra_reg_spill_p): New global var. (setup_reg_spill_flag): New function. (lra): Call setup_reg_spill_flag. Use lra_reg_spill_p as an argument for lra_create_live_ranges before spill sub-pass. * lra-spills.c: Include ira.h. (spill_hard_reg): New array. (struct slot): Add new memebr hard_regno. (assign_slot): Rename to assign_mem_slot. (assign_spill_hard_regs): New function. (add_pseudo_to_slot): Ditto. (assign_stack_slot_num_and_sort_pseudos): Rewrite using add_pseudo_to_slot. (remove_pseudos): Use spill_hard_reg. (lra_spill): Allocate, initialize, and free spill_hard_reg. Sort pseudo_regnos and call assign_spill_hard_regs. * lra-assign.c (assign_hard_regno): Use the biggest mode instead of the pseudo mode. * Makefile.in (lra-spills.c): Add dependence on ira.h. * config/i386/i386.h (enum ix86_tune_indices): Add X86_TUNE_GENERAL_REGS_SSE_SPILL. (TARGET_GENERAL_REGS_SSE_SPILL): New macro. * config/i386/i386.c (initial_ix86_tune_features): Add entry for X86_TUNE_GENERAL_REGS_SSE_SPILL. (ix86_spill_class): New function. (ix86_spill_class_mode): Ditto. (TARGET_SPILL_CLASS, TARGET_SPILL_CLASS_MODE): Define macros.