This patch series fixes a number of issues in rs6000_rtx_costs, the aim being to provide costing somewhat closer to reality. Probably the most important patch of the series is patch 4, which just adds a comment. Without the analysis that went into that comment, I found myself making what seemed to be good changes but which introduced regressions.
So far these changes have not introduced any testsuite regressions on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on powerpc64le-linux. Pat spec tested on power9 against a baseline master from a few months ago, seeing a few small improvements and no degradations above the noise. Some notes: Examination of varasm.o shows quite a number of cases where if-conversion succeeds due to different seq_cost. One example: extern int foo (); int default_assemble_integer (unsigned size) { extern unsigned long rs6000_isa_flags; if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8)) return 0; return foo (); } This rather horrible code turns the rs6000_isa_flags value into either 4 or 8: rldicr 9,9,28,0 srdi 9,9,28 addic 9,9,-1 subfe 9,9,9 rldicr 9,9,0,61 addi 9,9,8 Better would be rldicl 9,9,29,63 sldi 9,9,2 addi 9,9,4 There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60", and "li r31,0x4000; rotldi r31,r31,17" vs. "lis r31,0x8000; clrldi r31,r31,32". Neither of these is a real change. I saw one occurrence of a 5 insn sequence being replaced with a load from memory in default_function_rodata_section, for ".rodata", and others elsewhere. Sometimes correct insn cost leads to unexpected results. For example: extern unsigned bar (void); unsigned f1 (unsigned a) { if ((a & 0x01000200) == 0x01000200) return bar (); return 0; } emits for a & 0x01000200 (set (reg) (and (reg) (const_int 0x01000200))) at expand time (two rlwinm insns) rather than the older (set (reg) (const_int 0x01000200)) (set (reg) (and (reg) (reg))) which is three insns. However, since 0x01000200 is needed later the older code after optimisation is smaller.