Hi Richard, Fortunately, it's (moderately) safe to mix COSTS_N_INSNS and COSTS_N_BYTES in the i386 backend. The average length of an x86_64 instruction in typical code is between 2 and 3 bytes, so the definition of N*4 for COSTS_N_INSNS(N) and N*2 for COSTS_N_BYTES(N) allows these be mixed, with no less approximation error than the more common problem, that rtx_costs usually encodes cycle counts (but converted to units of COSTS_N_INSNS).
The thing that I like about STV's use of a "gain calculator" is that it allows much more accurate fine tuning. Many passes, particularly combine, base their decisions of obtaining total cost estimates (with large approximate values) and then comparing those two totals. As any physicist will confirm, it's much better to parameterize on the observed delta between those two large approximate values. But you're also right, that I need to run CSiBE with -m32 and get back to you with the results. Roger -- -----Original Message----- From: Richard Biener <richard.guent...@gmail.com> Sent: 20 August 2021 08:29 To: Roger Sayle <ro...@nextmovesoftware.com> Cc: GCC Patches <gcc-patches@gcc.gnu.org> Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass. On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Doh! ENOPATCH. > > -----Original Message----- > From: Roger Sayle <ro...@nextmovesoftware.com> > Sent: 19 August 2021 16:59 > To: 'GCC Patches' <gcc-patches@gcc.gnu.org> > Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass. > > > Back in June I briefly mentioned in one of my gcc-patches posts that a > change that should have always reduced code size, would mysteriously > occasionally result in slightly larger code (according to CSiBE): > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html > > Investigating further, the cause turns out to be that x86_64's > scalar-to-vector (stv) pass is relying on poor estimates of the size > costs/benefits. This patch tweaks the backend's compute_convert_gain > method to provide slightly more accurate values when compiling with -Os. > Compilation without -Os is (should be) unaffected. And for > completeness, I'll mention that the stv pass is a net win for code > size so it's much better to improve its heuristics than simply gate > the pass on !optimize_for_size. > > The net effect of this change is to save 1399 bytes on the CSiBE code > size benchmark when compiling with -Os. > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. > > Ok for mainline? + /* xor (2 bytes) vs. xorps (3 bytes). */ + if (src == const0_rtx) + igain -= COSTS_N_BYTES (1); + /* movdi_internal vs. movv2di_internal. */ + /* => mov (5 bytes) vs. movaps (7 bytes). */ + else if (x86_64_immediate_operand (src, SImode)) + igain -= COSTS_N_BYTES (2); doesn't it need two GPR xor for 32bit DImode and two mov? Thus the non-SSE cost should be times 'm'? For const0_rtx we may eventually re-use the zero reg for the high part so that is eventually correct. Also I'm missing a 'else' - in the default case there's no cost/benefit of using SSE vs. GPR regs? For SSE it would be a constant pool load. I also wonder, since I now see COSTS_N_BYTES for the first time (heh), whether with -Os we'd need to replace all COSTS_N_INSNS (1) scaling with COSTS_N_BYTES scaling? OTOH costs_add_n_insns uses COSTS_N_INSNS for the size part as well. That said, it looks like we're eventually mixing apples and oranges now or even previously? Thanks, Richard. > > > 2021-08-19 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-features.c (compute_convert_gain): Provide > more accurate values for CONST_INT, when optimizing for size. > * config/i386/i386.c (COSTS_N_BYTES): Move definition from here... > * config/i386/i386.h (COSTS_N_BYTES): to here. > > Roger > -- >