On Thu, Dec 23, 2021 at 3:42 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Mon, Dec 20, 2021 at 2:22 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Mon, Dec 20, 2021 at 12:38 PM Jakub Jelinek <ja...@redhat.com> wrote: > > > > > > On Mon, Dec 20, 2021 at 11:44:08AM -0800, H.J. Lu wrote: > > > > The problem is in > > > > > > > > (define_memory_constraint "TARGET_MEM_CONSTRAINT" > > > > "Matches any valid memory." > > > > (and (match_code "mem") > > > > (match_test "memory_address_addr_space_p (GET_MODE (op), XEXP > > > > (op, 0), > > > > MEM_ADDR_SPACE > > > > (op))"))) > > > > > > > > define_register_constraint allows LRA to convert the operand to the form > > > > '(mem (reg X))', where X is a base register. I am testing the v2 patch > > > > with > > > > > > If you mean replacing an immediate with a MEM containing that immediate, > > > isn't that often the right thing though? > > > I mean, if the register pressure is high and options are either spill some > > > register, set it to immediate, use it in one instruction and then fill the > > > spilled register (i.e. 2 memory loads), compared to a MEM use on the > > > arithmetic instruction one MEM seems cheaper to me. With -fPIC and the > > > cst needing runtime relocation slightly less so of course. > > > > > > > We will check the performance impact on SPEC CPU 2017. > > Here is the v2 patch. Liwei, can you help collect SPEC CPU 2017 > > impact of the enclosed patch? Thanks. > > We checked SPEC CPU 2017 performance with -O2 and -Ofast. > There is no performance regression. OK for master?
OK if there are no further comments from Jakub. Thanks, Uros. > > > The code due to ivopts is trying to have something like > > > size_t a = (size_t) &tunable_list; > > > size_t b = 0xffffffffffffffa8 - a; > > > size_t c = x + b; > > > and for that cst - &symbol one needs actually 2 registers, one to hold the > > > constant and one to hold the (%rip) based address. > > > (insn 790 789 791 111 (set (reg:DI 292) > > > (const_int -88 [0xffffffffffffffa8])) "dl-tunables.c":304:15 76 > > > {*movdi_internal} > > > (nil)) > > > (insn 791 790 792 111 (set (reg:DI 293) > > > (symbol_ref:DI ("tunable_list") [flags 0x2] <var_decl > > > 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal} > > > (nil)) > > > (insn 792 791 793 111 (parallel [ > > > (set (reg:DI 291) > > > (minus:DI (reg:DI 292) > > > (reg:DI 293))) > > > (clobber (reg:CC 17 flags)) > > > ]) "dl-tunables.c":304:15 299 {*subdi_1} > > > (nil)) > > > (insn 793 792 794 111 (parallel [ > > > (set (reg:DI 294) > > > (plus:DI (reg:DI 291) > > > (reg:DI 198 [ ivtmp.176 ]))) > > > (clobber (reg:CC 17 flags)) > > > ]) "dl-tunables.c":304:15 226 {*adddi_1} > > > (nil)) > > > It would be smarter to rewrite the above into a lea > > > 88+tunable_list(%rip), %temp1 > > > and use a subtraction instead of addition in the last insn above, or of > > > course in the particular case even consider the following 2 instructions > > > that do: > > > (insn 794 793 795 111 (set (reg:DI 296) > > > (symbol_ref:DI ("tunable_list") [flags 0x2] <var_decl > > > 0x7f3460aa9cf0 tunable_list>)) "dl-tunables.c":304:15 76 {*movdi_internal} > > > (nil)) > > > (insn 795 794 796 111 (parallel [ > > > (set (reg:DI 295 [ cur ]) > > > (plus:DI (reg:DI 294) > > > (reg:DI 296))) > > > (clobber (reg:CC 17 flags)) > > > ]) "dl-tunables.c":304:15 226 {*adddi_1} > > > (nil)) > > > and find out that &tuneble_list - &tuneble_list is 0 and we don't need it > > > at > > > all. Guess we don't figure that out due to the cast of one of those > > > addresses to size_t and the other one used in POINTER_PLUS_EXPR as normal > > > pointer. > > > > > > Jakub > > > > > > > > > -- > > H.J. > > Thanks. > > -- > H.J.