On 7/5/2021 5:17 AM, Richard Biener via Gcc-patches wrote:
On Fri, Jul 2, 2021 at 6:13 PM Jeff Law <j...@tachyum.com> wrote:
This is a minor missed optimization we found with our internal port.
Given this code:
typedef struct {short a; short b;} T;
extern void g1();
void f(T s)
{
if (s.a < 0)
g1();
}
"s" is passed in a register, but it's still a BLKmode object because the
alignment of T is smaller than the alignment that an integer of the same
size would have (16 bits vs 32 bits).
Because "s" is BLKmode function.c is going to store it into a stack slot
and we'll load it from the that slot for each reference. So on the v850
(just to pick a port that likely has the same behavior we're seeing) we
have this RTL from CSE2:
(insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
(const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
(reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 6 r6)
(nil)))
(note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
(mem/c:HI (plus:SI (reg/f:SI 34 .fp)
(const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 A32]))
"j.c":7:5 3 {*movhi_internal}
(nil))
(insn 9 8 10 2 (parallel [
(set (reg:SI 45)
(ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
(const_int 16 [0x10])))
(clobber (reg:CC 32 psw))
]) "j.c":7:5 94 {ashlsi3_clobber_flags}
(expr_list:REG_DEAD (reg:HI 44 [ s.a ])
(expr_list:REG_UNUSED (reg:CC 32 psw)
(nil))))
(insn 10 9 11 2 (parallel [
(set (reg:SI 43)
(ashiftrt:SI (reg:SI 45)
(const_int 16 [0x10])))
(clobber (reg:CC 32 psw))
]) "j.c":7:5 104 {ashrsi3_clobber_flags}
(expr_list:REG_DEAD (reg:SI 45)
(expr_list:REG_UNUSED (reg:CC 32 psw)
(nil))))
Insn 2 is the store into the stack. insn 8 is the load for s.a in the
conditional. DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6)
has the value we want. After that the store at insn 2 is dead. Sadly
DSE never removes the store.
The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
which keeps the MEM live. The lack of a MEM_EXPR is due to call to
change_address to twiddle the mode on the MEM for the store at insn 2.
It should be safe to copy the MEM_EXPR (which should always be a
PARM_DECL) from the original memory to the memory returned by
change_address. Doing so results in DSE1 removing the store at insn 2.
It would be nice to remove the stack setup/teardown. I'm not offhand
aware of mechanisms to remove the setup/teardown after we've already
allocated a slot, even if the slot is no longer used.
Bootstrapped and regression tested on x86, though I don't think that's a
particularly useful test. So I also ran it through my tester across
those pesky embedded targets without regressions as well.
I didn't include a test simply because I didn't want to have an insane
target selector. I guess if we really wanted a test we could look after
DSE1 is done and verify there aren't any MEMs left at all. Willing to
try that if the consensus is we want this tested.
OK for the trunk?
I wonder why the code doesn't use adjust_address instead? That
handles most cases already and the code doesn't change the
address but just the mode (and access size)?
No idea. It should be easy enough to try that approach though.
jeff