Am 10.06.24 um 18:35 schrieb Paul Koning:
On Jun 10, 2024, at 11:48 AM, Georg-Johann Lay <a...@gjlay.de> wrote:
Am 08.06.24 um 11:32 schrieb Mikael Pettersson via Gcc:
On Thu, Jun 6, 2024 at 8:59 PM Dimitar Dimitrov <dimi...@dinux.eu> wrote:
Have you tried defining TARGET_LEGITIMIZE_ADDRESS for your target? From
a quick search I see that the iq2000 and rx backends are rewriting some
PLUS expression addresses with insn sequence to calculate the address.
I have partial success.
The key was to define both TARGET_LEGITIMATE_ADDRESS_P and an
addptr<Pmode>3 insn.
I had tried TARGET_LEGITIMATE_ADDRESS_P before, together with various
combinations of TARGET_LEGITIMIZE_ADDRESS and
LEGITIMIZE_RELOAD_ADDRESS, but they all threw gcc into reload loops.
My add<Pmode>3 insn clobbers the CC register. The docs say to define
addptr<Pmode>3 in this case, and that eliminated the reload loops.
The issue now is that the machine cannot perform an add without
clobbering the CC register, so I'll have to hide that somehow. When
emitting the asm code, can one check if the CC register is LIVE-OUT
from the insn? If it isn't I shouldn't have to generate code to
preserve it.
/Mikael
There is a different approach like liken by AVR (and maybe some
more targets):
Don't introduce CC until after reload, i.e. keep cbranch insns
and only split them to compare + branch after reload in the
first post reload split pass.
It's some effort because the number of insns is effectively
doubled: One pre-reload version of the insn without CC,
and a postreload version with CC. On AVR, most insns don't
set CCmode is a usable way, so that works, though not as
well like the killed cc0 representation.
Yes, PDP11 does this also. And it uses define_subst to create two post-reload
flavors, one that clobbers CC, one that sets it. (The CC set from most
instructions is pretty useable in the sense that it's generally what's needed
for a compare against zero.)
Then I am not sure whether TARGET_LEGITIMIZE_ADDRESS works in
all situations, in particular when it comes to accessing frame
locations. It might be required to fake-supply reg+offset
addressing, and then split that post-reload.
An example is the Reduced Tiny cores (-mavrtiny) of the AVR
port that only support POST_INC or byte addressing. Splitting
locations after reload improved code quality a lot.
Did you find a good way to handle POST_INC or similar modes in LRA? PDP11
would like to use that (and PRE_DEC) but it seems that LRA is even less willing
than recent versions of old reload to generate such modes.
paul
avr still uses reload, and even there POST_INC-only and REG-only
addressing isn't well supported. I'd guess that LRA isn't going
to be an improvement in that regard.
Even when you expand to always POST_INC, the rtl passes will
ignore that and load the address to a new register and to
arithmetic there instead of using the post-inc address.
The problem with X-only addressing is that after regalloc, you
end up with code that does / must do fake addressing modes like
;; insn1: reg1 = FP[off1]
FP += off1 * size1
reg1 = *FP++
FP -= (off1+1) * size1
;; insn2: reg2 = FP[off2]
FP += off2 * size2
reg2 = *FP++
FP -= (off2+1) * size2
You cannot change the frame-pointer like that in regalloc
or LEGITIMATE_ADDRESS etc, and when you post-reload split
into real instructions in order to combine
FP -= (off1+1) * size1
FP += off2 * size2
into one PLUS insn, then you are running into bugs like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208
where DSE deletes a store that is NOT dead.
For POST_INC optimization in general, there was an announcement
quite some time ago that someone would have a go at
improving pre- / post-modify situations, but I don't know
whether that was a dead end somewhere in the loop passes...
The old lreg/greg allocator had some astonishing tricks up
its sleeves, but that days are over since SSA.
Johann