On Fri, Oct 11, 2019 at 5:17 PM Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > > Hi Ramana, > > >On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > >> > >> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting > >> in > >> inefficient code. Since Thumb-2 supports similar address offsets use the > >> Arm > >> legitimization code for Thumb-2 to get significant codesize and performance > >> gains. SPECINT2006 shows 0.4% gain on Cortex-A57, while SPECFP improves > >> 0.2%. > > > > What were the sort of code size gains ? It did end up piquing my > > curiosity as to how we missed something so basic. For instance ldr > > h264ref is 1% smaller, various other benchmarks a few hundred bytes to 1KB > smaller (~0.1%). > > > r0, [r0, #-4080] is valid in Arm state but not in Thumb2. Thus if > > there was an illegitimate address given here, would we end up > > producing plus (r0, -4080) ? Yeah a simple testcase doesn't work out. > > Scratching my head a bit , it's late at night. > > If the proposed offsets are not legal, GCC tries something different that is > legal, hence there is no requirement to only propose legal splits. In any > case we don't optimize the negative range even on Arm - the offsets are > always split into 4KB ranges (not 8KB as would be possible). > > The negative offset splitting doesn't appear to have changed on Thumb-2 > so there is apparently a different mechanism that deals with negative offsets. > So only positive offsets are improved with my patch: > > int f(int *p) { return p[1025] + p[1026]; } > > before: > movw r3, #4104 > movw r2, #4100 > ldr r2, [r0, r2] > ldr r0, [r0, r3] > add r0, r0, r2 > bx lr > > after: > add r3, r0, #4096 > ldrd r0, r3, [r3, #4] > add r0, r0, r3 > bx lr > > > Orthogonally it looks like you can clean up the MINUS handling here > > and in legitimate_address_p , I'm not sure what the status of LRA with > > MINUS is either and thus we should now look to clean this up or look > > to turn this on and see what happens. However that's a subject of a > > future patch. > > Well there are lots of cases that aren't handled correctly or optimally > yet - I'd copy the way I wrote aarch64_legitimize_address_displacement, > but that's for a future patch indeed. > > > For the record, bootstrap with Thumb2 presumably and the testruns were > > clean ? > > Yes, at the time I ran them.
Yeah it dropped about 2660 bytes out of a Thumb2 bootstrap - so it seems worth while. Ok please apply but as usual watch out for any fallout from this. regards Ramana > > Cheers, > Wilco