On Wed, Jun 17, 2026 at 10:21 AM Ciprian Arbone <[email protected]> wrote:
>
> On Tue, 16 Jun 2026 at 16:48, Richard Biener <[email protected]> 
> wrote:
> >
> > On Fri, Jun 12, 2026 at 3:31 PM Ciprian Arbone via Gcc <[email protected]> 
> > wrote:
> > >
> > > Hello,
> > >
> > > We recently enabled LDMIA/STMIA instructions for Thumb-1 (Cortex-M0+) by
> > > modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow auto-increment addressing
> > > for THUMB1 targets. However, we've discovered that IVOPTs generates
> > > suboptimal code for simple loops due to incorrect addressing mode 
> > > selection.
> > >
> > > Consider this test case:
> > >
> > > void test(int *a, int *b, int size)
> > > {
> > >     for (int i = 0; i < size; i++)
> > >     {
> > >         a[i] = b[i] * a[i];
> > >     }
> > > }
> > >
> > > GCC currently generates:
> > >
> > >     ldmia   r0!, {r4}
> > >     ldmia   r1!, {r6}
> > >     subs    r5, r0, #4
> > >     ...
> > >     str     r4, [r5, #0]
> > >
> > > The issue occurs because IVOPTs selects a candidate with the lowest cost 
> > > that
> > > has the following structure:
> > >
> > > Candidate xxx:
> > >   Incr POS: after use 0
> > >   IV struct:
> > >     Type:       unsigned int
> > >     Base:       (unsigned int) a_13(D)
> > >     Step:       4
> > >
> > > This results in the following loop structure:
> > >
> > > loop-preheader:
> > >     r0 = a
> > >     jump loop-exiting
> > >
> > > loop-header:
> > >     load-from  [r0]
> > >     increment  r0
> > >     store-to   [r0, #-4]
> > >
> > > loop-exiting:
> > >     jump loop-header
> > >
> > > **Issue 1:** IVOPTs recognizes both patterns as valid post-increment with
> > > offset zero:
> > >   - "load-from [r0]; increment r0" → recognized as post-inc from offset 0
> > >   - "increment r0; store-to [r0, #-4]" → also recognized as post-inc from
> > >     offset 0
> > >
> > > The code in tree-ssa-loop-ivopts.cc:get_address_cost() applies the 
> > > adjustment:
> > >
> > >     if (stmt_after_increment (data->current_loop, cand, use->stmt))
> > >         ainc_offset += ainc_step;
> > >     cost = get_address_cost_ainc (ainc_step, ainc_offset,
> > >                                   addr_mode, mem_mode, as, speed);
> > >
> > > However, Thumb-1 does not support negative immediate offsets in addressing
> > > modes. The pattern "increment r0; store-to [r0, #-4]" can never be 
> > > realized
> > > as a post-increment store on Thumb-1, yet IVOPTs assigns it a low cost.
> > >
> > > **Question 1:** Should get_address_cost() verify that an addressing mode 
> > > is
> > > actually valid on the target before assigning auto-increment cost? 
> > > Currently,
> > > it appears to assume validity without checking target constraints.
> >
> > It should end up calling the legitimize_address_p target hook to verify
> > validity.
>
> The legitimize_address_p target hook is invoked, but relatively
> late—during the GIMPLE-to-RTL conversion.
> At that point, after ivopts, the GIMPLE already contains code that may
> not be optimal.

IVOPTs checks valid_mem_ref_p which calls memory_address_addr_space_p
with artificially generated RTL and that calls legitimate_address_p.

Richard.

>
> _24 = (void *) ivtmp.15_20;
> _6 = MEM[(int *)_24];
> ivtmp.15_21 = ivtmp.15_20 + 4;
>
> . . .
>
> _25 = (void *) ivtmp.15_21;
> MEM[(int *)_25 + 4294967292 * 1] = _7; <------
>
> Would it make sense to check whether the addressing mode is valid at this 
> point
> (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4732),
> similar to the validation performed later
> (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4753)?
>
> >
> > > **Issue 2:** IVOPTs also assigns low cost to another candidate:
> > >
> > > Candidate yyy:
> > >   Incr POS: before exit test
> > >   IV struct:
> > >     Type:       unsigned int
> > >     Base:       (unsigned int) a_13(D)
> > >     Step:       4
> > >
> > > This produces:
> > >
> > > loop-preheader:
> > >     r0 = &a[0]
> > >     jump loop-exiting
> > >
> > > loop-header:
> > >     load-from  [r0, #-4]
> > >     store-to   [r0]
> > >
> > > loop-exiting:
> > >     increment  r0
> > >     jump loop-header
> > >
> > > IVOPTs considers that the increment in the loop-exiting block can be 
> > > paired
> > > with "load-from [r0, #-4]" in the loop-header block, despite them being in
> > > different basic blocks.
> > >
> > > **Question 2:** Should get_address_cost() verify that the candidate 
> > > increment
> > > and use->stmt are in the same basic block when cand->pos == IP_NORMAL?
> > > Cross-block pairing seems problematic for post-increment addressing mode
> > > costing.
> >
> > If the pairing is wrong it should be fixed, where's that pairing done?
> >
>
> The pairing takes place at this point
> (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4739),
> where the candidate (the `increment r0` statement from the
> loop‑exiting basic block) is combined with the use
> (the `load-from [r0, #-4]` statement from the loop‑header basic
> block), resulting in can_autoinc being set to true.
>
> loop-header:
>     load-from [r0, #-4]
>     store-to [r0]
>
> loop-exiting:
>     increment r0
>     bcc loop-header
>
> > >
> > > Both issues suggest that IVOPTs may need additional validation to ensure:
> > > 1. The selected addressing mode is actually supported by the target
> > > 2. The increment and memory operation are properly co-located for 
> > > IP_NORMAL
> > >    candidates
> > >
> > > Best regards,
> > > Ciprian Arbone

Reply via email to