On Fri, Jun 12, 2026 at 3:31 PM Ciprian Arbone via Gcc <[email protected]> wrote:
>
> Hello,
>
> We recently enabled LDMIA/STMIA instructions for Thumb-1 (Cortex-M0+) by
> modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow auto-increment addressing
> for THUMB1 targets. However, we've discovered that IVOPTs generates
> suboptimal code for simple loops due to incorrect addressing mode selection.
>
> Consider this test case:
>
> void test(int *a, int *b, int size)
> {
>     for (int i = 0; i < size; i++)
>     {
>         a[i] = b[i] * a[i];
>     }
> }
>
> GCC currently generates:
>
>     ldmia   r0!, {r4}
>     ldmia   r1!, {r6}
>     subs    r5, r0, #4
>     ...
>     str     r4, [r5, #0]
>
> The issue occurs because IVOPTs selects a candidate with the lowest cost that
> has the following structure:
>
> Candidate xxx:
>   Incr POS: after use 0
>   IV struct:
>     Type:       unsigned int
>     Base:       (unsigned int) a_13(D)
>     Step:       4
>
> This results in the following loop structure:
>
> loop-preheader:
>     r0 = a
>     jump loop-exiting
>
> loop-header:
>     load-from  [r0]
>     increment  r0
>     store-to   [r0, #-4]
>
> loop-exiting:
>     jump loop-header
>
> **Issue 1:** IVOPTs recognizes both patterns as valid post-increment with
> offset zero:
>   - "load-from [r0]; increment r0" → recognized as post-inc from offset 0
>   - "increment r0; store-to [r0, #-4]" → also recognized as post-inc from
>     offset 0
>
> The code in tree-ssa-loop-ivopts.cc:get_address_cost() applies the adjustment:
>
>     if (stmt_after_increment (data->current_loop, cand, use->stmt))
>         ainc_offset += ainc_step;
>     cost = get_address_cost_ainc (ainc_step, ainc_offset,
>                                   addr_mode, mem_mode, as, speed);
>
> However, Thumb-1 does not support negative immediate offsets in addressing
> modes. The pattern "increment r0; store-to [r0, #-4]" can never be realized
> as a post-increment store on Thumb-1, yet IVOPTs assigns it a low cost.
>
> **Question 1:** Should get_address_cost() verify that an addressing mode is
> actually valid on the target before assigning auto-increment cost? Currently,
> it appears to assume validity without checking target constraints.

It should end up calling the legitimize_address_p target hook to verify
validity.

> **Issue 2:** IVOPTs also assigns low cost to another candidate:
>
> Candidate yyy:
>   Incr POS: before exit test
>   IV struct:
>     Type:       unsigned int
>     Base:       (unsigned int) a_13(D)
>     Step:       4
>
> This produces:
>
> loop-preheader:
>     r0 = &a[0]
>     jump loop-exiting
>
> loop-header:
>     load-from  [r0, #-4]
>     store-to   [r0]
>
> loop-exiting:
>     increment  r0
>     jump loop-header
>
> IVOPTs considers that the increment in the loop-exiting block can be paired
> with "load-from [r0, #-4]" in the loop-header block, despite them being in
> different basic blocks.
>
> **Question 2:** Should get_address_cost() verify that the candidate increment
> and use->stmt are in the same basic block when cand->pos == IP_NORMAL?
> Cross-block pairing seems problematic for post-increment addressing mode
> costing.

If the pairing is wrong it should be fixed, where's that pairing done?

>
> Both issues suggest that IVOPTs may need additional validation to ensure:
> 1. The selected addressing mode is actually supported by the target
> 2. The increment and memory operation are properly co-located for IP_NORMAL
>    candidates
>
> Best regards,
> Ciprian Arbone

Reply via email to