https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007

            Bug ID: 93007
           Summary: [10 regression] pr77698.c testcase fails due to block
                    commoning
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wilco at gcc dot gnu.org
  Target Milestone: ---

Since r276960 we see this failure on Arm:

FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments "internal loop
alignment added" 1

The issue appears to be that basic block commoning works on an unrolled loop,
which is unlikely to be beneficial for performance:

.L17:
        adds    r0, r0, #1
        b       .L27
.L6:
        ldr     r4, [r2, #12]
        adds    r0, r0, #4
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
.L27:
        ldr     r4, [r2, #12]
        cmp     ip, r0
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        bne     .L6
        pop     {r4, pc}

The test could be easily fixed, but ensuring block commoning takes loops and
execution frequencies into account would be better overall.

Reply via email to