http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

           Summary: Horrible code generation for trivial decrement with
                    test
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: torva...@linux-foundation.org


This trivial code:

  extern void fncall(void *);

  int main(int argc, char **argv)
  {
    if (!--*argv)
        fncall(argv);
    return 0;
  }

compiles into this ridiculous x86-64 assembly language:

    movq    (%rsi), %rax
    subq    $1, %rax
    testq    %rax, %rax
    movq    %rax, (%rsi)
    je    .L4

for the "decrement and test result" at -O2. 

I'd have expected that any reasonable compiler would generate something like

    decq    (%rsi)
    je    .L4

instead, which would be smaller and faster (even a "subq $1" would be fine, but
the decq is one byte shorter).

The problem is more noticeable when the memory location is a structure offset,
when the "load+decrement+store" model really results in relatively much bigger
code due to the silly repetition of the memory address, for absolutely no
advantage.

Is there some way that I haven't found to make gcc use the rmw instructions?

Reply via email to