https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #10)
> Created attachment 58650 [details]
> Testcase that illustrates performance issue

Without ustrunc{m}{m}2 optab the loop in the testcase compiles to (gcc -O2):

.L7:
        movl    12(%rsp), %eax
.L4:
        testl   %eax, %eax
        jne     .L2
        movl    $4294967295, %eax
        cmpq    %rax, %rbx
        cmovbe  %rbx, %rax
        movl    %eax, 12(%rsp)
        subq    %rax, %rbx
.L2:
        leaq    12(%rsp), %rdi
        call    deflate
        testl   %eax, %eax
        je      .L7

where the relevant part of the _.optimized tree dump reads:

  <bb 4> [local count: 536870913]:
  _13 = MIN_EXPR <left_4, 4294967295>;
  iftmp.0_6 = (unsigned int) _13;
  stream.avail_out = iftmp.0_6;
  left_15 = left_4 - _13;

and when ustrunc{m}{n} is present, the same loop compiles to:

.L7:
        movl    12(%rsp), %eax
.L4:
        testl   %eax, %eax
        jne     .L2
        movl    $4294967295, %eax
        movl    %ebp, %edx
        cmpq    %rax, %rbx
        cmovbe  %rbx, %rax
        cmpq    %rbx, %rbp   <---
        cmovnc  %ebx, %edx   <---
        subq    %rax, %rbx
        movl    %edx, 12(%rsp)
.L2:
        leaq    12(%rsp), %rdi
        call    deflate
        testl   %eax, %eax
        je      .L7

where the relevant part of the _.optimized tree dump reads:

  <bb 4> [local count: 536870912]:
  _12 = MIN_EXPR <left_3, 4294967295>;
  iftmp.0_5 = .SAT_TRUNC (left_3);
  stream.avail_out = iftmp.0_5;
  left_14 = left_3 - _12;

Please note two new instructions in the second asm dump. These are expanded
from .SAT_TRUNC and are not present in the first asm dump.

The problem here is that the presence of ustrunc{m}{n}2 optab in i386.md
prevents some optimization involving .MIN_EXPR that would result in better
code.

Reply via email to