https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863
--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Uroš Bizjak from comment #10) > Created attachment 58650 [details] > Testcase that illustrates performance issue Without ustrunc{m}{m}2 optab the loop in the testcase compiles to (gcc -O2): .L7: movl 12(%rsp), %eax .L4: testl %eax, %eax jne .L2 movl $4294967295, %eax cmpq %rax, %rbx cmovbe %rbx, %rax movl %eax, 12(%rsp) subq %rax, %rbx .L2: leaq 12(%rsp), %rdi call deflate testl %eax, %eax je .L7 where the relevant part of the _.optimized tree dump reads: <bb 4> [local count: 536870913]: _13 = MIN_EXPR <left_4, 4294967295>; iftmp.0_6 = (unsigned int) _13; stream.avail_out = iftmp.0_6; left_15 = left_4 - _13; and when ustrunc{m}{n} is present, the same loop compiles to: .L7: movl 12(%rsp), %eax .L4: testl %eax, %eax jne .L2 movl $4294967295, %eax movl %ebp, %edx cmpq %rax, %rbx cmovbe %rbx, %rax cmpq %rbx, %rbp <--- cmovnc %ebx, %edx <--- subq %rax, %rbx movl %edx, 12(%rsp) .L2: leaq 12(%rsp), %rdi call deflate testl %eax, %eax je .L7 where the relevant part of the _.optimized tree dump reads: <bb 4> [local count: 536870912]: _12 = MIN_EXPR <left_3, 4294967295>; iftmp.0_5 = .SAT_TRUNC (left_3); stream.avail_out = iftmp.0_5; left_14 = left_3 - _12; Please note two new instructions in the second asm dump. These are expanded from .SAT_TRUNC and are not present in the first asm dump. The problem here is that the presence of ustrunc{m}{n}2 optab in i386.md prevents some optimization involving .MIN_EXPR that would result in better code.