https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398

--- Comment #17 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #16)

> But you can't elide the checks in the peeled copies and for 4-times
> unrolling you have most cases exiting on the first or fourth check.

See comment #8 for an example how it should be unrolled (it needs a simple
check at entry and a trailing loop as well of course).

> Duffs device simply merges the prologue iterations for unrolling
> with the loop body so I don't see why it can't be used.  It's
> 
>   switch (n % 4)
>    {
>     loop:
>        iter
>        n--;
>     case 3:
>        iter
>        n--;
>     case 2:
>        iter
>        n--
>     case 1:
>        iter
>        n--;
>     case 0:
>        if (n != 0)
>          goto loop;
>    } 
> 
> it's cost is mainly the computed jump into the loop body.  Then
> you have a four-fold reduction in branches without the overhead
> of having another three loop body copies in the prologue with
> retained early exit checks.

Duff's device is a bad idea given it adds extra checks and dependencies that
aren't necessary if you unroll properly. There is never a need to merge the
trailing loop into the unrolled copy, and neither should we peel off 3
iterations for no gain.

Reply via email to