I have tried to play a bit with SMS on ia64 and I can't understand
what it is doing.
It seems that instead of getting some of the first insns out of the
loop into the prologue it simply gets an entire iteration out of the
loop and the loop's content stays approximately the same.

For example for

void x(long long*  y, long long* x)
{
    int i;
    for (i = 0; i < 100; i++)
    {
        *x = *y;
        x+=20;y+=30;
    }
}

with ./cc1 ./a.c -O3 -fmodulo-sched.
Can someone show an example where it actually works as it should?

Roy.

2010/11/10 Andrey Belevantsev <a...@ispras.ru>:
> Hi,
>
> On 10.11.2010 12:32, roy rosen wrote:
>>
>> Hi,
>>
>> I was wondering if gcc has software pipelining.
>> I saw options -fsel-sched-pipelining -fselective-scheduling
>> -fselective-scheduling2 but I don't see any pipelining happening
>> (tried with ia64).
>> Is there a gcc VLIW port in which I can see it working?
>
> You need to try -fmodulo-sched.  Selective scheduling works by default on
> ia64 with -O3, otherwise you need -fselective-scheduling2
> -fsel-sched-pipelining.  Note that selective scheduling disables autoinc
> generation for the pipelining to work, and modulo scheduling will likely
> refuse to pipeline a loop with autoincs.
>
> Modulo scheduling implementation in GCC may be improved, but that's a
> different topic.
>
> Andrey
>
>>
>> For an example function like
>>
>> int nor(char* __restrict__ c, char* __restrict__ d)
>> {
>>     int i, sum = 0;
>>     for (i = 0; i<  256; i++)
>>         d[i] = c[i]<<  3;
>>     return sum;
>> }
>>
>> with no pipelining a code like
>>
>> r1 = 0
>> r2 = c
>> r3 = d
>> _startloop
>> if r1 == 256 jmp _end
>> r4 = [r2]+
>> r4>>= r4
>> [r3]+ = r4
>> r1++
>> jmp _startloop
>> _end
>>
>> here inside the loop there is a data dependency between all 3 insns
>> (only the r1++ is independent) which does not permit any parallelism
>>
>> with pipelining I expect a code like
>>
>> r1 = 2
>> r2 = c
>> r3 = d
>> // peel first iteration
>> r4 = [r2]+
>> r4>>= r4
>> r5 = [r2]+
>> _startloop
>> if r1 == 256 jmp _end
>> [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+
>> r1++
>> jmp _startloop
>> _end
>>
>> Now the data dependecy is broken and parlallism is possible.
>> As I said I could not see that happening.
>> Can someone please tell me on which port and with what options can I
>> get such a result?
>>
>> Thanks, Roy.
>
>

Reply via email to