Hello,
> Well, the target architecture is actually quite peculiar, it's a
> parallel SPMD machine. The only similarity with MIPS is the ISA. The
> latency I'm trying to hide is somewhere around 24 cycles, but because it
> is a parallel machine, up to 1024 threads have to stall for 24 cycles in
George Caragea wrote:
So my initial question remains: is there any way to tell the scheduler
not to place the prefetch instruction after the actual read?
You can try changing sched_analyze_2 in sched-deps.c to handle PREFETCH
specially.
You could perhaps handle it similarly to how PRE_DEC is
Zdenek Dvorak wrote:
2. Right now I am inserting a __builting_prefetch(...) call immediately
before the actual read, getting something like:
D.1117_12 = &A[D.1101_14];
__builtin_prefetch (D.1117_12, 0, 1);
D.1102_16 = A[D.1101_14];
However, if I enable the instruction scheduler pass, it does
Hello,
> 2. Right now I am inserting a __builting_prefetch(...) call immediately
> before the actual read, getting something like:
> D.1117_12 = &A[D.1101_14];
> __builtin_prefetch (D.1117_12, 0, 1);
> D.1102_16 = A[D.1101_14];
>
> However, if I enable the instruction scheduler pass, it doesn