Hi Alexander,
 
> So essentially the main issue is not a hardware peculiarity, but rather the
> bad schedule being totally wrong (it could only make sense if loads had 
> 1-cycle
> latency, which they do not).

The scheduling is only bad because the specific intrinsics used are mapped
onto asm statements, so they are ignored by the scheduler and modelled
with zero latencies.

> I think this highlights how implementing this autoprefetch heuristic via the
> dfa_lookahead_guard interface looks questionable in the first place, but the
> patch itself makes sense to me.

Yes I'm still not sure what this autoprefetch heuristic is trying to 
accomplish...
We could try disabling it and see whether it actually helps.

Wilco

Reply via email to