On 09/08/2015 02:51 PM, Jeff Law wrote:
On 09/08/2015 12:39 PM, Aditya K wrote:
IIUC, in the haifa-sched.c, the default scheduling algorithm seems to
be top-down (before reload). Is there a way to schedule the other way
(bottom up), or both ways?
Not that I'm aware of. Note that region scheduling allows insns to
move between basic blocks to help fill the bubbles that can occur at
the end of a block.
Also the current scheduler has a lot of algorithms to decrease the
problem as backtracking scheduler written by Bernd Schmidt or some form
of lookahead.
As a use case for bottom-up or some other heuristic: Currently, the
first priority in the selection is given to the longest path, in some
cases this may produce code with stalls at the end of the basic
block. Whereas in the case of combined top-down + bottom-up
scheduling we would end up having stalls in the middle of the basic
block.
GCC's original scheduler worked bottom-up until ~1997. IBM Haifa's
work turned it into a top-down model and was a small, but clear
improvement.
As I remember it is was written by Mike Tiemann. Bottom-up scheduler as
a rule generates worse code than top-down one. By the way, implementing
bottom-up scheduler in GCC would require implementing reverse (N)DFA
from a processor description to recognize resource constraints.
There's certainly better things that can be done than strictly
top-down or bottom-up, but revamping the scheduler again hasn't been
seen as a major win for the most common processors GCC targets these
days. Thus it hasn't been a significant area of focus.
Yes, that is true for OOO execution processors which can rearrange insns
and execute them speculatively looking through several branches. For
such processors, software pipelining is more important as the processors
can look only through a few branches as software pipelining could look
through any number of branches. That is why Intel compiler did not have
any insn scheduler (but had software pipelining) until Intel Atom
introduction which was originally in-order processor.
Actually, I believe dealing with variable/unknown latency of load insns
(depending where data are placed in a cache or memory) would be more
important than bottom-up or hybrid scheduler. A balanced scheduling
dealing with this problem was implemented by Alexander Monakov about 7-8
years ago as a google internship work but it was not included as at that
time its advantages was not confirmed on SPEC2000. It would be
interesting to reconsider and re-evaluate it on modern processors and
scientific benchmarks with big data.
For in-order processors, we also have another scheduler (selective one)
which does additional transformations (like register renaming and
non-modulo software pipelining) which could be more important than
top-down/bottom-up scheduling. And it gave 1-2% improvement on Itanium
SPEC2000 in comparison with haifa scheduler.