Sorry my previous answer was cut.
The motivation for prepass global code motion is indeed that after register
allocation, inter-block scheduling is even more restricted due to
anti-dependencies, including those due to live-out on side exit branches.
Global code motion is a key performance enabler especially for the non-temporal
loads (i.e. L1 cache bypass loads), which have an exposed latency close to 20
cycles on the current kvx cores.
The dataflow issues encountered with SEL_SCHED in prepass with control
speculation enabled was inconsistent liveness reported by the compiler. I am
running a test suite to reproduce it (saw it 3 months ago).
Here is again a motivating example where I expect the scheduler to speculate
loads from the second to the first block in the loop, which dominates it, so in
principle SCHED_RGN should do it:
typedef struct list_cell_ {
struct list_cell_ *next;
float payload;
} list_cell_, *list_cell;
float
list_sum(list_cell_ *list)
{
float result = 0.0;
while (list->next) {
list = list->next;
result += 1.0f/list->payload;
if (!list->next) break;
list = list->next;
result += 1.0f/list->payload;
}
return result;
}
Here is the TARGET_SCHED_SET_SCHED_FLAGS, with comments that reflect my
understanding on what to do. The commented line prevents SEL_SCHED with control
speculation unless postpass (as in ia64):
static void
kvx_sched_set_sched_flags (struct spec_info_def *spec_info)
{
unsigned int *flags = &(current_sched_info->flags);
// Speculative scheduling is enabled by non-zero spec_info->mask.
spec_info->mask = 0;
if (*flags & (SEL_SCHED | SCHED_RGN))
{
//if (!sel_sched_p () || reload_completed)
{
// Must do this in case of speculation.
*flags |= USE_DEPS_LIST | DO_SPECULATION;
// Do control speculation only.
spec_info->mask = BEGIN_CONTROL;
// Speculative scheduling without CHECK.
spec_info->flags = SEL_SCHED_SPEC_DONT_CHECK_CONTROL;
// Dump into the sched_dump.
spec_info->dump = sched_dump;
}
}
}
The TARGET_SCHED_SET_SCHED_FLAGS is implemented by (should memoize to return 0
if already speculated with the same ts, assuming not relevant here):
static int
kvx_sched_speculate_insn (rtx_insn *insn, ds_t ts, rtx *new_pat)
{
rtx pattern = PATTERN (insn);
if (GET_CODE (pattern) == SET)
{
rtx src = SET_SRC (pattern);
if (GET_CODE (src) == MEM)
{
*new_pat = pattern;
return 1;
}
}
return -1;
}
And TARGET_SCHED_NEEDS_BLOCK_P always returns false.
When I compile the motivating example above for the KVX,
kvx_sched_speculate_insn() is indeed called with reload_completed==0 (prepass)
for the two loads of the second block, but no code motion to the first block
happens. Generated code is the same for SCHED_RGN (default) or SEL_SCHED
(-fselective-scheduling), up to a renaming of the registers, although SEL_SCHED
calls kvx_sched_speculate_insn() several times for each load.
For the ia64 on the motivating example, it seems there is no prepass control
speculation either:
./gcc/ia64/gcc/cc1 -fpreprocessed list_sum2.i -quiet -dumpbase list_sum2.c
-dp -auxbase list_sum2 -O3 -version -ffast-math -o list_sum2.s -da -dp
-msched-control-spec -msched-in-control-spec
grep _speculative list_sum2.c.*
list_sum2.c.298r.mach:] UNSPEC_LDS)) 24 {movsf_speculative}
...
I noticed that the ia64 target uses the undocumented target hooks
TARGET_SCHED_GET_INSN_SPEC_DS and TARGET_SCHED_GET_INSN_CHECKED_DS whose code
is actually executed on this example.
Any recommendation