On 9/9/21 12:15 PM, Richard Biener wrote:
On Thu, Sep 9, 2021 at 11:21 AM Aldy Hernandez <al...@redhat.com> wrote:
On 9/9/21 10:58 AM, Richard Biener wrote:
I ran some experiments a while back, and my current work on the enhanced
solver/threader, can fold virtually everything the DOM/threader gets
(even with its use of const_and_copies, avail_exprs, and
evrp_range_analyzer), while getting 5% more DOM threads and 1% more
overall threads. That is, I've been testing if the path solver can
solve everything the DOM threader needs (the hybrid approach I mentioned).
Unfortunately, replacing the forward threader right now is not feasible
for a few reasons:
a) The const_and_copies/avail_exprs relation framework can do floats,
and that's next year's ranger work.
Hmm, it sounds like relying fully on ranger is a bit premature then.
For floats, most definitely. Ranger doesn't do them. They're for the
next release. However, const_and_copies/avail_exprs for integers we can
do just fine, that's why evrp/VRP are much easier targets.
b) Even though we can seemingly fold everything DOM/threader does, in
order to replace it with a backward threader instance we'd have to merge
the cost/profitability code scattered throughout the forward threader,
as well as the EDGE_FSM* / EDGE_COPY* business.
c) DOM changes the IL as it goes. Though we could conceivably divorce
do the threading after DOM is done.
Yeah, it does not actually process/simplify the blocks copied by threading.
In fact I think it only registers jump threading opportunities during the DOM
walk and commits them only later. But it of course uses its avail / copies
stack to find them - that part you cannot easily divorce.
Yes, all the threaders register paths ahead of time and only do the
actual CFG shuffling at the end with thread_through_all_blocks().
DOM is also yet another value-numbering framework - one could think
of stripping it down from that side, keeping the threading bits only
(but you'd still have the avail / copies bits).
Ughh, yeah. I've seen some random missed cases because of the
value-numbering it does :-/. It's very frustrating that we have various
ad-hoc VN methods.
That said, it has one nice property it can leverage due to its incredibly
simple memory redundancy handling, in that it usually performs way less
alias queries than FRE (even when you run the latter in non-iterative mode).
But the same way DOM can register jump threading opportunities FRE
could do as well.
My plan for DOM/threading for this release is to replace its evrp
analyzer with a path solver (path_range_query). We can still use the
avail/copies, and everything else should just work. I'm hand waiving on
a few missed cases due to IL changing, but last time I checked we got
way more in return. Andrew even thought we should get most things even
in the presence of changing IL, but I haven't distilled testcases for him.
For VRP/threading, the same hybrid thing, but replacing the
ASSERT_EXPRs, and the avail/copies with a path solver. There are no
floats here. Things are looking good in my tests.
Once we do floats, if we could merge the EDGE_*_THREAD and the
cost/profit bits between both threaders, we should be able to replace
the entire forward threader with a backward client. Hopefully I don't
run out of steam by then ;-).
Thanks for your insight on DOM. It's very helpful.
Aldy