On Tue, Oct 10, 2023 at 2:43 PM Joern Rennecke <joern.renne...@embecosm.com> wrote: > > I'm working on implementing hardware loops for the CORE-V CV32E40P > https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest/corev_hw_loop.html > > This core supports nested hardware lops, but does not allow any other flow > control inside hardware loops. I found that our existing interfaces do not > allow sufficient control over when to emit doloop patterns, i.e. allowing > nested doloops while rejecting other flow control inside the loop. > > TARGET_CAN_USE_DOLOOP_P does not get passed anything to look at the > individual loop. Most convenient would be the loop structure, although > that would cause tight coupling of the target port with the internal data > structures of the loop optimizers.
I don't think this would really be an issue, the loop structure is really part of the CFG structure nowadays. > OTOH we already have a precedent with TARGET_PREDICT_DOLOOP_P . > > TARGET_INVALID_WITHIN_DOLOOP is missing context. We neither know the loop > nesting depth, nor if any jump instruction under consideration is the final > branch to jump back to the loop latch. Actually, the seccond part is the > main problem for the CV32E40P: inner doloops that have been transformed > can be recognized as such, but un-transformed condjumps could either be > spaghetti code inside the loop or the final jump instruction of the loop. > > The doloop_end pattern is also missing context to make meaningful decisions. > Although we know the label where the pattern is supposed to jump to, > we don't know where the original branch is. Even if we scan the insn > stream, this is ambigous, since there can be two (or more) nested doloop > candidates. > What we could do here is add optional arguments; there is precedence, e.g. > for the call pattern. The advantage of this approach is that ports that > are fine with the current interface need not be patched. > To make it possible to scritinze the control flow of the loop, the branch > at the end of the loop makes a good optional argument. > > There is also the issue that loop setup is a bit more costly for large loops, > and it would be nice to weigh that against the iteration count. We had > information about the iteration count at TARGET_CAN_USE_DOLOOP_P, but > nothing to allow us to analyze the loop body. Although the port could > stash avay the iteration count into a globalvariable or machine_function > member, it would be more straightforward and robust to pass the information > together so that it can be considered in context. > > Attached is an patch for an optional 3rd parameter to doloop_end .