On Tue, Oct 10, 2023 at 2:43 PM Joern Rennecke
<joern.renne...@embecosm.com> wrote:
>
> I'm working on implementing hardware loops for the CORE-V CV32E40P
> https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/latest/corev_hw_loop.html
>
> This core supports nested hardware lops, but does not allow any other flow
> control inside hardware loops.  I found that our existing interfaces do not
> allow sufficient control over when to emit doloop patterns, i.e. allowing
> nested doloops while rejecting other flow control inside the loop.
>
> TARGET_CAN_USE_DOLOOP_P does not get passed anything to look at the
> individual loop.  Most convenient would be the loop structure, although
> that would cause tight coupling of the target port with the internal data
> structures of the loop optimizers.

I don't think this would really be an issue, the loop structure is really
part of the CFG structure nowadays.

> OTOH we already have a precedent with TARGET_PREDICT_DOLOOP_P .
>
> TARGET_INVALID_WITHIN_DOLOOP is missing context.  We neither know the loop
> nesting depth, nor if any jump instruction under consideration is the final
> branch to jump back to the loop latch.  Actually, the seccond part is the
> main problem for the CV32E40P: inner doloops that have been transformed
> can be recognized as such, but un-transformed condjumps could either be
> spaghetti code inside the loop or the final jump instruction of the loop.
>
> The doloop_end pattern is also missing context to make meaningful decisions.
> Although we know the label where the pattern is supposed to jump to,
> we don't know where the original branch is.  Even if we scan the insn
> stream, this is ambigous, since there can be two (or more) nested doloop
> candidates.
> What we could do here is add optional arguments; there is precedence, e.g.
> for the call pattern.  The advantage of this approach is that ports that
> are fine with the current interface need not be patched.
> To make it possible to scritinze the control flow of the loop, the branch
> at the end of the loop makes a good optional argument.
>
> There is also the issue that loop setup is a bit more costly for large loops,
> and it would be nice to weigh that against the iteration count.  We had
> information about the iteration count at TARGET_CAN_USE_DOLOOP_P, but
> nothing to allow us to analyze the loop body.  Although the port could
> stash avay the iteration count into a globalvariable or machine_function
> member, it would be more straightforward and robust to pass the information
> together so that it can be considered in context.
>
> Attached is an patch for an optional 3rd parameter to doloop_end .

Reply via email to