Hello,
Le 06/08/2024 à 22:05, Harald Anlauf a écrit :
Hi Mikael,
thanks for this nice set of patches!
I've played around a bit, and it seems to look good.
I have only minor comments left (besides the nan issue raised):
- inline expansion is inhibited at -Os. But wouldn't it be good if
we make this expansion also dependent on -ffrontend-optimize?
(This was the case for rank-1 before your patch).
See my answer to Thomas' message.
- in the case where two sets of loop(nest)s are generated, i.e. for
non-integral argument x, I wonder if the predictors for conditionals
(-> _likely/_unlikely) are chosen optimally. E.g. for this code:
subroutine minloc_real (x, m, back)
implicit none
real, intent(in), contiguous :: x(:)
integer :: m(*)
logical, optional :: back
m(1:rank(x)) = minloc (x,back=back)
end
the first loop becomes:
S.10 = second_loop_entry.9 ? idx0.7 : 1;
while (1)
{
if (S.10 > D.4310) goto L.3;
if (__builtin_expect ((integer(kind=8)) ((*x.0)[S.10 * D.4314
+ D.4309] <= limit.8), 0, 8))
{
limit.8 = (*x.0)[S.10 * D.4314 + D.4309];
pos0.5 = S.10 + offset0.6;
idx0.7 = S.10;
second_loop_entry.9 = 1;
goto L.1;
}
S.10 = S.10 + 1;
}
This results from this code in trans-intrinsic.cc:
if (!lab1 || HONOR_NANS (DECL_MODE (limit)))
{
...
cond = gfc_unlikely (cond, PRED_BUILTIN_EXPECT);
ifbody = build3_v (COND_EXPR, cond, ifbody,
build_empty_stmt (input_location));
}
As the reason for this separate loop is finding a first non-nan value,
I would expect gfc_likely to be more reasonable for the common case.
I think it is unlikely in the sense that it will be true at most once,
because of the goto breaking out of the loop in the if body.
Your expectation is reasonable, I can't tell which is the better choice.
It seems also reasonable to generate code that is fast when there are
many NANs and is slow but without visible effect when there are none,
because there is a single slow iteration in that case.
(There is also the oddity S.10 = second_loop_entry.9 ? ..., where
idx0.7 seems to be not initialized, but luckily it seems to be
handled by the optimizer and seen that this is no problem.)
Yeah, I considered adding initializations, but finally decided to not
pollute the frontend code (already complicated enough) and the generated
code with useless stuff, that the optimizers would have to remove.
Having gfc_unlikely in the second set of loops is fine, as this passes
over all array elements.
Note that this is pre-existing without/before your patch, but since you
are at it, you may want to check.
Not sure how to check that. I need a (realistic) benchmark.
Otherwise this is fine for mainline with the said issues considered.
Thanks for the review.
Will send a second version once we have settled on the topic of the
frontend optimization flag.
Mikael