Hello,
Le 19/08/2024 à 21:44, Harald Anlauf a écrit :
Hi Mikael,
apart from patch #04/10, which did not apply cleanly here, I was
able to test your patch. It seems to work with a manual workaround
(-fno-frontend-optimize) to work around this problem.
Might be a local issue...
Huh? That's unexpected, patches were rebased before submitting, and I
don't think there was any recent activity in that area of the compiler
anyway.
That said, it works as advertised. Thanks for separating out the
IEEE-NaN tests.
What I did not fully get is the way you deal with -finline-intrinsics= .
Currently there are only MINLOC and MAXLOC, but in the future there
could be many more intrinsics. Given that you need 2 bits per
intrinsic in flag_inline_intrinsics, how future-proof is that?
Well, I don't expect that many candidate intrinsics for the flags;
currently SUM and PRODUCT could be added, and probably IALL, IANY and
IPARITY as well. Remember that having both a libgfortran and a frontend
implementation is a prerequisite.
For the future, 2 bits gives room to 16 intrinsics, and if we extend to
64 bits, to 32 intrinsics without much hassle. Having only 1 bit per
intrinsic would be certainly more future proof but I haven't found how
to do it. Zero bit (aka no per-intrinsic setting) would be even more
future-proof, but less convenient. As a last resort possibility the
.opt framework gives the possibility to accumulate options in a vector
and have them processed "by hand". It seemed more convenient to me to
just use 2 bits per intrinsic, but we can fall back to manual processing
if we get out of bits at some point.
Using only 1 bit per intrinsic was what I tried first, but then the
default value has to be set before processing the flags, which means no
default value depending on optimization, as optimization level is not
known at that time. I tried using EnabledBy(O || Ofast || Og) to set
default value dependending on optimization, but this doesn't work either
because the "O" covers all levels (-O0, -O1, -O2, -O3) without
distinction. And finally EnabledBy(O1 || O2 || O3 || Ofast || Og) is
not accepted because "O1", "O2" and "O3" are not valid option names.
I can certainly drop the per-intrinsic setting to have unlimited room
for future intrinsics. Or if you have ideas on how to only use one bit
per intrinsic, I'm all ears. Or maybe the testcases are too cumbersome,
and with a slight modification of behaviour we can save one bit (for
example, also generate inline code for -O0). What do you think?
In the documentation, you have:
+The set of intrinsics permitting the choice of implementation variant
through
+@code{-finline-intrinsics} is currently limited to non-scalar
@code{MAXLOC} and
+@code{MINLOC}.
Why do you say "non-scalar"? The new inlining is done for these
intrinsics when the DIM argument is absent. The result characteristics
however is:
"If DIM does not appear, the result is an array of rank one and of
size equal to the rank of ARRAY; ..."
and I thought the implementation does just that and does that right.
(With DIM present, the result is an array of rank rank(arg)-1.)
Can you clarify the wording in a way that is better understandable?
Yeah, these patches are all about non-scalar MINLOC/MAXLOC. But there
is also the scalar MINLOC/MAXLOC case which has pre-existing inline code
support (and on which these patches are based). The scalar case (aka
with DIM present, and ARRAY of rank 1) is always inlined, so the flag
has no effect on it.
Does this sound better?:
The set of intrinsics allowed as argument to @code{-finline-intrinsics=}
is currently limited to @code{MAXLOC} and @code{MINLOC}. The effect of
the flag is moreover limited to calls of those intrinsics without
@code{DIM} argument and with @code{ARRAY} of a non-@code{CHARACTER} type.
Otherwise the Fortran parts look fine to me.
Thanks for the review.
For the changes to gcc/flag-types.h you might need an OK from the
gcc maintainers.
Thanks,
Harald