Am 20.11.19 um 23:18 schrieb Janne Blomqvist:
On Wed, Nov 20, 2019 at 11:35 PM Thomas König <t...@tkoenig.net> wrote:
Am 20.11.19 um 21:45 schrieb Janne Blomqvist:
BTW, since this is done for the purpose of optimization, have you done
testing on some suitable benchmark suite such as polyhedron, whether
it a) generates any different code b) does it make it go faster?
I haven't run any actual benchmarks.
However, there is a simple example which shows its advantages.
Consider
subroutine foo(n,m)
m = 0
do 100 i=1,100
call bar
m = m + n
100 continue
end
(I used old-style DO loops just because :-)
Without the optimization, the inner loop is translated to
.L2:
xorl %eax, %eax
call bar_
movl (%r12), %eax
addl %eax, 0(%rbp)
subl $1, %ebx
jne .L2
and with the optimization to
.L2:
xorl %eax, %eax
call bar_
addl %r12d, 0(%rbp)
subl $1, %ebx
jne .L2
so the load of the address is missing. (Why do we zero %eax
before each call? It should not be a variadic call right?)
Not sure. Maybe some belt and suspenders thing? I guess someone better
versed in ABI minutiae knows better. It's not Fortran-specific though,
the C frontend does the same when calling a void function.
OK, so considering your other e-mail, this is a separate issue that
we can fix another time.
AFAIK on reasonably current OoO CPU's xor'ing a register with itself
is handled by the renamer and doesn't consume an execute slot, so it's
in effect a zero-cycle instruction. Still bloats the code slightly,
though.
Of course, Fortran language rules specify that the call to bar
cannot do anything to n
Hmm, does it? What about the following modification to your testcase:
module nmod
integer :: n
end module nmod
subroutine foo(n,m)
m = 0
do 100 i=1,100
call bar
m = m + n
100 continue
end subroutine foo
subroutine bar()
use nmod
n = 0
end subroutine bar
program main
use nmod
implicit none
integer :: m
n = 1
m = 0
call foo(n, m)
print *, m
end program main
That is not allowed:
# 15.5.2.13 Restrictions on entities associated with dummy arguments
[...]
# (3) Action that affects the value of the entity or any subobject of it
# shall be taken only through the dummy argument unless
[none of the restrictions apply].
So, a copy in / copy out for variables where we can not be sure that
no value is assigned? Does anybody see a downside for that?)
In principle sounds good, unless my concerns above are real and affect
this case too.
So, how to proceed? Commit the patch with the maximum length for a
mangled symbol, and then maybe try for the copy-out variant in a
follow-up patch?
I agree with Tobias that dealing with this in the middle end is probably
the right thing to do in the long run (especially since we could also
handle arrays and structs this way). Until we get around to doing this
(gcc 11 at earliest), we could still profit somewhat from this
optimization in the meantime.
Regards
Thomas