https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
Bug ID: 101053
Summary: Incorrect code at -O1 on arm64
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: gilles.gouaillardet at gmail dot com
Target Milestone: ---
Created attachment 51003
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51003&action=edit
A simple reproducer
This issue was initially reported at
https://github.com/numpy/numpy/issues/18422
Bottom line, since the gcc-9 series(!), gfortran generates incorrect code for
OpenBLAS from -O1 on arm64.
Here is how to reproduce the issue:
# set the local prefix (to be customized)
prefix=...
# Download OpenBLAS
wget
https://github.com/xianyi/OpenBLAS/releases/download/v0.3.15/OpenBLAS-0.3.15.tar.gz
# Build and install OpenBLAS
tar xfz OpenBLAS-0.3.15.tar.gz
cd OpenBLAS-0.3.15/
make -j 56 libs netlib shared BINARY='64' CC='gcc' FC='gfortran'
MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' COMMON_OPT="-g -O1"
make install PREFIX=$prefix
cd ..
# Build and execute the attached reproducer
gfortran dgehd2.f90 -o dgehd2 -L$prefix/lib -Wl,-rpath,$prefix/lib -lopenblas
./dgehd2
Expected result (obtained with gfortran 8.3.1 (from rhel8) and 8.5.0, or if
OpenBLAS is built with COMMON_OPT="-g -O0":
INFO = 0
1.0000000000000000 -8.0622577482985491 0.58032253547122137
-3.5970073030870449 11.461538461538458 -3.6923076923076938
-0.24806946917841688 4.3076923076923075 2.5384615384615383
Current result (from gfortran 9.1.0 up to the gcc-12-20210606 snapshot):
INFO = 0
1.0000000000000000 -8.0622577482985491 0.58032253547122137
-Infinity NaN NaN
-Infinity NaN NaN
The faulty code is in the dgehd2 subroutine:
PARAMETER ( ONE = 1.0D+0 )
DO 10 I = ILO, IHI - 1
CALL DLARFG( IHI-I, A( I+1, I ), A( MIN( I+2, N ), I ), 1,
$ TAU( I ) )
AII = A( I+1, I )
A( I+1, I ) = ONE
CALL DLARF( 'Right', IHI, IHI-I, A( I+1, I ), 1, TAU( I ),
$ A( 1, I+1 ), LDA, WORK )
CALL DLARF( 'Left', IHI-I, N-I, A( I+1, I ), 1, TAU( I ),
$ A( I+1, I+1 ), LDA, WORK )
A( I+1, I ) = AII
10 CONTINUE
At the following line
A( I+1, I ) = ONE
Here is a snippet of the assembly (generated with gfortran 10.3.0)
.LBE9:
.loc 1 206 72 view .LVU34
fmov d9, 1.0e+0
.LBB10:
.loc 1 211 72 view .LVU35
adrp x0, .LC1
add x0, x0, :lo12:.LC1
str x0, [sp, 192]
.LBE10:
.LBB11:
.loc 1 216 72 view .LVU36
adrp x0, .LC2
add x0, x0, :lo12:.LC2
str x0, [sp, 200]
.LVL20:
.L7:
.loc 1 216 72 is_stmt 0 view .LVU37
.LBE11:
.LBB12:
.loc 1 204 72 is_stmt 1 view .LVU38
ldr w0, [x22]
sub w0, w0, w20
str w0, [sp, 224]
add w0, w20, 2
ldr w2, [x26]
cmp w2, w0
csel w2, w2, w0, le
mov w24, w20
add w20, w20, 1
.LVL21:
.loc 1 204 72 is_stmt 0 view .LVU39
add x2, x23, x2, sxtw
mov x4, x21
mov x3, x25
ldr x0, [sp, 136]
add x2, x0, x2, lsl 3
mov x1, x19
ldr x0, [sp, 184]
bl dlarfg_
.LVL22:
.LBE12:
.loc 1 205 72 is_stmt 1 view .LVU40
ldr d8, [x19]
.LVL23:
.loc 1 206 72 view .LVU41
str d9, [x19]
The constant 1.0D+0 is stored in $d9, but this register is used **after** the
invocation of the dlarfg_ subroutine, and it turns out this subroutine does
modify the $d9 register.
When $d9 is used to be stored into [x19], its value is
(gdb) p $d9
$1 = ( f = inf, u = 9218868437227405312, s = 9218868437227405312 )
If I set a breakpoint at that instruction, and manually
(gdb) set $d9=1.0
then the program behaves as expected.
Bottom line, there is an issue from gfortran 9 on arm64 from -O1 with this:
- Did gfortran incorrectly assume $d9 will not be modified (or at least, will
be restored) by other subroutines?
- Did dlarfg_ forget to restore $d9?
- Something else?