On 5/7/24 20:44, Toon Moene wrote:

On 5/7/24 20:35, Andrew Pinski wrote:

On Tue, May 7, 2024 at 11:31 AM Toon Moene <t...@moene.org> wrote:

On 5/7/24 00:02, Toon Moene wrote:

OK, perhaps on the aarch64 I need the following option to make the
comparison fair:

‘rdma’

      Enable Round Double Multiply Accumulate instructions. This is on by
default for -march=armv8.1-a.

I.e., -mno-rdma

(I hope that's correct - I'll will try that when the Sun rises again and
I have some power to run the AArch64 machine ...).

Well, I did two independent runs with gfortran-13.2 and the following
options:

-O3 -march=armv8.1-a+rdma

and

-O3 -march=armv8.1-a+nordma

No difference in the number of error runs exceeding the prescribed
thresholds.

So, unless I made a mistake in the option specification (or the compiler
silently ignored them because they were not applicable to my machine -
ugh), the cause of the problem lies elsewhere.


AARCH64 armv8-a has FMA as part of its base ISA.
So you want to try with `-ffp-contract=off` instead.
RDMA turns on/off instructions which are not used by the
auto-vectorizer (yet) and used by intrinsics for them (If I read the
code correctly).

Ah, thanks - I'll try that tomorrow.

Yep, that did it:


                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1327023         0       (0.000%)        0       
(0.000%)        
DOUBLE PRECISION        1327845         0       (0.000%)        0       
(0.000%)        
COMPLEX                 786775          0       (0.000%)        0       
(0.000%)        
COMPLEX16               787842          0       (0.000%)        0       
(0.000%)        

--> ALL PRECISIONS   4229485         0       (0.000%)        0       (0.000%)   
     

So, obviously, the threshold values for these tests were derived on a machine without fused-multiply-add, or without using them if present.

This is perhaps not surprising, as the default build-and-test setup (make.inc.example) of the LAPACK package as distributed from netlib.org lists as the compiler choice:

FC = gfortran
FFLAGS = -O2 -frecursive
FFLAGS_DRV = $(FFLAGS)
FFLAGS_NOOPT = -O0 -frecursive

which means that the choice of architecture on x86-64 would be "generic" and wouldn't include FMA instructions. If the authors had used that setup in deriving the thresholds, it is not surprising that you need -ffp-contract=off on architectures that include FMA instructions by default.

Thanks for helping me out with this !

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Reply via email to