Hi Prathamesh!
Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni:
-----Original Message-----
From: Harald Anlauf <anl...@gmx.de>
Sent: Thursday, July 11, 2024 12:53 AM
To: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
patc...@gcc.gnu.org; fort...@gcc.gnu.org
Subject: Re: Lower zeroing array assignment to memset for allocatable
arrays
External email: Use caution opening links or attachments
Hi Prathamesh,
Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
Hi,
The attached patch lowers zeroing array assignment to memset for
allocatable arrays.
For example:
subroutine test(z, n)
implicit none
integer :: n
real(4), allocatable :: z(:,:,:)
allocate(z(n, 8192, 2048))
z = 0
end subroutine
results in following call to memset instead of 3 nested loops for z
= 0:
(void) __builtin_memset ((void *) z->data, 0, (unsigned long)
((((MAX_EXPR <z->dim[0].ubound - z->dim[0].lbound, -1> + 1) *
(MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
(MAX_EXPR
<z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
The patch significantly improves speedup for an internal Fortran
application on AArch64 -mcpu=grace (and potentially on other AArch64
cores too).
Bootstrapped+tested on aarch64-linux-gnu.
Does the patch look OK to commit ?
no, it is NOT ok.
Consider:
subroutine test0 (n, z)
implicit none
integer :: n
real, pointer :: z(:,:,:) ! need not be contiguous!
z = 0
end subroutine
After your patch this also generates a memset, but this cannot be true
in general. One would need to have a test on contiguity of the array
before memset can be used.
In principle this is a nice idea, and IIRC there exists a very old PR
on this (by Thomas König?). So it might be worth pursuing.
Hi Harald,
Thanks for the suggestions!
The attached patch checks gfc_is_simply_contiguous(expr, true, false) before
lowering to memset,
which avoids generating memset for your example above.
This is much better, as it avoids generating false memsets where
it should not. However, you now miss cases where the array is a
component reference, as in:
subroutine test_dt (dt)
implicit none
type t
real, allocatable :: x(:,:,:) ! contiguous!
real, pointer, contiguous :: y(:,:,:) ! contiguous!
real, pointer :: z(:,:,:) ! need not be contiguous!
end type t
type(t) :: dt
dt% x = 0 ! memset possible!
dt% y = 0 ! memset possible!
dt% z = 0 ! memset NOT possible!
end subroutine
You'll need to cycle through the component references and
apply the check for contiguity to the ultimate component,
not the top level.
Can you have another look?
Thanks,
Harald
Bootstrapped+tested on aarch64-linux-gnu.
Does the attached patch look OK ?
Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>
Thanks,
Prathamesh
Thanks,
Harald
Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>
Thanks,
Prathamesh