Hi Prathamesh,
Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
Hi,
The attached patch lowers zeroing array assignment to memset for allocatable
arrays.
For example:
subroutine test(z, n)
implicit none
integer :: n
real(4), allocatable :: z(:,:,:)
allocate(z(n, 8192, 2048))
z = 0
end subroutine
results in following call to memset instead of 3 nested loops for z = 0:
(void) __builtin_memset ((void *) z->data, 0, (unsigned long) ((((MAX_EXPR <z->dim[0].ubound -
z->dim[0].lbound, -1> + 1) * (MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) * (MAX_EXPR
<z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
The patch significantly improves speedup for an internal Fortran application on
AArch64 -mcpu=grace (and potentially on other AArch64 cores too).
Bootstrapped+tested on aarch64-linux-gnu.
Does the patch look OK to commit ?
no, it is NOT ok.
Consider:
subroutine test0 (n, z)
implicit none
integer :: n
real, pointer :: z(:,:,:) ! need not be contiguous!
z = 0
end subroutine
After your patch this also generates a memset, but this cannot be true
in general. One would need to have a test on contiguity of the array
before memset can be used.
In principle this is a nice idea, and IIRC there exists a very
old PR on this (by Thomas König?). So it might be worth
pursuing.
Thanks,
Harald
Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>
Thanks,
Prathamesh