Hi Prathamesh,

Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
Hi,
The attached patch lowers zeroing array assignment to memset for allocatable 
arrays.

For example:
subroutine test(z, n)
     implicit none
     integer :: n
     real(4), allocatable :: z(:,:,:)

     allocate(z(n, 8192, 2048))
     z = 0
end subroutine

results in following call to memset instead of 3 nested loops for z = 0:
     (void) __builtin_memset ((void *) z->data, 0, (unsigned long) ((((MAX_EXPR <z->dim[0].ubound - 
z->dim[0].lbound, -1> + 1) * (MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) * (MAX_EXPR 
<z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));

The patch significantly improves speedup for an internal Fortran application on 
AArch64 -mcpu=grace (and potentially on other AArch64 cores too).
Bootstrapped+tested on aarch64-linux-gnu.
Does the patch look OK to commit ?

no, it is NOT ok.

Consider:

subroutine test0 (n, z)
  implicit none
  integer :: n
  real, pointer :: z(:,:,:)     ! need not be contiguous!
  z = 0
end subroutine

After your patch this also generates a memset, but this cannot be true
in general.  One would need to have a test on contiguity of the array
before memset can be used.

In principle this is a nice idea, and IIRC there exists a very
old PR on this (by Thomas König?).  So it might be worth
pursuing.

Thanks,
Harald


Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>

Thanks,
Prathamesh

Reply via email to