Hi Prathamesh!

Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni:


-----Original Message-----
From: Harald Anlauf <anl...@gmx.de>
Sent: Thursday, July 11, 2024 12:53 AM
To: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
patc...@gcc.gnu.org; fort...@gcc.gnu.org
Subject: Re: Lower zeroing array assignment to memset for allocatable
arrays

External email: Use caution opening links or attachments


Hi Prathamesh,

Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
Hi,
The attached patch lowers zeroing array assignment to memset for
allocatable arrays.

For example:
subroutine test(z, n)
      implicit none
      integer :: n
      real(4), allocatable :: z(:,:,:)

      allocate(z(n, 8192, 2048))
      z = 0
end subroutine

results in following call to memset instead of 3 nested loops for z
= 0:
      (void) __builtin_memset ((void *) z->data, 0, (unsigned long)
((((MAX_EXPR <z->dim[0].ubound - z->dim[0].lbound, -1> + 1) *
(MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
(MAX_EXPR
<z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));

The patch significantly improves speedup for an internal Fortran
application on AArch64 -mcpu=grace (and potentially on other AArch64
cores too).
Bootstrapped+tested on aarch64-linux-gnu.
Does the patch look OK to commit ?

no, it is NOT ok.

Consider:

subroutine test0 (n, z)
    implicit none
    integer :: n
    real, pointer :: z(:,:,:)     ! need not be contiguous!
    z = 0
end subroutine

After your patch this also generates a memset, but this cannot be true
in general.  One would need to have a test on contiguity of the array
before memset can be used.

In principle this is a nice idea, and IIRC there exists a very old PR
on this (by Thomas König?).  So it might be worth pursuing.
Hi Harald,
Thanks for the suggestions!
The attached patch checks gfc_is_simply_contiguous(expr, true, false) before 
lowering to memset,
which avoids generating memset for your example above.

This is much better, as it avoids generating false memsets where
it should not.  However, you now miss cases where the array is a
component reference, as in:

subroutine test_dt (dt)
  implicit none
  type t
     real, allocatable         :: x(:,:,:)     ! contiguous!
     real, pointer, contiguous :: y(:,:,:)     ! contiguous!
     real, pointer             :: z(:,:,:)     ! need not be contiguous!
  end type t
  type(t) :: dt
  dt% x = 0  ! memset possible!
  dt% y = 0  ! memset possible!
  dt% z = 0  ! memset NOT possible!
end subroutine

You'll need to cycle through the component references and
apply the check for contiguity to the ultimate component,
not the top level.

Can you have another look?

Thanks,
Harald

Bootstrapped+tested on aarch64-linux-gnu.
Does the attached patch look OK ?

Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>

Thanks,
Prathamesh

Thanks,
Harald


Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>

Thanks,
Prathamesh



Reply via email to