RE: Lower zeroing array assignment to memset for allocatable arrays

Prathamesh Kulkarni Fri, 12 Jul 2024 06:31:44 -0700


> -----Original Message-----
> From: Harald Anlauf <anl...@gmx.de>
> Sent: Friday, July 12, 2024 1:52 AM
> To: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
> patc...@gcc.gnu.org; fortran@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Harald,
> 
> Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni:
> >
> >
> >> -----Original Message-----
> >> From: Harald Anlauf <anl...@gmx.de>
> >> Sent: Thursday, July 11, 2024 12:53 AM
> >> To: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
> >> patc...@gcc.gnu.org; fortran@gcc.gnu.org
> >> Subject: Re: Lower zeroing array assignment to memset for
> allocatable
> >> arrays
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi Prathamesh,
> >>
> >> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
> >>> Hi,
> >>> The attached patch lowers zeroing array assignment to memset for
> >> allocatable arrays.
> >>>
> >>> For example:
> >>> subroutine test(z, n)
> >>>       implicit none
> >>>       integer :: n
> >>>       real(4), allocatable :: z(:,:,:)
> >>>
> >>>       allocate(z(n, 8192, 2048))
> >>>       z = 0
> >>> end subroutine
> >>>
> >>> results in following call to memset instead of 3 nested loops for
> z
> >> = 0:
> >>>       (void) __builtin_memset ((void *) z->data, 0, (unsigned
> long)
> >>> ((((MAX_EXPR <z->dim[0].ubound - z->dim[0].lbound, -1> + 1) *
> >>> (MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
> >> (MAX_EXPR
> >>> <z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
> >>>
> >>> The patch significantly improves speedup for an internal Fortran
> >> application on AArch64 -mcpu=grace (and potentially on other
> AArch64
> >> cores too).
> >>> Bootstrapped+tested on aarch64-linux-gnu.
> >>> Does the patch look OK to commit ?
> >>
> >> no, it is NOT ok.
> >>
> >> Consider:
> >>
> >> subroutine test0 (n, z)
> >>     implicit none
> >>     integer :: n
> >>     real, pointer :: z(:,:,:)     ! need not be contiguous!
> >>     z = 0
> >> end subroutine
> >>
> >> After your patch this also generates a memset, but this cannot be
> >> true in general.  One would need to have a test on contiguity of
> the
> >> array before memset can be used.
> >>
> >> In principle this is a nice idea, and IIRC there exists a very old
> PR
> >> on this (by Thomas König?).  So it might be worth pursuing.
> > Hi Harald,
> > Thanks for the suggestions!
> > The attached patch checks gfc_is_simply_contiguous(expr, true,
> false)
> > before lowering to memset, which avoids generating memset for your
> example above.
> 
> This is much better, as it avoids generating false memsets where it
> should not.  However, you now miss cases where the array is a
> component reference, as in:
> 
> subroutine test_dt (dt)
>    implicit none
>    type t
>       real, allocatable         :: x(:,:,:)     ! contiguous!
>       real, pointer, contiguous :: y(:,:,:)     ! contiguous!
>       real, pointer             :: z(:,:,:)     ! need not be
> contiguous!
>    end type t
>    type(t) :: dt
>    dt% x = 0  ! memset possible!
>    dt% y = 0  ! memset possible!
>    dt% z = 0  ! memset NOT possible!
> end subroutine
> 
> You'll need to cycle through the component references and apply the
> check for contiguity to the ultimate component, not the top level.
> 
> Can you have another look?
Thanks for the review!
It seems that component references are not currently handled even for static 
size arrays ?
For eg:
subroutine test_dt (dt, y)
   implicit none
   real :: y (10, 20, 30)
   type t
      real :: x(10, 20, 30)
   end type t
   type(t) :: dt
   y = 0
   dt% x = 0
end subroutine


With trunk, it generates memset for 'y' but not for dt%x.
That happens because copyable_array_p returns false for dt%x,
because expr->ref->next is non NULL:

  /* First check it's an array.  */
  if (expr->rank < 1 || !expr->ref || expr->ref->next)
    return false;

and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY.
Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 
to fix PR33370.
(Even after removing these checks, the previous patch bails out from 
gfc_trans_zero_assign because
GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up 
returning NULL_TREE)
I am working on extending the patch to handle component refs for statically 
sized as well as allocatable arrays.

Since it looks like a bigger change and an extension to current functionality, 
will it be OK to commit the previous patch as-is (if it looks correct)
and address component refs in follow up one ?

Thanks,
Prathamesh                                                                      
 
> 
> Thanks,
> Harald
> 
> > Bootstrapped+tested on aarch64-linux-gnu.
> > Does the attached patch look OK ?
> >
> > Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Harald
> >>
> >>
> >>> Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>
> >>>
> >>> Thanks,
> >>> Prathamesh
> >

RE: Lower zeroing array assignment to memset for allocatable arrays

Reply via email to