> -----Original Message----- > From: Harald Anlauf <anl...@gmx.de> > Sent: Friday, July 12, 2024 1:52 AM > To: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc- > patc...@gcc.gnu.org; fortran@gcc.gnu.org > Subject: Re: Lower zeroing array assignment to memset for allocatable > arrays > > External email: Use caution opening links or attachments > > > Hi Prathamesh! Hi Harald, > > Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni: > > > > > >> -----Original Message----- > >> From: Harald Anlauf <anl...@gmx.de> > >> Sent: Thursday, July 11, 2024 12:53 AM > >> To: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc- > >> patc...@gcc.gnu.org; fortran@gcc.gnu.org > >> Subject: Re: Lower zeroing array assignment to memset for > allocatable > >> arrays > >> > >> External email: Use caution opening links or attachments > >> > >> > >> Hi Prathamesh, > >> > >> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni: > >>> Hi, > >>> The attached patch lowers zeroing array assignment to memset for > >> allocatable arrays. > >>> > >>> For example: > >>> subroutine test(z, n) > >>> implicit none > >>> integer :: n > >>> real(4), allocatable :: z(:,:,:) > >>> > >>> allocate(z(n, 8192, 2048)) > >>> z = 0 > >>> end subroutine > >>> > >>> results in following call to memset instead of 3 nested loops for > z > >> = 0: > >>> (void) __builtin_memset ((void *) z->data, 0, (unsigned > long) > >>> ((((MAX_EXPR <z->dim[0].ubound - z->dim[0].lbound, -1> + 1) * > >>> (MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) * > >> (MAX_EXPR > >>> <z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4)); > >>> > >>> The patch significantly improves speedup for an internal Fortran > >> application on AArch64 -mcpu=grace (and potentially on other > AArch64 > >> cores too). > >>> Bootstrapped+tested on aarch64-linux-gnu. > >>> Does the patch look OK to commit ? > >> > >> no, it is NOT ok. > >> > >> Consider: > >> > >> subroutine test0 (n, z) > >> implicit none > >> integer :: n > >> real, pointer :: z(:,:,:) ! need not be contiguous! > >> z = 0 > >> end subroutine > >> > >> After your patch this also generates a memset, but this cannot be > >> true in general. One would need to have a test on contiguity of > the > >> array before memset can be used. > >> > >> In principle this is a nice idea, and IIRC there exists a very old > PR > >> on this (by Thomas König?). So it might be worth pursuing. > > Hi Harald, > > Thanks for the suggestions! > > The attached patch checks gfc_is_simply_contiguous(expr, true, > false) > > before lowering to memset, which avoids generating memset for your > example above. > > This is much better, as it avoids generating false memsets where it > should not. However, you now miss cases where the array is a > component reference, as in: > > subroutine test_dt (dt) > implicit none > type t > real, allocatable :: x(:,:,:) ! contiguous! > real, pointer, contiguous :: y(:,:,:) ! contiguous! > real, pointer :: z(:,:,:) ! need not be > contiguous! > end type t > type(t) :: dt > dt% x = 0 ! memset possible! > dt% y = 0 ! memset possible! > dt% z = 0 ! memset NOT possible! > end subroutine > > You'll need to cycle through the component references and apply the > check for contiguity to the ultimate component, not the top level. > > Can you have another look? Thanks for the review! It seems that component references are not currently handled even for static size arrays ? For eg: subroutine test_dt (dt, y) implicit none real :: y (10, 20, 30) type t real :: x(10, 20, 30) end type t type(t) :: dt y = 0 dt% x = 0 end subroutine
With trunk, it generates memset for 'y' but not for dt%x. That happens because copyable_array_p returns false for dt%x, because expr->ref->next is non NULL: /* First check it's an array. */ if (expr->rank < 1 || !expr->ref || expr->ref->next) return false; and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY. Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 to fix PR33370. (Even after removing these checks, the previous patch bails out from gfc_trans_zero_assign because GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up returning NULL_TREE) I am working on extending the patch to handle component refs for statically sized as well as allocatable arrays. Since it looks like a bigger change and an extension to current functionality, will it be OK to commit the previous patch as-is (if it looks correct) and address component refs in follow up one ? Thanks, Prathamesh > > Thanks, > Harald > > > Bootstrapped+tested on aarch64-linux-gnu. > > Does the attached patch look OK ? > > > > Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> > > > > Thanks, > > Prathamesh > >> > >> Thanks, > >> Harald > >> > >> > >>> Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> > >>> > >>> Thanks, > >>> Prathamesh > >