Hi, the stup^H^H^H^Hhighly benevolent mailing list filter rejected my patch because it was too large...
Hello world, now that the release of gcc 8 is really around the corner, it's time to add some new features :-) I had mostly completed this a few months ago, but by then we were too deep into regression fixes only mode. The attached patch implements the BACK argument for minloc and maxloc, in all three parts - simplification, inline and library. I didn't actually go backwards though the array with BACK set to true. Instead, I used comparisons which included equality. At first, I was a bit concerned that adding the argument would lead to inefficient code. Here, it seems that adding gfc_unlikely to the condition really helped. It would probably make sense to add this to minval and maxval as well, but that is for another time. With the short benchmark program program main call x() contains subroutine x() integer, dimension(5000,5000) :: a integer, dimension(5000*5000) :: b real, dimension(5000*5000) :: r character(len=100) :: line call random_number(r) b = r * 100000 a = reshape(b,shape(a)) call cpu_time (t1) write (unit=line,fmt=*) maxloc(a) call cpu_time (t2) print '(A25, F10.6)',"maxloc rank 2:", t2-t1 call cpu_time (t1) write (unit=line,fmt=*) maxloc(b,dim=1) call cpu_time (t2) print '(A25,F10.6)',"maxloc inline:", t2-t1 call cpu_time (t1) write (unit=line,fmt=*) sum(maxloc(a,dim=1)) call cpu_time (t2) print '(A25, F10.6)',"maxloc rank 2, dim=1:", t2-t1 call cpu_time (t1) write (unit=line,fmt=*) sum(maxloc(a,dim=2)) call cpu_time (t2) print '(A25, F10.6)',"maxloc rank 2, dim=2:", t2-t1 end subroutine x end program main I got with the patch on current trunk $ gfortran -Ofast -march=native ibench.f90 && ./a.out maxloc rank 2: 0.009175 maxloc inline: 0.007585 maxloc rank 2, dim=1: 0.014779 maxloc rank 2, dim=2: 0.038609 and with gfortran-7 (haven't build a gcc 8 yet) $ /usr/bin/gfortran-7 -Ofast -march=native ibench.f90 && ./a.out maxloc rank 2: 0.011980 maxloc inline: 0.013796 maxloc rank 2, dim=1: 0.021252 maxloc rank 2, dim=2: 0.037773 Regression-tested. OK for trunk? I'll hold off committing until gcc 8 is actually released, just in case. Regards Thomas 2018-05-01 Thomas Koenig <tkoe...@gcc.gnu.org> PR fortran/54613 * check.c (gfc_check_minmaxloc): Remove error for BACK not being implemented. Use gfc_logical_4_kind for BACK. * simplify.c (min_max_choose): Add optional argument back_val. Handle it. (simplify_minmaxloc_to_scalar): Add argument back_val. Pass back_val to min_max_choose. (simplify_minmaxloc_to_nodim): Likewise. (simplify_minmaxloc_to_array): Likewise. (gfc_simplify_minmaxloc): Add argument back, handle it. Pass back_val to specific simplification functions. (gfc_simplify_minloc): Remove ATTRIBUTE_UNUSED from argument back, pass it on to gfc_simplify_minmaxloc. (gfc_simplify_maxloc): Likewise. * trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc): Adjust comment. If BACK is true, use greater or equal (or lesser or equal) insteal of greater (or lesser). Mark the condition of having found a value which exceeds the limit as unlikely. 2018-05-01 Thomas Koenig <tkoe...@gcc.gnu.org> PR fortran/54613 * m4/iforeach-s.m4: Remove assertion that back is zero. * m4/iforeach.m4: Likewise. Remove leading 'do' before implementation start. * m4/ifunction-s.m4: Remove assertion that back is zero. * m4/ifunction.m4: Likewise. Remove for loop if HAVE_BACK_ARG is defined. * m4/maxloc0.m4: Reorganize loops. Split loops between >= and =, depending if back is true. Mark the condition of having found a value which exceeds the limit as unlikely. * m4/minloc0.m4: Likewise. * m4/maxloc1.m4: Likewise. * m4/minloc1.m4: Likewise. * m4/maxloc1s.m4: Handle back argument. * m4/minloc1s.m4: Likewise. * m4/maxloc2s.m4: Remove assertion that back is zero. Remove special handling of loop start. Handle back argument. * m4/minloc2s.m4: Likewise. * generated/iall_i1.c: Regenerated. * generated/iall_i16.c: Regenerated. * generated/iall_i2.c: Regenerated. * generated/iall_i4.c: Regenerated. * generated/iall_i8.c: Regenerated. * generated/iany_i1.c: Regenerated. * generated/iany_i16.c: Regenerated. * generated/iany_i2.c: Regenerated. * generated/iany_i4.c: Regenerated. * generated/iany_i8.c: Regenerated. * generated/iparity_i1.c: Regenerated. * generated/iparity_i16.c: Regenerated. * generated/iparity_i2.c: Regenerated. * generated/iparity_i4.c: Regenerated. * generated/iparity_i8.c: Regenerated. * generated/maxloc0_16_i1.c: Regenerated. * generated/maxloc0_16_i16.c: Regenerated. * generated/maxloc0_16_i2.c: Regenerated. * generated/maxloc0_16_i4.c: Regenerated. * generated/maxloc0_16_i8.c: Regenerated. * generated/maxloc0_16_r10.c: Regenerated. * generated/maxloc0_16_r16.c: Regenerated. * generated/maxloc0_16_r4.c: Regenerated. * generated/maxloc0_16_r8.c: Regenerated. * generated/maxloc0_16_s1.c: Regenerated. * generated/maxloc0_16_s4.c: Regenerated. * generated/maxloc0_4_i1.c: Regenerated. * generated/maxloc0_4_i16.c: Regenerated. * generated/maxloc0_4_i2.c: Regenerated. * generated/maxloc0_4_i4.c: Regenerated. * generated/maxloc0_4_i8.c: Regenerated. * generated/maxloc0_4_r10.c: Regenerated. * generated/maxloc0_4_r16.c: Regenerated. * generated/maxloc0_4_r4.c: Regenerated. * generated/maxloc0_4_r8.c: Regenerated. * generated/maxloc0_4_s1.c: Regenerated. * generated/maxloc0_4_s4.c: Regenerated. * generated/maxloc0_8_i1.c: Regenerated. * generated/maxloc0_8_i16.c: Regenerated. * generated/maxloc0_8_i2.c: Regenerated. * generated/maxloc0_8_i4.c: Regenerated. * generated/maxloc0_8_i8.c: Regenerated. * generated/maxloc0_8_r10.c: Regenerated. * generated/maxloc0_8_r16.c: Regenerated. * generated/maxloc0_8_r4.c: Regenerated. * generated/maxloc0_8_r8.c: Regenerated. * generated/maxloc0_8_s1.c: Regenerated. * generated/maxloc0_8_s4.c: Regenerated. * generated/maxloc1_16_i1.c: Regenerated. * generated/maxloc1_16_i16.c: Regenerated. * generated/maxloc1_16_i2.c: Regenerated. * generated/maxloc1_16_i4.c: Regenerated. * generated/maxloc1_16_i8.c: Regenerated. * generated/maxloc1_16_r10.c: Regenerated. * generated/maxloc1_16_r16.c: Regenerated. * generated/maxloc1_16_r4.c: Regenerated. * generated/maxloc1_16_r8.c: Regenerated. * generated/maxloc1_16_s1.c: Regenerated. * generated/maxloc1_16_s4.c: Regenerated. * generated/maxloc1_4_i1.c: Regenerated. * generated/maxloc1_4_i16.c: Regenerated. * generated/maxloc1_4_i2.c: Regenerated. * generated/maxloc1_4_i4.c: Regenerated. * generated/maxloc1_4_i8.c: Regenerated. * generated/maxloc1_4_r10.c: Regenerated. * generated/maxloc1_4_r16.c: Regenerated. * generated/maxloc1_4_r4.c: Regenerated. * generated/maxloc1_4_r8.c: Regenerated. * generated/maxloc1_4_s1.c: Regenerated. * generated/maxloc1_4_s4.c: Regenerated. * generated/maxloc1_8_i1.c: Regenerated. * generated/maxloc1_8_i16.c: Regenerated. * generated/maxloc1_8_i2.c: Regenerated. * generated/maxloc1_8_i4.c: Regenerated. * generated/maxloc1_8_i8.c: Regenerated. * generated/maxloc1_8_r10.c: Regenerated. * generated/maxloc1_8_r16.c: Regenerated. * generated/maxloc1_8_r4.c: Regenerated. * generated/maxloc1_8_r8.c: Regenerated. * generated/maxloc1_8_s1.c: Regenerated. * generated/maxloc1_8_s4.c: Regenerated. * generated/maxloc2_16_s1.c: Regenerated. * generated/maxloc2_16_s4.c: Regenerated. * generated/maxloc2_4_s1.c: Regenerated. * generated/maxloc2_4_s4.c: Regenerated. * generated/maxloc2_8_s1.c: Regenerated. * generated/maxloc2_8_s4.c: Regenerated. * generated/maxval_i1.c: Regenerated. * generated/maxval_i16.c: Regenerated. * generated/maxval_i2.c: Regenerated. * generated/maxval_i4.c: Regenerated. * generated/maxval_i8.c: Regenerated. * generated/maxval_r10.c: Regenerated. * generated/maxval_r16.c: Regenerated. * generated/maxval_r4.c: Regenerated. * generated/maxval_r8.c: Regenerated. * generated/minloc0_16_i1.c: Regenerated. * generated/minloc0_16_i16.c: Regenerated. * generated/minloc0_16_i2.c: Regenerated. * generated/minloc0_16_i4.c: Regenerated. * generated/minloc0_16_i8.c: Regenerated. * generated/minloc0_16_r10.c: Regenerated. * generated/minloc0_16_r16.c: Regenerated. * generated/minloc0_16_r4.c: Regenerated. * generated/minloc0_16_r8.c: Regenerated. * generated/minloc0_16_s1.c: Regenerated. * generated/minloc0_16_s4.c: Regenerated. * generated/minloc0_4_i1.c: Regenerated. * generated/minloc0_4_i16.c: Regenerated. * generated/minloc0_4_i2.c: Regenerated. * generated/minloc0_4_i4.c: Regenerated. * generated/minloc0_4_i8.c: Regenerated. * generated/minloc0_4_r10.c: Regenerated. * generated/minloc0_4_r16.c: Regenerated. * generated/minloc0_4_r4.c: Regenerated. * generated/minloc0_4_r8.c: Regenerated. * generated/minloc0_4_s1.c: Regenerated. * generated/minloc0_4_s4.c: Regenerated. * generated/minloc0_8_i1.c: Regenerated. * generated/minloc0_8_i16.c: Regenerated. * generated/minloc0_8_i2.c: Regenerated. * generated/minloc0_8_i4.c: Regenerated. * generated/minloc0_8_i8.c: Regenerated. * generated/minloc0_8_r10.c: Regenerated. * generated/minloc0_8_r16.c: Regenerated. * generated/minloc0_8_r4.c: Regenerated. * generated/minloc0_8_r8.c: Regenerated. * generated/minloc0_8_s1.c: Regenerated. * generated/minloc0_8_s4.c: Regenerated. * generated/minloc1_16_i1.c: Regenerated. * generated/minloc1_16_i16.c: Regenerated. * generated/minloc1_16_i2.c: Regenerated. * generated/minloc1_16_i4.c: Regenerated. * generated/minloc1_16_i8.c: Regenerated. * generated/minloc1_16_r10.c: Regenerated. * generated/minloc1_16_r16.c: Regenerated. * generated/minloc1_16_r4.c: Regenerated. * generated/minloc1_16_r8.c: Regenerated. * generated/minloc1_16_s1.c: Regenerated. * generated/minloc1_16_s4.c: Regenerated. * generated/minloc1_4_i1.c: Regenerated. * generated/minloc1_4_i16.c: Regenerated. * generated/minloc1_4_i2.c: Regenerated. * generated/minloc1_4_i4.c: Regenerated. * generated/minloc1_4_i8.c: Regenerated. * generated/minloc1_4_r10.c: Regenerated. * generated/minloc1_4_r16.c: Regenerated. * generated/minloc1_4_r4.c: Regenerated. * generated/minloc1_4_r8.c: Regenerated. * generated/minloc1_4_s1.c: Regenerated. * generated/minloc1_4_s4.c: Regenerated. * generated/minloc1_8_i1.c: Regenerated. * generated/minloc1_8_i16.c: Regenerated. * generated/minloc1_8_i2.c: Regenerated. * generated/minloc1_8_i4.c: Regenerated. * generated/minloc1_8_i8.c: Regenerated. * generated/minloc1_8_r10.c: Regenerated. * generated/minloc1_8_r16.c: Regenerated. * generated/minloc1_8_r4.c: Regenerated. * generated/minloc1_8_r8.c: Regenerated. * generated/minloc1_8_s1.c: Regenerated. * generated/minloc1_8_s4.c: Regenerated. * generated/minloc2_16_s1.c: Regenerated. * generated/minloc2_16_s4.c: Regenerated. * generated/minloc2_4_s1.c: Regenerated. * generated/minloc2_4_s4.c: Regenerated. * generated/minloc2_8_s1.c: Regenerated. * generated/minloc2_8_s4.c: Regenerated. * generated/minval_i1.c: Regenerated. * generated/minval_i16.c: Regenerated. * generated/minval_i2.c: Regenerated. * generated/minval_i4.c: Regenerated. * generated/minval_i8.c: Regenerated. * generated/minval_r10.c: Regenerated. * generated/minval_r16.c: Regenerated. * generated/minval_r4.c: Regenerated. * generated/minval_r8.c: Regenerated. * generated/norm2_r10.c: Regenerated. * generated/norm2_r16.c: Regenerated. * generated/norm2_r4.c: Regenerated. * generated/norm2_r8.c: Regenerated. * generated/parity_l1.c: Regenerated. * generated/parity_l16.c: Regenerated. * generated/parity_l2.c: Regenerated. * generated/parity_l4.c: Regenerated. * generated/parity_l8.c: Regenerated. * generated/product_c10.c: Regenerated. * generated/product_c16.c: Regenerated. * generated/product_c4.c: Regenerated. * generated/product_c8.c: Regenerated. * generated/product_i1.c: Regenerated. * generated/product_i16.c: Regenerated. * generated/product_i2.c: Regenerated. * generated/product_i4.c: Regenerated. * generated/product_i8.c: Regenerated. * generated/product_r10.c: Regenerated. * generated/product_r16.c: Regenerated. * generated/product_r4.c: Regenerated. * generated/product_r8.c: Regenerated. * generated/sum_c10.c: Regenerated. * generated/sum_c16.c: Regenerated. * generated/sum_c4.c: Regenerated. * generated/sum_c8.c: Regenerated. * generated/sum_i1.c: Regenerated. * generated/sum_i16.c: Regenerated. * generated/sum_i2.c: Regenerated. * generated/sum_i4.c: Regenerated. * generated/sum_i8.c: Regenerated. * generated/sum_r10.c: Regenerated. * generated/sum_r16.c: Regenerated. * generated/sum_r4.c: Regenerated. * generated/sum_r8.c: Regenerated. 2018-01-15 Thomas Koenig <tkoe...@gcc.gnu.org> PR fortran/54613 * gfortran.dg/minmaxloc_12.f90: New test case. * gfortran.dg/minmaxloc_13.f90: New test case.
pi-9a.diff.gz
Description: application/gzip
! { dg-do run } ! Test compile-time simplification of minloc and maxloc with BACK argument program main integer, parameter :: i1(*) = [ 1,2,3,1,2,3]; integer, parameter :: d1 = minloc(i1,dim=1,back=.true.) integer, parameter :: d2 = minloc(i1,dim=1,back=.false.) integer, parameter :: d3 = maxloc(i1,dim=1,back=.true.) integer, parameter :: d4 = maxloc(i1,dim=1,back=.false.) integer, parameter :: i2(4,4) = reshape([1,2,1,2,2,3,3,2,3,4,4,3,4,5,5,4], & [4,4]); integer, parameter :: d5(2) = minloc(i2,back=.true.) integer, parameter :: d6(2) = maxloc(i2,back=.true.) integer, parameter :: d7(4) = minloc(i2,dim=1,back=.true.) integer, parameter :: d25(4) = minloc(i2,dim=2,mask=i2<2,back=.true.) integer, parameter :: d26(4) = maxloc(i2,dim=1,mask=i2<3,back=.true.) integer, parameter :: i3(4,4) = transpose(i2) integer, parameter :: d8(4) = minloc(i3,dim=2,back=.true.) integer, parameter :: i4(4,4) = reshape([1,2,1,2,2,1,2,1,1,2,1,2,2,1,2,1],& ([4,4])) integer, parameter :: d9(4) = minloc(i4,dim=1,mask=i4>1,back=.true.) integer, parameter :: d10(4) = maxloc(i4,dim=1,mask=i4>1,back=.true.) character(len=2), parameter :: c0(9) = ["aa", "bb", "aa", & "cc", "bb", "cc", "aa", "bb", "aa"] character(len=2), parameter :: c1 (3,3) = reshape(c0, [3,3]); integer, parameter :: d11(2) = minloc(c1,back=.true.) integer, parameter :: d12(2) = maxloc(c1,back=.true.) integer, parameter :: d13(2) = minloc(c1,mask=c1>"aa",back=.true.) integer, parameter :: d14(2) = maxloc(c1,mask=c1<"cc",back=.true.) integer, parameter :: d15(3) = minloc(c1,dim=1,back=.true.) integer, parameter :: d16(3) = maxloc(c1,dim=1,back=.true.) integer, parameter :: d17(3) = minloc(c1,dim=2,back=.true.) integer, parameter :: d18(3) = maxloc(c1,dim=2,back=.true.) integer, parameter :: d19 = minloc(c0,dim=1,back=.true.) integer, parameter :: d20 = maxloc(c0,dim=1,back=.true.) integer, parameter :: d21 = minloc(c0,dim=1,mask=c0>"aa",back=.true.) integer, parameter :: d22 = maxloc(c0,dim=1,mask=c0<"cc",back=.true.) integer, parameter :: d23(3) = minloc(c1,dim=2,mask=c1>"aa",back=.true.) integer, parameter :: d24(3) = maxloc(c1,dim=2,mask=c1<"cc",back=.true.) if (d1 /= 4) call abort if (d2 /= 1) call abort if (d3 /= 6) call abort if (d4 /= 3) call abort if (any (d5 /= [3,1])) call abort if (any (d6 /= [3,4])) call abort if (any (d7 /= [3,4,4,4])) call abort if (any (d8 /= d7)) call abort if (any (d9 /= [4,3,4,3])) call abort if (any (d10 /= d9)) call abort if (any(d11 /= [3,3])) call abort if (any(d12 /= [3,2])) call abort if (any(d13 /= [2,3])) call abort if (any(d14 /= [2,3])) call abort if (any(d15 /= [3,2,3])) call abort if (any(d16 /= [2,3,2])) call abort if (any(d17 /= [3,3,3])) call abort if (any(d18 /= [2,3,2])) call abort if (d19 /= 9) call abort if (d20 /= 6) call abort if (d21 /= 8 .or. d22 /= 8) call abort if (any(d23 /= [2,3,2])) call abort if (any(d24 /= 3)) call abort if (any(d25 /= [1,0,1,0])) call abort if (any(d26 /= [4,4,0,0])) call abort end program main
! { dg-do run } ! Test run-time of MINLOC and MAXLOC with BACK program main implicit none integer:: i1(6) integer:: d1 integer:: d2 integer:: d3 integer:: d4 integer:: i2(4,4) integer:: d5(2) integer:: d6(2) integer:: d7(4) integer:: d25(4) integer:: d26(4) integer:: i3(4,4) integer:: d8(4) integer:: i4(4,4) integer:: d9(4) integer:: d10(4) character(len=2) :: c0(9) character(len=2) :: c1(3,3) integer:: d11(2) integer:: d12(2) integer:: d13(2) integer:: d14(2) integer:: d15(3) integer:: d16(3) integer:: d17(3) integer:: d18(3) integer:: d19 integer:: d20 integer:: d21 integer:: d22 integer:: d23(3) integer:: d24(3) i1 = [ 1,2,3,1,2,3]; d1 = minloc(i1,dim=1,back=.true.) d2 = minloc(i1,dim=1,back=.false.) d3 = maxloc(i1,dim=1,back=.true.) d4 = maxloc(i1,dim=1,back=.false.) i2 = reshape([1,2,1,2,2,3,3,2,3,4,4,3,4,5,5,4], & [4,4]); d5 = minloc(i2,back=.true.) d6 = maxloc(i2,back=.true.) d7= minloc(i2,dim=1,back=.true.) d25 = minloc(i2,dim=2,mask=i2<2,back=.true.) d26 = maxloc(i2,dim=1,mask=i2<3,back=.true.) i3 = transpose(i2) d8 = minloc(i3,dim=2,back=.true.) i4 = reshape([1,2,1,2,2,1,2,1,1,2,1,2,2,1,2,1],& ([4,4])) d9 = minloc(i4,dim=1,mask=i4>1,back=.true.) d10 = maxloc(i4,dim=1,mask=i4>1,back=.true.) c0 = ["aa", "bb", "aa", & "cc", "bb", "cc", "aa", "bb", "aa"] c1 = reshape(c0, [3,3]); d11 = minloc(c1,back=.true.) d12 = maxloc(c1,back=.true.) d13 = minloc(c1,mask=c1>"aa",back=.true.) d14 = maxloc(c1,mask=c1<"cc",back=.true.) d15 = minloc(c1,dim=1,back=.true.) d16 = maxloc(c1,dim=1,back=.true.) d17 = minloc(c1,dim=2,back=.true.) d18 = maxloc(c1,dim=2,back=.true.) d19 = minloc(c0,dim=1,back=.true.) d20 = maxloc(c0,dim=1,back=.true.) d21 = minloc(c0,dim=1,mask=c0>"aa",back=.true.) d22 = maxloc(c0,dim=1,mask=c0<"cc",back=.true.) d23 = minloc(c1,dim=2,mask=c1>"aa",back=.true.) d24 = maxloc(c1,dim=2,mask=c1<"cc",back=.true.) if (d1 /= 4) call abort if (d2 /= 1) call abort if (d3 /= 6) call abort if (d4 /= 3) call abort if (any (d5 /= [3,1])) call abort if (any (d6 /= [3,4])) call abort if (any (d7 /= [3,4,4,4])) call abort if (any (d8 /= d7)) call abort if (any (d9 /= [4,3,4,3])) call abort if (any (d10 /= d9)) call abort if (any(d11 /= [3,3])) call abort if (any(d12 /= [3,2])) call abort if (any(d13 /= [2,3])) call abort if (any(d14 /= [2,3])) call abort if (any(d15 /= [3,2,3])) call abort if (any(d16 /= [2,3,2])) call abort if (any(d17 /= [3,3,3])) call abort if (any(d18 /= [2,3,2])) call abort if (d19 /= 9) call abort if (d20 /= 6) call abort if (d21 /= 8 .or. d22 /= 8) call abort if (any(d23 /= [2,3,2])) call abort if (any(d24 /= 3)) call abort if (any(d25 /= [1,0,1,0])) call abort if (any(d26 /= [4,4,0,0])) call abort end program