Issue 129779
Summary [flang] surprising performance loss with nested type operator overloading
Labels flang
Assignees
Reporter ivan-pi
    I've attempted to create a performance benchmark which sums an array of numbers, but in different ways to measure the overhead of operator overloading for simple value types:

[abstraction_penalty.F90.txt](https://github.com/user-attachments/files/19077916/abstraction_penalty.F90.txt)

When I run the program, I see the output:

```
$ flang-new -O2 abstraction_penalty.F90 
$ ./a.out
[info] compiler: Homebrew flang version 19.1.4 (https://github.com/Homebrew/homebrew-core/issues)
[info] compiler options: flang-new -O2 abstraction_penalty.F90
[info] using naive sum
[info] number of iterations: 25000

        test    absolute additions  ratio with
      number  time (sec)  per second       test0

 0      0.0532   9.400E+02       1.000
           1      0.0498 1.003E+03       0.937
           2      0.0493   1.015E+03       0.926
 3      0.0526   9.511E+02       0.988
           4      0.0595 8.410E+02       1.118
           5      0.0515   9.700E+02       0.969
 6      0.0486   1.029E+03       0.913
           7      0.0485 1.031E+03       0.912
           8      0.0490   1.020E+03       0.922
 9      0.0472   1.059E+03       0.888
          10      0.0483 1.036E+03       0.907
          11      0.0485   1.031E+03       0.912
 12      0.0479   1.044E+03       0.901
          13      0.0481 1.039E+03       0.905
          14      6.7735   7.382E+00     127.336
 15      6.7167   7.444E+00     126.267
          16      0.0467 1.071E+03       0.878
          17      0.0452   1.105E+03       0.850
 18      0.0451   1.108E+03       0.849
          19      0.0452 1.105E+03       0.850
          20      0.0476   1.050E+03       0.895
 21      0.0469   1.066E+03       0.882
          22      0.0467 1.071E+03       0.877
          23      0.0461   1.086E+03       0.866
 24      0.0454   1.101E+03       0.853
          25      0.0452 1.105E+03       0.851
          26      0.0456   1.097E+03       0.857
 27      0.0454   1.102E+03       0.853
          28      6.6540 7.514E+00     125.089
          29      6.5274   7.660E+00 122.709
------------------------------------------------
        mean 0.0928   5.386E+02        1.75
```

The slow cases (14, 15, 28, 29) are calling the procedure `test_ddd`, which calls `dsum` for the `type(ddd)`, which is really just a double value but defined in a obscure way:

```fortran
    integer, parameter :: dp = c_double

    ! Double wrapper
    type :: dd
        real(dp) :: val
    end type

    ! Double wrapper child with TBP
    type, extends(dd) :: ddi
    contains
 procedure :: get => get_ddi_val
    end type

    ! Double wrapper wrapper
    type :: ddd
        type(dd) :: val
    end type
```

The sum procedure looks as follows:
```fortran
    pure function ddd_sum(a) result(res)
        type(ddd), intent(in) :: a(:)
        type(ddd) :: res
        real(dp), pointer :: t(:)
#if USE_INTRINSIC_SUM
 res%val%val = sum(a%val%val)
#else
        integer :: i
        res = ddd(dd(0.0_dp))
        do i = 1, size(a)
            res = res + a(i)
 end do
#endif
    end function
``` 
where the `+` is the overloaded `operator(+)` defined as,

```fortran
    pure function ddd_add(a,b) result(c)
        type(ddd), intent(in) :: a, b
        type(ddd) :: c
 c%val%val = a%val%val + b%val%val
    end function
```

If the intrinsic sum (`-DUSE_INTRINSIC_SUM`) is used instead, there are no observable penalties. There are other switches too, namely `-DUSE_INTRINSIC_REDUCE` which displays good performance, and `-DUSE_STRUCTURE_CONSTRUCTOR` which makes the performance even worse (300x slower than the baseline). 
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to