https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91018

            Bug ID: 91018
           Summary: std::??clusive_scan vectorization
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

In the following testcase with -O2 -fopenmp-simd -std=c++17 only foo and bar is
vectorized:

#include <execution>
#include <numeric>

auto
foo (std::vector<int> &ca, std::vector<int> &co)
{
  return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin());
}

auto
bar (std::vector<int> &ca, std::vector<int> &co)
{
  return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(), 0);
}

auto
baz (std::vector<int> &ca, std::vector<int> &co)
{
  return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(),
                             std::multiplies<int>{}, 1);
}

auto
qux (std::vector<int> &ca, std::vector<int> &co)
{
  return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(), 1,
                             std::multiplies<int>{});
}

auto
corge (std::vector<int> &ca, std::vector<int> &co)
{
  return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(),
                             [](int x, int y){ return x + y; });
}

auto
grault (std::vector<int> &ca, std::vector<int> &co)
{
  return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(), 0,
                             [](int x, int y){ return x + y; });
}

Any deep reason why __simd_scan isn't called from __brick_transform_scan when
_BinaryOperation is not std::plus?

It seems the PSTL header has some code for it, non-std::plus variants of
__simd_scan which do:
    typedef _Combiner<_Tp, _BinaryOperation> _CombinerType;
    _CombinerType __init_{__init, &__binary_op};

    _PSTL_PRAGMA_DECLARE_REDUCTION(__bin_op, _CombinerType)

    _PSTL_PRAGMA_SIMD_SCAN(__bin_op : __init_)
    for (_Size __i = 0; __i < __n; ++__i)
    {
        __result[__i] = __init_.__value;
        _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(__init_)
        _PSTL_PRAGMA_FORCEINLINE
        __init_.__value = __binary_op(__init_.__value,
__unary_op(__first[__i]));
    }
    return std::make_pair(__result + __n, __init_.__value);
but I (with my limited C++-fu) can't figure out if that one is ever invoked and
for what.
Note, I don't think we actually can ATM vectorize that, but wanted to file a PR
about if SRA could help with the case, but as it is even never tried, I'll need
to write that by hand.

Reply via email to