https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91018
Bug ID: 91018
Summary: std::??clusive_scan vectorization
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: jakub at gcc dot gnu.org
Target Milestone: ---
In the following testcase with -O2 -fopenmp-simd -std=c++17 only foo and bar is
vectorized:
#include <execution>
#include <numeric>
auto
foo (std::vector<int> &ca, std::vector<int> &co)
{
return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin());
}
auto
bar (std::vector<int> &ca, std::vector<int> &co)
{
return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(), 0);
}
auto
baz (std::vector<int> &ca, std::vector<int> &co)
{
return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(),
std::multiplies<int>{}, 1);
}
auto
qux (std::vector<int> &ca, std::vector<int> &co)
{
return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(), 1,
std::multiplies<int>{});
}
auto
corge (std::vector<int> &ca, std::vector<int> &co)
{
return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(),
[](int x, int y){ return x + y; });
}
auto
grault (std::vector<int> &ca, std::vector<int> &co)
{
return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(),
co.begin(), 0,
[](int x, int y){ return x + y; });
}
Any deep reason why __simd_scan isn't called from __brick_transform_scan when
_BinaryOperation is not std::plus?
It seems the PSTL header has some code for it, non-std::plus variants of
__simd_scan which do:
typedef _Combiner<_Tp, _BinaryOperation> _CombinerType;
_CombinerType __init_{__init, &__binary_op};
_PSTL_PRAGMA_DECLARE_REDUCTION(__bin_op, _CombinerType)
_PSTL_PRAGMA_SIMD_SCAN(__bin_op : __init_)
for (_Size __i = 0; __i < __n; ++__i)
{
__result[__i] = __init_.__value;
_PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(__init_)
_PSTL_PRAGMA_FORCEINLINE
__init_.__value = __binary_op(__init_.__value,
__unary_op(__first[__i]));
}
return std::make_pair(__result + __n, __init_.__value);
but I (with my limited C++-fu) can't figure out if that one is ever invoked and
for what.
Note, I don't think we actually can ATM vectorize that, but wanted to file a PR
about if SRA could help with the case, but as it is even never tried, I'll need
to write that by hand.