https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91018
Bug ID: 91018 Summary: std::??clusive_scan vectorization Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- In the following testcase with -O2 -fopenmp-simd -std=c++17 only foo and bar is vectorized: #include <execution> #include <numeric> auto foo (std::vector<int> &ca, std::vector<int> &co) { return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(), co.begin()); } auto bar (std::vector<int> &ca, std::vector<int> &co) { return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(), co.begin(), 0); } auto baz (std::vector<int> &ca, std::vector<int> &co) { return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(), co.begin(), std::multiplies<int>{}, 1); } auto qux (std::vector<int> &ca, std::vector<int> &co) { return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(), co.begin(), 1, std::multiplies<int>{}); } auto corge (std::vector<int> &ca, std::vector<int> &co) { return std::inclusive_scan(std::execution::unseq, ca.begin(), ca.end(), co.begin(), [](int x, int y){ return x + y; }); } auto grault (std::vector<int> &ca, std::vector<int> &co) { return std::exclusive_scan(std::execution::unseq, ca.begin(), ca.end(), co.begin(), 0, [](int x, int y){ return x + y; }); } Any deep reason why __simd_scan isn't called from __brick_transform_scan when _BinaryOperation is not std::plus? It seems the PSTL header has some code for it, non-std::plus variants of __simd_scan which do: typedef _Combiner<_Tp, _BinaryOperation> _CombinerType; _CombinerType __init_{__init, &__binary_op}; _PSTL_PRAGMA_DECLARE_REDUCTION(__bin_op, _CombinerType) _PSTL_PRAGMA_SIMD_SCAN(__bin_op : __init_) for (_Size __i = 0; __i < __n; ++__i) { __result[__i] = __init_.__value; _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(__init_) _PSTL_PRAGMA_FORCEINLINE __init_.__value = __binary_op(__init_.__value, __unary_op(__first[__i])); } return std::make_pair(__result + __n, __init_.__value); but I (with my limited C++-fu) can't figure out if that one is ever invoked and for what. Note, I don't think we actually can ATM vectorize that, but wanted to file a PR about if SRA could help with the case, but as it is even never tried, I'll need to write that by hand.