Richard Biener <richard.guent...@gmail.com> writes: > On Wed, Jun 21, 2023 at 11:32 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard Sandiford <richard.sandif...@arm.com> writes: >> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> On Fri, Jun 2, 2023 at 3:01 AM liuhongt via Gcc-patches >> >> <gcc-patches@gcc.gnu.org> wrote: >> >>> >> >>> We have already use intermidate type in case WIDEN, but not for NONE, >> >>> this patch extended that. >> >>> >> >>> I didn't do that in pattern recog since we need to know whether the >> >>> stmt belongs to any slp_node to decide the vectype, the related optabs >> >>> are checked according to vectype_in and vectype_out. For non-slp case, >> >>> vec_pack/unpack are always used when lhs has different size from rhs, >> >>> for slp case, sometimes vec_pack/unpack is used, somethings >> >>> direct conversion is used. >> >>> >> >>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> >>> Ok for trunk? >> >>> >> >>> gcc/ChangeLog: >> >>> >> >>> PR target/110018 >> >>> * tree-vect-stmts.cc (vectorizable_conversion): Use >> >>> intermiediate integer type for float_expr/fix_trunc_expr when >> >>> direct optab is not existed. >> >>> >> >>> gcc/testsuite/ChangeLog: >> >>> >> >>> * gcc.target/i386/pr110018-1.c: New test. >> >>> --- >> >>> gcc/testsuite/gcc.target/i386/pr110018-1.c | 94 ++++++++++++++++++++++ >> >>> gcc/tree-vect-stmts.cc | 56 ++++++++++++- >> >>> 2 files changed, 149 insertions(+), 1 deletion(-) >> >>> create mode 100644 gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> >> >>> diff --git a/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> new file mode 100644 >> >>> index 00000000000..b1baffd7af1 >> >>> --- /dev/null >> >>> +++ b/gcc/testsuite/gcc.target/i386/pr110018-1.c >> >>> @@ -0,0 +1,94 @@ >> >>> +/* { dg-do compile } */ >> >>> +/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mavx512dq" } */ >> >>> +/* { dg-final { scan-assembler-times {(?n)vcvttp[dsh]2[dqw]} 5 } } */ >> >>> +/* { dg-final { scan-assembler-times {(?n)vcvt[dqw]*2p[dsh]} 5 } } */ >> >>> + >> >>> +void >> >>> +foo (double* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo1 (float* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo2 (_Float16* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> + a[4] = b[4]; >> >>> + a[5] = b[5]; >> >>> + a[6] = b[6]; >> >>> + a[7] = b[7]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo3 (double* __restrict a, short* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo4 (float* __restrict a, char* b) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo5 (double* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo6 (float* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo7 (_Float16* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> + a[4] = b[4]; >> >>> + a[5] = b[5]; >> >>> + a[6] = b[6]; >> >>> + a[7] = b[7]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo8 (double* __restrict b, short* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> +} >> >>> + >> >>> +void >> >>> +foo9 (float* __restrict b, char* a) >> >>> +{ >> >>> + a[0] = b[0]; >> >>> + a[1] = b[1]; >> >>> + a[2] = b[2]; >> >>> + a[3] = b[3]; >> >>> +} >> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> >>> index bd3b07a3aa1..1118c89686d 100644 >> >>> --- a/gcc/tree-vect-stmts.cc >> >>> +++ b/gcc/tree-vect-stmts.cc >> >>> @@ -5162,6 +5162,49 @@ vectorizable_conversion (vec_info *vinfo, >> >>> return false; >> >>> if (supportable_convert_operation (code, vectype_out, vectype_in, >> >>> &code1)) >> >>> break; >> >> >> >> A comment would be nice here. Like >> >> >> >> /* For conversions between float and smaller integer types try whether >> >> we can >> >> use intermediate signed integer types to support the conversion. */ >> >> >> >>> + if ((code == FLOAT_EXPR >> >>> + && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode)) >> >>> + || (code == FIX_TRUNC_EXPR >> >>> + && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode))) >> > >> > Is the FIX_TRUNC_EXPR case safe without some flag? >> > >> > #include <stdint.h> >> > int32_t x = (int32_t)0x1.0p32; >> > int32_t y = (int32_t)(int64_t)0x1.0p32; >> > >> > sets x to 2147483647 and y to 0. > > Hmm, good question. GENERIC has a direct truncation to unsigned char > for example, the C standard generally says if the integral part cannot > be represented then the behavior is undefined. So I think we should be > safe here (0x1.0p32 doesn't fit an int).
OK. >> Also, I think multi_step_cvt should influence the costs, since at >> the moment we cost one statement but generate two. This makes a >> difference for SVE with VECT_COMPARE_COSTS. Would changing it to: >> >> vect_model_simple_cost (vinfo, stmt_info, >> ncopies * (multi_step_cvt + 1), >> dt, ndts, slp_node, >> cost_vec); >> >> be OK? > > Yeah, I guess so. Thanks, will send a patch. >> There again, I wonder if we should handle this using patterns instead. >> That makes both conversions explicit and therefore easier to cost. > > But we don't do this for the other multi-step conversions either ... Yeah, agree it's not a new problem. > but sure, I also suggested that but the complaint was that with > BB SLP this would get us a vector type with off lanes? Ah, sorry, missed the previous discussion. I only became interested once some SVE tests started failing :) >> E.g. for SVE, an integer extension is free if the source is a load, >> and we do try to model that. But it's difficult to handle if the >> conversion is only implicit. > > In the distant future I hope that vectorizable_conversion will > upon analysis produce an SLP sub-graph for the suggested > code generation which we'd then cost. Sounds good. Agree we can live with the compound operation until then. > My current idea is > to get this part live before switching the analysis to all-SLP > because it "seems" easier (fingers crossing). Still I'm > struggling to even find time to start that effort. Yeah, can sympthaise :/ Don't know how you find time for all the reviews you currently do. Thanks, Richard