Hi everyone,

I have bootstrapped and regtested the patch below on s390.  For the
64-bit target I do not see any changes regarding the testsuite.  For the
31-bit target I see the following failures:

FAIL: gcc.dg/vect/no-scevccp-outer-14.c (internal compiler error: in require, 
at machmode.h:313)
FAIL: gcc.dg/vect/no-scevccp-outer-14.c (test for excess errors)
FAIL: gcc.dg/vect/pr50451.c (internal compiler error: in require, at 
machmode.h:313)
FAIL: gcc.dg/vect/pr50451.c (test for excess errors)
FAIL: gcc.dg/vect/pr50451.c -flto -ffat-lto-objects (internal compiler error: 
in require, at machmode.h:313)
FAIL: gcc.dg/vect/pr50451.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr53773.c (internal compiler error: in require, at 
machmode.h:313)
FAIL: gcc.dg/vect/pr53773.c (test for excess errors)
FAIL: gcc.dg/vect/pr53773.c -flto -ffat-lto-objects (internal compiler error: 
in require, at machmode.h:313)
FAIL: gcc.dg/vect/pr53773.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr71407.c (internal compiler error: in require, at 
machmode.h:313)
FAIL: gcc.dg/vect/pr71407.c (test for excess errors)
FAIL: gcc.dg/vect/pr71407.c -flto -ffat-lto-objects (internal compiler error: 
in require, at machmode.h:313)
FAIL: gcc.dg/vect/pr71407.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr71416-1.c (internal compiler error: in require, at 
machmode.h:313)
FAIL: gcc.dg/vect/pr71416-1.c (test for excess errors)
FAIL: gcc.dg/vect/pr71416-1.c -flto -ffat-lto-objects (internal compiler error: 
in require, at machmode.h:313)
FAIL: gcc.dg/vect/pr71416-1.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr94443.c (internal compiler error: in require, at 
machmode.h:313)
FAIL: gcc.dg/vect/pr94443.c (test for excess errors)
FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (internal compiler error: 
in require, at machmode.h:313)
FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr97558.c (internal compiler error: in require, at 
machmode.h:313)
FAIL: gcc.dg/vect/pr97558.c (test for excess errors)
FAIL: gcc.dg/vect/pr97558.c -flto -ffat-lto-objects (internal compiler error: 
in require, at machmode.h:313)
FAIL: gcc.dg/vect/pr97558.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-reduc-pattern-3.c -flto -ffat-lto-objects (internal 
compiler error: in require, at machmode.h:313)
FAIL: gcc.dg/vect/vect-reduc-pattern-3.c -flto -ffat-lto-objects (test for 
excess errors)
UNRESOLVED: gcc.dg/vect/no-scevccp-outer-14.c compilation failed to produce 
executable
UNRESOLVED: gcc.dg/vect/pr53773.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized "\\* 10" 2
UNRESOLVED: gcc.dg/vect/pr53773.c scan-tree-dump-times optimized "\\* 10" 2
UNRESOLVED: gcc.dg/vect/pr71416-1.c -flto -ffat-lto-objects compilation failed 
to produce executable
UNRESOLVED: gcc.dg/vect/pr71416-1.c compilation failed to produce executable
UNRESOLVED: gcc.dg/vect/vect-reduc-pattern-3.c -flto -ffat-lto-objects 
compilation failed to produce executable

I've randomely picked pr50451.c and ran gcc against it which results in:

during GIMPLE pass: vect
dump file: pr50451.c.174t.vect
/gcc-verify-workdir/patched/src/gcc/testsuite/gcc.dg/vect/pr50451.c: In 
function ‘foo’:
/gcc-verify-workdir/patched/src/gcc/testsuite/gcc.dg/vect/pr50451.c:5:1: 
internal compiler error: in require, at machmode.h:313
0x1265d21 opt_mode<scalar_int_mode>::require() const
        /gcc-verify-workdir/patched/src/gcc/machmode.h:313
0x1d7e4e9 opt_mode<machine_mode>::require() const
        /gcc-verify-workdir/patched/src/gcc/vec.h:955
0x1d7e4e9 vect_verify_loop_lens
        /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:1471
0x1da29ab vect_analyze_loop_2
        /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:2929
0x1da40c7 vect_analyze_loop_1
        /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:3330
0x1da499d vect_analyze_loop(loop*, vec_info_shared*)
        /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:3484
0x1deed27 try_vectorize_loop_1
        /gcc-verify-workdir/patched/src/gcc/tree-vectorizer.cc:1064
0x1deed27 try_vectorize_loop
        /gcc-verify-workdir/patched/src/gcc/tree-vectorizer.cc:1180
0x1def5c1 execute
        /gcc-verify-workdir/patched/src/gcc/tree-vectorizer.cc:1296
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

I will come back to this tomorrow.

Cheers,
Stefan

On Mon, Aug 14, 2023 at 08:45:21PM +0800, Kewen.Lin via Gcc-patches wrote:
> Hi Juzhe,
> 
> on 2023/8/14 20:08, juzhe.zh...@rivai.ai wrote:
> > Hi, Kewin.
> > 
> > Could you test 'can_vec_extract_var_idx_p' and send V5 patch when you pass 
> > the testing?
> 
> The below diff was bootstrapped and regress-tested on Power10 LE.  Comparing 
> to the
> previous v4, the only changes should be the proposed 
> can_vec_extract_var_idx_p and
> its required new includes as below:
> 
> +#include "memmodel.h"
> +#include "optabs.h"
>  
> Could you have a double check?
> 
> Since I just tested it on Power10, you have the full ownership on the patch, 
> I'd leave
> the v5 posting to you.  Thanks!
> 
> BR,
> Kewen
> -----
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index bc3063c3615..5ae9f69c7eb 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "ssa.h"
>  #include "optabs-tree.h"
> +#include "memmodel.h"
> +#include "optabs.h"
>  #include "diagnostic-core.h"
>  #include "fold-const.h"
>  #include "stor-layout.h"
> @@ -10300,17 +10302,7 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>        /* No transformation required.  */
>        if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>       {
> -       if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> -                                            OPTIMIZE_FOR_SPEED))
> -         {
> -           if (dump_enabled_p ())
> -             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                              "can't operate on partial vectors "
> -                              "because the target doesn't support extract "
> -                              "last reduction.\n");
> -           LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -         }
> -       else if (slp_node)
> +       if (slp_node)
>           {
>             if (dump_enabled_p ())
>               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -10330,9 +10322,26 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>         else
>           {
>             gcc_assert (ncopies == 1 && !slp_node);
> -           vect_record_loop_mask (loop_vinfo,
> -                                  &LOOP_VINFO_MASKS (loop_vinfo),
> -                                  1, vectype, NULL);
> +           if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> +                                               OPTIMIZE_FOR_SPEED))
> +             vect_record_loop_mask (loop_vinfo,
> +                                    &LOOP_VINFO_MASKS (loop_vinfo),
> +                                    1, vectype, NULL);
> +           else if (can_vec_extract_var_idx_p (
> +                      TYPE_MODE (vectype), TYPE_MODE (TREE_TYPE (vectype))))
> +             vect_record_loop_len (loop_vinfo,
> +                                   &LOOP_VINFO_LENS (loop_vinfo),
> +                                   1, vectype, 1);
> +           else
> +             {
> +               if (dump_enabled_p ())
> +                 dump_printf_loc (
> +                   MSG_MISSED_OPTIMIZATION, vect_location,
> +                   "can't operate on partial vectors "
> +                   "because the target doesn't support extract "
> +                   "last reduction.\n");
> +               LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> +             }
>           }
>       }
>        /* ???  Enable for loop costing as well.  */
> @@ -10358,7 +10367,9 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>    gimple *vec_stmt;
>    if (slp_node)
>      {
> -      gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo));
> +      gcc_assert (!loop_vinfo
> +               || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +                   && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)));
> 
>        /* Get the correct slp vectorized stmt.  */
>        vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry];
> @@ -10402,7 +10413,42 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
> 
>        gimple_seq stmts = NULL;
>        tree new_tree;
> -      if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> +      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +     {
> +       /* Emit:
> +
> +            SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> +
> +          where VEC_LHS is the vectorized live-out result and MASK is
> +          the loop mask for the final iteration.  */
> +       gcc_assert (ncopies == 1 && !slp_node);
> +       gimple_seq tem = NULL;
> +       gimple_stmt_iterator gsi = gsi_last (tem);
> +       tree len
> +         = vect_get_loop_len (loop_vinfo, &gsi,
> +                              &LOOP_VINFO_LENS (loop_vinfo),
> +                              1, vectype, 0, 0);
> +
> +       /* BIAS - 1.  */
> +       signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +       tree bias_minus_one
> +         = int_const_binop (MINUS_EXPR,
> +                            build_int_cst (TREE_TYPE (len), biasval),
> +                            build_one_cst (TREE_TYPE (len)));
> +
> +       /* LAST_INDEX = LEN + (BIAS - 1).  */
> +       tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> +                                       len, bias_minus_one);
> +
> +       /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
> +       tree scalar_res
> +         = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> +                         vec_lhs_phi, last_index);
> +
> +       /* Convert the extracted vector element to the scalar type.  */
> +       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> +     }
> +      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>       {
>         /* Emit:

Reply via email to