On Wed, 6 Dec 2023, Andre Vieira (lists) wrote: > Hi, > > This patch addresses the issue reported in PR target/112787 by improving the > compute type selection. We do this by not considering types with more > elements > than the type we are lowering since we'd reject such types anyway. > > gcc/ChangeLog: > > PR target/112787 > * tree-vect-generic (type_for_widest_vector_mode): Add a parameter to > control maximum amount of elements in resulting vector mode. > (get_compute_type): Restrict vector_compute_type to a mode no wider > than the original compute type. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/pr112787.c: New test. > > Bootstrapped and regression tested on aarch64-unknown-linux-gnu and > x86_64-pc-linux-gnu. > > Is this OK for trunk?
@@ -1347,7 +1347,7 @@ optimize_vector_constructor (gimple_stmt_iterator *gsi) TYPE, or NULL_TREE if none is found. */ Can you improve the function comment? It also doesn't mention OP ... static tree -type_for_widest_vector_mode (tree type, optab op) +type_for_widest_vector_mode (tree type, optab op, poly_int64 max_nunits = 0) { machine_mode inner_mode = TYPE_MODE (type); machine_mode best_mode = VOIDmode, mode; @@ -1371,7 +1371,9 @@ type_for_widest_vector_mode (tree type, optab op) FOR_EACH_MODE_FROM (mode, mode) if (GET_MODE_INNER (mode) == inner_mode && maybe_gt (GET_MODE_NUNITS (mode), best_nunits) - && optab_handler (op, mode) != CODE_FOR_nothing) + && optab_handler (op, mode) != CODE_FOR_nothing + && (known_eq (max_nunits, 0) + || known_lt (GET_MODE_NUNITS (mode), max_nunits))) max_nunits suggests that known_le would be appropriate instead. I see the only other caller with similar "problems": } /* Can't use get_compute_type here, as supportable_convert_operation doesn't necessarily use an optab and needs two arguments. */ tree vec_compute_type = type_for_widest_vector_mode (TREE_TYPE (arg_type), mov_optab); if (vec_compute_type && VECTOR_MODE_P (TYPE_MODE (vec_compute_type)) && subparts_gt (arg_type, vec_compute_type)) so please do not default to 0 but adjust this one as well. It also seems you then can remove the subparts_gt guards on both vec_compute_type uses. I think the API would be cleaner if we'd pass the original vector type we can then extract TYPE_VECTOR_SUBPARTS from, avoiding the extra arg. No? Thanks, Richard.