On Tue, Jan 26, 2021 at 1:25 AM Joern Wolfgang Rennecke <joern.renne...@riscy-ip.com> wrote: > > optabs.c:expand_unop_direct can expand a popcount builtin without a call > under certain conditions even without a popcount pattern of the required > data width: > > if (unoptab == popcount_optab > && is_a <scalar_int_mode> (mode, &int_mode) > && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD > && optab_handler (unoptab, word_mode) != CODE_FOR_nothing > && optimize_insn_for_speed_p ()) > { > temp = expand_doubleword_popcount (int_mode, op0, target); > if (temp) > return temp; > } > > > However, the match.pd recognition of popcount arithmetic using & / + is > tied to having an exactly matching operation. This causes a failure for > gcc.dg/tree-ssa/popcount4l.c for 16-bit targets that have a 16 bit > popcount operation (and no wider). > Likewise, not recognizing a 64 bit popcount for a 32 bit target with > 32 bit popcount could be rectified by synthesizing the wide popcount > operations with two narrower popcount operations. > The attached patch implements this.
Few comments. + (with { tree half_type = (prec <= BITS_PER_WORD || (prec & 1) ? NULL_TREE + : lang_hooks.types.type_for_size (prec/2, 1)); + gcc_assert (prec > 2 || half_type == NULL_TREE); + } + (if (half_type != NULL_TREE type_for_size can return a type with > prec/2 precision, I suppose that's OK here. In the end we're probably looking for the "next" narrower mode with at least half the number of bits as type and support for popcount, not sure how likely it is to have say a 24bit PSImode popcount only so that we'd still fail to recognize a 32bit popcount since we only will try 16bit halves. That said, I don't like including langhooks here very much and I'd prefer sth like (given there's no GET_MODE_NARROWER) FOR_EACH_MODE_IN_CLASS (m, MODE_INT) if (m == TYPE_MODE (type)) break; else if (known_ge (GET_MODE_PRECISION (m), prec/2)) { half_type = build_nonstandard_integer_type (GET_MODE_PRECISION (m), 1); if (direct_internal_fn_supported_p (IFN_POPCOUNT, half_type, OPTIMIZE_FOR_SPEED)) break; half_type = NULL_TREE; ^^^ IMHO we should be conservative with -Os and use OPTIMIZE_FOR_SPEED when we need two popcount ops? } } (if (half_type) (.... + (IFN_POPCOUNT:half_type (convert (rshift @0 + { wide_int_to_tree (half_type, prec/2); } ))))))))))) please use build_int_cst (integer_type_node, prec/2) for the shift amount. Otherwise this looks reasonable but since it doesn't fix a regression it has to wait for stage1 now. Thanks, Richard.