On Tue, Jan 26, 2021 at 1:25 AM Joern Wolfgang Rennecke
<joern.renne...@riscy-ip.com> wrote:
>
> optabs.c:expand_unop_direct can expand a popcount builtin without a call
> under certain conditions even without a popcount pattern of the required
> data width:
>
>   if (unoptab == popcount_optab
>       && is_a <scalar_int_mode> (mode, &int_mode)
>       && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD
>       && optab_handler (unoptab, word_mode) != CODE_FOR_nothing
>       && optimize_insn_for_speed_p ())
>     {
>       temp = expand_doubleword_popcount (int_mode, op0, target);
>       if (temp)
>         return temp;
>     }
>
>
> However, the match.pd recognition of popcount arithmetic using & / + is
> tied to having an exactly matching operation.  This causes a failure for
> gcc.dg/tree-ssa/popcount4l.c for 16-bit targets that have a 16 bit
> popcount operation (and no wider).
> Likewise, not recognizing a 64 bit popcount for a 32 bit target with
> 32 bit popcount could be rectified by synthesizing the wide popcount
> operations with two narrower popcount operations.
> The attached patch implements this.

Few comments.

+     (with { tree half_type = (prec <= BITS_PER_WORD || (prec & 1) ? NULL_TREE
+                              : lang_hooks.types.type_for_size (prec/2, 1));
+          gcc_assert (prec > 2 || half_type == NULL_TREE);
+      }
+      (if (half_type != NULL_TREE

type_for_size can return a type with > prec/2 precision, I suppose that's OK
here.  In the end we're probably looking for the "next" narrower mode
with at least half the number of bits as type and support for popcount, not
sure how likely it is to have say a 24bit PSImode popcount only so that
we'd still fail to recognize a 32bit popcount since we only will try
16bit halves.
That said, I don't like including langhooks here very much and I'd prefer
sth like (given there's no GET_MODE_NARROWER)

    FOR_EACH_MODE_IN_CLASS (m, MODE_INT)
       if (m == TYPE_MODE (type))
         break;
       else if (known_ge (GET_MODE_PRECISION (m), prec/2))
          {
             half_type = build_nonstandard_integer_type
(GET_MODE_PRECISION (m), 1);
             if (direct_internal_fn_supported_p (IFN_POPCOUNT, half_type,

OPTIMIZE_FOR_SPEED))
               break;
            half_type = NULL_TREE;
^^^ IMHO we should be conservative with -Os and use OPTIMIZE_FOR_SPEED when
we need two popcount ops?

          }
    }
  (if (half_type)
    (....

+        (IFN_POPCOUNT:half_type (convert (rshift @0
+           { wide_int_to_tree (half_type, prec/2); } )))))))))))

please use build_int_cst (integer_type_node, prec/2) for the shift amount.

Otherwise this looks reasonable but since it doesn't fix a regression it has to
wait for stage1 now.

Thanks,
Richard.

Reply via email to