On Wed, 25 Jun 2014, Jakub Jelinek wrote:
Since we've started folding (a << n) | (a >> (bitsize - n)) etc.
into rotates even vectors, we've regressed code quality on targets
where we do have vector shifts, but don't have vector rotates.
The following patch attempts to fix it, by telling veclower pass
not to actually lower it if we have vector shifts and bitwise or,
and fixing the expansion of vector rotates.
Thanks for handling this. It looks good to me, I am just wondering about a
few unimportant details below.
static void
expand_vector_operations_1 (gimple_stmt_iterator *gsi)
{
gimple stmt = gsi_stmt (*gsi);
- tree lhs, rhs1, rhs2 = NULL, type, compute_type;
+ tree lhs, rhs1, rhs2 = NULL, type, compute_type = NULL_TREE;
enum tree_code code;
- enum machine_mode compute_mode;
optab op = unknown_optab;
enum gimple_rhs_class rhs_class;
tree new_rhs;
@@ -1461,6 +1502,41 @@ expand_vector_operations_1 (gimple_stmt_
&& optab_handler (opv, TYPE_MODE (type)) != CODE_FOR_nothing)
return;
}
+
+ if (code == LROTATE_EXPR || code == RROTATE_EXPR)
+ {
+ compute_type = get_compute_type (code, op, type);
+ if (compute_type == type)
+ return;
+ /* Before splitting vector rotates into scalar rotates,
+ see if we can't use vector shifts and BIT_IOR_EXPR
+ instead. For vector by vector rotates we'd also
+ need to check BIT_AND_EXPR and NEGATE_EXPR, punt there
+ for now, fold doesn't seem to create such rotates anyway. */
+ if (compute_type == TREE_TYPE (type)
+ && !VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
+ {
+ optab oplv, opl, oprv, opr, opo;
+ oplv = optab_for_tree_code (LSHIFT_EXPR, type, optab_vector);
+ oprv = optab_for_tree_code (RSHIFT_EXPR, type, optab_vector);
+ opo = optab_for_tree_code (BIT_IOR_EXPR, type, optab_default);
+ opl = optab_for_tree_code (LSHIFT_EXPR, type, optab_scalar);
+ opr = optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar);
How well are we separating lshiftrt from ashiftrt? Are ROTATE_EXPR always
on an unsigned type so it is fine? Or do we expand one in terms of the
other if it isn't available so it doesn't matter?
+ if (optab_handler (oplv, TYPE_MODE (type)) != CODE_FOR_nothing
+ && optab_handler (opl, TYPE_MODE (type)) == CODE_FOR_nothing
+ && optab_handler (oprv, TYPE_MODE (type)) != CODE_FOR_nothing
+ && optab_handler (opr, TYPE_MODE (type)) == CODE_FOR_nothing)
+ {
+ opl = oplv;
+ opr = oprv;
+ }
Maybe arrange the conditions in order (oplv!= && oprv!= && (opl== ||
opr==)), or separate the replacement of opl and of opv into 2 separate
ifs?
Somehow, it feels like those checks should be somewhere in
get_compute_type so we test both scalar and vector versions for each size,
or we could call get_compute_type for both and pick the best.
+ compute_type = get_compute_type (LSHIFT_EXPR, opl, type);
+ if (compute_type == TREE_TYPE (type)
+ || compute_type != get_compute_type (RSHIFT_EXPR, opr, type)
+ || compute_type != get_compute_type (BIT_IOR_EXPR, opo, type))
+ compute_type = TREE_TYPE (type);
Since we have determined compute_type from ashift (let's assume that's the
one least likely to exist), I would just check that optab is ok with using
this mode for the other 2 ops. Here, if we have shifts in 128 bits and ior
in both 128 and 256 bits, we will fail (I thought that might be the case
in AVX, but apparently not). Plus it is faster ;-)
When we move to have vector rotate recognition in forwprop or match.pd, I
guess we'll have to make it conditional to !PROP_gimple_lvec to be safe.
Does rotate hit PR 56873? (I noticed the -mno-xop and no -mxop test)
--
Marc Glisse