On Thu, Oct 20, 2011 at 11:42:01AM +0200, Richard Guenther wrote: > > + if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR > > + && is_pattern_stmt_p (stmt_info)) > > + scalar_dest = TREE_OPERAND (scalar_dest, 0); > > if (TREE_CODE (scalar_dest) != ARRAY_REF > > && TREE_CODE (scalar_dest) != INDIRECT_REF > > && TREE_CODE (scalar_dest) != COMPONENT_REF > > Just change the if () stmt to > > if (!handled_component_p (scalar_dest) > && TREE_CODE (scalar_dest) != MEM_REF) > return false;
That will accept BIT_FIELD_REF and ARRAY_RANGE_REF (as well as VCE outside of pattern stmts). The VCEs I hope don't appear, but the first two might, and I'm not sure we are prepared to handle them. Certainly not BIT_FIELD_REFs. > > + rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, > > stmts); > > + if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF) > > + { > > + lhs = copy_node (lhs); > > We don't handle TARGET_MEM_REF in vectorizable_store, so no need to > do it here. In fact, just unconditionally do ... > > > + TREE_TYPE (lhs) = TREE_TYPE (vectype); > > + } > > + else > > + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs); > > ... this (wrap it in a V_C_E). No need to special-case any > MEM_REFs. Ok. After all it seems vectorizable_store pretty much ignores it (except for the scalar_dest check above). For aliasing it uses the type from DR_REF and otherwise it uses the vectorized type. > > + if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) > > This should never be false, so you can as well unconditionally build > the conversion stmt. You mean because currently adjust_bool_pattern will prefer signed types over unsigned while here lhs will be unsigned? I guess I should change it to use signed type for the memory store too to avoid the extra cast instead. Both types can be certainly the same precision, e.g. for: unsigned char a[N], b[N]; unsigned int d[N], e[N]; bool c[N]; ... for (i = 0; i < N; ++i) c[i] = a[i] < b[i]; or different precision, e.g. for: for (i = 0; i < N; ++i) c[i] = d[i] < e[i]; > > @@ -347,6 +347,28 @@ vect_determine_vectorization_factor (loo > > gcc_assert (STMT_VINFO_DATA_REF (stmt_info) > > || is_pattern_stmt_p (stmt_info)); > > vectype = STMT_VINFO_VECTYPE (stmt_info); > > + if (STMT_VINFO_DATA_REF (stmt_info)) > > + { > > + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > > + tree scalar_type = TREE_TYPE (DR_REF (dr)); > > + /* vect_analyze_data_refs will allow bool writes through, > > + in order to allow vect_recog_bool_pattern to transform > > + those. If they couldn't be transformed, give up now. */ > > + if (((TYPE_PRECISION (scalar_type) == 1 > > + && TYPE_UNSIGNED (scalar_type)) > > + || TREE_CODE (scalar_type) == BOOLEAN_TYPE) > > Shouldn't it be always possible to vectorize those? For loads > we can assume the memory contains only 1 or 0 (we assume that for > scalar loads), for stores we can mask out all other bits explicitly > if you add support for truncating conversions to non-mode precision > (in fact, we could support non-mode precision vectorization that way, > if not support bitfield loads or extending conversions). Not without the pattern recognizer transforming it into something. That is something we've discussed on IRC before I started working on the first vect_recog_bool_pattern patch, we'd need to special case bool and one-bit precision types in way too many places all around the vectorizer. Another reason for that was that what vect_recog_bool_pattern does currently is certainly way faster than what would we end up with if we just handled bool as unsigned (or signed?) char with masking on casts and stores - the ability to use any integer type for the bools rather than char as appropriate means we can avoid many VEC_PACK_TRUNK_EXPRs and corresponding VEC_UNPACK_{LO,HI}_EXPRs. So the chosen solution was attempt to transform some of bool patterns into something the vectorizer can handle easily. And that can be extended over time what it handles. The above just reflects it, probably just me trying to be too cautious, the vectorization would likely fail on the stmt feeding the store, because get_vectype_for_scalar_type would fail on it. If we wanted to support general TYPE_PRECISION != GET_MODE_BITSIZE (TYPE_MODE) vectorization (hopefully with still preserving the pattern bool recognizer for the above stated reasons), we'd start with changing get_vectype_for_scalar_type to handle those types (then the tree-vect-data-refs.c and tree-vect-loop.c changes from this patch would be unnecessary), but then we'd need to handle it in other places too (I guess loads would be fine (unless BIT_FIELD_REF loads), but then casts and stores need extra code). Jakub