On 16 August 2011 15:20, Ramana Radhakrishnan <ramana.radhakrish...@linaro.org> wrote: > Hi, > > While looking at a failure with regrename and > mvectorize-with-neon-quad I noticed that the early-clobber in this > vec_pack_trunc pattern is superfluous given that we can use > reg_overlap_mentioned_p to decide in which order we want to emit these > 2 instructions. While it works around the problem in regrename.c I > still think that the behaviour in regrename is a bit suspicious and > needs some more investigation. >
RichardS finally fixed the problem in data-flow and hence we should be able to turn on vectorize_with_quad anyway. Here's the patch which I thought I should have committed as a workaround but I think it's better to split this further in the case where the 2 registers are equal because otherwise you are pointlessly creating a stall in the Neon pipe for the vmovn result to arrive. Hence I'm not committing this patch. Tests finished OK btw for this patch. cheers Ramana index 24dd941..2c60c5f 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -5631,14 +5631,29 @@ ; the semantics of the instructions require. (define_insn "vec_pack_trunc_<mode>" - [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w") + [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=w") (vec_concat:<V_narrow_pack> (truncate:<V_narrow> (match_operand:VN 1 "register_operand" "w")) (truncate:<V_narrow> (match_operand:VN 2 "register_operand" "w"))))] "TARGET_NEON && !BYTES_BIG_ENDIAN" - "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2" + { + /* If operand1 and operand2 are identical, then the second + narrowing operation isn't needed as the values obtained + in both parts of the destination q register are identical. + This precludes the need for an early clobber in the destination + operand. */ + if (rtx_equal_p (operands[1], operands[2])) + return "vmovn.i<V_sz_elem>\\t%e0, %q1\;vmov.i<V_sz_elem>\\t%f0, %e0"; + else + { + if (reg_overlap_mentioned_p (operands[0], operands[2])) + return "vmovn.i<V_sz_elem>\\t%f0, %q2\;vmovn.i<V_sz_elem>\\t%e0, %q1"; + else + return "vmovn.i<V_sz_elem>\\t%e0, %q1\;vmovn.i<V_sz_elem>\\t%f0, %q2"; + } + } [(set_attr "neon_type" "neon_shift_1") (set_attr "length" "8")] )