Hi Segher, on 2019/11/20 上午1:29, Segher Boessenkool wrote: > Hi! > > On Tue, Nov 12, 2019 at 06:41:07PM +0800, Kewen.Lin wrote: >> +;; code iterators and attributes for vector FP comparison operators: >> +(define_code_iterator vector_fp_comparison_operator [lt le ne >> + ungt unge unlt unle]) > > Let's indent that differently? > > (define_code_iterator > vector_fp_comparison_operator [lt le ne ungt unge unlt unle]) > > is nice I think. >
Done. >> +; There are 14 possible vector FP comparison operators, gt and eq of them >> have >> +; been expanded above, so just support 12 remaining operators here. >> >> +; 0. For ge: > > (Don't number comments please). > Done. >> +(define_expand "vector_ge<mode>" >> + [(set (match_operand:VEC_F 0 "vlogical_operand") >> + (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand") >> + (match_operand:VEC_F 2 "vlogical_operand")))] >> "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" >> + "") > > This already exists? Line 764 in vector.md? > Yes, that one is only for VEC_F (aka. vector FP), move to here to centralize them. Does it make sense? >> +; 1. For lt/le/ne/ungt/unge/unlt/unle: >> +; lt(a,b) = gt(b,a) >> +; le(a,b) = ge(b,a) >> +; unge(a,b) = ~lt(a,b) >> +; unle(a,b) = ~gt(a,b) >> +; ne(a,b) = ~eq(a,b) >> +; ungt(a,b) = ~le(a,b) >> +; unlt(a,b) = ~ge(a,b) >> +(define_insn_and_split "vector_<code><mode>" >> [(set (match_operand:VEC_F 0 "vfloat_operand") >> + (vector_fp_comparison_operator:VEC_F >> + (match_operand:VEC_F 1 "vfloat_operand") >> + (match_operand:VEC_F 2 "vfloat_operand")))] >> "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" >> "#" >> + "can_create_pseudo_p ()" >> + [(pc)] >> { >> + enum rtx_code cond = <CODE>; >> + rtx res = gen_reg_rtx (<MODE>mode); >> + bool need_invert = false; >> + switch (cond) >> + { >> + case LT: >> + case LE: >> + cond = swap_condition (cond); >> + std::swap (operands[1], operands[2]); >> + break; >> + case UNLE: >> + case UNLT: >> + case NE: >> + case UNGE: >> + case UNGT: >> + cond = reverse_condition_maybe_unordered (cond); >> + need_invert = true; >> + break; >> + default: >> + gcc_unreachable (); >> + } >> + >> + rtx comp = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]); >> + emit_insn (gen_rtx_SET (res, comp)); >> + >> + if (need_invert) >> + emit_insn (gen_one_cmpl<mode>2 (res, res)); >> + >> + emit_insn (gen_rtx_SET (operands[0], res)); >> }) > > Does this work for (say) ungt? I would do two switch statements: the > first only to do the invert, and then the second to do the swap (with > the modified cond!) So: > Yes, it works but it has to call further splitting for "le". It looks better to split to the end at once. Thanks a lot for your example! > { > enum rtx_code cond = <CODE>; > bool need_invert = false; > > if (cond == UNLE || cond == UNLT || cond == NE > || cond == UNGE || cond == UNGT) > { > cond = reverse_condition_maybe_unordered (cond); > need_invert = true; > } > > if (cond == LT || cond == LE) > { > cond = swap_condition (cond); > std::swap (operands[1], operands[2]); > } > I added one assert here to guard the codes. > rtx comp = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]); > > if (need_invert) > { > rtx res = gen_reg_rtx (<MODE>mode); > emit_insn (gen_rtx_SET (res, comp)); > emit_insn (gen_one_cmpl<mode>2 (operands[0], res)); > } > else > emit_insn (gen_rtx_SET (operands[0], comp)); > }) > >> +; 2. For ltgt/uneq/ordered/unordered: >> +; ltgt: gt(a,b) | gt(b,a) >> +; uneq: ~gt(a,b) & ~gt(b,a) >> +; ordered: ge(a,b) | ge(b,a) >> +; unordered: ~ge(a,b) & ~ge(b,a) > > You could write that as > ~(ge(a,b) | ge(b,a)) > which may show the structure better? > Done. >> +(define_insn_and_split "vector_<code><mode>" >> [(set (match_operand:VEC_F 0 "vfloat_operand") >> + (vector_fp_extra_comparison_operator:VEC_F >> + (match_operand:VEC_F 1 "vfloat_operand") >> + (match_operand:VEC_F 2 "vfloat_operand")))] >> "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" >> "#" >> + "can_create_pseudo_p ()" > > This should be "&& can_create_pseudo ()". > Done. > Also, can_create_pseudo in the split condition can fail, in theory > anyway. It should be part of the insn condition, instead, and even > then it can fail: after the last split pass, but before reload. such > an insn can in principle be created, and then it sill be never split. > > In this case, maybe you should add a scratch register; in the previous > case, you can reuse operands[0] for res instead of making a new reg > for it, if !can_create_pseudo (). > I've updated the previous case with operands[0] with !can_create_pseudo check. But for the later one I met ICE if I added two clobbers with scratch into the RTL pattern like: [(set (match_operand:VEC_F 0 "vfloat_operand") (vector_fp_extra_comparison_operator:VEC_F (match_operand:VEC_F 1 "vfloat_operand") (match_operand:VEC_F 2 "vfloat_operand"))) (clobber (match_scratch:VEC_F 3)) (clobber (match_scratch:VEC_F 4))] Some pattern without clobber as below can't be recognized (early in vregs). (insn 24 23 25 4 (set (reg:V4SF 142) (uneq:V4SF (reg:V4SF 141 [ vect__4.11 ]) (reg:V4SF 127 [ vect_cst__47 ]))) -1 (nil)) I may miss something, but I can't find a way to satisfy both, to recog the simplified pattern and the pattern with clobber. So I kept it as existing codes for now to only handle can_create_pseudo (the old define_insn_and_split even doesn't check can_create_pseudo, but it should since it tries to gen_reg_rtx?). Looking forward to your further suggestion! >> { >> + enum rtx_code cond = <CODE>; >> + rtx res1 = gen_reg_rtx (<MODE>mode); >> + rtx res2 = gen_reg_rtx (<MODE>mode); >> + bool need_invert = false; >> + switch (cond) >> + { >> + case UNEQ: >> + need_invert = true; >> + /* Fall through. */ >> + case LTGT: >> + cond = GT; >> + break; >> + case UNORDERED: >> + need_invert = true; >> + /* Fall through. */ >> + case ORDERED: >> + cond = GE; >> + break; >> + default: >> + gcc_unreachable (); >> + } >> + >> + rtx comp1 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]); >> + emit_insn (gen_rtx_SET (res1, comp1)); >> + rtx comp2 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[2], operands[1]); >> + emit_insn (gen_rtx_SET (res2, comp2)); >> + >> + emit_insn (gen_ior<mode>3 (res1, res1, res2)); >> + >> + if (need_invert) >> + emit_insn (gen_one_cmpl<mode>2 (res1, res1)); >> + >> + emit_insn (gen_rtx_SET (operands[0], res1)); >> }) > > You can do this similarly: > > { > enum rtx_code cond = <CODE>; > bool need_invert = false; > > if (cond == UNORDERED || cond == UNEQ) > { > cond = reverse_condition_maybe_unordered (cond); > need_invert = true; > } > > switch (cond) > { > case LTGT: > cond = GT; > break; > case ORDERED: > cond = GE; > break; > default: > gcc_unreachable (); > } > > rtx comp1 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]); > rtx res1 = gen_reg_rtx (<MODE>mode); > emit_insn (gen_rtx_SET (res1, comp1)); > rtx comp2 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[2], operands[1]); > rtx res2 = gen_reg_rtx (<MODE>mode); > emit_insn (gen_rtx_SET (res2, comp2)); > > if (need_invert) > { > rtx comp3 = gen_rtx_fmt_ee (IOR, <MODE>mode, res1, res2); > rtx comp4 = gen_rtx_fmt_e (NOT, <MODE>mode, comp3); I had to add one more gen_rtx_SET for the IOR here, otherwise it gets the error on the pattern can't be recognized. > emit_insn (gen_rtx_SET (operands[0], comp4)); > } > else > emit_insn (gen_ior<mode>3 (operands[0], res1, res2)); > }) > > (the result of a split does not get combine etc. optimisations anymore, > so you have to generate pretty much ideal code directly). > > HTH, sorry for the late reply, Yeah, it helps a lot, thanks again! I have attached the updated version to get your further review. BR, Kewen
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index b132037..deeab9f 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -107,6 +107,12 @@ (smin "smin") (smax "smax")]) +;; code iterators and attributes for vector FP comparison operators: +(define_code_iterator + vector_fp_comparison_operator [lt le ne ungt unge unlt unle]) +(define_code_iterator + vector_fp_extra_comparison_operator [ltgt uneq unordered ordered]) + ;; Vector move instructions. Little-endian VSX loads and stores require ;; special handling to circumvent "element endianness." @@ -665,88 +671,6 @@ DONE; }) -; lt(a,b) = gt(b,a) -(define_expand "vector_lt<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (lt:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_gt<mode> (operands[0], operands[2], operands[1])); - DONE; -}) - -; le(a,b) = ge(b,a) -(define_expand "vector_le<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (le:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_ge<mode> (operands[0], operands[2], operands[1])); - DONE; -}) - -; ne(a,b) = ~eq(a,b) -(define_expand "vector_ne<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (ne:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_eq<mode> (operands[0], operands[1], operands[2])); - emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0])); - DONE; -}) - -; unge(a,b) = ~gt(b,a) -(define_expand "vector_unge<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (unge:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_gt<mode> (operands[0], operands[2], operands[1])); - emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0])); - DONE; -}) - -; ungt(a,b) = ~ge(b,a) -(define_expand "vector_ungt<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (ungt:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_ge<mode> (operands[0], operands[2], operands[1])); - emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0])); - DONE; -}) - -; unle(a,b) = ~gt(a,b) -(define_expand "vector_unle<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (unle:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_gt<mode> (operands[0], operands[1], operands[2])); - emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0])); - DONE; -}) - -; unlt(a,b) = ~ge(a,b) -(define_expand "vector_unlt<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (unlt:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" -{ - emit_insn (gen_vector_ge<mode> (operands[0], operands[1], operands[2])); - emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[0])); - DONE; -}) - (define_expand "vector_eq<mode>" [(set (match_operand:VEC_C 0 "vlogical_operand") (eq:VEC_C (match_operand:VEC_C 1 "vlogical_operand") @@ -761,13 +685,6 @@ "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" "") -(define_expand "vector_ge<mode>" - [(set (match_operand:VEC_F 0 "vlogical_operand") - (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand") - (match_operand:VEC_F 2 "vlogical_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" - "") - ; >= for integer vectors: swap operands and apply not-greater-than (define_expand "vector_nlt<mode>" [(set (match_operand:VEC_I 3 "vlogical_operand") @@ -829,88 +746,118 @@ operands[3] = gen_reg_rtx_and_attrs (operands[0]); }) -(define_insn_and_split "vector_uneq<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (uneq:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" - "#" - "" - [(set (match_dup 3) - (gt:VEC_F (match_dup 1) - (match_dup 2))) - (set (match_dup 4) - (gt:VEC_F (match_dup 2) - (match_dup 1))) - (set (match_dup 0) - (and:VEC_F (not:VEC_F (match_dup 3)) - (not:VEC_F (match_dup 4))))] -{ - operands[3] = gen_reg_rtx (<MODE>mode); - operands[4] = gen_reg_rtx (<MODE>mode); -}) +; There are 14 possible vector FP comparison operators, gt and eq of them have +; been expanded above, so just support 12 remaining operators here. -(define_insn_and_split "vector_ltgt<mode>" - [(set (match_operand:VEC_F 0 "vfloat_operand") - (ltgt:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] +; For ge: +(define_expand "vector_ge<mode>" + [(set (match_operand:VEC_F 0 "vlogical_operand") + (ge:VEC_F (match_operand:VEC_F 1 "vlogical_operand") + (match_operand:VEC_F 2 "vlogical_operand")))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" - "#" - "" - [(set (match_dup 3) - (gt:VEC_F (match_dup 1) - (match_dup 2))) - (set (match_dup 4) - (gt:VEC_F (match_dup 2) - (match_dup 1))) - (set (match_dup 0) - (ior:VEC_F (match_dup 3) - (match_dup 4)))] -{ - operands[3] = gen_reg_rtx (<MODE>mode); - operands[4] = gen_reg_rtx (<MODE>mode); -}) + "") -(define_insn_and_split "vector_ordered<mode>" +; For lt/le/ne/ungt/unge/unlt/unle: +; lt(a,b) = gt(b,a) +; le(a,b) = ge(b,a) +; unge(a,b) = ~lt(a,b) +; unle(a,b) = ~gt(a,b) +; ne(a,b) = ~eq(a,b) +; ungt(a,b) = ~le(a,b) +; unlt(a,b) = ~ge(a,b) +(define_insn_and_split "vector_<code><mode>" [(set (match_operand:VEC_F 0 "vfloat_operand") - (ordered:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] + (vector_fp_comparison_operator:VEC_F + (match_operand:VEC_F 1 "vfloat_operand") + (match_operand:VEC_F 2 "vfloat_operand")))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" "#" - "" - [(set (match_dup 3) - (ge:VEC_F (match_dup 1) - (match_dup 2))) - (set (match_dup 4) - (ge:VEC_F (match_dup 2) - (match_dup 1))) - (set (match_dup 0) - (ior:VEC_F (match_dup 3) - (match_dup 4)))] + "&& 1" + [(pc)] { - operands[3] = gen_reg_rtx (<MODE>mode); - operands[4] = gen_reg_rtx (<MODE>mode); + enum rtx_code cond = <CODE>; + bool need_invert = false; + + if (cond == UNLE || cond == UNLT || cond == NE || cond == UNGE + || cond == UNGT) + { + cond = reverse_condition_maybe_unordered (cond); + need_invert = true; + } + + if (cond == LT || cond == LE) + { + cond = swap_condition (cond); + std::swap (operands[1], operands[2]); + } + + gcc_assert (cond == EQ || cond == GE || cond == GT); + + rtx comp = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]); + + if (need_invert) + { + rtx res + = (!can_create_pseudo_p () ? operands[0] : gen_reg_rtx (<MODE>mode)); + emit_insn (gen_rtx_SET (res, comp)); + emit_insn (gen_one_cmpl<mode>2 (operands[0], res)); + } + else + emit_insn (gen_rtx_SET (operands[0], comp)); }) -(define_insn_and_split "vector_unordered<mode>" +; For ltgt/uneq/ordered/unordered: +; ltgt: gt(a,b) | gt(b,a) +; uneq: ~(gt(a,b) | gt(b,a)) +; ordered: ge(a,b) | ge(b,a) +; unordered: ~(ge(a,b) | ge(b,a)) +(define_insn_and_split "vector_<code><mode>" [(set (match_operand:VEC_F 0 "vfloat_operand") - (unordered:VEC_F (match_operand:VEC_F 1 "vfloat_operand") - (match_operand:VEC_F 2 "vfloat_operand")))] - "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" + (vector_fp_extra_comparison_operator:VEC_F + (match_operand:VEC_F 1 "vfloat_operand") + (match_operand:VEC_F 2 "vfloat_operand")))] + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode) && can_create_pseudo_p ()" "#" - "" - [(set (match_dup 3) - (ge:VEC_F (match_dup 1) - (match_dup 2))) - (set (match_dup 4) - (ge:VEC_F (match_dup 2) - (match_dup 1))) - (set (match_dup 0) - (and:VEC_F (not:VEC_F (match_dup 3)) - (not:VEC_F (match_dup 4))))] + "&& 1" + [(pc)] { - operands[3] = gen_reg_rtx (<MODE>mode); - operands[4] = gen_reg_rtx (<MODE>mode); + enum rtx_code cond = <CODE>; + bool need_invert = false; + + if (cond == UNORDERED || cond == UNEQ) + { + cond = reverse_condition_maybe_unordered (cond); + need_invert = true; + } + + switch (cond) + { + case LTGT: + cond = GT; + break; + case ORDERED: + cond = GE; + break; + default: + gcc_unreachable (); + } + + rtx comp1 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[1], operands[2]); + rtx res1 = gen_reg_rtx (<MODE>mode); + emit_insn (gen_rtx_SET (res1, comp1)); + rtx comp2 = gen_rtx_fmt_ee (cond, <MODE>mode, operands[2], operands[1]); + rtx res2 = gen_reg_rtx (<MODE>mode); + emit_insn (gen_rtx_SET (res2, comp2)); + + if (need_invert) + { + rtx comp3 = gen_rtx_fmt_ee (IOR, <MODE>mode, res1, res2); + emit_insn (gen_rtx_SET (operands[0], comp3)); + rtx comp4 = gen_rtx_fmt_e (NOT, <MODE>mode, operands[0]); + emit_insn (gen_rtx_SET (operands[0], comp4)); + } + else + emit_insn (gen_ior<mode>3 (operands[0], res1, res2)); }) ;; Note the arguments for __builtin_altivec_vsel are op2, op1, mask