On Wed, Sep 9, 2020 at 5:51 PM Anton Youdkevitch
<anton.youdkevi...@bell-sw.com> wrote:
>
> ThunderxT2 chip has an odd property that nested scalar FP min and max are
> slower than logically the same sequence of compares and branches.

Always for any input data?

> Here is the patch where I'm trying to implement that transformation.
> Please advise if the "combine" pass (actually after the pass itself) is the
> appropriate place to do this.
>
> I was considering the possibility to implement this in aarch64.md
> (which would be much cleaner) but didn't manage to figure out how
> to make fmin/fmax survive until later passes and replace them only
> then.

+             || !SCALAR_FLOAT_MODE_P (GET_MODE (SET_SRC (PATTERN (insn)))))
+           continue;
...
+         if (code1 != SMIN && code1 != UMIN &&
+             code1 != SMAX && code1 != UMAX)
+           continue;

you shouldn't see U{MIN,MAX} for float data.

May I suggest to instead to this in a peephole2 or in another late
machine-specific pass?

Are nested vector FP min/max fast?

Richard.


>
> --
>   Thanks,
>   Anton

Reply via email to