On 7/12/22 07:27, Ilya Leoshkevich wrote:
+/* + * vfmin/vfmax code generation. + */ +extern const char vfminmax_template[]; +extern const int vfminmax_template_size; +extern const int vfminmax_offset; +asm(".globl vfminmax_template\n" + "vfminmax_template:\n" + "vl %v25,0(%r3)\n" + "vl %v26,0(%r4)\n" + "0: vfmax %v24,%v25,%v26,2,0,0\n" + "vst %v24,0(%r2)\n" + "br %r14\n" + "1: .align 4\n" + ".globl vfminmax_template_size\n" + "vfminmax_template_size: .long 1b - vfminmax_template\n" + ".globl vfminmax_offset\n" + "vfminmax_offset: .long 0b - vfminmax_template\n");
...
+ +#define VFMIN 0xEE +#define VFMAX 0xEF + +static void vfminmax(unsigned char *buf, unsigned int op, + unsigned int m4, unsigned int m5, unsigned int m6, + void *v1, const void *v2, const void *v3) +{ + memcpy(buf, vfminmax_template, vfminmax_template_size); + buf[vfminmax_offset + 3] = (m6 << 4) | m5; + buf[vfminmax_offset + 4] &= 0x0F; + buf[vfminmax_offset + 4] |= (m4 << 4); + buf[vfminmax_offset + 5] = op; + ((void (*)(void *, const void *, const void *))buf)(v1, v2, v3); +}
This works, of course. It could be simpler using EXECUTE, to store just the one instruction and not worry about an executable mapped page, but I guess it doesn't matter.
Reviewed-by: Richard Henderson <richard.hender...@linaro.org> r~