qiucf added a comment. In D116015#3326148 <https://reviews.llvm.org/D116015#3326148>, @shchenz wrote:
>> hiding the semantics from the optimizer is sometimes a good thing and >> sometimes a bad thing). > > Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed > with another neg and fma, so we can totally eliminate the fnmsub. But after > we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub. > >> Here's a pretty simple case: vector float foo(vector float a, vector float >> b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, >> a*b); } >> It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces >> xvmulsp+xvnmsubasp. In some complicated cases, we can see much more >> unexpected instructions generated. > > This is narrowed down from a real-world case. After CSE some part of the > fnmsub, it is hard to optimize it back to a single hardware fnmsub > instruction as normally we check the use number of a register and if the user > number is not 1, we may exit the combine. > > Is it possible to get some perf data for some float workloads with this > patch? @qiucf Thanks. I did not see performance change in some common benchmarks. ================ Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:1737 + [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>], + [IntrNoMem]>; def int_ppc_fre ---------------- shchenz wrote: > When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics > with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and > `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not > delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL > compatibility and XL has seperated fnmsub for float and double and we need to > map them 1 by 1. Better to check if it is possible to replace > `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be > replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new > intrinsic. We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D116015/new/ https://reviews.llvm.org/D116015 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits