qiucf added a comment.

In D116015#3326148 <https://reviews.llvm.org/D116015#3326148>, @shchenz wrote:

>> hiding the semantics from the optimizer is sometimes a good thing and 
>> sometimes a bad thing).
>
> Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed 
> with another neg and fma, so we can totally eliminate the fnmsub. But after 
> we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.
>
>> Here's a pretty simple case: vector float foo(vector float a, vector float 
>> b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, 
>> a*b); }
>> It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces 
>> xvmulsp+xvnmsubasp. In some complicated cases, we can see much more 
>> unexpected instructions generated.
>
> This is narrowed down from a real-world case. After CSE some part of the 
> fnmsub, it is hard to optimize it back to a single hardware fnmsub 
> instruction as normally we check the use number of a register and if the user 
> number is not 1, we may exit the combine.
>
> Is it possible to get some perf data for some float workloads with this 
> patch? @qiucf

Thanks. I did not see performance change in some common benchmarks.



================
Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:1737
+                  [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
+                  [IntrNoMem]>;
   def int_ppc_fre
----------------
shchenz wrote:
> When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics 
> with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and 
> `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not 
> delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL 
> compatibility and XL has seperated fnmsub for float and double and we need to 
> map them 1 by 1. Better to check if it is possible to replace 
> `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be 
> replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new 
> intrinsic.
We can do that, but that requires more work and seems beyond this patch's 
scope. See D105930, we'll need to handle the builtin in Clang. And the builtin 
explicitly generates type-M VSX instructions (I guess to reduce copy in simple 
cases).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116015/new/

https://reviews.llvm.org/D116015

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to