jaykang10 added a comment.

> Yes. This would make sense. I am guessing that in vec3->vec4, we will have 3 
> loads and 4 stores and in vec4->vec3 we will have 4 loads and 3 stores?

It depends on implementation. If you scalarize all vector operations on LLVM IR 
level before entering llvm's codegen, the vec3->vec4  would generate 3 loads 
and 4 stores and the vec4->vec3 would generate 4 loads and 3 stores. I guess 
your implementation follows this situation. The AMDGPU target keeps the vector 
form on LLVM IR level and handles it with legalization of SelectionDAG on 
llvm's codegen. On this situation, vec3->vec4 generates 4 loads and 4 stores 
because type legalizer widen vec3 load to vec4 load because the alignment is 
16. vec4->vec3 generates 4 loads and 3 stores with type legalization. The 
output could be different according to llvm's backend implementation.

> Although, I am guessing the load/store optimizations would come from your 
> current change? I don't see anything related to it in the VisitAsTypeExpr 
> implementation. But I think it might be a good idea to add this to the test 
> to make sure the IR output is as we expect.

We should implement the IRs with option. I will update patch with it.


https://reviews.llvm.org/D30810



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to