jaykang10 added a comment. > Yes. This would make sense. I am guessing that in vec3->vec4, we will have 3 > loads and 4 stores and in vec4->vec3 we will have 4 loads and 3 stores?
It depends on implementation. If you scalarize all vector operations on LLVM IR level before entering llvm's codegen, the vec3->vec4 would generate 3 loads and 4 stores and the vec4->vec3 would generate 4 loads and 3 stores. I guess your implementation follows this situation. The AMDGPU target keeps the vector form on LLVM IR level and handles it with legalization of SelectionDAG on llvm's codegen. On this situation, vec3->vec4 generates 4 loads and 4 stores because type legalizer widen vec3 load to vec4 load because the alignment is 16. vec4->vec3 generates 4 loads and 3 stores with type legalization. The output could be different according to llvm's backend implementation. > Although, I am guessing the load/store optimizations would come from your > current change? I don't see anything related to it in the VisitAsTypeExpr > implementation. But I think it might be a good idea to add this to the test > to make sure the IR output is as we expect. We should implement the IRs with option. I will update patch with it. https://reviews.llvm.org/D30810 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits