Anastasia added a comment.

In https://reviews.llvm.org/D30810#710069, @jaykang10 wrote:

> > I believe the argument lacks numbers (or at least you have them but didn't 
> > mention). I didn't hear about performance results, or validation that this 
> > was actually tested for correctness. Small test cases prove a point, but 
> > can't be considered general.
> > 
> > OTOH, it seems like this is exactly why you want the flag, to hide any 
> > potential issues and play with it. I'm not opposed to adding the flag if 
> > there's a commitment to actually get the result and change the default to 
> > whatever seems to be better, does that seems reasonable?
>
> To be honest, I did not start this discussion to get better performance for 
> GPU. But the goal has moved while we discussing it. I think I have showed you 
> the gain from AMD GPU target on previous comments. On current version of 
> clang/llvm, it generates one less load/store to preserve vec3 type instead of 
> vec4 type on GPU target. But it could cause more load/store on another 
> targets which have vector register because they usually expect aligned vector 
> load/store. It means that if we want to change default  to vec3 on clang, we 
> need to change llvm's codegen's default legalization or backend targets need 
> to be modified in order to get aligned vector load/store with vec3 type. I 
> think It is too big change at this moment. In the future, we could move this 
> discussion to llvm-dev. Up to that time,  as you mentioned, I would like to 
> hide the potential issues with vec3 type and play with it.


Just to summarize. There seems to be no issue in adding this feature by a 
special flag considering that changing default behaviors requires changing the 
backend that we might consider at some later point of time.  We can start a 
separate discussion on that.

The change can be used to generate more generic and optimal IR resulting in 1/4 
less instructions generated in the final binary per each load and store of vec2 
types in some architectures (e.g. AMD) . I am not sure what kind of further 
numbers are needed and if there is any measuring infrastructure LLVM project 
provides for that.

Considering all the above I don't feel there is any reason this can't be 
committed.


https://reviews.llvm.org/D30810



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to