[PATCH] D30810: Preserve vec3 type.

JinGu Kang via Phabricator via cfe-commits Tue, 21 Mar 2017 03:03:38 -0700

jaykang10 added a comment.

In https://reviews.llvm.org/D30810#702637, @jaykang10 wrote:


> In https://reviews.llvm.org/D30810#702614, @Anastasia wrote:
>
> > In https://reviews.llvm.org/D30810#702443, @bruno wrote:
> >
> > > > As a result, I think it would be good for clang to have both of 
> > > > features and I would like to stick to the option "-fpresereve-vec3' to 
> > > > change the behavior easily.
> > >
> > > The motivation doesn't seem solid to me, who else is going to benefit 
> > > from this flag?
> >
> >
> > There are some off the main tree implementation that would benefit. But in 
> > the case of AMD GPU 3 loads/stores will be produced instead of 4. Sounds 
> > like a good optimization to me. As I said in my previous comment I think it 
> > should have been the default behavior from the beginning, but since 
> > different implementation landed first we can integrate this one now with an 
> > additional option.
>
>
> Additionally, Here is assembly output from vec3 with amdgcn target. :)
>
> LLVM IR
>
>   define void @foo(<3 x float>* nocapture %a, <3 x float>* nocapture readonly 
> %b) {
>   entry:
>     %0 = load <3 x float>, <3 x float>* %b, align 16
>     store <3 x float> %0, <3 x float>* %a, align 16
>     ret void
>   }
>
>
> Assembly Output
>
>       .text
>       .section        .AMDGPU.config
>       .long   47176
>       .long   11272256
>       .long   47180
>       .long   132
>       .long   47200
>       .long   0
>       .long   4
>       .long   0
>       .long   8
>       .long   0
>       .text
>       .globl  foo
>       .p2align        8
>       .type   foo,@function
>   foo:                                    ; @foo
>   ; BB#0:                                 ; %entry
>       s_load_dword s2, s[0:1], 0x9
>       s_load_dword s0, s[0:1], 0xa
>       s_mov_b32 s4, SCRATCH_RSRC_DWORD0
>       s_mov_b32 s5, SCRATCH_RSRC_DWORD1
>       s_mov_b32 s6, -1
>       s_mov_b32 s8, s3
>       s_mov_b32 s7, 0xe8f000
>       s_waitcnt lgkmcnt(0)
>       v_mov_b32_e32 v0, s0
>       buffer_load_dword v2, v0, s[4:7], s8 offen
>       buffer_load_dword v3, v0, s[4:7], s8 offen offset:8
>       buffer_load_dword v0, v0, s[4:7], s8 offen offset:4
>       v_mov_b32_e32 v1, s2
>       s_waitcnt vmcnt(0)
>       buffer_store_dword v0, v1, s[4:7], s8 offen offset:4
>       buffer_store_dword v2, v1, s[4:7], s8 offen
>       buffer_store_dword v3, v1, s[4:7], s8 offen offset:8
>       s_endpgm
>   .Lfunc_end0:
>       .size   foo, .Lfunc_end0-foo
>  
>       .section        .AMDGPU.csdata
>   ; Kernel info:
>   ; codeLenInByte = 112
>   ; NumSgprs: 9
>   ; NumVgprs: 4
>   ; FloatMode: 192
>   ; IeeeMode: 1
>   ; ScratchSize: 0
>   ; LDSByteSize: 0 bytes/workgroup (compile time only)
>   ; SGPRBlocks: 1
>   ; VGPRBlocks: 0
>   ; NumSGPRsForWavesPerEU: 9
>   ; NumVGPRsForWavesPerEU: 4
>   ; ReservedVGPRFirst: 0
>   ; ReservedVGPRCount: 0
>   ; COMPUTE_PGM_RSRC2:USER_SGPR: 2
>   ; COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0
>   ; COMPUTE_PGM_RSRC2:TGID_X_EN: 1
>   ; COMPUTE_PGM_RSRC2:TGID_Y_EN: 0
>   ; COMPUTE_PGM_RSRC2:TGID_Z_EN: 0
>   ; COMPUTE_PGM_RSRC2:TIDIG_COMP_CNT: 0
>  
>       .section        ".note.GNU-stack"
>


Hi Anastasia, Bruno,

Do you still have other opinion? or Can we go ahead and commit this patch?


https://reviews.llvm.org/D30810



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30810: Preserve vec3 type.

Reply via email to