On Fri, Apr 07, 2017 at 04:30:49PM -0700, Jason Ekstrand wrote: > On Fri, Apr 7, 2017 at 3:19 PM, Chris Wilson <[1]ch...@chris-wilson.co.uk> > wrote: > > On Fri, Apr 07, 2017 at 02:41:13PM -0700, Jason Ekstrand wrote: > > On Fri, Apr 7, 2017 at 1:26 PM, Chris Wilson > <[1][2]ch...@chris-wilson.co.uk> > > wrote: > > > > On Fri, Apr 07, 2017 at 12:55:53PM -0700, Jason Ekstrand wrote: > > > +#define _ANV_MULTIALLOC_UPDATE_POINTER(_i) \ > > > + if ((_i) < ma->ptr_count) \ > > > + *ma->ptrs[_i] = ptr + (uintptr_t)*ma->ptrs[_i] > > > + _ANV_MULTIALLOC_UPDATE_POINTER(0); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(1); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(2); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(3); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(4); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(5); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(6); > > > + _ANV_MULTIALLOC_UPDATE_POINTER(7); > > > +#undef _ANV_MULTIALLOC_UPDATE_POINTER > > > > #define _ANV_MULTIALLOC_UPDATE_POINTER(_i) case _i + 1: > *ma->ptrs[_i] = > > ptr +(uintptr)*ma->ptrs[_i] > > > > switch (ma->ptr_count) { > > _ANV_MULTIALLOC_UPDATE_POINTER(7); > > _ANV_MULTIALLOC_UPDATE_POINTER(6); > > _ANV_MULTIALLOC_UPDATE_POINTER(5); > > _ANV_MULTIALLOC_UPDATE_POINTER(4); > > _ANV_MULTIALLOC_UPDATE_POINTER(3); > > _ANV_MULTIALLOC_UPDATE_POINTER(2); > > _ANV_MULTIALLOC_UPDATE_POINTER(1); > > _ANV_MULTIALLOC_UPDATE_POINTER(0); > > } > > > > #undef _ANV_MULITALLOC_UPDATE_POINTER > > > > If ma->ptr_count is constant, they generate exactly the same code. > If it > > isn't (i.e. if one of the multialloc_adds is predicated), then they > still > > generate basically the same code with the code for the if version > being > > slightly more straightforward. > > Took a look at this with [3]https://godbolt.org/g/UwrMk1 > > Weird... That's not at all what I'm seeing with my demo file. In fact, > when I try to compile your demo file with GCC on my local machine, it > reduces the entire thing down to less than a dozen instrutions.
Yes, if I force inline the add, gcc and clang both realise that the function doesn't use any of the values and discards everything. In the end, gcc actually generates very smart code. consume_pointer: movl $0, (%rdi) ret main: subq $8, %rsp movl $200, %edi call malloc testq %rax, %rax je .L5 movq %rax, %rdi call consume_pointer leaq 4(%rax), %rdi call consume_pointer leaq 72(%rax), %rdi call consume_pointer xorl %eax, %eax .L3: addq $8, %rsp ret .L5: orl $-1, %eax jmp .L3 It's generated a single allocation, and yet still passed around the various offsets within that block without having to store the offsets. anv_multialloc_add() definitely needs __attribute__((always_inline)). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev