FWIW, memcpy() vs a for() loop has different semantics with respect to address alignment. I don't know how much it will matter, but last time I was reading assembly output, copying int[] via for() loop didn't produce a codepath for 16-byte aligned addresses (allowing for SSE streaming) while memcpy() has a lot of such logic. This won't matter much unless you have lots to copy, and of course, compiler optimizations can change, so maybe this situation has changed.
Patrick On Thu, Feb 20, 2014 at 8:11 PM, Michel Dänzer <mic...@daenzer.net> wrote: > On Don, 2014-02-20 at 10:21 -0800, Tom Stellard wrote: > > > > diff --git a/src/gallium/drivers/radeonsi/si_shader.c > b/src/gallium/drivers/radeonsi/si_shader.c > > index 54270cd..9b04e6b 100644 > > --- a/src/gallium/drivers/radeonsi/si_shader.c > > +++ b/src/gallium/drivers/radeonsi/si_shader.c > > @@ -2335,7 +2335,7 @@ int si_compile_llvm(struct si_context *sctx, > struct si_pipe_shader *shader, > > ptr = (uint32_t*)sctx->b.ws->buffer_map(shader->bo->cs_buf, > sctx->b.rings.gfx.cs, PIPE_TRANSFER_WRITE); > > if (0 /*SI_BIG_ENDIAN*/) { > > for (i = 0; i < binary.code_size / 4; ++i) { > > - ptr[i] = util_bswap32(*(uint32_t*)(binary.code + > i*4)); > > + ptr[i] = > util_cpu_to_le32((*(uint32_t*)(binary.code + i*4))); > > } > > } else { > > memcpy(ptr, binary.code, binary.code_size); > > We could get rid of the separate *_ENDIAN paths using util_cpu_to_le*(). > > Either way, the non-clover patches are > > Reviewed-by: Michel Dänzer <michel.daen...@amd.com> > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev