On Thu, May 23, 2013 at 07:49:39PM -0500, Aaron Watry wrote:
> I've implemented the OpenCL vload/vstore builtin functions in two parts.
> 1) Pure CL C implementation. No Assembly
> 2) Add assembly optimizations for 32-bit int/uint loads/stores of 4+ component
>vectors
>
> Note: The vstore impl
I've implemented the OpenCL vload/vstore builtin functions in two parts.
1) Pure CL C implementation. No Assembly
2) Add assembly optimizations for 32-bit int/uint loads/stores of 4+ component
vectors
Note: The vstore implementation assumes that the hardware back end supports
byte-addressable s