I've implemented the OpenCL vload/vstore builtin functions in two parts. 1) Pure CL C implementation. No Assembly 2) Add assembly optimizations for 32-bit int/uint loads/stores of 4+ component vectors
Note: The vstore implementation assumes that the hardware back end supports byte-addressable stores. This may not always be optimal. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev