On Thu, 2014-10-23 at 09:20 -0600, Brian Paul wrote: > Nice, just a few comments below. > > > On 10/22/2014 10:02 PM, Timothy Arceri wrote: > > Makes use of SSE to speed up compute of min and max elements > > > > Callgrind cpu usage results from pts benchmarks: > > > > Openarena 0.8.8: 3.67% -> 1.03% > > UrbanTerror: 2.36% -> 0.81% > > > > Signed-off-by: Timothy Arceri <t_arc...@yahoo.com.au> > > --- > > src/mesa/Makefile.am | 3 +- > > src/mesa/main/sse_minmax.c | 75 > > +++++++++++++++++++++++++++++++++++++++++++ > > src/mesa/main/sse_minmax.h | 29 +++++++++++++++++ > > src/mesa/vbo/vbo_exec_array.c | 8 +++++ > > 4 files changed, 114 insertions(+), 1 deletion(-) > > create mode 100644 src/mesa/main/sse_minmax.c > > create mode 100644 src/mesa/main/sse_minmax.h > > > > I almost wasn't going to bother sending this out since it uses SSE4.1 > > and its recommended to use glDrawRangeElements anyway. But since these games > > are still ofter used for benchmarking I thought I'd see if anyone is > > interested in this. I only optimised GL_UNSIGNED_INT as that was the > > only place these games were hitting but I guess it wouldn't hurt > > to optimse the other cases too. > > > > I think it would probably make sense too to just combine > > streaming-load-memcpy.c and sse_minmax.c into a single file > > something like sse_opt.c for example. > > > > As far a frame rates go I couldn't see any concrete improments > > on my Ivybridge machine. Maybe it would help more on a high end radeon > > card where cpu apparently is more of a concern?? > > > > Finally its seems to run fine but its only the second time I've tried > > these type of optimisations so let me know if there are any obvious > > mistakes or improvements. > > > > diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am > > index e71bccb..932db4f 100644 > > --- a/src/mesa/Makefile.am > > +++ b/src/mesa/Makefile.am > > @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \ > > $(ARCH_LIBS) > > > > libmesa_sse41_la_SOURCES = \ > > - main/streaming-load-memcpy.c > > + main/streaming-load-memcpy.c \ > > + main/sse_minmax.c > > libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 > > > > pkgconfigdir = $(libdir)/pkgconfig > > I think you also need to add the new file to Makefile.sources for SCons > (also check Android). > > Alternately, since the function is pretty small, and I'm not sure if it > would be used elsewhere, you might put it in the vbo_exec_array.c file.
Correct me if I'm wrong but I don't think I can do this or the compiler might end up generating SSE4.1 instructions for the other functions in vbo_exec_array.c the same would apply with your later suggestion for moving the c code into the sse_minmax.c file. > > > > diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c > > new file mode 100644 > > index 0000000..1625407 > > --- /dev/null > > +++ b/src/mesa/main/sse_minmax.c > > @@ -0,0 +1,75 @@ > > +/* > > + * Copyright © 2014 Timothy Arceri > > + * > > + * Permission is hereby granted, free of charge, to any person obtaining a > > + * copy of this software and associated documentation files (the > > "Software"), > > + * to deal in the Software without restriction, including without > > limitation > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > > + * and/or sell copies of the Software, and to permit persons to whom the > > + * Software is furnished to do so, subject to the following conditions: > > + * > > + * The above copyright notice and this permission notice (including the > > next > > + * paragraph) shall be included in all copies or substantial portions of > > the > > + * Software. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > > OR > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > > OTHER > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > > DEALINGS > > + * IN THE SOFTWARE. > > + * > > + * Author: > > + * Timothy Arceri <t_arc...@yahoo.com.au> > > + * > > + */ > > + > > +#ifdef __SSE4_1__ > > +#include "main/glheader.h" > > +#include "main/sse_minmax.h" > > +#include <smmintrin.h> > > + > > +void > > +sse_minmax(const GLuint *ui_indices, GLuint *min_index, GLuint *max_index, > > const GLuint count) > > We normally prefix non-static functions with _mesa_. > > > > +{ > > + GLuint i = 0; > > + GLuint max_ui = 0; > > + GLuint min_ui = ~0U; > > + GLuint max_arr[4] = {0}; > > + GLuint min_arr[4] = {0}; > > Do those two vars really need to be initialized? > > > > + GLuint vec_count; > > + __m128i max_ui4 = _mm_setzero_si128(); > > + __m128i min_ui4 = _mm_set1_epi32(~0U); > > + __m128i ui_indices4; > > + __m128i *ui_indices_ptr; > > Some of those declarations could be moved inside the if statement. > > > + > > + if (count >= 4) { > > + vec_count = count - (count % 4); > > + ui_indices_ptr = (__m128i*)ui_indices; > > + for (i = 0; i < vec_count/4; i++) { > > + ui_indices4 = _mm_loadu_si128(&ui_indices_ptr[i]); > > + max_ui4 = _mm_max_epu32(ui_indices4, max_ui4); > > + min_ui4 = _mm_min_epu32(ui_indices4, min_ui4); > > + } > > + > > + _mm_store_ps((float*)max_arr, _mm_castsi128_ps(max_ui4)); > > + _mm_store_ps((float*)min_arr, _mm_castsi128_ps(min_ui4)); > > + > > + for (i = 0; i < 4; i++) { > > + if (max_arr[i] > max_ui) max_ui = max_arr[i]; > > + if (min_arr[i] < min_ui) min_ui = min_arr[i]; > > Let's put the condition and the assignment on separate lines (I know > it's like this in the original code). > > > > + } > > + i = vec_count; > > + } > > + > > + for (; i < count; i++) { > > + if (ui_indices[i] > max_ui) max_ui = ui_indices[i]; > > + if (ui_indices[i] < min_ui) min_ui = ui_indices[i]; > > + } > > + > > + *min_index = min_ui; > > + *max_index = max_ui; > > +} > > > BTW, another way to structure this would be something like this: > > void > _mesa_uint_array_min_max(const GLuint *ui_indices, GLuint *min_index, > GLuint *max_index, GLuint count) > { > #ifdef __SSE4_1__ > // sse code > #else > // C code > #endif > } > > Then you could get rid of the #ifdef __SSE4_1__ at the call site. > > Also, it would be easy to implement debug/cross-check code in the > function where the C code would verify the result of the SSE code to > make sure it's correct. > > Not a big deal though. > > Can something similar be done for 16-bit values? > > -Brian > > > > + > > +#endif > > diff --git a/src/mesa/main/sse_minmax.h b/src/mesa/main/sse_minmax.h > > new file mode 100644 > > index 0000000..d14db26 > > --- /dev/null > > +++ b/src/mesa/main/sse_minmax.h > > @@ -0,0 +1,29 @@ > > +/* > > + * Copyright © 2014 Timothy Arceri > > + * > > + * Permission is hereby granted, free of charge, to any person obtaining a > > + * copy of this software and associated documentation files (the > > "Software"), > > + * to deal in the Software without restriction, including without > > limitation > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > > + * and/or sell copies of the Software, and to permit persons to whom the > > + * Software is furnished to do so, subject to the following conditions: > > + * > > + * The above copyright notice and this permission notice (including the > > next > > + * paragraph) shall be included in all copies or substantial portions of > > the > > + * Software. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > > OR > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > > OTHER > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > > DEALINGS > > + * IN THE SOFTWARE. > > + * > > + * Authors: > > + * Timothy Arceri <t_arc...@yahoo.com.au> > > + * > > + */ > > + > > +void > > +sse_minmax(const GLuint *ui_indices, GLuint *min_index, GLuint *max_index, > > const GLuint count); > > diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c > > index 045dbb5..0399348 100644 > > --- a/src/mesa/vbo/vbo_exec_array.c > > +++ b/src/mesa/vbo/vbo_exec_array.c > > @@ -36,6 +36,9 @@ > > #include "main/enums.h" > > #include "main/macros.h" > > #include "main/transformfeedback.h" > > +#ifdef __SSE4_1__ > > +#include "main/sse_minmax.h" > > +#endif > > > > #include "vbo_context.h" > > > > @@ -110,6 +113,7 @@ vbo_get_minmax_index(struct gl_context *ctx, > > const GLuint *ui_indices = (const GLuint *)indices; > > GLuint max_ui = 0; > > GLuint min_ui = ~0U; > > + > > if (restart) { > > for (i = 0; i < count; i++) { > > if (ui_indices[i] != restartIndex) { > > @@ -119,10 +123,14 @@ vbo_get_minmax_index(struct gl_context *ctx, > > } > > } > > else { > > +#ifdef __SSE4_1__ > > + sse_minmax(ui_indices, &min_ui, &max_ui, count); > > +#else > > for (i = 0; i < count; i++) { > > if (ui_indices[i] > max_ui) max_ui = ui_indices[i]; > > if (ui_indices[i] < min_ui) min_ui = ui_indices[i]; > > } > > +#endif > > } > > *min_index = min_ui; > > *max_index = max_ui; > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev