On Sun, 2014-10-26 at 11:16 +0000, Bruno Jimenez wrote: > On Sun, 2014-10-26 at 10:47 +1100, Timothy Arceri wrote: > > Makes use of SSE to speed up compute of min and max elements > > > > Callgrind cpu usage results from pts benchmarks: > > > > Openarena 0.8.8: 3.67% -> 1.03% > > UrbanTerror: 2.36% -> 0.81% > > > > Signed-off-by: Timothy Arceri <t_arc...@yahoo.com.au> > > --- > > src/mesa/Android.libmesa_dricore.mk | 3 +- > > src/mesa/Makefile.am | 3 +- > > src/mesa/main/sse_minmax.c | 92 > > +++++++++++++++++++++++++++++++++++++ > > src/mesa/main/sse_minmax.h | 30 ++++++++++++ > > src/mesa/vbo/vbo_exec_array.c | 14 ++++-- > > 5 files changed, 137 insertions(+), 5 deletions(-) > > create mode 100644 src/mesa/main/sse_minmax.c > > create mode 100644 src/mesa/main/sse_minmax.h > > > > Thanks again to everyone for the feedback. > > > > V3: > > > > - Removed sse_minmax.c from Makefile.sources > > - handle the first few values without SSE until the pointer is aligned > > and use _mm_load_si128 rather than _mm_loadu_si128 > > - guard the call to the SSE code better at build time > > > > diff --git a/src/mesa/Android.libmesa_dricore.mk > > b/src/mesa/Android.libmesa_dricore.mk > > index 1e6d948..52d626f 100644 > > --- a/src/mesa/Android.libmesa_dricore.mk > > +++ b/src/mesa/Android.libmesa_dricore.mk > > @@ -51,7 +51,8 @@ endif # MESA_ENABLE_ASM > > > > ifeq ($(ARCH_X86_HAVE_SSE4_1),true) > > LOCAL_SRC_FILES += \ > > - $(SRCDIR)main/streaming-load-memcpy.c > > + $(SRCDIR)main/streaming-load-memcpy.c \ > > + $(SRCDIR)main/sse_minmax.c > > LOCAL_CFLAGS := -msse4.1 > > endif > > > > diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am > > index e71bccb..932db4f 100644 > > --- a/src/mesa/Makefile.am > > +++ b/src/mesa/Makefile.am > > @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \ > > $(ARCH_LIBS) > > > > libmesa_sse41_la_SOURCES = \ > > - main/streaming-load-memcpy.c > > + main/streaming-load-memcpy.c \ > > + main/sse_minmax.c > > libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 > > > > pkgconfigdir = $(libdir)/pkgconfig > > diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c > > new file mode 100644 > > index 0000000..8f05982 > > --- /dev/null > > +++ b/src/mesa/main/sse_minmax.c > > @@ -0,0 +1,92 @@ > > +/* > > + * Copyright © 2014 Timothy Arceri > > + * > > + * Permission is hereby granted, free of charge, to any person obtaining a > > + * copy of this software and associated documentation files (the > > "Software"), > > + * to deal in the Software without restriction, including without > > limitation > > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > > + * and/or sell copies of the Software, and to permit persons to whom the > > + * Software is furnished to do so, subject to the following conditions: > > + * > > + * The above copyright notice and this permission notice (including the > > next > > + * paragraph) shall be included in all copies or substantial portions of > > the > > + * Software. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > > OR > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > > OTHER > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > > DEALINGS > > + * IN THE SOFTWARE. > > + * > > + * Author: > > + * Timothy Arceri <t_arc...@yahoo.com.au> > > + * > > + */ > > + > > +#ifdef __SSE4_1__ > > +#include "main/sse_minmax.h" > > +#include <smmintrin.h> > > +#include <stdint.h> > > + > > +void > > +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index, > > + unsigned *max_index, const unsigned count) > > +{ > > + unsigned max_ui = 0; > > + unsigned min_ui = ~0U; > > + unsigned i = 0; > > + > > + /* handle the first few values without SSE until the pointer is aligned > > */ > > + while (((uintptr_t)ui_indices & 15) && (i < count)) { > > + if (ui_indices[i] > max_ui) > > + max_ui = ui_indices[i]; > > + if (ui_indices[i] < min_ui) > > + min_ui = ui_indices[i]; > > + > > + i++; > > + ui_indices++; > > + } > > Hi, > > I think that this is wrong. If you advance the pointer and the index you > will be jumping over some values. Maybe what you wanted was something > like: > > if (*ui_indices > max_ui) > max_ui = *ui_indices; > if (*ui_indices < min_ui) > min_ui = *ui_indices; >
Yeah thanks. I think I've messed things up big time by advancing the pointer, as well as what you have pointed out I think I need to also recalculate the count in the code that follows. It was working in my testing as the values so far have just happened to be aligned. > - Bruno _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev