On Thu, Oct 23, 2014 at 2:13 AM, Timothy Arceri <t_arc...@yahoo.com.au> wrote: > On Wed, 2014-10-22 at 22:49 -0700, Matt Turner wrote: >> On Wed, Oct 22, 2014 at 10:30 PM, Matt Turner <matts...@gmail.com> wrote: >> > On Wed, Oct 22, 2014 at 9:02 PM, Timothy Arceri <t_arc...@yahoo.com.au> >> > wrote: >> >> I almost wasn't going to bother sending this out since it uses SSE4.1 >> >> and its recommended to use glDrawRangeElements anyway. But since these >> >> games >> >> are still ofter used for benchmarking I thought I'd see if anyone is >> >> interested in this. I only optimised GL_UNSIGNED_INT as that was the >> >> only place these games were hitting but I guess it wouldn't hurt >> >> to optimse the other cases too. >> > >> > I think it's kind of neat! >> > >> > It might also be fun to try to do this with OpenMP. OpenMP 3.1 >> > (supported since gcc-4.7) supports min/max reduction operators. > > I've never really looked into OpenMP before, but very cool :) > > It seems simd support wasn't added until 4.0 (gcc-4.9) so using 3.1 > would require threading. Probably best just to go with 4.0.
Oh, that's unfortunate. I didn't notice because I'm using 4.9.1 and was too preoccupied with finding out when min/max reductions had been added. >> I think all you'd need to do for that is to add this pragma >> immediately before the for loop in vbo_exec_array.c: >> >> #if _OPENMP > ... (have to figure out the date for OMP 3.1) >> #pragma omp simd reduction(max:max_ui) reduction(min:min_ui). >> #endif >> >> and then change the inner loop to use ternary for min/max: >> >> max_ui = ui_indices[i] > max_ui ? ui_indices[i] : max_ui; >> min_ui = ui_indices[i] < min_ui ? ui_indices[i] : min_ui; >> >> I tested it with a little function and confirmed that it generates >> SSE4.1/AVX2 instructions (and even a bunch of SSE2 instructions when >> 4.1 isn't available!) depending on the -march= value I pass. > > I assume this means there isn't a way to tell OpenMP to build multiple > versions and select the best one at runtime, so distros would always > just ship SSE2? Anyway I'm going to give the SSE2 code a run on my (6 > year old) desktop and see how it performs. I will also compare it to my > SSE4.1 code on my laptop maybe it won't be to big of a difference. I couldn't find a way. :( I suspect the SSE 4.1 path you proposed will be the best solution since we can use it with runtime detection. We might also simply try using OpenMP in the sse_minmax.c file, since it'll be built with -msse4.1 and seeing how the generated code compares. While on x86-64 we can at least assume SSE 2, we can't make any assumptions on 32-bit, which most games still are. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev