On 10/24/2014 06:28 AM, Timothy Arceri wrote:
On Thu, 2014-10-23 at 09:20 -0600, Brian Paul wrote:
Nice, just a few comments below.


On 10/22/2014 10:02 PM, Timothy Arceri wrote:
Makes use of SSE to speed up compute of min and max elements

Callgrind cpu usage results from pts benchmarks:

Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%

Signed-off-by: Timothy Arceri <t_arc...@yahoo.com.au>
---
   src/mesa/Makefile.am          |  3 +-
   src/mesa/main/sse_minmax.c    | 75 
+++++++++++++++++++++++++++++++++++++++++++
   src/mesa/main/sse_minmax.h    | 29 +++++++++++++++++
   src/mesa/vbo/vbo_exec_array.c |  8 +++++
   4 files changed, 114 insertions(+), 1 deletion(-)
   create mode 100644 src/mesa/main/sse_minmax.c
   create mode 100644 src/mesa/main/sse_minmax.h

I almost wasn't going to bother sending this out since it uses SSE4.1
and its recommended to use glDrawRangeElements anyway. But since these games
are still ofter used for benchmarking I thought I'd see if anyone is
interested in this. I only optimised GL_UNSIGNED_INT as that was the
only place these games were hitting but I guess it wouldn't hurt
to optimse the other cases too.

I think it would probably make sense too to just combine
streaming-load-memcpy.c and sse_minmax.c into a single file
something like sse_opt.c for example.

As far a frame rates go I couldn't see any concrete improments
on my Ivybridge machine. Maybe it would help more on a high end radeon
card where cpu apparently is more of a concern??

Finally its seems to run fine but its only the second time I've tried
these type of optimisations so let me know if there are any obvious
mistakes or improvements.

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index e71bccb..932db4f 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
        $(ARCH_LIBS)

   libmesa_sse41_la_SOURCES = \
-       main/streaming-load-memcpy.c
+       main/streaming-load-memcpy.c \
+       main/sse_minmax.c
   libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1

   pkgconfigdir = $(libdir)/pkgconfig

I think you also need to add the new file to Makefile.sources for SCons
(also check Android).

Alternately, since the function is pretty small, and I'm not sure if it
would be used elsewhere, you might put it in the vbo_exec_array.c file.

Correct me if I'm wrong but I don't think I can do this or the compiler
might end up generating SSE4.1 instructions for the other functions in
vbo_exec_array.c the same would apply with your later suggestion for
moving the c code into the sse_minmax.c file.

I see, but if we're concerned about that, shouldn't we at least check for SSE4.1 support at runtime for the new code? "if (_mesa_x86_cpu_features & X86_FEATURE_SSE4_1)" should do it.

Otherwise, I think we should at least respect the USE_SSE_ASM switch: #if defined(__SSE4_1__) && defined(USE_SSE_ASM)

-Brian

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to