On 31.10.2014 20:30, Roland Scheidegger wrote: > Am 31.10.2014 um 18:17 schrieb Matt Turner: >> On Fri, Oct 31, 2014 at 3:13 AM, Juha-Pekka Heikkila >> <juhapekka.heikk...@gmail.com> wrote: >>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikk...@gmail.com> >>> --- >>> src/mesa/main/colormac.h | 20 +++++++++++++++ >>> src/mesa/main/pixeltransfer.c | 59 >>> ++++++++++++++++++++++++++++++++----------- >>> 2 files changed, 64 insertions(+), 15 deletions(-) >>> >>> diff --git a/src/mesa/main/colormac.h b/src/mesa/main/colormac.h >>> index c8adca6..da5e094 100644 >>> --- a/src/mesa/main/colormac.h >>> +++ b/src/mesa/main/colormac.h >>> @@ -51,6 +51,26 @@ _mesa_unclamped_float_rgba_to_ubyte(GLubyte dst[4], >>> const GLfloat src[4]) >>> >>> >>> /** >>> + * Clamp four float values to [min,max] >>> + */ >>> +#if defined(__SSE2__) && defined(__GNUC__) >>> +static inline void >>> +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, >>> + const float max) >>> +{ >>> + __m128 operand, minval, maxval; >>> + >>> + operand = _mm_loadu_ps(src); >> >> Surely 128-bit pixels will be 128-bit aligned? I think we can use an >> aligned load here. > I can't see why that would be the case (sure in some cases it might be > but I don't see anything which would actually guarantee that). temp > images themselves don't seem to have any forced alignment (that could be > fixed though I didn't check fi there's other paths into it which can't > be easily fixed).
On SNB 64-bit build with my standard build options I did see the pixels aligned at 64-bit at worst thus I have unaligned load/store here. > >> >>> + minval = _mm_set1_ps(min); >>> + maxval = _mm_set1_ps(max); >>> + operand = _mm_max_ps(operand, minval); >>> + operand = _mm_min_ps(operand, maxval); >>> + _mm_storeu_ps(result, operand); >> >> And an aligned store here. >> >>> +} >>> +#endif >>> + >>> + >>> +/** >>> * \name Generic color packing macros. All inputs should be GLubytes. >>> * >>> * \todo We may move these into texstore.h at some point. >>> diff --git a/src/mesa/main/pixeltransfer.c b/src/mesa/main/pixeltransfer.c >>> index 8bbeeb8..e16eb59 100644 >>> --- a/src/mesa/main/pixeltransfer.c >>> +++ b/src/mesa/main/pixeltransfer.c >>> @@ -35,7 +35,7 @@ >>> #include "pixeltransfer.h" >>> #include "imports.h" >>> #include "mtypes.h" >>> - >>> +#include "x86/common_x86_asm.h" >>> >>> /* >>> * Apply scale and bias factors to an array of RGBA pixels. >>> @@ -89,16 +89,34 @@ _mesa_map_rgba( const struct gl_context *ctx, GLuint n, >>> GLfloat rgba[][4] ) >>> const GLfloat *bMap = ctx->PixelMaps.BtoB.Map; >>> const GLfloat *aMap = ctx->PixelMaps.AtoA.Map; >>> GLuint i; >>> - for (i=0;i<n;i++) { >>> - GLfloat r = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); >>> - GLfloat g = CLAMP(rgba[i][GCOMP], 0.0F, 1.0F); >>> - GLfloat b = CLAMP(rgba[i][BCOMP], 0.0F, 1.0F); >>> - GLfloat a = CLAMP(rgba[i][ACOMP], 0.0F, 1.0F); >>> - rgba[i][RCOMP] = rMap[F_TO_I(r * rscale)]; >>> - rgba[i][GCOMP] = gMap[F_TO_I(g * gscale)]; >>> - rgba[i][BCOMP] = bMap[F_TO_I(b * bscale)]; >>> - rgba[i][ACOMP] = aMap[F_TO_I(a * ascale)]; >>> + >>> +#if defined(__SSE2__) && defined(__GNUC__) >>> + if (cpu_has_xmm2) { >> >> #ifdef __SSE2__ means the compiler is free to use SSE2 instructions >> whenever it pleases. That's not what you want here, if you're also >> doing runtime checking (cpu_has_xmm2). >> >> The typical way to do this is to put the function containing SSE >> instructions in a separate file that is compiled with -msse2. See >> streaming-load-memcpy.c for example. gcc has a way to mark specific >> functions, but since we have to compile with MSVC... >> >> I think you just want copy-and-paste the SSE4.1 testing code in >> configure.ac for SSE2 and then wrap these uses in #ifdef USE_SSE2. >> >>> + for (i=0;i<n;i++) { >>> + GLfloat rgba_temp[4]; >>> + _mesa_clamp_float_rgba(rgba[i], rgba_temp, 0.0F, 1.0F); >>> + rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * rscale)]; >>> + rgba[i][GCOMP] = gMap[F_TO_I(rgba_temp[GCOMP] * gscale)]; >>> + rgba[i][BCOMP] = bMap[F_TO_I(rgba_temp[BCOMP] * bscale)]; >>> + rgba[i][ACOMP] = aMap[F_TO_I(rgba_temp[ACOMP] * ascale)]; >> >> Oh, but we shouldn't be bothering to store the floats back to memory >> anyway. We should just do this part with SSE as well. > > Indeed the mul scale and F_TO_I look like obvious candidates for > vectorization (though the table part unfortunately not). I'll rework this patch, it'll be a bit different looking with these suggested changes. /Juha-Pekka _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev