Re: [Mesa-dev] [PATCH 1/2] i965/fs: Remove FS_OPCODE_BLORP_FB_WRITE.

2014-11-06 Thread Matt Turner
On Thu, Aug 28, 2014 at 12:50 PM, Matt Turner  wrote:
> On Tue, Aug 12, 2014 at 9:48 AM, Kenneth Graunke  
> wrote:
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> index d3509a0..19f7ef8 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> @@ -186,7 +186,7 @@ fs_generator::generate_fb_write(fs_inst *inst)
>>brw_imm_ud(0x1 << 11));
>>   }
>>
>> -if (inst->target > 0) {
>> +if (fp && inst->target > 0) {
>> /* Set the render target index for choosing BLEND_STATE. */
>> brw_MOV(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE,
>>inst->base_mrf, 2),
>
> I don't understand this change. I don't see fp being used elsewhere in
> this patch.
>
> Patch 2 is
>
> Reviewed-by: Matt Turner 
>
> and so is this one once this hunk is sorted out.

Ping?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Michel Dänzer

On 05.11.2014 20:14, Joonas Lahtinen wrote:


Modified not refer to DRI3, just uses the present extension to get rid
of the excess buffer invalidations.


AFAICT there's no fallback from your changes to the current behaviour if 
the X server doesn't support the Present extension. There probably needs 
to be such a fallback.



--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #49 from Iago Toral  ---
Jason you had this commit in your original branch: "MAYBEREVERT: Fill X
components with 1"

That basically makes packing to padded formats (like RGBX) set X=1. In the
commit message you mention that you are not sure you like this... my opinion is
that this should not be necessary and it degrades performance when packing to
these format, so I would be more for removing this one. What do you think?

FWIW, I have checked that removing this commit does not affect piglit results.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-06 Thread Juha-Pekka Heikkila
On 05.11.2014 21:21, Ian Romanick wrote:
> On 11/04/2014 01:24 PM, Roland Scheidegger wrote:
>> Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
>>> +   for(i = 0; i < n; i++) {
>>> +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
>>> +
>>> +  *operand = _mm_mul_ps(multiplier, *operand);
>>> +  truncated_integers = _mm_cvttps_epi32(*operand);
>>> +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
>>> + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] );
>>> +
>>> +  _mm_storeu_ps(rgba_dst[i], mmove);
>> The sse2 code at the end looks counterproductive to me. Not sure what
>> gcc will generate but I'd suspect it involves some simd->int domain
>> transition for the table lookups, plus another int->simd transition to
>> get the values back into simd domain (alternatively it might use
>> stores/load here) just so you can store them again...
>> It would probably be better to just store the values directly after the
>> table lookups.
>> But in any case actually I'm beginning to suspect noone really cares
>> about performance anyway for that path (who the hell uses these
>> scale/map features?) so whatever works...
> 
> Which raises another question... do we have any piglit tests that
> actually exercise this path?

No we don't. I made small test for this to see how it works, I was
planning to move my test to Piglit later.

/Juha-Pekka

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Joonas Lahtinen
On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:
> On 05.11.2014 20:14, Joonas Lahtinen wrote:
> >
> > Modified not refer to DRI3, just uses the present extension to get rid
> > of the excess buffer invalidations.
> 
> AFAICT there's no fallback from your changes to the current behaviour if 
> the X server doesn't support the Present extension. There probably needs 
> to be such a fallback.
> 
> 

It gets rid of such nasty hack (the intel_viewport one), that I thought
there is no point making fallback. Because without this, the egl dri2
backend is fundamentally broken anyway.

Regards, Joonas

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Joonas Lahtinen
On ke, 2014-11-05 at 15:19 +, Emil Velikov wrote:
> Hi Joonas,
> 
> Does getting rid of the viewport hack give you any noticeable
> performance improvement?

Yes, it significantly reduces the CPU load when multiple glViewport
calls are made per frame (4x4 grid or so).

> Is there any interest in converting the
> egl_dri2 backend to dri3, rather than just copying over the present bits ?
> 

This could be one thing to do. But in the meanwhile, I would commit this
present extension patch so that the affected use cases get the
improvements.

Regards, Joonas

> On 05/11/14 11:14, Joonas Lahtinen wrote:
> > Hi,
> > 
> > Modified not refer to DRI3, just uses the present extension to get rid
> > of the excess buffer invalidations.
> > 
> > Regards, Joonas
> > 
> > From 257e2a8c750f9dcf868cce9da8632df3cae0fcec Mon Sep 17 00:00:00 2001
> > From: Joonas Lahtinen 
> > Date: Wed, 5 Nov 2014 12:25:32 +0200
> > Subject: [PATCH] egl: dri2: Use present extension.
> > 
> > Present extension is used to avoid excess buffer invalidations, because
> > the XCB interface doesn't supply that information.
> > 
> > Signed-off-by: Daniel van der Wath 
> > Signed-off-by: Joonas Lahtinen 
> > ---
> >  configure.ac|5 +-
> >  src/egl/drivers/dri2/egl_dri2.c |2 +-
> >  src/egl/drivers/dri2/egl_dri2.h |   24 ++-
> >  src/egl/drivers/dri2/platform_x11.c |  247 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.c |9 +-
> >  5 files changed, 262 insertions(+), 25 deletions(-)
> > 
> > diff --git a/configure.ac b/configure.ac
> > index fc7d372..75d90c0 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -952,7 +952,8 @@ xyesno)
> >  fi
> >  
> >  if test x"$enable_dri" = xyes; then
> > -   dri_modules="$dri_modules xcb-dri2 >= $XCBDRI2_REQUIRED"
> > +   PKG_CHECK_MODULES([PRESENTPROTO], [presentproto >= 
> > $PRESENTPROTO_REQUIRED])
> > +   dri_modules="$dri_modules xcb-dri2 >= $XCBDRI2_REQUIRED 
> > xcb-present"
> Afaics you are not changing anything on the dri modules (or glx/dri2) to
> require the above changes. Perhaps you need to push the presentproto
> check in the x11 case below ?
> 
> >  fi
> >  
> >  if test x"$enable_dri3" = xyes; then
> > @@ -1564,7 +1565,7 @@ for plat in $egl_platforms; do
> > ;;
> >  
> > x11)
> > -   PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 >= 
> > $XCBDRI2_REQUIRED xcb-xfixes])
> > +   PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 >= 
> > $XCBDRI2_REQUIRED xcb-xfixes xcb-present])
> > ;;
> >  
> > drm)
> [snip]
> > diff --git a/src/egl/drivers/dri2/platform_x11.c 
> > b/src/egl/drivers/dri2/platform_x11.c
> > index f8c4b70..a1445b2 100644
> > --- a/src/egl/drivers/dri2/platform_x11.c
> > +++ b/src/egl/drivers/dri2/platform_x11.c
> > @@ -188,6 +188,205 @@ get_xcb_screen(xcb_screen_iterator_t iter, int screen)
> >  return NULL;
> >  }
> >  
> > +/*
> > + * Called by the XCB_PRESENT_COMPLETE_NOTIFY case.
> > + */
> > +static void
> > +dri2_update_num_back(struct dri2_egl_surface *priv)
> > +{
> > +   priv->num_back = 1;
> > +   if (priv->flipping)
> > +  priv->num_back++;
> > +   if (priv->base.SwapInterval == 0)
> > +  priv->num_back++;
> > +}
> > +
> This seems to be out of sync with dri3_glx. Don't you need something
> similar to commit f7a36ef5fe23056299a77414f9ad8b5e5a1d ?
> 
> [snip]
> > +/**
> > + *
> > + * Process any present events that have been received from the X server
> > + *
> > + * From glx, we additionally invalidate the drawable here if there has a 
> > been a special event.
> > + */
> > +static void
> > +dri2_flush_present_events(struct dri2_egl_display *dri2_dpy, struct 
> > dri2_egl_surface *priv)
> > +{
> > +   xcb_connection_t *c = dri2_dpy->conn;
> > +
> > +   /* Check to see if any configuration changes have occurred
> > +* since we were last invoked
> > +*/
> > +   if (priv->special_event) {
> > +  xcb_generic_event_t*ev;
> > +
> > +  while ((ev = xcb_poll_for_special_event(c, priv->special_event)) != 
> > NULL) {
> > + xcb_present_generic_event_t *ge = (void *) ev;
> > + dri2_handle_present_event(priv, ge);
> > + _eglLog(_EGL_INFO, "DRI: Invalidating buffer 0x%x\n", 
> > priv->dri_drawable);
> > + (*dri2_dpy->flush->invalidate)(priv->dri_drawable);
> Hmm why does one need to invalidate at this stage - I take that it's
> related to lack of fence objects ?
> 
> [snip]
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> > b/src/mesa/drivers/dri/i965/brw_context.c
> > index e1a994a..dbadd10 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.c
> > +++ b/src/mesa/drivers/dri/i965/brw_context.c
> > @@ -148,6 +148,9 @@ intel_viewport(struct gl_context *ctx)
> > __DRIcontext *driContext = brw->driContext;
> >  
> > if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) {
> >

[Mesa-dev] [Bug 74563] Surfaceless contexts are not properly released by DRI drivers

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=74563

Tapani Pälli  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Tapani Pälli  ---
was pushed to master

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

2014-11-06 Thread Timothy Arceri
Signed-off-by: Timothy Arceri 
---
 configure.ac   | 6 ++
 src/mesa/x86/common_x86.c  | 4 
 src/mesa/x86/common_x86_features.h | 4 +++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 03f1bca..cc0a661 100644
--- a/configure.ac
+++ b/configure.ac
@@ -258,6 +258,12 @@ if test "x$SSE41_SUPPORTED" = x1; then
 fi
 AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
 
+AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
+if test "x$SSSE3_SUPPORTED" = x1; then
+DEFINES="$DEFINES -DUSE_SSSE3"
+fi
+AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
+
 dnl Can't have static and shared libraries, default to static if user
 dnl explicitly requested. If both disabled, set to static since shared
 dnl was explicitly requested.
diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
index 25f5c40..ad0648a 100644
--- a/src/mesa/x86/common_x86.c
+++ b/src/mesa/x86/common_x86.c
@@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
 
   __get_cpuid(1, &eax, &ebx, &ecx, &edx);
 
+#ifdef bit_SSSE3
+  if (ecx & bit_SSSE3)
+ _mesa_x86_cpu_features |= X86_FEATURE_SSSE3;
+#endif
   if (ecx & bit_SSE4_1)
  _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;
}
diff --git a/src/mesa/x86/common_x86_features.h 
b/src/mesa/x86/common_x86_features.h
index 66f2cf6..6eb2b38 100644
--- a/src/mesa/x86/common_x86_features.h
+++ b/src/mesa/x86/common_x86_features.h
@@ -43,7 +43,8 @@
 #define X86_FEATURE_XMM2   (1<<6)
 #define X86_FEATURE_3DNOWEXT   (1<<7)
 #define X86_FEATURE_3DNOW  (1<<8)
-#define X86_FEATURE_SSE4_1 (1<<9)
+#define X86_FEATURE_SSSE3  (1<<9)
+#define X86_FEATURE_SSE4_1 (1<<10)
 
 /* standard X86 CPU features */
 #define X86_CPU_FPU(1<<0)
@@ -65,6 +66,7 @@
 #define cpu_has_xmm2   (_mesa_x86_cpu_features & X86_FEATURE_XMM2)
 #define cpu_has_3dnow  (_mesa_x86_cpu_features & X86_FEATURE_3DNOW)
 #define cpu_has_3dnowext   (_mesa_x86_cpu_features & X86_FEATURE_3DNOWEXT)
+#define cpu_has_ssse3  (_mesa_x86_cpu_features & X86_FEATURE_SSSE3)
 #define cpu_has_sse4_1 (_mesa_x86_cpu_features & X86_FEATURE_SSE4_1)
 
 #endif
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Timothy Arceri
Also cleans up some if statements in the *faster functions.

Callgrind cpu usage results from pts benchmarks:

For ytile_copy_faster()

Nexuiz 1.6.1: 2.16% -> 1.20%

Signed-off-by: Timothy Arceri 
---
 src/mesa/Makefile.am   |  8 +++
 src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 ++
 src/mesa/main/fast_rgba8_copy.c| 78 
 src/mesa/main/fast_rgba8_copy.h| 37 
 4 files changed, 141 insertions(+), 64 deletions(-)
 create mode 100644 src/mesa/main/fast_rgba8_copy.c
 create mode 100644 src/mesa/main/fast_rgba8_copy.h

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index e71bccb..2402096 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
 
 ARCH_LIBS =
 
+if SSSE3_SUPPORTED
+ARCH_LIBS += libmesa_ssse3.la
+endif
+
 if SSE41_SUPPORTED
 ARCH_LIBS += libmesa_sse41.la
 endif
@@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
main/streaming-load-memcpy.c
 libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
+libmesa_ssse3_la_SOURCES = \
+   main/fast_rgba8_copy.c
+libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
+
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = gl.pc
 
diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index cb5738a..0deeb75 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -27,6 +27,7 @@
  **/
 
 #include "main/bufferobj.h"
+#include "main/fast_rgba8_copy.h"
 #include "main/image.h"
 #include "main/macros.h"
 #include "main/mtypes.h"
@@ -42,9 +43,7 @@
 #include "intel_mipmap_tree.h"
 #include "intel_blit.h"
 
-#ifdef __SSSE3__
-#include 
-#endif
+#include "x86/common_x86_asm.h"
 
 #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
@@ -175,18 +174,6 @@ err:
return false;
 }
 
-#ifdef __SSSE3__
-static const uint8_t rgba8_permutation[16] =
-   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
-
-/* NOTE: dst must be 16 byte aligned */
-#define rgba8_copy_16(dst, src) \
-   *(__m128i *)(dst) = _mm_shuffle_epi8(\
-  (__m128i) _mm_loadu_ps((float *)(src)),   \
-  *(__m128i *) rgba8_permutation\
-   )
-#endif
-
 /**
  * Copy RGBA to BGRA - swap R and B.
  */
@@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
uint8_t *d = dst;
uint8_t const *s = src;
 
-#ifdef __SSSE3__
-   /* Fast copying for tile spans.
-*
-* As long as the destination texture is 16 aligned,
-* any 16 or 64 spans we get here should also be 16 aligned.
-*/
-
-   if (bytes == 16) {
-  assert(!(((uintptr_t)dst) & 0xf));
-  rgba8_copy_16(d+ 0, s+ 0);
-  return dst;
-   }
-
-   if (bytes == 64) {
-  assert(!(((uintptr_t)dst) & 0xf));
-  rgba8_copy_16(d+ 0, s+ 0);
-  rgba8_copy_16(d+16, s+16);
-  rgba8_copy_16(d+32, s+32);
-  rgba8_copy_16(d+48, s+48);
-  return dst;
-   }
-#endif
-
while (bytes >= 4) {
   d[0] = s[2];
   d[1] = s[1];
@@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
   mem_copy_fn mem_copy)
 {
if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
-  if (mem_copy == memcpy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
-   } else {
-  if (mem_copy == memcpy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
+  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
+dst, src, src_pitch, swizzle_bit, mem_copy);
}
xtile_copy(x0, x1, x2, x3, y0, y1,
   dst, src, src_pitch, swizzle_bit, mem_copy);
@@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
   mem_copy_fn mem_copy)
 {
if (x0 == 0 && x3 == ytile_width && y0 == 0 && y1 == ytile_height) {
-  if (mem_copy == memcpy)
- return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
-   } else {
-  if (mem_copy == memcpy)
-

Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds

2014-11-06 Thread Emil Velikov
On 05/11/14 21:11, Jose Fonseca wrote:
>> How many people/companies use EGL for Windows/fbdev, how about OpenVG on
> any platform ?
> 
> I already said this privately to Marek when he was RFC'ing on this change: 
> I'm fine if Linux-specific drivers abandon st/egl to focus solely on st/dri, 
> but removing st/egl altogether seems unnecessary and short-sighted: EGL is a 
> cross-platform API, Mesa is a cross-platform implementation of OpenGL and 
> friends, so sooner or later people will want to have Mesa's EGL support on 
> platforms others than Linux.
> 
> This is not hypothetical:
> - See https://bugs.freedesktop.org/show_bug.cgi?id=40920 for an example of a 
> bug reported from an user using llvmpipe + egl + opengv on windows.
> - VMware doesn't currently ship or support EGL on Windows, but I suspect we 
> eventually we'll want to support EGL on non-linux platforms.
> 
> Even if OpenVG is loosing popularity, but maybe Khronos will come up with 
> another cross-platform graphics API (maybe OpenGL NG) that's tied to EGL.
> 
> So a cross-platform implementation of EGL is bound to be useful.
> 
> 
> I don't test, but I build egl-static and OpenVG on Windows nightly w/ 
> llvmpipe.  It's like a superset of OSMesa, and it seems more useful, as it 
> gives one more APIs than OSMesa, and through a standard API to create/bind 
> contexts .
> 
> 
> In short, stop caring about st/egl on Linux, maybe even remove DRI support 
> out st/egl if you must, but please don't go out of your way to break EGL on 
> non-linux platforms.
> 
So let me justify why I brought this in the first place:

1. Over the last two years st/egl had the following patches
 * Build fixes & related - most of the patches (80%?)
 * Interface changes - ~10 patches
 * Bugfixes - ~3 patches
 * New "features" - 1 patch (already present with the dri2 backend)
2. Over the last two years I've not seen any bug reports from people
using either st/egl or st/vega. Must admit I was not looking too closely.
3. Afaict the VMWare or other commercial products do not use it.

So based on those my naive question was "Is there anyone actually using
those state-trackers, rather than just building them" - i.e. it was
meant as a question, rather than a message of hate wrt the code-base :-)

I must admit I cannot predict the future (i.e. what VMWare, Khronos
and/or others have in plan) but based on the lack of testers,
maintainers and new improvements imho it make sense to remove the stale
code. As soon as any of that changes we can always bring it back.

So I would not call it short-sighted, but imho it does not make sense to
cling onto something in the hopes that one day someone may use it.

Cheers,
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] megadriver: explicitly link against glapi, link with -no-undefined

2014-11-06 Thread Emil Velikov
Hi Frank,

On 31/10/14 01:11, Frank Henigman wrote:
> I was too hasty with my "Tested-by."  While it worked in a
> shared-glapi configuration, it broke the build with the following:
> 
> ./configure --prefix=/usr --build=x86_64-pc-linux-gnu
> --host=x86_64-cros-linux-gnu --mandir=/usr/share/man
> --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
> --localstatedir=/var/lib --libdir=/usr/lib64
> --disable-dependency-tracking --disable-option-checking
> --with-driver=dri --disable-glu --disable-glut --without-demos
> --enable-texture-float --disable-dri3 --disable-llvm-shared-libs
> --enable-glx --enable-llvm-gallium --disable-egl --disable-gbm
> --disable-gles1 --disable-gles2 --disable-shared-glapi
> --disable-gallium --enable-debug --enable-glx-tls --enable-asm
> --disable-xlib-glx --enable-dri --with-dri-drivers=i965
> --with-gallium-drivers=
> 
> the output is:
> ...
> make[5]: Entering directory
> `/build/lumpy/tmp/portage/media-libs/mesa-10.3-r11/work/Mesa-10.3/src/mesa/drivers/dri'
>   CXXLD  mesa_dri_drivers.la
> ../../../../src/mapi/glapi/.libs/libglapi.a(glapi_libglapi_la-glapi_getproc.o):glapi_getproc.c:static_functions:
> error: undefined reference to 'glAreTexturesResidentEXT'
> ../../../../src/mapi/glapi/.libs/libglapi.a(glapi_libglapi_la-glapi_getproc.o):glapi_getproc.c:static_functions:
> error: undefined reference to 'glDeleteTexturesEXT'
> ...
> 
Indeed. static-glapi does not get too much testing, plus it seems that
it's broken (in a way) for a lng time.
It seems that we'll have to (temporary) resolve with shoving
dlopen(libglapi.so) into gbm, so that in time programs can nuke it from
their codebase. glapi is internal (implementation) detail that they
should not need to bother/know.

-Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

2014-11-06 Thread Emil Velikov
Humble ping x2

On 14/10/14 15:25, Emil Velikov wrote:
> Humble ping.
> 
> On 23/09/14 01:25, Emil Velikov wrote:
>> From: Sjoerd Simons 
>>
>> When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
>> application can decide whether they want to use RGBA (and have the
>> compositor blend their windows).
>>
>> On my system with this change EGLConfigs with a 24 bit visual comes up
>> first, as such applications blindly picking the first EGLConfig will
>> still get an RGB X visual.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
>> ---
>>
>> Hello gents,
>>
>> This patch has been stuck in bugzilla since February this year. Bringing 
>> it around here to gather greater exposure and perhaps some 
>> comments/reviews.
>>
>> -Emil
>>
>>  src/egl/drivers/dri2/egl_dri2.c |  5 +
>>  src/egl/drivers/dri2/platform_x11.c | 17 +
>>  2 files changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
>> b/src/egl/drivers/dri2/egl_dri2.c
>> index 20a7243..2ed90a7 100644
>> --- a/src/egl/drivers/dri2/egl_dri2.c
>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>> @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
>>  static EGLBoolean
>>  dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
>>  {
>> +
>> +   if (criteria->NativeVisualID != EGL_DONT_CARE &&
>> +conf->NativeVisualID != criteria->NativeVisualID)
>> +  return EGL_FALSE;
>> +
>> if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
>>return EGL_FALSE;
>>  
>> diff --git a/src/egl/drivers/dri2/platform_x11.c 
>> b/src/egl/drivers/dri2/platform_x11.c
>> index a7a7338..3395fb7 100644
>> --- a/src/egl/drivers/dri2/platform_x11.c
>> +++ b/src/egl/drivers/dri2/platform_x11.c
>> @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
>> dri2_egl_display *dri2_dpy,
>>  dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
>>  surface_type, config_attrs, rgba_masks);
>>  
>> -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
>> - * Otherwise it will only match a 32-bit RGBA visual.  On a
>> - * composited window manager on X11, this will make all of the
>> - * EGLConfigs with destination alpha get blended by the
>> - * compositor.  This is probably not what the application
>> - * wants... especially on drivers that only have 32-bit RGBA
>> - * EGLConfigs! */
>> -if (d.data->depth == 24) {
>> +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 
>> 32-bit
>> + * RGBA EGLConfig.  Otherwise it will only match a 32-bit RGBA
>> + * visual.  On a composited window manager on X11, this will 
>> make
>> + * all of the EGLConfigs with destination alpha get blended by 
>> the
>> + * compositor.  This is probably not what the application 
>> wants...
>> + * especially on drivers that only have 32-bit RGBA EGLConfigs!
>> + * Allowing both allows applications to make the decision 
>> whether
>> + * 32 bit visuals are intended */
>> +if (d.data->depth == 24 || d.data->depth == 32) {
>> rgba_masks[3] =
>>~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
>> dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
>>
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Juha-Pekka Heikkila
On 29.10.2014 14:05, Timothy Arceri wrote:
> Makes use of SSE to speed up compute of min and max elements
> 
> Callgrind cpu usage results from pts benchmarks:
> 
> Openarena 0.8.8: 3.67% -> 1.03%
> UrbanTerror: 2.36% -> 0.81%
> 
> V5:
> - actually make use of the optimisation in android (Emil Velikov)
> - set a better array size limit for using SSE and added TODO
> 
> V4:
> - fixed bugs with incrementing pointer and updating counters
> 
> V3:
> - Removed sse_minmax.c from Makefile.sources
> - handle the first few values without SSE until the pointer is aligned
>  and use _mm_load_si128 rather than _mm_loadu_si128
> - guard the call to the SSE code better at build time
> 
> V2:
> - removed GL* types
> - use _mm_store_si128() rather than _mm_store_ps()
> - add runtime check for SSE
> - use aligned attribute for local mix/max
> - bunch of tidyups
> 
> Signed-off-by: Timothy Arceri 
> ---
>  src/mesa/Android.libmesa_dricore.mk |  8 ++-
>  src/mesa/Android.libmesa_st_mesa.mk |  5 ++
>  src/mesa/Makefile.am|  3 +-
>  src/mesa/main/sse_minmax.c  | 97 
> +
>  src/mesa/main/sse_minmax.h  | 30 
>  src/mesa/vbo/vbo_exec_array.c   | 14 --
>  6 files changed, 152 insertions(+), 5 deletions(-)
>  create mode 100644 src/mesa/main/sse_minmax.c
>  create mode 100644 src/mesa/main/sse_minmax.h
> 
> diff --git a/src/mesa/Android.libmesa_dricore.mk 
> b/src/mesa/Android.libmesa_dricore.mk
> index 1e6d948..2ab593d 100644
> --- a/src/mesa/Android.libmesa_dricore.mk
> +++ b/src/mesa/Android.libmesa_dricore.mk
> @@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM
>  
>  ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
>  LOCAL_SRC_FILES += \
> - $(SRCDIR)main/streaming-load-memcpy.c
> + $(SRCDIR)main/streaming-load-memcpy.c \
> + $(SRCDIR)main/sse_minmax.c
>  LOCAL_CFLAGS := -msse4.1
>  endif
>  
> +ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
> +LOCAL_CFLAGS += \
> +   -DUSE_SSE41
> +endif
> +
>  LOCAL_C_INCLUDES := \
>   $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
>   $(MESA_TOP)/src \
> diff --git a/src/mesa/Android.libmesa_st_mesa.mk 
> b/src/mesa/Android.libmesa_st_mesa.mk
> index 8b8d652..618d6bf 100644
> --- a/src/mesa/Android.libmesa_st_mesa.mk
> +++ b/src/mesa/Android.libmesa_st_mesa.mk
> @@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86)
>  endif # x86
>  endif # MESA_ENABLE_ASM
>  
> +ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
> +LOCAL_CFLAGS := \
> +   -DUSE_SSE41
> +endif
> +
>  LOCAL_C_INCLUDES := \
>   $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
>   $(MESA_TOP)/src/gallium/auxiliary \
> diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
> index e71bccb..932db4f 100644
> --- a/src/mesa/Makefile.am
> +++ b/src/mesa/Makefile.am
> @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
>   $(ARCH_LIBS)
>  
>  libmesa_sse41_la_SOURCES = \
> - main/streaming-load-memcpy.c
> + main/streaming-load-memcpy.c \
> + main/sse_minmax.c
>  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
>  
>  pkgconfigdir = $(libdir)/pkgconfig
> diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c
> new file mode 100644
> index 000..91a55e5
> --- /dev/null
> +++ b/src/mesa/main/sse_minmax.c
> @@ -0,0 +1,97 @@
> +/*
> + * Copyright © 2014 Timothy Arceri
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Author:
> + *Timothy Arceri 
> + *
> + */
> +
> +#ifdef __SSE4_1__
> +#include "main/sse_minmax.h"
> +#include 
> +#include 
> +
> +void
> +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index,
> + unsigned *max_index, const unsigned count)
> +{
> +   unsigned max_ui = 0;
> +   unsigned min_ui = ~0U;
> +   unsigned i = 0;
> +   unsigned aligned_count = count;
> +
> +   /* handle the first few values without SSE until the pointer is

Re: [Mesa-dev] [PATCH v3 3/9] gallium/auxiliary: implement sw_probe_wrapped

2014-11-06 Thread Emil Velikov
Hi David,

Just a few nitpicks which I've missed the previously

On 02/11/14 18:31, David Heidelberg wrote:
> Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped
> software renderer backend when using the pipe loader.
> 
> Signed-off-by: David Heidelberg 
> ---
>  src/gallium/auxiliary/pipe-loader/pipe_loader.h| 11 +++
>  src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c | 22
> ++
>  src/gallium/targets/gbm/Makefile.am|  1 +
>  src/gallium/targets/opencl/Makefile.am |  1 +
>  src/gallium/targets/xa/Makefile.am |  1 +
>  src/gallium/tests/trivial/Makefile.am  |  1 +
>  6 files changed, 37 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.h
> b/src/gallium/auxiliary/pipe-loader/pipe_loader.h
> index 6127a6a..9f43f17 100644
> --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.h
> +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.h
> @@ -166,6 +166,17 @@ pipe_loader_sw_probe_null(struct pipe_loader_device
> **devs);
>  int
>  pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev);
>  +/**
> + * Get a software device wrapped atop another device.
> + *
> + * This function is platform-specific.
> + *
> + * \sa pipe_loader_probe
> + */
> +boolean
> +pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
> + struct pipe_screen *screen);
> +
>  #ifdef HAVE_PIPE_LOADER_DRM
>   /**
> diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> index b1b1ca6..b152f60 100644
> --- a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
> @@ -29,8 +29,11 @@
>   #include "util/u_memory.h"
>  #include "util/u_dl.h"
> +#ifdef HAVE_PIPE_LOADER_DRI
>  #include "sw/dri/dri_sw_winsys.h"
> +#endif
Does this bring any benefit ? Afaics the header does not pull any
dependencies - drm, Xlib, etc so it should be safe as is.

>  #include "sw/null/null_sw_winsys.h"
> +#include "sw/wrapper/wrapper_sw_winsys.h"
>  #ifdef HAVE_PIPE_LOADER_XLIB
>  /* Explicitly wrap the header to ease build without X11 headers */
>  #include "sw/xlib/xlib_sw_winsys.h"
> @@ -140,6 +143,25 @@ pipe_loader_sw_probe(struct pipe_loader_device
> **devs, int ndev)
> return i;
>  }
>  +boolean
> +pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
> + struct pipe_screen *screen)
> +{
> +   struct pipe_loader_sw_device *sdev =
> CALLOC_STRUCT(pipe_loader_sw_device);
> +
Can we check for CALLOC_STRUCT failure ?

> +   sdev->base.type = PIPE_LOADER_DEVICE_SOFTWARE;
> +   sdev->base.driver_name = "swrast";
> +   sdev->base.ops = &pipe_loader_sw_ops;
> +   sdev->ws = wrapper_sw_winsys_wrap_pipe_screen(screen);
> +
> +   if (!sdev->ws) {
> +  FREE(sdev);
> +  return FALSE;
> +   }
> +   *dev = &sdev->base;
> +   return TRUE;
> +}
> +
>  static void
>  pipe_loader_sw_release(struct pipe_loader_device **dev)
>  {
> diff --git a/src/gallium/targets/gbm/Makefile.am
> b/src/gallium/targets/gbm/Makefile.am
> index 2c9b425..679c994 100644
> --- a/src/gallium/targets/gbm/Makefile.am
> +++ b/src/gallium/targets/gbm/Makefile.am
> @@ -34,6 +34,7 @@ gbm_gallium_drm_la_SOURCES =
>  gbm_gallium_drm_la_LIBADD = \
>  $(top_builddir)/src/gallium/state_trackers/gbm/libgbm.la \
>  $(top_builddir)/src/gallium/auxiliary/libgallium.la \
> +$(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \
You can avoid adding this for each target by add it to
GALLIUM_PIPE_LOADER_WINSYS_LIBS. Something like the following would do
the job.


--- a/src/gallium/Automake.inc
+++ b/src/gallium/Automake.inc

GALLIUM_PIPE_LOADER_WINSYS_LIBS = \
+  $(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \
   $(top_builddir)/src/gallium/winsys/sw/null/libws_null.la


Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3][RFC v2] Clamp rgba floats with sse

2014-11-06 Thread Juha-Pekka Heikkila
On 05.11.2014 17:25, Emil Velikov wrote:
> Hi Juha-Pekka,
> 
> On 04/11/14 12:05, Juha-Pekka Heikkila wrote:
> [snip]
>> I made 'x86' folder under
>> src/mesa/main. The idea here being if there is optimization targeting
>> architecture it'd exist directly under the place where it was used, in its
>> own subdirectly indicating targeted architecture. I don't think majority
>> of such code would be generic code thus this approach.
>>
> IMHO adding x86 brings no benefit at this point considering all the
> optimisations are x86 based. Yet if you insist on having it can you move
> the other optimisations in there please.

This was just following the comments I got for my first rfc round for
this sse stuff, comments there reached consensus these should be
collected somewhere separate. I'll remove the x86 folder, if there are
later wishes to separate these to their own location I let someone else
continue with it.

/Juha-Pekka

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

2014-11-06 Thread Emil Velikov
Hi Ken,

From what I've gathered the proposed patch is incorrect and is (most
likely) working around a buggy application behaviour. Afaics Ian
suggested that we add a driconf option for such applications.

Should I consider this patch for the stable branch or the above sounds
about right and we can drop it ?

Thanks
Emil

On 14/10/14 23:42, Kenneth Graunke wrote:
> According to INTEL_DEBUG=perf, "Borderlands: The Pre-Sequel" was
> stalling on nearly every glBufferSubData call, with very slightly
> overlapping busy ranges.
> 
> It turns out the draw upload code was accidentally including an extra
> stride's worth of data in the vertex buffer size due to a simple
> off-by-one error.  We considered this extra bit of buffer space to be
> busy (in use by the GPU), when it was actually idle.
> 
> The new diagram should make it easier to understand the formula.  It's
> basically what I drew on paper when working through an actual
> glDrawRangeElements call.
> 
> Eliminates all glBufferSubData stalls in "Borderlands: The Pre-Sequel."
> 
> Signed-off-by: Kenneth Graunke 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> No Piglit regressions on Haswell.  This might help Dota 2 and Serious Sam 3
> as well, but I haven't checked.
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
> b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> index 5a12439..6cb653c 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> @@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw)
>offset = 0;
>size = intel_buffer->Base.Size;
> } else {
> +  /* Compute the size/amount of data referenced by the GPU.
> +   * If the data is interleaved, StrideB may be larger than
> +   * _ElementSize.  As an example, assume we have 2 
> interleaved
> +   * attributes A and B.  The data is organized like this:
> +   *
> +   *   StrideEltSize
> +   *_,,_,
> +   *   /\  / \
> +   *A: ---   ---   ---   ---   ---   ---
> +   *B:---   ---   ---   ---   ---   ---
> +   *
> +   *   |= 4 elts ==|  (4-1) * Stride + EltSize
> +   *
> +   * max_index - min_index gives the number of elements that
> +   * will be referenced.  Say we're drawing 4 elements.  On
> +   * the first three, we need the full stride in order to get
> +   * to the next element.  But on the last, we only want the
> +   * element size, since we don't actually read the other
> +   * interleaved vertex attributes stored beyond that.
> +   */
>offset = buffer->offset + min_index * buffer->stride;
> -  size = (buffer->stride * (max_index - min_index) +
> +  size = (buffer->stride * MAX2(max_index - min_index - 1, 
> 0) +
>glarray->_ElementSize);
> }
>  }
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=85918

--- Comment #4 from Emil Velikov  ---
Roland,
Indeed this issue is not related (does not seem) to the verison of msvc yet
it's a nice reminder about the topic, plus an humble ping for Michael :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=85918

Emil Velikov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
Version|unspecified |10.3
 Resolution|--- |FIXED

--- Comment #5 from Emil Velikov  ---
Pushed to the 10.3 branch. Issue will be fixed in mesa 10.3.3.

commit d5700dc276008decb2a5d63bfa38522c5f4ad3f3
Author: Brian Paul 
Date:   Wed Sep 10 08:16:24 2014 -0600

mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC

MSVC replaces the "F" in "255.0F" with the macro argument which leads
to an error.  s/F/FLT/ to avoid that.

It turns out we weren't using this macro at all on MSVC until the
recent "mesa: Drop USE_IEEE define." change.

Reviewed-by: Roland Scheidegger 
(cherry picked from commit 9608193cbc6ea14e49adcd0193f9e7c6058d5a2f)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85918
Nominated-by: Roland Scheidegger 

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Marek Olšák
I'd say it's a spec bug. ARB_texture_buffer_range should say that the
offset should be a multiple of an element size, but it doesn't. The
question is, what should the element size be? One component or the
whole pixel?

Marek

On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger  wrote:
> Trying to fix some bug due to alignment issues in llvmpipe's vertex
> fetch, I came across some issue with ARB_texture_buffer_range.
> Namely, it looks like the offsets specified there are always in bytes,
> regardless the actual format (hence, as long as the
> TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
> offset of 15 bytes for a rgba32f format for instance making all fetches
> quite unaligned).
> However in gallium we actually have first_elem and last_elem parameters
> in the sampler views which are specified in number of elements (so takes
> the format into account), which is what d3d10 does and the state tracker
> translates to that apparently. IMHO d3d10 makes way more sense there
> because that way the necessary alignment scales automatically depending
> on the format (so, if the format is 2x16bit for instance you'd need 4
> byte alignment for the offset, and only need 16 bytes alignment for
> 4x32bit, ensuring all lookups are always aligned). This means that 15
> byte offset in the example above is completely untranslatable.
> But if I see that right, OpenGL doesn't work like that, meaning
> effectively gallium drivers (and I doubt most other drivers neither)
> cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
> than 16, even if they'd only need that for 4x32bit formats. Though most
> gallium drivers indeed claim 1 right now.
> Looks quite messy...
>
> Roland
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=85918

--- Comment #6 from Michael Bachmann  ---
Thank you for taking care about this issue and for including the fix in Version
10.3.3.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Roland Scheidegger
Am 06.11.2014 um 16:15 schrieb Marek Olšák:
> I'd say it's a spec bug. ARB_texture_buffer_range should say that the
> offset should be a multiple of an element size, but it doesn't. The
> question is, what should the element size be? One component or the
> whole pixel?
Imho whole pixel (for block compressed that would be full block, for
things like packed 565 too but neither are possible in GL), i.e. "format
granularity". That said, the whole alignment thing is problematic for
rgb32 (and the possiblity of that was added later,
ARB_texture_buffer_object_rgb32), so maybe it's things like that why the
offset can be just byte aligned (in other words, I'm not convinced it's
just a spec bug, d3d10 doesn't have that problem with alignment).

Roland


> 
> Marek
> 
> On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger  wrote:
>> Trying to fix some bug due to alignment issues in llvmpipe's vertex
>> fetch, I came across some issue with ARB_texture_buffer_range.
>> Namely, it looks like the offsets specified there are always in bytes,
>> regardless the actual format (hence, as long as the
>> TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
>> offset of 15 bytes for a rgba32f format for instance making all fetches
>> quite unaligned).
>> However in gallium we actually have first_elem and last_elem parameters
>> in the sampler views which are specified in number of elements (so takes
>> the format into account), which is what d3d10 does and the state tracker
>> translates to that apparently. IMHO d3d10 makes way more sense there
>> because that way the necessary alignment scales automatically depending
>> on the format (so, if the format is 2x16bit for instance you'd need 4
>> byte alignment for the offset, and only need 16 bytes alignment for
>> 4x32bit, ensuring all lookups are always aligned). This means that 15
>> byte offset in the example above is completely untranslatable.
>> But if I see that right, OpenGL doesn't work like that, meaning
>> effectively gallium drivers (and I doubt most other drivers neither)
>> cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
>> than 16, even if they'd only need that for 4x32bit formats. Though most
>> gallium drivers indeed claim 1 right now.
>> Looks quite messy...
>>
>> Roland
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=AAIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9w&s=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXs&e=
>>  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375

2014-11-06 Thread Jan Vesely
Signed-off-by: Jan Vesely 
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index e953822..3a4fcf0 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -282,7 +282,11 @@ namespace {
 
   for (unsigned i = 0; i < kernel_node->getNumOperands(); ++i) {
  kernels.push_back(llvm::dyn_cast(
+#if HAVE_LLVM >= 0x0306
+
kernel_node->getOperandAsMDNode(i)->getOperand(0)));
+#else
 
kernel_node->getOperand(i)->getOperand(0)));
+#endif
   }
}
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Marek Olšák
For radeonsi, I think only x8, x8y8, and x16 fetches can be
byte-aligned. Everything else is dword-aligned (the 2 lowest bits are
ignored). I guess the cap should be 4 then.

Marek

On Thu, Nov 6, 2014 at 4:55 PM, Roland Scheidegger  wrote:
> Am 06.11.2014 um 16:15 schrieb Marek Olšák:
>> I'd say it's a spec bug. ARB_texture_buffer_range should say that the
>> offset should be a multiple of an element size, but it doesn't. The
>> question is, what should the element size be? One component or the
>> whole pixel?
> Imho whole pixel (for block compressed that would be full block, for
> things like packed 565 too but neither are possible in GL), i.e. "format
> granularity". That said, the whole alignment thing is problematic for
> rgb32 (and the possiblity of that was added later,
> ARB_texture_buffer_object_rgb32), so maybe it's things like that why the
> offset can be just byte aligned (in other words, I'm not convinced it's
> just a spec bug, d3d10 doesn't have that problem with alignment).
>
> Roland
>
>
>>
>> Marek
>>
>> On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger  
>> wrote:
>>> Trying to fix some bug due to alignment issues in llvmpipe's vertex
>>> fetch, I came across some issue with ARB_texture_buffer_range.
>>> Namely, it looks like the offsets specified there are always in bytes,
>>> regardless the actual format (hence, as long as the
>>> TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
>>> offset of 15 bytes for a rgba32f format for instance making all fetches
>>> quite unaligned).
>>> However in gallium we actually have first_elem and last_elem parameters
>>> in the sampler views which are specified in number of elements (so takes
>>> the format into account), which is what d3d10 does and the state tracker
>>> translates to that apparently. IMHO d3d10 makes way more sense there
>>> because that way the necessary alignment scales automatically depending
>>> on the format (so, if the format is 2x16bit for instance you'd need 4
>>> byte alignment for the offset, and only need 16 bytes alignment for
>>> 4x32bit, ensuring all lookups are always aligned). This means that 15
>>> byte offset in the example above is completely untranslatable.
>>> But if I see that right, OpenGL doesn't work like that, meaning
>>> effectively gallium drivers (and I doubt most other drivers neither)
>>> cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
>>> than 16, even if they'd only need that for 4x32bit formats. Though most
>>> gallium drivers indeed claim 1 right now.
>>> Looks quite messy...
>>>
>>> Roland
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=AAIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9w&s=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXs&e=
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965: Add code to verify the CFG is sane.

2014-11-06 Thread Jason Ekstrand
In general, it seems as if this can miss several things.  For instance, it
checks that all the predicessors are valid but never that we have all the
predecessors.  Same for successors.  If we really want to be able to
validate a CFG, maybe a stack-based approach like calculate_cfg would work
better?  Also, did you run this on piglit/shader-db to ensure that
everything coming out of calculate_cfg actually passes?

More comments inline

On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner  wrote:

> ---
>  src/mesa/drivers/dri/i965/test_verify_cfg.cpp | 273
> ++
>  src/mesa/drivers/dri/i965/test_verify_cfg.h   |  26 +++
>  2 files changed, 299 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.cpp
>  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.h
>
> diff --git a/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
> b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
> new file mode 100644
> index 000..0aa74c5
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
> @@ -0,0 +1,273 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include 
> +#include "test_verify_cfg.h"
> +#include "brw_cfg.h"
> +
> +static bool
> +is_unconditional_jump(const backend_instruction *inst)
> +{
> +   return (inst->opcode == BRW_OPCODE_BREAK ||
> +   inst->opcode == BRW_OPCODE_CONTINUE ||
> +   inst->opcode == BRW_OPCODE_WHILE) &&
> +  inst->predicate == BRW_PREDICATE_NONE;
> +}
> +
> +void
> +verify_cfg(backend_visitor *v)
> +{
> +   foreach_block(block, v->cfg) {
> +  switch (block->start()->opcode) {
> +  case BRW_OPCODE_ENDIF: {
> + /* Has two predecessors:
> +  *- the previous block is always a predecessor
> +  *- always a predecessor ending in an IF or an ELSE
> +  *
> +  * Note that if the body of the if block is empty, then the
> +  * previous block *is* the block that ends with IF, so the ENDIF
> +  * block will have the same predecessor twice.
> +  */
> + if (is_unconditional_jump(block->prev()->end())) {
> +EXPECT_EQ(block->parents.length(), 1u);
> + } else {
> +EXPECT_EQ(block->parents.length(), 2u);
> + }
>

This obviously catches

if (foo) {
/* Do stuff */
} else {
break;
}

but what about

if (foo) {
break;
} else {
/* Do stuff */
}

Or worse, what about

if (foo) {
break;
} else {
continue;
}

Maybe there's something I'm missing here.


> +
> + bool found_prev = false, found_if = false, found_else = false,
> +  found_other = false;
> + foreach_list_typed(bblock_link, parent, link, &block->parents) {
> +if (parent->block == block->prev() && !found_prev)
> +   found_prev = true;
> +else if (parent->block->end()->opcode == BRW_OPCODE_IF)
> +   found_if = true;
> +else if (parent->block->end()->opcode == BRW_OPCODE_ELSE)
> +   found_else = true;
> +else
> +   found_other = true;
> + }
>

how do we detect if a parent is simply missing?  Or are we trusting in the
check above?


> + EXPECT_NE(found_prev,
> is_unconditional_jump(block->prev()->end()));
> + EXPECT_NE(found_if, found_else);
> + EXPECT_FALSE(found_other);
> + break;
> +  }
> +  case BRW_OPCODE_DO: {
> + /* Has two or more predecessors
> +  *- the previous block is always a predecessor
> +  *- always a predecessor ending in a WHILE
> +  *- some number of predecessors ending in continue
> +  */
> + if (is_unconditional_jump(block->prev()->end())) {
> +EXPECT_GE(block->parents.length(), 1u);
> + }

Re: [Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375

2014-11-06 Thread Tom Stellard
On Thu, Nov 06, 2014 at 11:46:41AM -0500, Jan Vesely wrote:
> Signed-off-by: Jan Vesely 

I've pushed this, thanks!

-Tom

> ---
>  src/gallium/state_trackers/clover/llvm/invocation.cpp | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index e953822..3a4fcf0 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -282,7 +282,11 @@ namespace {
>  
>for (unsigned i = 0; i < kernel_node->getNumOperands(); ++i) {
>   kernels.push_back(llvm::dyn_cast(
> +#if HAVE_LLVM >= 0x0306
> +
> kernel_node->getOperandAsMDNode(i)->getOperand(0)));
> +#else
>  
> kernel_node->getOperand(i)->getOperand(0)));
> +#endif
>}
> }
>  
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965/cfg: Add functions to get first and last non-CF instructions.

2014-11-06 Thread Jason Ekstrand
1 and 4 are Reviewed-by: Jason Ekstrand 

On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner  wrote:

> ---
>  src/mesa/drivers/dri/i965/brw_cfg.h | 74
> +
>  1 file changed, 74 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h
> b/src/mesa/drivers/dri/i965/brw_cfg.h
> index e6403bd..c2029cc 100644
> --- a/src/mesa/drivers/dri/i965/brw_cfg.h
> +++ b/src/mesa/drivers/dri/i965/brw_cfg.h
> @@ -71,6 +71,12 @@ struct bblock_t {
> const bblock_t *next() const;
> bblock_t *prev();
> const bblock_t *prev() const;
> +
> +   bool starts_with_control_flow() const;
> +   bool ends_with_control_flow() const;
> +
> +   backend_instruction *first_non_control_flow_inst();
> +   backend_instruction *last_non_control_flow_inst();
>  #endif
>
> struct exec_node link;
> @@ -142,6 +148,50 @@ bblock_prev_const(const struct bblock_t *block)
> return (const struct bblock_t *)block->link.prev;
>  }
>
> +static inline bool
> +bblock_starts_with_control_flow(const struct bblock_t *block)
> +{
> +   enum opcode op = bblock_start_const(block)->opcode;
> +   return op == BRW_OPCODE_DO || op == BRW_OPCODE_ENDIF;
> +}
> +
> +static inline bool
> +bblock_ends_with_control_flow(const struct bblock_t *block)
> +{
> +   enum opcode op = bblock_end_const(block)->opcode;
> +   return op == BRW_OPCODE_IF ||
> +  op == BRW_OPCODE_ELSE ||
> +  op == BRW_OPCODE_WHILE ||
> +  op == BRW_OPCODE_BREAK ||
> +  op == BRW_OPCODE_CONTINUE;
> +}
> +
> +static inline struct backend_instruction *
> +bblock_first_non_control_flow_inst(struct bblock_t *block)
> +{
> +   struct backend_instruction *inst = bblock_start(block);
> +   if (bblock_starts_with_control_flow(block))
> +#ifdef __cplusplus
> +  inst = (struct backend_instruction *)inst->next;
> +#else
> +  inst = (struct backend_instruction *)inst->link.next;
> +#endif
> +   return inst;
> +}
> +
> +static inline struct backend_instruction *
> +bblock_last_non_control_flow_inst(struct bblock_t *block)
> +{
> +   struct backend_instruction *inst = bblock_end(block);
> +   if (bblock_ends_with_control_flow(block))
> +#ifdef __cplusplus
> +  inst = (struct backend_instruction *)inst->prev;
> +#else
> +  inst = (struct backend_instruction *)inst->link.prev;
> +#endif
> +   return inst;
> +}
> +
>  #ifdef __cplusplus
>  inline backend_instruction *
>  bblock_t::start()
> @@ -190,6 +240,30 @@ bblock_t::prev() const
>  {
> return bblock_prev_const(this);
>  }
> +
> +inline bool
> +bblock_t::starts_with_control_flow() const
> +{
> +   return bblock_starts_with_control_flow(this);
> +}
> +
> +inline bool
> +bblock_t::ends_with_control_flow() const
> +{
> +   return bblock_ends_with_control_flow(this);
> +}
> +
> +inline backend_instruction *
> +bblock_t::first_non_control_flow_inst()
> +{
> +   return bblock_first_non_control_flow_inst(this);
> +}
> +
> +inline backend_instruction *
> +bblock_t::last_non_control_flow_inst()
> +{
> +   return bblock_last_non_control_flow_inst(this);
> +}
>  #endif
>
>  struct cfg_t {
> --
> 2.0.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] st/va: MPEG4 call vlVaDecoderFixMPEG4Startcode()

2014-11-06 Thread Leo Liu
From: Michael Varga 

If the VOP and GOV headers were truncated they will be regenerated.

Signed-off-by: Michael Varga 
---
 src/gallium/state_trackers/va/picture.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 7107231..9b94b39 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -503,6 +503,13 @@ handleVASliceDataBufferType(vlVaContext *context, 
vlVaBuffer *buf)
  buffers[num_buffers] = (void *const)&start_code_vc1;
  sizes[num_buffers++] = sizeof(start_code_vc1);
   break;
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  if (bufHasStartcode(buf, 0x01, 24))
+ break;
+
+  vlVaDecoderFixMPEG4Startcode(context);
+  buffers[num_buffers] = (void *)context->mpeg4.start_code;
+  sizes[num_buffers++] = context->mpeg4.start_code_size;
default:
   break;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 4:18 AM, Timothy Arceri  wrote:
> Signed-off-by: Timothy Arceri 
> ---
>  configure.ac   | 6 ++
>  src/mesa/x86/common_x86.c  | 4 
>  src/mesa/x86/common_x86_features.h | 4 +++-
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/configure.ac b/configure.ac
> index 03f1bca..cc0a661 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -258,6 +258,12 @@ if test "x$SSE41_SUPPORTED" = x1; then
>  fi
>  AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
>
> +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
> +if test "x$SSSE3_SUPPORTED" = x1; then
> +DEFINES="$DEFINES -DUSE_SSSE3"
> +fi
> +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
> +
>  dnl Can't have static and shared libraries, default to static if user
>  dnl explicitly requested. If both disabled, set to static since shared
>  dnl was explicitly requested.
> diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
> index 25f5c40..ad0648a 100644
> --- a/src/mesa/x86/common_x86.c
> +++ b/src/mesa/x86/common_x86.c
> @@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
>
>__get_cpuid(1, &eax, &ebx, &ecx, &edx);
>
> +#ifdef bit_SSSE3

Why did you put this in an #ifdef bit_SSSE3? That's defined everywhere.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] st/va: MPEG4 populate the PPS structure

2014-11-06 Thread Leo Liu
From: Michael Varga 

Signed-off-by: Michael Varga 
---
 src/gallium/state_trackers/va/picture.c| 72 ++
 src/gallium/state_trackers/va/va_private.h |  9 
 2 files changed, 81 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 8775681..a4eb26b 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -80,6 +80,12 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
VAPictureParameterBufferMPEG2 *mpeg2;
VAPictureParameterBufferH264 *h264;
VAPictureParameterBufferVC1 * vc1;
+   VAPictureParameterBufferMPEG4 *mpeg4;
+   vlVaSurface *surf_forward;
+   vlVaSurface *surf_backward;
+   unsigned int i;
+   static const uint8_t default_intra_quant_matrix[64] = { 0 };
+   static const uint8_t default_non_intra_quant_matrix[64] = { 0 };
 
switch (u_reduce_video_profile(context->decoder->profile)) {
case PIPE_VIDEO_FORMAT_MPEG12:
@@ -214,6 +220,72 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
   context->desc.vc1.pquant = 
vc1->pic_quantizer_fields.bits.pic_quantizer_scale;
   break;
 
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  assert(buf->size >= sizeof(VAPictureParameterBufferMPEG4) && 
buf->num_elements == 1);
+  mpeg4 = buf->data;
+
+  context->mpeg4.pps = *mpeg4;
+
+  /* vop_width */
+  /* vop_height */
+  /* forward_reference_picture */
+  /* backward_reference_picture */
+  context->desc.mpeg4.short_video_header =
+mpeg4->vol_fields.bits.short_video_header;
+  /* chroma_format */
+  context->desc.mpeg4.interlaced = mpeg4->vol_fields.bits.interlaced;
+  /* obmc_disable */
+  /* sprite_enable */
+  /* sprite_warping_accuracy */
+  context->desc.mpeg4.quant_type = mpeg4->vol_fields.bits.quant_type;
+  context->desc.mpeg4.quarter_sample = 
mpeg4->vol_fields.bits.quarter_sample;
+  /* data_partitioned */
+  /* reversible_vlc */
+  context->desc.mpeg4.resync_marker_disable =
+mpeg4->vol_fields.bits.resync_marker_disable;
+  /* no_of_sprite_warping_points */
+  /* sprite_trajectory_du */
+  /* sprite_trajectory_dv */
+  /* quant_precision */
+  context->desc.mpeg4.vop_coding_type = 
mpeg4->vop_fields.bits.vop_coding_type;
+  /* backward_reference_vop_coding_type */
+  /* vop_rounding_type */
+  /* intra_dc_vlc_thr */
+  context->desc.mpeg4.top_field_first =
+mpeg4->vop_fields.bits.top_field_first;
+  context->desc.mpeg4.alternate_vertical_scan_flag =
+mpeg4->vop_fields.bits.alternate_vertical_scan_flag;
+  context->desc.mpeg4.vop_fcode_forward = mpeg4->vop_fcode_forward;
+  context->desc.mpeg4.vop_fcode_backward = mpeg4->vop_fcode_backward;
+  context->desc.mpeg4.vop_time_increment_resolution =
+mpeg4->vop_time_increment_resolution;
+  /* num_gobs_in_vop */
+  /* num_macroblocks_in_gob */
+  context->desc.mpeg4.trb[0] = mpeg4->TRB;
+  context->desc.mpeg4.trb[1] = mpeg4->TRB;
+  context->desc.mpeg4.trd[0] = mpeg4->TRD;
+  context->desc.mpeg4.trd[1] = mpeg4->TRD;
+
+  /* default [non-]intra quant matrix because mpv does not set these
+ matrices */
+  if (!context->desc.mpeg4.intra_matrix)
+ context->desc.mpeg4.intra_matrix = default_intra_quant_matrix;
+  if (!context->desc.mpeg4.non_intra_matrix)
+ context->desc.mpeg4.non_intra_matrix = default_non_intra_quant_matrix;
+
+  surf_forward = handle_table_get(drv->htab, 
mpeg4->forward_reference_picture);
+  if (surf_forward)
+ context->desc.mpeg4.ref[0] = surf_forward->buffer;
+  surf_backward = handle_table_get(drv->htab, 
mpeg4->backward_reference_picture);
+  if (surf_backward)
+ context->desc.mpeg4.ref[1] = surf_backward->buffer;
+
+  context->mpeg4.vti_bits = 0;
+  for (i = context->desc.mpeg4.vop_time_increment_resolution; i > 0; i /= 
2)
+ ++context->mpeg4.vti_bits;
+
+  break;
+
default:
   break;
}
diff --git a/src/gallium/state_trackers/va/va_private.h 
b/src/gallium/state_trackers/va/va_private.h
index 060a1fa..7d2fc24 100644
--- a/src/gallium/state_trackers/va/va_private.h
+++ b/src/gallium/state_trackers/va/va_private.h
@@ -162,6 +162,15 @@ typedef struct {
   struct pipe_vc1_picture_desc vc1;
   struct pipe_h264_picture_desc h264;
} desc;
+
+   struct {
+  unsigned long long int frame_num;
+  unsigned int start_code_size;
+  unsigned int vti_bits;
+  unsigned int quant_scale;
+  VAPictureParameterBufferMPEG4 pps;
+  uint8_t start_code[32];
+   } mpeg4;
 } vlVaContext;
 
 typedef struct {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri  wrote:
> Also cleans up some if statements in the *faster functions.
>
> Callgrind cpu usage results from pts benchmarks:
>
> For ytile_copy_faster()
>
> Nexuiz 1.6.1: 2.16% -> 1.20%
>
> Signed-off-by: Timothy Arceri 
> ---
>  src/mesa/Makefile.am   |  8 +++
>  src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 
> ++
>  src/mesa/main/fast_rgba8_copy.c| 78 
>  src/mesa/main/fast_rgba8_copy.h| 37 
>  4 files changed, 141 insertions(+), 64 deletions(-)
>  create mode 100644 src/mesa/main/fast_rgba8_copy.c
>  create mode 100644 src/mesa/main/fast_rgba8_copy.h
>
> diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
> index e71bccb..2402096 100644
> --- a/src/mesa/Makefile.am
> +++ b/src/mesa/Makefile.am
> @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
>
>  ARCH_LIBS =
>
> +if SSSE3_SUPPORTED
> +ARCH_LIBS += libmesa_ssse3.la
> +endif
> +
>  if SSE41_SUPPORTED
>  ARCH_LIBS += libmesa_sse41.la
>  endif
> @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
> main/streaming-load-memcpy.c
>  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
>
> +libmesa_ssse3_la_SOURCES = \
> +   main/fast_rgba8_copy.c
> +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
> +
>  pkgconfigdir = $(libdir)/pkgconfig
>  pkgconfig_DATA = gl.pc
>
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
> b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> index cb5738a..0deeb75 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> @@ -27,6 +27,7 @@
>   **/
>
>  #include "main/bufferobj.h"
> +#include "main/fast_rgba8_copy.h"
>  #include "main/image.h"
>  #include "main/macros.h"
>  #include "main/mtypes.h"
> @@ -42,9 +43,7 @@
>  #include "intel_mipmap_tree.h"
>  #include "intel_blit.h"
>
> -#ifdef __SSSE3__
> -#include 
> -#endif
> +#include "x86/common_x86_asm.h"
>
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>
> @@ -175,18 +174,6 @@ err:
> return false;
>  }
>
> -#ifdef __SSSE3__
> -static const uint8_t rgba8_permutation[16] =
> -   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
> -
> -/* NOTE: dst must be 16 byte aligned */
> -#define rgba8_copy_16(dst, src) \
> -   *(__m128i *)(dst) = _mm_shuffle_epi8(\
> -  (__m128i) _mm_loadu_ps((float *)(src)),   \
> -  *(__m128i *) rgba8_permutation\
> -   )
> -#endif
> -
>  /**
>   * Copy RGBA to BGRA - swap R and B.
>   */
> @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
> uint8_t *d = dst;
> uint8_t const *s = src;
>
> -#ifdef __SSSE3__
> -   /* Fast copying for tile spans.
> -*
> -* As long as the destination texture is 16 aligned,
> -* any 16 or 64 spans we get here should also be 16 aligned.
> -*/
> -
> -   if (bytes == 16) {
> -  assert(!(((uintptr_t)dst) & 0xf));
> -  rgba8_copy_16(d+ 0, s+ 0);
> -  return dst;
> -   }
> -
> -   if (bytes == 64) {
> -  assert(!(((uintptr_t)dst) & 0xf));
> -  rgba8_copy_16(d+ 0, s+ 0);
> -  rgba8_copy_16(d+16, s+16);
> -  rgba8_copy_16(d+32, s+32);
> -  rgba8_copy_16(d+48, s+48);
> -  return dst;
> -   }
> -#endif
> -
> while (bytes >= 4) {
>d[0] = s[2];
>d[1] = s[1];
> @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
> uint32_t x3,
>mem_copy_fn mem_copy)
>  {
> if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
> -  if (mem_copy == memcpy)
> - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
> -   dst, src, src_pitch, swizzle_bit, memcpy);
> -  else if (mem_copy == rgba8_copy)
> - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
> -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
> -   } else {
> -  if (mem_copy == memcpy)
> - return xtile_copy(x0, x1, x2, x3, y0, y1,
> -   dst, src, src_pitch, swizzle_bit, memcpy);
> -  else if (mem_copy == rgba8_copy)
> - return xtile_copy(x0, x1, x2, x3, y0, y1,
> -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
> +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
> +dst, src, src_pitch, swizzle_bit, mem_copy);
> }
> xtile_copy(x0, x1, x2, x3, y0, y1,
>dst, src, src_pitch, swizzle_bit, mem_copy);
> @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
> uint32_t x3,
>mem_copy_fn mem_copy)
>  {
> if (x0 == 0 && x3 == ytile_width && y0 == 0 && y1 == ytile_height) {
> -  if (mem_copy == memcpy)
> - return ytile_copy(0, 0, ytile_width, ytile_width

Re: [Mesa-dev] [PATCH 2/4] i965/fs: Wire up control flow correctly in predicated break pass.

2014-11-06 Thread Jason Ekstrand
On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner  wrote:

> When the earlier block ended with control flow, we'd mistakenly remove
> some of its links to its children. The same happened with the later
> block.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp | 10
> +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git
> a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
> index b7a1d7e..047c2c0 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
> @@ -107,10 +107,14 @@ fs_visitor::opt_peephole_predicated_break()
>}
>endif_inst->remove(endif_block);
>
> -  earlier_block->children.make_empty();
> -  later_block->parents.make_empty();
> +  if (!earlier_block->ends_with_control_flow()) {
> + earlier_block->children.make_empty();
> + earlier_block->add_successor(cfg->mem_ctx, jump_block);
> +  }
>
> -  earlier_block->add_successor(cfg->mem_ctx, jump_block);
> +  if (!later_block->starts_with_control_flow()) {
> + later_block->parents.make_empty();
> +  }
>

I *think* this is correct.  I didn't really understand it before and I
don't 100% now.

Acked-by: Jason Ekstrand 


>jump_block->add_successor(cfg->mem_ctx, later_block);
>
>if (earlier_block->can_combine_with(jump_block)) {
> --
> 2.0.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #50 from Jason Ekstrand  ---
(In reply to Iago Toral from comment #49)
> Jason you had this commit in your original branch: "MAYBEREVERT: Fill X
> components with 1"
> 
> That basically makes packing to padded formats (like RGBX) set X=1. In the
> commit message you mention that you are not sure you like this... my opinion
> is that this should not be necessary and it degrades performance when
> packing to these format, so I would be more for removing this one. What do
> you think?
> 
> FWIW, I have checked that removing this commit does not affect piglit
> results.

Go ahead and remove it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] st/va: refactored handleVASliceDataBufferType

2014-11-06 Thread Leo Liu
From: Michael Varga 

This patch cleans the function handleVASliceDataBufferType() for better
readability.

Signed-off-by: Michael Varga 
---
 src/gallium/state_trackers/va/picture.c | 75 ++---
 1 file changed, 40 insertions(+), 35 deletions(-)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 57d1fb1..8775681 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -272,51 +272,56 @@ handleSliceParameterBuffer(vlVaContext *context, 
vlVaBuffer *buf)
}
 }
 
+static unsigned int
+bufHasStartcode(vlVaBuffer *buf, unsigned int code, unsigned int bits)
+{
+   struct vl_vlc vlc = {0};
+   int i;
+
+   /* search the first 64 bytes for a startcode */
+   vl_vlc_init(&vlc, 1, (const void * const*)&buf->data, &buf->size);
+   for (i = 0; i < 64 && vl_vlc_bits_left(&vlc) >= bits; ++i) {
+  if (vl_vlc_peekbits(&vlc, bits) == code)
+ return 1;
+  vl_vlc_eatbits(&vlc, 8);
+  vl_vlc_fillbits(&vlc);
+   }
+
+   return 0;
+}
+
 static void
 handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf)
 {
+   enum pipe_video_format format;
unsigned num_buffers = 0;
void * const *buffers[2];
unsigned sizes[2];
-   enum pipe_video_format format;
+   static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 };
+   static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d };
 
format = u_reduce_video_profile(context->decoder->profile);
-   if (format == PIPE_VIDEO_FORMAT_MPEG4_AVC ||
-   format == PIPE_VIDEO_FORMAT_VC1) {
-  struct vl_vlc vlc = {0};
-  bool found = false;
-  int peek_bits, i;
-
-  /* search the first 64 bytes for a startcode */
-  vl_vlc_init(&vlc, 1, (const void * const*)&buf->data, &buf->size);
-  peek_bits = (format == PIPE_VIDEO_FORMAT_MPEG4_AVC) ? 24 : 32;
-  for (i = 0; i < 64 && vl_vlc_bits_left(&vlc) >= peek_bits; ++i) {
- uint32_t value = vl_vlc_peekbits(&vlc, peek_bits);
- if ((format == PIPE_VIDEO_FORMAT_MPEG4_AVC && value == 0x01) ||
-(format == PIPE_VIDEO_FORMAT_VC1 && (value == 0x010d ||
-value == 0x010c || value == 0x010b))) {
-found = true;
+   switch (format) {
+   case PIPE_VIDEO_FORMAT_MPEG4_AVC:
+ if (bufHasStartcode(buf, 0x01, 24))
 break;
- }
- vl_vlc_eatbits(&vlc, 8);
- vl_vlc_fillbits(&vlc);
-  }
-  /* none found, ok add one manually */
-  if (!found) {
- static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 };
- static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d };
-
- if (format == PIPE_VIDEO_FORMAT_MPEG4_AVC) {
-buffers[num_buffers] = (void *const)&start_code_h264;
-sizes[num_buffers] = sizeof(start_code_h264);
- }
- else {
-buffers[num_buffers] = (void *const)&start_code_vc1;
-sizes[num_buffers] = sizeof(start_code_vc1);
- }
- ++num_buffers;
-  }
+
+ buffers[num_buffers] = (void *const)&start_code_h264;
+ sizes[num_buffers++] = sizeof(start_code_h264);
+  break;
+   case PIPE_VIDEO_FORMAT_VC1:
+  if (bufHasStartcode(buf, 0x010d, 32) ||
+  bufHasStartcode(buf, 0x010c, 32) ||
+  bufHasStartcode(buf, 0x010b, 32))
+ break;
+
+ buffers[num_buffers] = (void *const)&start_code_vc1;
+ sizes[num_buffers++] = sizeof(start_code_vc1);
+  break;
+   default:
+  break;
}
+
buffers[num_buffers] = buf->data;
sizes[num_buffers] = buf->size;
++num_buffers;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] st/va: MPEG4 generate GOV and VOP header

2014-11-06 Thread Leo Liu
From: Michael Varga 

Also, Implemented a small locally used interface for writing bits to a buffer.

Signed-off-by: Michael Varga 
---
 src/gallium/state_trackers/va/picture.c | 92 +
 1 file changed, 92 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index ff13bc6..7107231 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -366,6 +366,97 @@ handleSliceParameterBuffer(vlVaContext *context, 
vlVaBuffer *buf)
}
 }
 
+struct bit_stream
+{
+   uint8_t *data;
+   unsigned int length; /* bits */
+   unsigned int pos;/* bits */
+};
+
+static inline void
+write_bit(struct bit_stream *writer, unsigned int bit)
+{
+   assert(writer->length > (writer)->pos);
+   writer->data[writer->pos>>3] |= ((bit & 1)<<(7 - (writer->pos & 7)));
+   writer->pos++;
+}
+
+static inline void
+write_bits(struct bit_stream *writer, unsigned int bits, unsigned int len)
+{
+   int i;
+   assert(len <= sizeof(bits)*8);
+   for (i = len - 1; i >= 0; i--)
+  write_bit(writer, bits>>i);
+}
+
+static void
+vlVaDecoderFixMPEG4Startcode(vlVaContext *context)
+{
+   uint8_t vop[] = { 0x00, 0x00, 0x01, 0xb6, 0x00, 0x00, 0x00, 0x00, 0x00 };
+   struct bit_stream bs_vop = {vop, sizeof(vop)*8, 32};
+   unsigned int vop_time_inc;
+   int mod_time;
+   unsigned int vop_size;
+   unsigned int vop_coding_type = context->desc.mpeg4.vop_coding_type;
+
+   context->mpeg4.start_code_size = 0;
+   memset(context->mpeg4.start_code, 0, sizeof(context->mpeg4.start_code));
+   if (vop_coding_type+1 == PIPE_MPEG12_PICTURE_CODING_TYPE_I) {
+  unsigned int vop_time = context->mpeg4.frame_num/
+context->desc.mpeg4.vop_time_increment_resolution;
+  unsigned int vop_hour = vop_time / 3600;
+  unsigned int vop_minute = (vop_time / 60) % 60;
+  unsigned int vop_second = vop_time % 60;
+  uint8_t group_of_vop[] = { 0x00, 0x00, 0x01, 0xb3, 0x00, 0x00, 0x00 };
+  struct bit_stream bs_gvop = {group_of_vop, sizeof(group_of_vop)*8, 32};
+
+  write_bits(&bs_gvop, vop_hour, 5);
+  write_bits(&bs_gvop, vop_minute, 6);
+  write_bit(&bs_gvop, 1); /* marker_bit */
+  write_bits(&bs_gvop, vop_second, 6);
+  write_bit(&bs_gvop, 0); /* closed_gov */ /* TODO replace magic */
+  write_bit(&bs_gvop, 0); /* broken_link */
+  write_bit(&bs_gvop, 0); /* padding */
+  write_bits(&bs_gvop, 7, 3); /* padding */
+
+  memcpy(context->mpeg4.start_code, group_of_vop, sizeof(group_of_vop));
+  context->mpeg4.start_code_size += sizeof(group_of_vop);
+   }
+
+   write_bits(&bs_vop, vop_coding_type, 2);
+   mod_time = context->mpeg4.frame_num %
+ context->desc.mpeg4.vop_time_increment_resolution == 0 &&
+ vop_coding_type+1 != PIPE_MPEG12_PICTURE_CODING_TYPE_I;
+   while (mod_time--)
+  write_bit(&bs_vop, 1); /* modulo_time_base */
+   write_bit(&bs_vop, 0); /* modulo_time_base */
+
+   write_bit(&bs_vop, 1); /* marker_bit */
+   vop_time_inc = context->mpeg4.frame_num %
+ context->desc.mpeg4.vop_time_increment_resolution;
+   write_bits(&bs_vop, vop_time_inc, context->mpeg4.vti_bits);
+   write_bit(&bs_vop, 1); /* marker_bit */
+   write_bit(&bs_vop, 1); /* vop_coded */
+   if (vop_coding_type+1 == PIPE_MPEG12_PICTURE_CODING_TYPE_P)
+  write_bit(&bs_vop, context->mpeg4.pps.vop_fields.bits.vop_rounding_type);
+   write_bits(&bs_vop, context->mpeg4.pps.vop_fields.bits.intra_dc_vlc_thr, 3);
+   if (context->mpeg4.pps.vol_fields.bits.interlaced) {
+  write_bit(&bs_vop, context->mpeg4.pps.vop_fields.bits.top_field_first);
+  write_bit(&bs_vop, 
context->mpeg4.pps.vop_fields.bits.alternate_vertical_scan_flag);
+   }
+
+   write_bits(&bs_vop, context->mpeg4.quant_scale, 
context->mpeg4.pps.quant_precision);
+   if (vop_coding_type+1 != PIPE_MPEG12_PICTURE_CODING_TYPE_I)
+  write_bits(&bs_vop, context->desc.mpeg4.vop_fcode_forward, 3);
+   if (vop_coding_type+1 == PIPE_MPEG12_PICTURE_CODING_TYPE_B)
+  write_bits(&bs_vop, context->desc.mpeg4.vop_fcode_backward, 3);
+
+   vop_size = bs_vop.pos/8;
+   memcpy(context->mpeg4.start_code + context->mpeg4.start_code_size, vop, 
vop_size);
+   context->mpeg4.start_code_size += vop_size;
+}
+
 static unsigned int
 bufHasStartcode(vlVaBuffer *buf, unsigned int code, unsigned int bits)
 {
@@ -489,6 +580,7 @@ vlVaEndPicture(VADriverContextP ctx, VAContextID context_id)
if (!context)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
+   context->mpeg4.frame_num++;
context->decoder->end_frame(context->decoder, context->target, 
&context->desc.base);
 
return VA_STATUS_SUCCESS;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] st/va: MPEG4 populate the SPS structure

2014-11-06 Thread Leo Liu
From: Michael Varga 

Signed-off-by: Michael Varga 
---
 src/gallium/state_trackers/va/picture.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 327c33d..ff13bc6 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -344,6 +344,7 @@ static void
 handleSliceParameterBuffer(vlVaContext *context, vlVaBuffer *buf)
 {
VASliceParameterBufferH264 *h264;
+   VASliceParameterBufferMPEG4 *mpeg4;
 
switch (u_reduce_video_profile(context->decoder->profile)) {
case PIPE_VIDEO_FORMAT_MPEG4_AVC:
@@ -354,7 +355,12 @@ handleSliceParameterBuffer(vlVaContext *context, 
vlVaBuffer *buf)
   context->desc.h264.num_ref_idx_l1_active_minus1 =
  h264->num_ref_idx_l1_active_minus1;
   break;
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  assert(buf->size >= sizeof(VASliceParameterBufferMPEG4) && 
buf->num_elements == 1);
+  mpeg4 = buf->data;
 
+  context->mpeg4.quant_scale = mpeg4->quant_scale;
+  break;
default:
   break;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] st/va: MPEG4 populate the iq matrix buffers

2014-11-06 Thread Leo Liu
From: Michael Varga 

Signed-off-by: Michael Varga 
---
 src/gallium/state_trackers/va/picture.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index a4eb26b..327c33d 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -296,6 +296,7 @@ handleIQMatrixBuffer(vlVaContext *context, vlVaBuffer *buf)
 {
VAIQMatrixBufferMPEG2 *mpeg2;
VAIQMatrixBufferH264 *h264;
+   VAIQMatrixBufferMPEG4 *mpeg4;
 
switch (u_reduce_video_profile(context->decoder->profile)) {
case PIPE_VIDEO_FORMAT_MPEG12:
@@ -319,6 +320,21 @@ handleIQMatrixBuffer(vlVaContext *context, vlVaBuffer *buf)
   memcpy(&context->desc.h264.pps->ScalingList8x8, h264->ScalingList8x8, 2 
* 64);
   break;
 
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  assert(buf->size >= sizeof(VAIQMatrixBufferMPEG4) && buf->num_elements 
== 1);
+  mpeg4 = buf->data;
+
+  if (mpeg4->load_intra_quant_mat)
+ context->desc.mpeg4.intra_matrix = mpeg4->intra_quant_mat;
+  else
+ context->desc.mpeg4.intra_matrix = NULL;
+
+  if (mpeg4->load_non_intra_quant_mat)
+ context->desc.mpeg4.non_intra_matrix = mpeg4->non_intra_quant_mat;
+  else
+ context->desc.mpeg4.non_intra_matrix = NULL;
+  break;
+
default:
   break;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

2014-11-06 Thread Marek Olšák
If there's no feedback for so long, I suggest you commit the patch
without review. If it fixes a bug, that's one more reason to commit
it.

Marek

On Thu, Nov 6, 2014 at 2:12 PM, Emil Velikov  wrote:
> Humble ping x2
>
> On 14/10/14 15:25, Emil Velikov wrote:
>> Humble ping.
>>
>> On 23/09/14 01:25, Emil Velikov wrote:
>>> From: Sjoerd Simons 
>>>
>>> When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
>>> application can decide whether they want to use RGBA (and have the
>>> compositor blend their windows).
>>>
>>> On my system with this change EGLConfigs with a 24 bit visual comes up
>>> first, as such applications blindly picking the first EGLConfig will
>>> still get an RGB X visual.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
>>> ---
>>>
>>> Hello gents,
>>>
>>> This patch has been stuck in bugzilla since February this year. Bringing
>>> it around here to gather greater exposure and perhaps some
>>> comments/reviews.
>>>
>>> -Emil
>>>
>>>  src/egl/drivers/dri2/egl_dri2.c |  5 +
>>>  src/egl/drivers/dri2/platform_x11.c | 17 +
>>>  2 files changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
>>> b/src/egl/drivers/dri2/egl_dri2.c
>>> index 20a7243..2ed90a7 100644
>>> --- a/src/egl/drivers/dri2/egl_dri2.c
>>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>>> @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
>>>  static EGLBoolean
>>>  dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
>>>  {
>>> +
>>> +   if (criteria->NativeVisualID != EGL_DONT_CARE &&
>>> +conf->NativeVisualID != criteria->NativeVisualID)
>>> +  return EGL_FALSE;
>>> +
>>> if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
>>>return EGL_FALSE;
>>>
>>> diff --git a/src/egl/drivers/dri2/platform_x11.c 
>>> b/src/egl/drivers/dri2/platform_x11.c
>>> index a7a7338..3395fb7 100644
>>> --- a/src/egl/drivers/dri2/platform_x11.c
>>> +++ b/src/egl/drivers/dri2/platform_x11.c
>>> @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
>>> dri2_egl_display *dri2_dpy,
>>>  dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
>>>  surface_type, config_attrs, rgba_masks);
>>>
>>> -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
>>> - * Otherwise it will only match a 32-bit RGBA visual.  On a
>>> - * composited window manager on X11, this will make all of the
>>> - * EGLConfigs with destination alpha get blended by the
>>> - * compositor.  This is probably not what the application
>>> - * wants... especially on drivers that only have 32-bit RGBA
>>> - * EGLConfigs! */
>>> -if (d.data->depth == 24) {
>>> +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 
>>> 32-bit
>>> + * RGBA EGLConfig.  Otherwise it will only match a 32-bit RGBA
>>> + * visual.  On a composited window manager on X11, this will 
>>> make
>>> + * all of the EGLConfigs with destination alpha get blended by 
>>> the
>>> + * compositor.  This is probably not what the application 
>>> wants...
>>> + * especially on drivers that only have 32-bit RGBA EGLConfigs!
>>> + * Allowing both allows applications to make the decision 
>>> whether
>>> + * 32 bit visuals are intended */
>>> +if (d.data->depth == 24 || d.data->depth == 32) {
>>> rgba_masks[3] =
>>>~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
>>> dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
>>>
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965: Add code to verify the CFG is sane.

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 9:41 AM, Jason Ekstrand  wrote:
> In general, it seems as if this can miss several things.  For instance, it
> checks that all the predicessors are valid but never that we have all the
> predecessors.

I'm not sure what you mean. That we don't validate that if A -> B then
B has an incoming edge from A? That's true.

I think that would be a good addition. I'm not sure I want to do that
in this patch. It was pretty big as is.

>Same for successors.  If we really want to be able to
> validate a CFG, maybe a stack-based approach like calculate_cfg would work
> better?  Also, did you run this on piglit/shader-db to ensure that
> everything coming out of calculate_cfg actually passes?

I don't think it will pass because of the things you pointed out about
unconditional jumps.

> More comments inline
>
> On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner  wrote:
>>
>> ---
>>  src/mesa/drivers/dri/i965/test_verify_cfg.cpp | 273
>> ++
>>  src/mesa/drivers/dri/i965/test_verify_cfg.h   |  26 +++
>>  2 files changed, 299 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.cpp
>>  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.h
>>
>> diff --git a/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
>> b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
>> new file mode 100644
>> index 000..0aa74c5
>> --- /dev/null
>> +++ b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
>> @@ -0,0 +1,273 @@
>> +/*
>> + * Copyright © 2014 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the
>> next
>> + * paragraph) shall be included in all copies or substantial portions of
>> the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include 
>> +#include "test_verify_cfg.h"
>> +#include "brw_cfg.h"
>> +
>> +static bool
>> +is_unconditional_jump(const backend_instruction *inst)
>> +{
>> +   return (inst->opcode == BRW_OPCODE_BREAK ||
>> +   inst->opcode == BRW_OPCODE_CONTINUE ||
>> +   inst->opcode == BRW_OPCODE_WHILE) &&
>> +  inst->predicate == BRW_PREDICATE_NONE;
>> +}
>> +
>> +void
>> +verify_cfg(backend_visitor *v)
>> +{
>> +   foreach_block(block, v->cfg) {
>> +  switch (block->start()->opcode) {
>> +  case BRW_OPCODE_ENDIF: {
>> + /* Has two predecessors:
>> +  *- the previous block is always a predecessor
>> +  *- always a predecessor ending in an IF or an ELSE
>> +  *
>> +  * Note that if the body of the if block is empty, then the
>> +  * previous block *is* the block that ends with IF, so the ENDIF
>> +  * block will have the same predecessor twice.
>> +  */
>> + if (is_unconditional_jump(block->prev()->end())) {
>> +EXPECT_EQ(block->parents.length(), 1u);
>> + } else {
>> +EXPECT_EQ(block->parents.length(), 2u);
>> + }
>
>
> This obviously catches
>
> if (foo) {
> /* Do stuff */
> } else {
> break;
> }
>
> but what about
>
> if (foo) {
> break;
> } else {
> /* Do stuff */
> }
>
> Or worse, what about
>
> if (foo) {
> break;
> } else {
> continue;
> }
>
> Maybe there's something I'm missing here.

I think you're right. For purposes of testing the predicated break
pass I didn't really care about unconditional jumps (since they would
be removed by the pass!). I added the unconditional jump cases as an
after thought.

So the question is whether I can actually test the cases you mention
in the current model? I'm not super excited expending a bunch more
effort to rewrite some testing code that already tests the thing I
want.

>>
>> +
>> + bool found_prev = false, found_if = false, found_else = false,
>> +  found_other = false;
>> + foreach_list_typed(bblock_link, parent, link, &block->parents) {
>> +if (parent->block == block->prev() && !found_prev)
>> +   found_prev = true;
>> +else i

Re: [Mesa-dev] [PATCH 2/4] i965/fs: Wire up control flow correctly in predicated break pass.

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 10:07 AM, Jason Ekstrand  wrote:
>
>
> On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner  wrote:
>>
>> When the earlier block ended with control flow, we'd mistakenly remove
>> some of its links to its children. The same happened with the later
>> block.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp | 10
>> +++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git
>> a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> index b7a1d7e..047c2c0 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
>> @@ -107,10 +107,14 @@ fs_visitor::opt_peephole_predicated_break()
>>}
>>endif_inst->remove(endif_block);
>>
>> -  earlier_block->children.make_empty();
>> -  later_block->parents.make_empty();
>> +  if (!earlier_block->ends_with_control_flow()) {
>> + earlier_block->children.make_empty();
>> + earlier_block->add_successor(cfg->mem_ctx, jump_block);
>> +  }
>>
>> -  earlier_block->add_successor(cfg->mem_ctx, jump_block);
>> +  if (!later_block->starts_with_control_flow()) {
>> + later_block->parents.make_empty();
>> +  }
>
>
> I *think* this is correct.  I didn't really understand it before and I don't
> 100% now.

Basically, the earlier code incorrectly assumed that the blocks were
were attempting to merge did not start/end in control flow and
therefore had only a single successor. In that case it was safe to
erase all of the successor/predecessors and recreate them.

When the blocks actually start/end with control flow, they have
multiple successors/predecessors and that assumption is false. In that
case, we're not going to be able to merge blocks and we shouldn't
destroy their successor/predecessor information.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl/list: Add an exec_list_validate function

2014-11-06 Thread Matt Turner
On Wed, Nov 5, 2014 at 5:12 PM, Jason Ekstrand  wrote:
>
>
> On Wed, Nov 5, 2014 at 2:46 PM, Matt Turner  wrote:
>>
>> On Wed, Nov 5, 2014 at 2:00 PM, Jason Ekstrand 
>> wrote:
>> > This can be very useful for trying to debug list corruptions.
>> >
>> > Signed-off-by: Jason Ekstrand 
>> > Cc: Ian Romanick 
>> > ---
>> >  src/glsl/list.h | 14 ++
>> >  1 file changed, 14 insertions(+)
>> >
>> > diff --git a/src/glsl/list.h b/src/glsl/list.h
>> > index aac13fd..1d18ec9 100644
>> > --- a/src/glsl/list.h
>> > +++ b/src/glsl/list.h
>> > @@ -521,6 +521,20 @@ exec_node_insert_list_before(struct exec_node *n,
>> > struct exec_list *before)
>> > exec_list_make_empty(before);
>> >  }
>> >
>> > +static inline void
>> > +exec_list_validate(struct exec_list *list)
>> > +{
>> > +   assert(list->head->prev == (struct exec_node *) &list->head);
>> > +   assert(list->tail == NULL);
>> > +   assert(list->tail_pred->next == (struct exec_node *) &list->tail);
>> > +
>> > +   for (struct exec_node *node = list->head;
>> > +node->next != NULL; node = node->next) {
>>
>> Just use foreach_in_list().
>
>
> Sure, I can do that.
>
>> > +  assert(node->next->prev == node);
>> > +  assert(node->prev->next == node);
>> > +   }
>> > +}
>>
>> Are you intending to call this from gdb? I'm having a hard time
>> imagining committing code that *sometimes* corrupts lists, which seems
>> like why this function would be useful to call from real code.
>
>
> It is useful to call from gdb but I also call it all over nir_validate.c.  I
> spent most of today fighting linked list corruptions, and this was very
> helpful for tracking them down.  Another option would be rename it to
> exec_list_is_valid and make it return a bool.  Then the standard procedure
> would be "assert(exec_list_is_valid(list))".  Would that be better?
>
>>
>> If you just want to call it from gdb, wrap the whole thing in #ifndef
>> NDEBUG. I don't want to ever accidentally call this function and think
>> it validated something when it actually did nothing.

Yeah, ignore all this.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.

2014-11-06 Thread Matt Turner
On Wed, Oct 22, 2014 at 11:11 AM,   wrote:
> From: José Fonseca 
>
> JITMemoryManager was removed in LLVM 3.6, and replaced by its base
> class RTDyldMemoryManager.
>
> This change fixes our JIT memory managers specializations to derive
> from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
>
> This enables llvmpipe to run with LLVM 3.6.

Should these be marked for stable? Or does 10.3.x work with 3.6?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] New stable-branch 10.3 candidate pushed

2014-11-06 Thread Emil Velikov
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello list,

Barring a slight delay, during which I was expecting more
non-freedreno patches, it's time for yet another 10.3 stable
candidate. Currently we have:
 - 102 queued (89 of which are for freedreno)
 - 4 nominated (outstanding)
 - and 3 rejected patches


In a nut shell this gives us
 - freedreno: 0ad is playable (+ 395 piglit fixes)
 - r600/radeonsi: 3 rendering fixes and one crash
 - glsl: 3 bug-fixes/crashes
 - build: fixed static linking against llvm, fix msvc and android builds

Take a look at section "Mesa stable queue" for more information.


Testing
- ---
The following results are against piglit a636a3610d7.


Fixes - gallium swrast
- --
Tests:
 - arb_explicit_uniform_location-use-of-unused-loc (crash > pass)
 - linker/set-location-to-sampler (fail > pass)


Fixes - classic i965(snb)
- -
Tests:
 - arb_explicit_uniform_location-use-of-unused-loc (crash > pass)
 - linker/set-location-to-sampler (fail > pass)
 - GLX_OML_sync_control/timing -divisor 2  (warn > pass)
 - GLX_OML_sync_control/timing -msc-delta 1 (warn > pass)


Regressions - classic i965(snb)
- ---
Tests:
 - GLX_OML_sync_control/timing -divisor 1  (pass > warn)

The OMX_sync_control tests continue to oscillate between warn and
pass, with the result depending on the moon cycle :P


Testing reports/general approval
- 
Any testing reports (or general approval of the state of the branch)
will be greatly appreciated.


Trivial merge conflicts
- ---
Here are the commits where I manually merged conflicts, (so these might
merit additional review):

commit 4956788a5f40fff2ea72d0f5bd6fcdb116492896
Author: Anuj Phogat 
Date:   Mon Sep 22 15:10:28 2014 -0700

glsl: Use signed array index in update_max_array_access()

(cherry picked from commit 7a652c41b4de4bdbb954a4ebf6cdb605d197e999)


As usual the plan is to have the next stable (10.3.3) released this
Friday, so if you have any questions or comments that you would like to
share before the release, please go ahead.


Cheers,
Emil


Mesa stable queue
- -

Nominated (4)
=

Kenneth Graunke (1):
  i965: Fix an off-by-1 error in the draw upload code's size
calculation.

Tom Stellard (2):
  radeonsi: Program RASTER_CONFIG for harvested GPUs v4
  R600/SI: radeonsi: Program RASTER_CONFIG for harvested GPUs v3

Tomasz Figa (1):
  glsl: Fix no return value for non-void function


Rejected(3)
===

Anuj Phogat (2):
  glsl: No compile error for out of bounds array index
  glsl: Don't abort if array index is out of bounds

Mauro Rossi (1):
  gallium/nouveau: use std::isfinite in c++ sources


Queued (102)


Anuj Phogat (2):
  glsl: Fix crash due to negative array index
  glsl: Use signed array index in update_max_array_access()

Brian Paul (1):
  mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC

Ilia Mirkin (27):
  freedreno/ir3: INEG operates on src0, not src1
  freedreno/ir3: add UARL support
  freedreno/ir3: negate result of USLT/etc
  freedreno/ir3: use unsigned comparison for UIF
  freedreno/ir3: add TXL support
  freedreno/ir3: fix UCMP handling
  freedreno/ir3: implement UMUL correctly
  freedreno: add default .dir-locals.el for emacs settings
  freedreno/ir3: make texture instruction construction more dynamic
  freedreno/ir3: fix TXB/TXL to actually pull the bias/lod argument
  freedreno/ir3: add TXQ support
  freedreno/ir3: add TXB2 support
  freedreno: dual-source render targets are not supported
  freedreno: instanced drawing/compute not yet supported
  freedreno/ir3: avoid fan-in sources referring to same instruction
  freedreno/ir3: add IDIV/UDIV support
  freedreno/ir3: add UMOD support, based on UDIV
  freedreno/ir3: add MOD support
  freedreno/ir3: add ISSG support
  freedreno/ir3: add UMAD support
  freedreno/ir3: make TXQ return integers, not floats
  freedreno/ir3: shadow comes before array
  freedreno/ir3: add texture offset support
  freedreno/ir3: add TXD support and expose ARB_shader_texture_lod
  freedreno/ir3: add TXF support
  freedreno: positions come out as integers, not half-integers
  freedreno/ir3: fix FSLT/etc handling to return 0/-1 instead of 0/1.0

Jan Vesely (1):
  configure: include llvm systemlibs when using static llvm

Marek Olšák (5):
  r600g: fix polygon mode for points and lines and point/line fill
modes
  radeonsi: fix polygon mode for points and lines and point/line
fill modes
  radeonsi: fix incorrect index buffer max size for lowered 8-bit
indices
  Revert "st/mesa: set MaxUnrollIterations = 255"
  r300g: remove enabled/disabled hyperz and AA compression messages

Mauro Rossi (1):
  gallium/nouveau: fully build the driver under android

Michel Dänzer (1):

Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.

2014-11-06 Thread Emil Velikov
On 06/11/14 19:20, Matt Turner wrote:
> On Wed, Oct 22, 2014 at 11:11 AM,   wrote:
>> From: José Fonseca 
>>
>> JITMemoryManager was removed in LLVM 3.6, and replaced by its base
>> class RTDyldMemoryManager.
>>
>> This change fixes our JIT memory managers specializations to derive
>> from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
>>
>> This enables llvmpipe to run with LLVM 3.6.
> 
> Should these be marked for stable? Or does 10.3.x work with 3.6?
If I understood Michel correctly a while back - it's not the best of
ideas to add support for unreleased llvm in the mesa stable branches.

As the llvm release is made we can obviously pick them up when needed.

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

2014-11-06 Thread Kenneth Graunke
On Thursday, November 06, 2014 02:55:25 PM Emil Velikov wrote:
> Hi Ken,
> 
> From what I've gathered the proposed patch is incorrect and is (most
> likely) working around a buggy application behaviour. Afaics Ian
> suggested that we add a driconf option for such applications.
> 
> Should I consider this patch for the stable branch or the above sounds
> about right and we can drop it ?
> 
> Thanks
> Emil

We should drop it.  Sorry for the noise...

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 11:27 AM, Emil Velikov  wrote:
> On 06/11/14 19:20, Matt Turner wrote:
>> On Wed, Oct 22, 2014 at 11:11 AM,   wrote:
>>> From: José Fonseca 
>>>
>>> JITMemoryManager was removed in LLVM 3.6, and replaced by its base
>>> class RTDyldMemoryManager.
>>>
>>> This change fixes our JIT memory managers specializations to derive
>>> from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
>>>
>>> This enables llvmpipe to run with LLVM 3.6.
>>
>> Should these be marked for stable? Or does 10.3.x work with 3.6?
> If I understood Michel correctly a while back - it's not the best of
> ideas to add support for unreleased llvm in the mesa stable branches.
>
> As the llvm release is made we can obviously pick them up when needed.
>
> Cheers,
> Emil

Ah, sure. That makes sense.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

2014-11-06 Thread Emil Velikov
On 06/11/14 19:29, Kenneth Graunke wrote:
> On Thursday, November 06, 2014 02:55:25 PM Emil Velikov wrote:
>> Hi Ken,
>>
>> From what I've gathered the proposed patch is incorrect and is (most
>> likely) working around a buggy application behaviour. Afaics Ian
>> suggested that we add a driconf option for such applications.
>>
>> Should I consider this patch for the stable branch or the above sounds
>> about right and we can drop it ?
>>
>> Thanks
>> Emil
> 
> We should drop it.  Sorry for the noise...
> 
Ack. There is nothing to apologise for :)

-Emil
> --Ken
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Matt Turner
On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner  wrote:
> On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri  wrote:
>> There have been quite a few eyes over this now but nobody has given it a
>> reviewed by yet.
>>
>> Would be nice to get it in before the code freeze. Any takers?
>
> Yes, I'll make sure that happens.

I made a couple of trivial changes to the commit message and added
some spaces between __m128i and * in casts and pushed it with review.

Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84566] Unify the format conversion code

2014-11-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #51 from Jason Ekstrand  ---
(In reply to Samuel Iglesias from comment #48)
> (In reply to Jason Ekstrand from comment #34)
> > (In reply to Samuel Iglesias from comment #33)
> > > Jason, I would like to know your opinion about the integer RGBA clamping
> > > done in pack.c (_mesa_pack_rgba_span_from_ints()).
> > > 
> > > glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 3.3.
> > > core specification) that for an integer RGBA color, each component is
> > > clamped to the representable range of type.
> > > 
> > > Those GL functions end up calling pack.c's functions for performing the
> > > conversion (mesa_pack_rgba_span_from_ints() and others).
> > > 
> > > It's possible to replace some of those pack.c's conversion functions by 
> > > the
> > > master conversion but the problem is in the clamping operation. I would 
> > > like
> > > to add a boolean argument called "clamp" to the master conversion function
> > > which is passed to _mesa_swizzle_and_convert() where each of its
> > > convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when 
> > > "clamp"
> > > is true.
> > > 
> > > This "clamp" parameter would be false for every call to either master
> > > conversion function or _mesa_swizzle_and_convert() except when they are
> > > being called from pack.c.
> > > 
> > > What do you think?
> > 
> > Supporting clamping is probably fine.  I think we determined we needed a
> > clamp parameter anyway for some of the float conversions.  I guess it makes
> > sense for integers too.  Let's watch out for performance when we implement
> > it though.  Changing the loop inside mesa_swizzle_and_convert without
> > hurting performance can be tricky.  The teximage-colors test in Piglit has a
> > -benchmark flag that can be used for testing that.
> 
> In the end, we did not make that change in pack.c as we could just use the
> autogenerated format pack/unpack functions. However I am retaking this topic
> again because we found another issue which would require a similar solution:
> 
> The convert_*() functions in format_utils.c convert between source and
> destination data and are used by _mesa_swizzle_and_convert. We found that
> these were not good enough for various conversions that involved
> non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc.
> Because of that, several piglit tests related to glGetTexImage() and others
> failed, like for example bin/ext_texture_integer-getteximage-clamping.
> 
> In order to fix that we added the clamp expressions for these cases [1]  and
> with that we achieved no regressions when executing a full piglit run on
> i965 driver.
> Unfortunately, when testing our patches on a couple of Gallium drivers, we
> found a regression that we tracked down back to that patch:
> bin/arb_clear_buffer_object-formats.
> Reverting the patch makes fixes the problem with these Gallium drivers but
> then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We
> wonder if there could be more cases like this that piglit is not covering,
> since it looks weird that this affects just this one test.
> 
> So, we wonder if that patch for _mesa_swizzle_and_convert is correct and we
> should fix the failed gallium cases elsewhere, or if we should revert that
> patch and fix the cases it fixed in a different way. What do you think? Was
> _mesa_swizzle_and_convert implemented like that  on purpose or are these
> real bugs?

>From my brief reading of the GL spec, it looks like clamping integers to the
max representable range is what it expects by default.  From the glTexImage
spec:

"The selected groups are transferred to the GL as described in section 3.7.2
and then clamped to the representable range of the internal format. If the
internalformat of the texture is signed or unsigned integer, components are
clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is
the number of bits per component. For color component groups, if the
internalformat of the texture is signed or unsigned normalized fixed-point,
components are clamped t0 [-1, 1] or [0, 1], respectively."

Therefore, it seems as if we want to be clamping when we have integer
destinations.  I'm not sure why the gallium drivers are regressing when you do.

One more observation is that it doesn't look like your clamping patch is
complete.  If we're going to clamp when we have an integer destination, we
should always clamp with integer destinations, not just in the few cases that
piglit hits.  I wouldn't be surprised if piglit's coverage in those areas is
terrible.

> If we decide to revert our clamp patch, then a solution could be to have a
> separate implementation of _mesa_swizzle_and_convert() and its convert_*()
> functions that clamps. We would have to use that version in glGetTexImage()
> (maybe in glReadPIxels too)  and use the normal version for texture uploads
> (texstore). If we do this, then we would hav

Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Roland Scheidegger
But even dword offsets aren't translatable right now to
pipe_sampler_view first_elements if the format has more than 32 bits.

Roland

Am 06.11.2014 um 18:21 schrieb Marek Olšák:
> For radeonsi, I think only x8, x8y8, and x16 fetches can be
> byte-aligned. Everything else is dword-aligned (the 2 lowest bits are
> ignored). I guess the cap should be 4 then.
> 
> Marek
> 
> On Thu, Nov 6, 2014 at 4:55 PM, Roland Scheidegger  wrote:
>> Am 06.11.2014 um 16:15 schrieb Marek Olšák:
>>> I'd say it's a spec bug. ARB_texture_buffer_range should say that the
>>> offset should be a multiple of an element size, but it doesn't. The
>>> question is, what should the element size be? One component or the
>>> whole pixel?
>> Imho whole pixel (for block compressed that would be full block, for
>> things like packed 565 too but neither are possible in GL), i.e. "format
>> granularity". That said, the whole alignment thing is problematic for
>> rgb32 (and the possiblity of that was added later,
>> ARB_texture_buffer_object_rgb32), so maybe it's things like that why the
>> offset can be just byte aligned (in other words, I'm not convinced it's
>> just a spec bug, d3d10 doesn't have that problem with alignment).
>>
>> Roland
>>
>>
>>>
>>> Marek
>>>
>>> On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger  
>>> wrote:
 Trying to fix some bug due to alignment issues in llvmpipe's vertex
 fetch, I came across some issue with ARB_texture_buffer_range.
 Namely, it looks like the offsets specified there are always in bytes,
 regardless the actual format (hence, as long as the
 TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
 offset of 15 bytes for a rgba32f format for instance making all fetches
 quite unaligned).
 However in gallium we actually have first_elem and last_elem parameters
 in the sampler views which are specified in number of elements (so takes
 the format into account), which is what d3d10 does and the state tracker
 translates to that apparently. IMHO d3d10 makes way more sense there
 because that way the necessary alignment scales automatically depending
 on the format (so, if the format is 2x16bit for instance you'd need 4
 byte alignment for the offset, and only need 16 bytes alignment for
 4x32bit, ensuring all lookups are always aligned). This means that 15
 byte offset in the example above is completely untranslatable.
 But if I see that right, OpenGL doesn't work like that, meaning
 effectively gallium drivers (and I doubt most other drivers neither)
 cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
 than 16, even if they'd only need that for 4x32bit formats. Though most
 gallium drivers indeed claim 1 right now.
 Looks quite messy...

 Roland
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=AAIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9w&s=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXs&e=
>>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

2014-11-06 Thread Ian Romanick
I thought Eric and Chad already NAKed it in bugzilla.  The problem is
that applications ask for an RGBA visual for GL blending.  They use the
alpha channel to generate their images, but the final alpha values are,
basically, random... and the composited result would be pure garbage.

As Chad points out in comment #1, EGL just doesn't let applications do
the thing the patch is trying to do.

On 11/06/2014 05:12 AM, Emil Velikov wrote:
> Humble ping x2
> 
> On 14/10/14 15:25, Emil Velikov wrote:
>> Humble ping.
>>
>> On 23/09/14 01:25, Emil Velikov wrote:
>>> From: Sjoerd Simons 
>>>
>>> When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
>>> application can decide whether they want to use RGBA (and have the
>>> compositor blend their windows).
>>>
>>> On my system with this change EGLConfigs with a 24 bit visual comes up
>>> first, as such applications blindly picking the first EGLConfig will
>>> still get an RGB X visual.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
>>> ---
>>>
>>> Hello gents,
>>>
>>> This patch has been stuck in bugzilla since February this year. Bringing 
>>> it around here to gather greater exposure and perhaps some 
>>> comments/reviews.
>>>
>>> -Emil
>>>
>>>  src/egl/drivers/dri2/egl_dri2.c |  5 +
>>>  src/egl/drivers/dri2/platform_x11.c | 17 +
>>>  2 files changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
>>> b/src/egl/drivers/dri2/egl_dri2.c
>>> index 20a7243..2ed90a7 100644
>>> --- a/src/egl/drivers/dri2/egl_dri2.c
>>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>>> @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
>>>  static EGLBoolean
>>>  dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
>>>  {
>>> +
>>> +   if (criteria->NativeVisualID != EGL_DONT_CARE &&
>>> +conf->NativeVisualID != criteria->NativeVisualID)
>>> +  return EGL_FALSE;
>>> +
>>> if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
>>>return EGL_FALSE;
>>>  
>>> diff --git a/src/egl/drivers/dri2/platform_x11.c 
>>> b/src/egl/drivers/dri2/platform_x11.c
>>> index a7a7338..3395fb7 100644
>>> --- a/src/egl/drivers/dri2/platform_x11.c
>>> +++ b/src/egl/drivers/dri2/platform_x11.c
>>> @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
>>> dri2_egl_display *dri2_dpy,
>>> dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
>>> surface_type, config_attrs, rgba_masks);
>>>  
>>> -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
>>> - * Otherwise it will only match a 32-bit RGBA visual.  On a
>>> - * composited window manager on X11, this will make all of the
>>> - * EGLConfigs with destination alpha get blended by the
>>> - * compositor.  This is probably not what the application
>>> - * wants... especially on drivers that only have 32-bit RGBA
>>> - * EGLConfigs! */
>>> -if (d.data->depth == 24) {
>>> +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 
>>> 32-bit
>>> + * RGBA EGLConfig.  Otherwise it will only match a 32-bit RGBA
>>> + * visual.  On a composited window manager on X11, this will 
>>> make
>>> + * all of the EGLConfigs with destination alpha get blended by 
>>> the
>>> + * compositor.  This is probably not what the application 
>>> wants...
>>> + * especially on drivers that only have 32-bit RGBA EGLConfigs!
>>> + * Allowing both allows applications to make the decision 
>>> whether
>>> + * 32 bit visuals are intended */
>>> +if (d.data->depth == 24 || d.data->depth == 32) {
>>> rgba_masks[3] =
>>>~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
>>> dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
>>>
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

2014-11-06 Thread Timothy Arceri
On Thu, 2014-11-06 at 09:59 -0800, Matt Turner wrote:
> On Thu, Nov 6, 2014 at 4:18 AM, Timothy Arceri  wrote:
> > Signed-off-by: Timothy Arceri 
> > ---
> >  configure.ac   | 6 ++
> >  src/mesa/x86/common_x86.c  | 4 
> >  src/mesa/x86/common_x86_features.h | 4 +++-
> >  3 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/configure.ac b/configure.ac
> > index 03f1bca..cc0a661 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -258,6 +258,12 @@ if test "x$SSE41_SUPPORTED" = x1; then
> >  fi
> >  AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
> >
> > +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
> > +if test "x$SSSE3_SUPPORTED" = x1; then
> > +DEFINES="$DEFINES -DUSE_SSSE3"
> > +fi
> > +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
> > +
> >  dnl Can't have static and shared libraries, default to static if user
> >  dnl explicitly requested. If both disabled, set to static since shared
> >  dnl was explicitly requested.
> > diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
> > index 25f5c40..ad0648a 100644
> > --- a/src/mesa/x86/common_x86.c
> > +++ b/src/mesa/x86/common_x86.c
> > @@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
> >
> >__get_cpuid(1, &eax, &ebx, &ecx, &edx);
> >
> > +#ifdef bit_SSSE3
> 
> Why did you put this in an #ifdef bit_SSSE3? That's defined everywhere.

I have no idea. I'll fix it.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Timothy Arceri
On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote:
> On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri  wrote:
> > Also cleans up some if statements in the *faster functions.
> >
> > Callgrind cpu usage results from pts benchmarks:
> >
> > For ytile_copy_faster()
> >
> > Nexuiz 1.6.1: 2.16% -> 1.20%
> >
> > Signed-off-by: Timothy Arceri 
> > ---
> >  src/mesa/Makefile.am   |  8 +++
> >  src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 
> > ++
> >  src/mesa/main/fast_rgba8_copy.c| 78 
> > 
> >  src/mesa/main/fast_rgba8_copy.h| 37 
> >  4 files changed, 141 insertions(+), 64 deletions(-)
> >  create mode 100644 src/mesa/main/fast_rgba8_copy.c
> >  create mode 100644 src/mesa/main/fast_rgba8_copy.h
> >
> > diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
> > index e71bccb..2402096 100644
> > --- a/src/mesa/Makefile.am
> > +++ b/src/mesa/Makefile.am
> > @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
> >
> >  ARCH_LIBS =
> >
> > +if SSSE3_SUPPORTED
> > +ARCH_LIBS += libmesa_ssse3.la
> > +endif
> > +
> >  if SSE41_SUPPORTED
> >  ARCH_LIBS += libmesa_sse41.la
> >  endif
> > @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
> > main/streaming-load-memcpy.c
> >  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
> >
> > +libmesa_ssse3_la_SOURCES = \
> > +   main/fast_rgba8_copy.c
> > +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
> > +
> >  pkgconfigdir = $(libdir)/pkgconfig
> >  pkgconfig_DATA = gl.pc
> >
> > diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
> > b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> > index cb5738a..0deeb75 100644
> > --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> > +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
> > @@ -27,6 +27,7 @@
> >   
> > **/
> >
> >  #include "main/bufferobj.h"
> > +#include "main/fast_rgba8_copy.h"
> >  #include "main/image.h"
> >  #include "main/macros.h"
> >  #include "main/mtypes.h"
> > @@ -42,9 +43,7 @@
> >  #include "intel_mipmap_tree.h"
> >  #include "intel_blit.h"
> >
> > -#ifdef __SSSE3__
> > -#include 
> > -#endif
> > +#include "x86/common_x86_asm.h"
> >
> >  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
> >
> > @@ -175,18 +174,6 @@ err:
> > return false;
> >  }
> >
> > -#ifdef __SSSE3__
> > -static const uint8_t rgba8_permutation[16] =
> > -   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
> > -
> > -/* NOTE: dst must be 16 byte aligned */
> > -#define rgba8_copy_16(dst, src) \
> > -   *(__m128i *)(dst) = _mm_shuffle_epi8(\
> > -  (__m128i) _mm_loadu_ps((float *)(src)),   \
> > -  *(__m128i *) rgba8_permutation\
> > -   )
> > -#endif
> > -
> >  /**
> >   * Copy RGBA to BGRA - swap R and B.
> >   */
> > @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
> > uint8_t *d = dst;
> > uint8_t const *s = src;
> >
> > -#ifdef __SSSE3__
> > -   /* Fast copying for tile spans.
> > -*
> > -* As long as the destination texture is 16 aligned,
> > -* any 16 or 64 spans we get here should also be 16 aligned.
> > -*/
> > -
> > -   if (bytes == 16) {
> > -  assert(!(((uintptr_t)dst) & 0xf));
> > -  rgba8_copy_16(d+ 0, s+ 0);
> > -  return dst;
> > -   }
> > -
> > -   if (bytes == 64) {
> > -  assert(!(((uintptr_t)dst) & 0xf));
> > -  rgba8_copy_16(d+ 0, s+ 0);
> > -  rgba8_copy_16(d+16, s+16);
> > -  rgba8_copy_16(d+32, s+32);
> > -  rgba8_copy_16(d+48, s+48);
> > -  return dst;
> > -   }
> > -#endif
> > -
> > while (bytes >= 4) {
> >d[0] = s[2];
> >d[1] = s[1];
> > @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t 
> > x2, uint32_t x3,
> >mem_copy_fn mem_copy)
> >  {
> > if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
> > -  if (mem_copy == memcpy)
> > - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
> > -   dst, src, src_pitch, swizzle_bit, memcpy);
> > -  else if (mem_copy == rgba8_copy)
> > - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
> > -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
> > -   } else {
> > -  if (mem_copy == memcpy)
> > - return xtile_copy(x0, x1, x2, x3, y0, y1,
> > -   dst, src, src_pitch, swizzle_bit, memcpy);
> > -  else if (mem_copy == rgba8_copy)
> > - return xtile_copy(x0, x1, x2, x3, y0, y1,
> > -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
> > +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
> > +dst, src, src_pitch, swizzle_bit, mem_copy);
> > }
> > xtile_copy(x0, x1, x2, x3, y0, y1,
> >dst, src,

[Mesa-dev] [PATCH] gbm: dlopen libglapi so gbm_create_device works

2014-11-06 Thread Frank Henigman
From: Frank Henigman 

Dri driver libs are not linked to pull in libglapi so gbm_create_device()
fails when it tries to dlopen them (unless the application is linked
with something that does pull in libglapi, like libGL).
Until dri drivers can be fixed properly, dlopen libglapi before trying
to dlopen them.
https://bugs.freedesktop.org/show_bug.cgi?id=57702

Signed-off-by: Frank Henigman 
---
 src/gbm/backends/dri/gbm_dri.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
index f637e32..6ea2294 100644
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -311,6 +311,11 @@ dri_open_driver(struct gbm_dri_device *dri)
if (search_paths == NULL)
   search_paths = DEFAULT_DRIVER_DIR;
 
+   /* Temporarily work around dri driver libs that need symbols in libglapi
+* but don't automatically link it in.
+*/
+   dlopen("libglapi.so.0", RTLD_LAZY | RTLD_GLOBAL);
+
dri->driver = NULL;
end = search_paths + strlen(search_paths);
for (p = search_paths; p < end && dri->driver == NULL; p = next + 1) {
-- 
2.1.0.rc2.206.gedb03e5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

2014-11-06 Thread Ian Romanick
On 11/06/2014 04:18 AM, Timothy Arceri wrote:
> Signed-off-by: Timothy Arceri 
> ---
>  configure.ac   | 6 ++
>  src/mesa/x86/common_x86.c  | 4 
>  src/mesa/x86/common_x86_features.h | 4 +++-
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 03f1bca..cc0a661 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -258,6 +258,12 @@ if test "x$SSE41_SUPPORTED" = x1; then
>  fi
>  AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
>  
> +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
> +if test "x$SSSE3_SUPPORTED" = x1; then
> +DEFINES="$DEFINES -DUSE_SSSE3"
> +fi
> +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
> +

Just so people can find things, the SSSE3 check should logically come
before the SSE4.1 check.

>  dnl Can't have static and shared libraries, default to static if user
>  dnl explicitly requested. If both disabled, set to static since shared
>  dnl was explicitly requested.
> diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
> index 25f5c40..ad0648a 100644
> --- a/src/mesa/x86/common_x86.c
> +++ b/src/mesa/x86/common_x86.c
> @@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
>  
>__get_cpuid(1, &eax, &ebx, &ecx, &edx);
>  
> +#ifdef bit_SSSE3
> +  if (ecx & bit_SSSE3)
> + _mesa_x86_cpu_features |= X86_FEATURE_SSSE3;
> +#endif
>if (ecx & bit_SSE4_1)
>   _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;
> }
> diff --git a/src/mesa/x86/common_x86_features.h 
> b/src/mesa/x86/common_x86_features.h
> index 66f2cf6..6eb2b38 100644
> --- a/src/mesa/x86/common_x86_features.h
> +++ b/src/mesa/x86/common_x86_features.h
> @@ -43,7 +43,8 @@
>  #define X86_FEATURE_XMM2 (1<<6)
>  #define X86_FEATURE_3DNOWEXT (1<<7)
>  #define X86_FEATURE_3DNOW(1<<8)
> -#define X86_FEATURE_SSE4_1   (1<<9)
> +#define X86_FEATURE_SSSE3(1<<9)
> +#define X86_FEATURE_SSE4_1   (1<<10)
>  
>  /* standard X86 CPU features */
>  #define X86_CPU_FPU  (1<<0)
> @@ -65,6 +66,7 @@
>  #define cpu_has_xmm2 (_mesa_x86_cpu_features & X86_FEATURE_XMM2)
>  #define cpu_has_3dnow(_mesa_x86_cpu_features & 
> X86_FEATURE_3DNOW)
>  #define cpu_has_3dnowext (_mesa_x86_cpu_features & X86_FEATURE_3DNOWEXT)
> +#define cpu_has_ssse3(_mesa_x86_cpu_features & 
> X86_FEATURE_SSSE3)
>  #define cpu_has_sse4_1   (_mesa_x86_cpu_features & 
> X86_FEATURE_SSE4_1)
>  
>  #endif
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] megadriver: explicitly link against glapi, link with -no-undefined

2014-11-06 Thread Frank Henigman
On Thu, Nov 6, 2014 at 8:09 AM, Emil Velikov  wrote:

> Indeed. static-glapi does not get too much testing, plus it seems that
> it's broken (in a way) for a lng time.
> It seems that we'll have to (temporary) resolve with shoving
> dlopen(libglapi.so) into gbm, so that in time programs can nuke it from
> their codebase. glapi is internal (implementation) detail that they
> should not need to bother/know.

Here you go
https://freedesktop.org/patch/36392/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 1:22 PM, Timothy Arceri  wrote:
> On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote:
>> On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri  wrote:
>> > Also cleans up some if statements in the *faster functions.
>> >
>> > Callgrind cpu usage results from pts benchmarks:
>> >
>> > For ytile_copy_faster()
>> >
>> > Nexuiz 1.6.1: 2.16% -> 1.20%
>> >
>> > Signed-off-by: Timothy Arceri 
>> > ---
>> >  src/mesa/Makefile.am   |  8 +++
>> >  src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 
>> > ++
>> >  src/mesa/main/fast_rgba8_copy.c| 78 
>> > 
>> >  src/mesa/main/fast_rgba8_copy.h| 37 
>> >  4 files changed, 141 insertions(+), 64 deletions(-)
>> >  create mode 100644 src/mesa/main/fast_rgba8_copy.c
>> >  create mode 100644 src/mesa/main/fast_rgba8_copy.h
>> >
>> > diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
>> > index e71bccb..2402096 100644
>> > --- a/src/mesa/Makefile.am
>> > +++ b/src/mesa/Makefile.am
>> > @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
>> >
>> >  ARCH_LIBS =
>> >
>> > +if SSSE3_SUPPORTED
>> > +ARCH_LIBS += libmesa_ssse3.la
>> > +endif
>> > +
>> >  if SSE41_SUPPORTED
>> >  ARCH_LIBS += libmesa_sse41.la
>> >  endif
>> > @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
>> > main/streaming-load-memcpy.c
>> >  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
>> >
>> > +libmesa_ssse3_la_SOURCES = \
>> > +   main/fast_rgba8_copy.c
>> > +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
>> > +
>> >  pkgconfigdir = $(libdir)/pkgconfig
>> >  pkgconfig_DATA = gl.pc
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
>> > b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>> > index cb5738a..0deeb75 100644
>> > --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>> > +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>> > @@ -27,6 +27,7 @@
>> >   
>> > **/
>> >
>> >  #include "main/bufferobj.h"
>> > +#include "main/fast_rgba8_copy.h"
>> >  #include "main/image.h"
>> >  #include "main/macros.h"
>> >  #include "main/mtypes.h"
>> > @@ -42,9 +43,7 @@
>> >  #include "intel_mipmap_tree.h"
>> >  #include "intel_blit.h"
>> >
>> > -#ifdef __SSSE3__
>> > -#include 
>> > -#endif
>> > +#include "x86/common_x86_asm.h"
>> >
>> >  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>> >
>> > @@ -175,18 +174,6 @@ err:
>> > return false;
>> >  }
>> >
>> > -#ifdef __SSSE3__
>> > -static const uint8_t rgba8_permutation[16] =
>> > -   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
>> > -
>> > -/* NOTE: dst must be 16 byte aligned */
>> > -#define rgba8_copy_16(dst, src) \
>> > -   *(__m128i *)(dst) = _mm_shuffle_epi8(\
>> > -  (__m128i) _mm_loadu_ps((float *)(src)),   \
>> > -  *(__m128i *) rgba8_permutation\
>> > -   )
>> > -#endif
>> > -
>> >  /**
>> >   * Copy RGBA to BGRA - swap R and B.
>> >   */
>> > @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
>> > uint8_t *d = dst;
>> > uint8_t const *s = src;
>> >
>> > -#ifdef __SSSE3__
>> > -   /* Fast copying for tile spans.
>> > -*
>> > -* As long as the destination texture is 16 aligned,
>> > -* any 16 or 64 spans we get here should also be 16 aligned.
>> > -*/
>> > -
>> > -   if (bytes == 16) {
>> > -  assert(!(((uintptr_t)dst) & 0xf));
>> > -  rgba8_copy_16(d+ 0, s+ 0);
>> > -  return dst;
>> > -   }
>> > -
>> > -   if (bytes == 64) {
>> > -  assert(!(((uintptr_t)dst) & 0xf));
>> > -  rgba8_copy_16(d+ 0, s+ 0);
>> > -  rgba8_copy_16(d+16, s+16);
>> > -  rgba8_copy_16(d+32, s+32);
>> > -  rgba8_copy_16(d+48, s+48);
>> > -  return dst;
>> > -   }
>> > -#endif
>> > -
>> > while (bytes >= 4) {
>> >d[0] = s[2];
>> >d[1] = s[1];
>> > @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t 
>> > x2, uint32_t x3,
>> >mem_copy_fn mem_copy)
>> >  {
>> > if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
>> > -  if (mem_copy == memcpy)
>> > - return xtile_copy(0, 0, xtile_width, xtile_width, 0, 
>> > xtile_height,
>> > -   dst, src, src_pitch, swizzle_bit, memcpy);
>> > -  else if (mem_copy == rgba8_copy)
>> > - return xtile_copy(0, 0, xtile_width, xtile_width, 0, 
>> > xtile_height,
>> > -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
>> > -   } else {
>> > -  if (mem_copy == memcpy)
>> > - return xtile_copy(x0, x1, x2, x3, y0, y1,
>> > -   dst, src, src_pitch, swizzle_bit, memcpy);
>> > -  else if (mem_copy == rgba8_copy)
>> > - return xtile_copy(x0, x1, x2, x3, y0, y1,
>> > -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
>> > +  return xtile_

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Ian Romanick
On 11/06/2014 02:12 PM, Matt Turner wrote:
> On Thu, Nov 6, 2014 at 1:22 PM, Timothy Arceri  wrote:
>> On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote:
>>> On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri  
>>> wrote:
 +#include 
 +#include 
 +#include 
>>>
>>> I don't think you need these three includes for a single prototype.
>>
>> Right I can move assert to the .c
> 
> Presumably one of the others can be removed as well? I don't know what
> defines size_t.

stddef.h since C89 at least.

 +
 +/* Fast copying for tile spans.
 + *
 + * As long as the destination texture is 16 aligned,
 + * any 16 or 64 spans we get here should also be 16 aligned.
 + */
 +void *
 +_mesa_fast_rgba8_copy(void *dst, const void *src, size_t n);
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements

2014-11-06 Thread Marc Dietrich
Am Donnerstag, 6. November 2014, 11:55:59 schrieb Matt Turner:
> On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner  wrote:
> > On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri  
wrote:
> >> There have been quite a few eyes over this now but nobody has given it a
> >> reviewed by yet.
> >> 
> >> Would be nice to get it in before the code freeze. Any takers?
> > 
> > Yes, I'll make sure that happens.
> 
> I made a couple of trivial changes to the commit message and added
> some spaces between __m128i and * in casts and pushed it with review.

somehow this managed to successfully break my build - so finally a nice 
testcase :-)

Using CFLAGS="-march=native -fLTO". My cpu does not support sse4.1. During 
compile time, -mno-sse4.1 -msse4.1 is passed, while during linktime, only -
mno-sse4.1 is passed.

usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h: In function 
'vbo_get_minmax_indices':
/usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h:320:20: error: 
'__builtin_ia32_pmaxud128' needs isa option -m32 -msse4.1
   return (__m128i) __builtin_ia32_pmaxud128 ((__v4si)__X, (__v4si)__Y);
^
/usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h:314:20: error: 
'__builtin_ia32_pminud128' needs isa option -m32 -msse4.1
   return (__m128i) __builtin_ia32_pminud128 ((__v4si)__X, (__v4si)__Y);
^
make[4]: *** [/tmp/cc5QZcUd.ltrans1.ltrans.o] Error 1

So opposite to my earlier thoughts, the "-m" flags are not kept per-file 
during link. On the other hand, removing "-march=native" gives:

  CXXLDgallium_dri.la
../../src/mesa/vbo/vbo_exec_array.c: In function 'vbo_get_minmax_indices':
../../src/mesa/vbo/vbo_exec_array.c:197:0: internal compiler error: in 
propagate_rhs_into_lhs, at tree-ssa-dom.c:2913
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

with gcc-4.9.2 from opensuse/factory, so who knows what is really happening.

Trying with gcc-4.8 also fails like in the first example, even without 
"march=native", maybe because the default march is x86_64 which also does not 
support see4.1.

So this is only a problem with Link-Time-Opt.

Marc

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 2:33 PM, Marc Dietrich  wrote:
> So this is only a problem with Link-Time-Opt.

I don't actually know how you're getting to this point in the build
with LTO. It fails for me in src/mapi.

I think LTO is something we should make work at some point, but I
don't think we should gate contributions on whether LTO works or not.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Chad Versace

On Thu 06 Nov 2014, Timothy Arceri wrote:

Also cleans up some if statements in the *faster functions.


I have comments about the cleanup below.


diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index cb5738a..0deeb75 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c


/**
* Copy texture data from linear to X tile layout, faster.
*
* Same as \ref xtile_copy but faster, because it passes constant parameters
* for common cases, allowing the compiler to inline code optimized for those
* cases.
*
* \copydoc tile_copy_fn
*/
static FLATTEN void
xtile_copy_faster(...)


@@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
  mem_copy_fn mem_copy)




{
   if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
-  if (mem_copy == memcpy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
-   } else {
-  if (mem_copy == memcpy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
+  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
+dst, src, src_pitch, swizzle_bit, mem_copy);
   }
   xtile_copy(x0, x1, x2, x3, y0, y1,
  dst, src, src_pitch, swizzle_bit, mem_copy);


The "cleanup" of this if tree concerns me. Accoring the function
comment, the original author of this function, fjhenigman, clearly 
created the weird 'if' tree with the intentation that the compiler would 
"inline code optimized for those cases".


Without one of the following, I object to this cleanup:
   - Frank's approval, or
   - Proof that gcc never does the desired optimizations, or
   - Proof that this change does not harm's Chrome's texture upload 
   performance.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Frank Henigman
I tested your patch with the "teximage" program in mesa demos, the
same thing I used to benchmark when I developed this code.
As Matt and Chad point out, the odd-looking _faster functions are
there for a reason.  Your change causes a huge slowdown.
I tested on a sandybridge system with a "Intel(R) Celeron(R) CPU 857 @
1.20GHz."  Mesa compiled with -O2.

original code:
  TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec
  TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec
  TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec

  TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec
  TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec
  TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec

  TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec
  TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec
  TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec

your code:
  TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec
  TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec
  TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec

  TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec
  TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec
  TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec

  TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec
  TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec
  TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec

This is just one run, not an average, but you can see it's slower
across the board up to a factor of around 6.
Also I couldn't configure the build after your patch.  I think you
left out a change to configure.ac to define SSSE3_SUPPORTED.

On Thu, Nov 6, 2014 at 6:08 PM, Chad Versace  wrote:
> On Thu 06 Nov 2014, Timothy Arceri wrote:
>>
>> Also cleans up some if statements in the *faster functions.
>
>
> I have comments about the cleanup below.
>
>> diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>> b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>> index cb5738a..0deeb75 100644
>> --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>> +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
>
>
> /**
> * Copy texture data from linear to X tile layout, faster.
> *
> * Same as \ref xtile_copy but faster, because it passes constant parameters
> * for common cases, allowing the compiler to inline code optimized for those
> * cases.
> *
> * \copydoc tile_copy_fn
> */
> static FLATTEN void
> xtile_copy_faster(...)
>
>> @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t
>> x2, uint32_t x3,
>>   mem_copy_fn mem_copy)
>
>
>
>> {
>>if (x0 == 0 && x3 == xtile_width && y0 == 0 && y1 == xtile_height) {
>> -  if (mem_copy == memcpy)
>> - return xtile_copy(0, 0, xtile_width, xtile_width, 0,
>> xtile_height,
>> -   dst, src, src_pitch, swizzle_bit, memcpy);
>> -  else if (mem_copy == rgba8_copy)
>> - return xtile_copy(0, 0, xtile_width, xtile_width, 0,
>> xtile_height,
>> -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
>> -   } else {
>> -  if (mem_copy == memcpy)
>> - return xtile_copy(x0, x1, x2, x3, y0, y1,
>> -   dst, src, src_pitch, swizzle_bit, memcpy);
>> -  else if (mem_copy == rgba8_copy)
>> - return xtile_copy(x0, x1, x2, x3, y0, y1,
>> -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
>> +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
>> +dst, src, src_pitch, swizzle_bit, mem_copy);
>>}
>>xtile_copy(x0, x1, x2, x3, y0, y1,
>>   dst, src, src_pitch, swizzle_bit, mem_copy);
>
>
> The "cleanup" of this if tree concerns me. Accoring the function
> comment, the original author of this function, fjhenigman, clearly created
> the weird 'if' tree with the intentation that the compiler would "inline
> code optimized for those cases".
>
> Without one of the following, I object to this cleanup:
>- Frank's approval, or
>- Proof that gcc never does the desired optimizations, or
>- Proof that this change does not harm's Chrome's texture upload
> performance.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Frank Henigman
On Thu, Nov 6, 2014 at 7:30 PM, Frank Henigman  wrote:

> Also I couldn't configure the build after your patch.  I think you
> left out a change to configure.ac to define SSSE3_SUPPORTED.

Ah, that was in patch 1/2.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Siavash Eliasi
How and when is "cpu_has_sse4_1" true? Is it controllable at runtime 
through setting some environmental variable? or is it set once during 
startup by detecting CPU features?


I guess checking for "cpu_has_sse4_1" is unnecessary if it isn't 
controllable by user at runtime; because "USE_SSE41" is a compile time 
check and requires the target machine to be SSE 4.1 capable already.


Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 1:30 AM, Siavash Eliasi  wrote:
> How and when is "cpu_has_sse4_1" true? Is it controllable at runtime through
> setting some environmental variable? or is it set once during startup by
> detecting CPU features?

It's actually a macro, but yes, see the end of
src/mesa/x86/common_x86.c. It's set by using the CPUID instruction to
detect SSE 4.1 capabilities.

  if (ecx & bit_SSE4_1)
 _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;

> I guess checking for "cpu_has_sse4_1" is unnecessary if it isn't
> controllable by user at runtime; because "USE_SSE41" is a compile time check
> and requires the target machine to be SSE 4.1 capable already.

Right.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/blorp: Fix hiz ops on MSAA surfaces

2014-11-06 Thread Anuj Phogat
On Tue, Nov 4, 2014 at 8:40 AM, Chris Forbes  wrote:

> Two things were broken here:
> - The depth/stencil surface dimensions were broken for MSAA.
> - Sample count was programmed incorrectly.
>
> Result was the depth resolve didn't work correctly on MSAA surfaces, and
> so sampling the surface later produced garbage.
>
> Fixes the new piglit test arb_texture_multisample-sample-depth, and
> various artifacts in 'tesseract' with msaa=4 glineardepth=0.
>
> Not observed any piglit regressions on Haswell.
>
> Signed-off-by: Chris Forbes 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.h|  4 
>  src/mesa/drivers/dri/i965/gen7_blorp.cpp | 24 +---
>  2 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
> b/src/mesa/drivers/dri/i965/brw_blorp.h
> index ff68000..c4ff0f7 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
> @@ -236,6 +236,10 @@ public:
> bool use_wm_prog;
> brw_blorp_wm_push_constants wm_push_consts;
> bool color_write_disable[4];
> +
> +   unsigned num_samples() const {
> +  return dst.mt ? dst.num_samples : depth.mt->num_samples;
> +   }
>  };
>
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> index 206a6ff..cc57ffe 100644
> --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> @@ -415,7 +415,7 @@ gen7_blorp_emit_sf_config(struct brw_context *brw,
>OUT_BATCH(_3DSTATE_SF << 16 | (7 - 2));
>OUT_BATCH(params->depth_format <<
>  GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT);
> -  OUT_BATCH(params->dst.num_samples > 1 ? GEN6_SF_MSRAST_ON_PATTERN :
> 0);
> +  OUT_BATCH(params->num_samples() > 1 ? GEN6_SF_MSRAST_ON_PATTERN :
> 0);
>OUT_BATCH(0);
>OUT_BATCH(0);
>OUT_BATCH(0);
> @@ -470,7 +470,7 @@ gen7_blorp_emit_wm_config(struct brw_context *brw,
>dw1 |= GEN7_WM_DISPATCH_ENABLE; /* We are rendering */
> }
>
> -  if (params->dst.num_samples > 1) {
> +  if (params->num_samples() > 1) {
>   dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
>   if (prog_data && prog_data->persample_msaa_dispatch)
>  dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
> @@ -661,8 +661,17 @@ gen7_blorp_emit_depth_stencil_config(struct
> brw_context *brw,
> * larger to allow the fast depth clear to fit the hardware
> * alignment requirements. (8x4)
> */
> -  surfwidth = params->depth.width;
> -  surfheight = params->depth.height;
> +
> +  if (params->num_samples() > 1) {
> + /* If this is an MSAA + HIZ op, we need to program the
> +  * aligned logical size of the depth surface.
> +  */
> + surfwidth = ALIGN(params->depth.mt->logical_width0, 8);
> + surfheight = ALIGN(params->depth.mt->logical_height0, 4);
> +  } else {
> + surfwidth = params->depth.width;
> + surfheight = params->depth.height;
> +  }
> } else {
>surfwidth = params->depth.mt->logical_width0;
>surfheight = params->depth.mt->logical_height0;
> @@ -805,10 +814,11 @@ gen7_blorp_exec(struct brw_context *brw,
> uint32_t sampler_offset = 0;
>
> uint32_t prog_offset = params->get_wm_prog(brw, &prog_data);
> -   gen6_emit_3dstate_multisample(brw, params->dst.num_samples);
> +   unsigned num_samples = params->num_samples();
> +   gen6_emit_3dstate_multisample(brw, num_samples);
> gen6_emit_3dstate_sample_mask(brw,
> - params->dst.num_samples > 1 ?
> - (1 << params->dst.num_samples) - 1 : 1);
> + num_samples > 1 ?
> + (1 << num_samples) - 1 : 1);
> gen6_blorp_emit_state_base_address(brw, params);
> gen6_blorp_emit_vertices(brw, params);
> gen7_blorp_emit_urb_config(brw, params);
> --
> 2.1.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

Looks good to me. Verified the requirement in IVB PRM.
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Michel Dänzer

On 06.11.2014 19:18, Joonas Lahtinen wrote:

On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:

On 05.11.2014 20:14, Joonas Lahtinen wrote:


Modified not refer to DRI3, just uses the present extension to get rid
of the excess buffer invalidations.


AFAICT there's no fallback from your changes to the current behaviour if
the X server doesn't support the Present extension. There probably needs
to be such a fallback.


It gets rid of such nasty hack (the intel_viewport one), that I thought
there is no point making fallback. Because without this, the egl dri2
backend is fundamentally broken anyway.


Well, AFAICT your code uses Present extension functionality 
unconditionally, without checking that the X server supports Present. I 
can't see how that could possibly work on an X server which doesn't 
support Present, but I think it would be better to keep it working at 
least as badly as it does now in that case. :)



--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/blorp: Fix hiz ops on MSAA surfaces

2014-11-06 Thread Chris Forbes
Thanks for the review. Ken has a slightly cleaner version of the patch
which avoids adding the helper function.
On Nov 7, 2014 2:29 AM, "Anuj Phogat"  wrote:

>
>
> On Tue, Nov 4, 2014 at 8:40 AM, Chris Forbes  wrote:
>
>> Two things were broken here:
>> - The depth/stencil surface dimensions were broken for MSAA.
>> - Sample count was programmed incorrectly.
>>
>> Result was the depth resolve didn't work correctly on MSAA surfaces, and
>> so sampling the surface later produced garbage.
>>
>> Fixes the new piglit test arb_texture_multisample-sample-depth, and
>> various artifacts in 'tesseract' with msaa=4 glineardepth=0.
>>
>> Not observed any piglit regressions on Haswell.
>>
>> Signed-off-by: Chris Forbes 
>> ---
>>  src/mesa/drivers/dri/i965/brw_blorp.h|  4 
>>  src/mesa/drivers/dri/i965/gen7_blorp.cpp | 24 +---
>>  2 files changed, 21 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
>> b/src/mesa/drivers/dri/i965/brw_blorp.h
>> index ff68000..c4ff0f7 100644
>> --- a/src/mesa/drivers/dri/i965/brw_blorp.h
>> +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
>> @@ -236,6 +236,10 @@ public:
>> bool use_wm_prog;
>> brw_blorp_wm_push_constants wm_push_consts;
>> bool color_write_disable[4];
>> +
>> +   unsigned num_samples() const {
>> +  return dst.mt ? dst.num_samples : depth.mt->num_samples;
>> +   }
>>  };
>>
>>
>> diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
>> b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
>> index 206a6ff..cc57ffe 100644
>> --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
>> +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
>> @@ -415,7 +415,7 @@ gen7_blorp_emit_sf_config(struct brw_context *brw,
>>OUT_BATCH(_3DSTATE_SF << 16 | (7 - 2));
>>OUT_BATCH(params->depth_format <<
>>  GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT);
>> -  OUT_BATCH(params->dst.num_samples > 1 ? GEN6_SF_MSRAST_ON_PATTERN
>> : 0);
>> +  OUT_BATCH(params->num_samples() > 1 ? GEN6_SF_MSRAST_ON_PATTERN :
>> 0);
>>OUT_BATCH(0);
>>OUT_BATCH(0);
>>OUT_BATCH(0);
>> @@ -470,7 +470,7 @@ gen7_blorp_emit_wm_config(struct brw_context *brw,
>>dw1 |= GEN7_WM_DISPATCH_ENABLE; /* We are rendering */
>> }
>>
>> -  if (params->dst.num_samples > 1) {
>> +  if (params->num_samples() > 1) {
>>   dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
>>   if (prog_data && prog_data->persample_msaa_dispatch)
>>  dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
>> @@ -661,8 +661,17 @@ gen7_blorp_emit_depth_stencil_config(struct
>> brw_context *brw,
>> * larger to allow the fast depth clear to fit the hardware
>> * alignment requirements. (8x4)
>> */
>> -  surfwidth = params->depth.width;
>> -  surfheight = params->depth.height;
>> +
>> +  if (params->num_samples() > 1) {
>> + /* If this is an MSAA + HIZ op, we need to program the
>> +  * aligned logical size of the depth surface.
>> +  */
>> + surfwidth = ALIGN(params->depth.mt->logical_width0, 8);
>> + surfheight = ALIGN(params->depth.mt->logical_height0, 4);
>> +  } else {
>> + surfwidth = params->depth.width;
>> + surfheight = params->depth.height;
>> +  }
>> } else {
>>surfwidth = params->depth.mt->logical_width0;
>>surfheight = params->depth.mt->logical_height0;
>> @@ -805,10 +814,11 @@ gen7_blorp_exec(struct brw_context *brw,
>> uint32_t sampler_offset = 0;
>>
>> uint32_t prog_offset = params->get_wm_prog(brw, &prog_data);
>> -   gen6_emit_3dstate_multisample(brw, params->dst.num_samples);
>> +   unsigned num_samples = params->num_samples();
>> +   gen6_emit_3dstate_multisample(brw, num_samples);
>> gen6_emit_3dstate_sample_mask(brw,
>> - params->dst.num_samples > 1 ?
>> - (1 << params->dst.num_samples) - 1 : 1);
>> + num_samples > 1 ?
>> + (1 << num_samples) - 1 : 1);
>> gen6_blorp_emit_state_base_address(brw, params);
>> gen6_blorp_emit_vertices(brw, params);
>> gen7_blorp_emit_urb_config(brw, params);
>> --
>> 2.1.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
> Looks good to me. Verified the requirement in IVB PRM.
> Reviewed-by: Anuj Phogat 
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] egl_dri2: add a note about dri2_create_screen

2014-11-06 Thread Emil Velikov
The function is not called by platform_drm. As such one needs to
pay special attention at teardown.

Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 609afde..2094ffd 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -543,6 +543,9 @@ dri2_setup_screen(_EGLDisplay *disp)
}
 }
 
+/* All platforms but DRM call this function to create the screen, query the
+ * dri extensions, setup the vtables and populate the driver_configs.
+ * DRM inherits all that information from its display - GBM. */
 EGLBoolean
 dri2_create_screen(_EGLDisplay *disp)
 {
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] egl_dri2: fix double free on drm platforms

2014-11-06 Thread Emil Velikov
Earlier commit failed to attribure that for drm platforms one does not
call dri2_create_screen, thus it does not create the screen and
driver_configs but inherits them from the "display" - gbm.

As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
the issue and still cleanup correctly for non-drm platforms.

Cc: Kenneth Graunke 
Cc: Mark Janes 
Reported-by: Kenneth Graunke 
Reported-by: Mark Janes 
Signed-off-by: Emil Velikov 
---
 src/egl/drivers/dri2/egl_dri2.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index dcc3239..609afde 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -707,9 +707,18 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp)
   break;
}
 
+   /* The drm platform does not create the screen/driver_configs but reuses
+* the ones from the gbm device. As such the gbm itself is responsible
+* for the cleanup. */
+#ifdef HAVE_DRM_PLATFORM
+   if (disp->Platform != _EGL_PLATFORM_DRM) {
+#endif
for (i = 0; dri2_dpy->driver_configs[i]; i++)
   free((__DRIconfig *) dri2_dpy->driver_configs[i]);
free(dri2_dpy->driver_configs);
+#ifdef HAVE_DRM_PLATFORM
+   }
+#endif
free(dri2_dpy);
disp->DriverData = NULL;
 
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] gbm/dri: cleanup memory leak on teardown

2014-11-06 Thread Emil Velikov
During teardown we free the driver_configs list pointer, but we forget
to deallocate each config in that list.

Signed-off-by: Emil Velikov 
---
 src/gbm/backends/dri/gbm_dri.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
index f637e32..0664266 100644
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -914,8 +914,11 @@ static void
 dri_destroy(struct gbm_device *gbm)
 {
struct gbm_dri_device *dri = gbm_dri_device(gbm);
+   unsigned i;
 
dri->core->destroyScreen(dri->screen);
+   for (i = 0; dri->driver_configs[i]; i++)
+  free((__DRIconfig *) dri->driver_configs[i]);
free(dri->driver_configs);
dlclose(dri->driver);
free(dri->base.driver_name);
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Bogus bounds checking in api_validate.c?

2014-11-06 Thread Ian Romanick
While working on some other things, I came across some bounds checking
code in _mesa_validate_DrawElements (and related functions) in
api_validate.c.

  /* use indices in the buffer object */
  /* make sure count doesn't go outside buffer bounds */
  if (index_bytes(type, count) > ctx->Array.VAO->IndexBufferObj->Size) {
 _mesa_warning(ctx, "glDrawElements index out of buffer bounds");
 return GL_FALSE;
  }

index_bytes calculates how many bytes of data "count" indices will
occupy based on the type.  The problem is that this doesn't consider
the base pointer.  As far as I can tell, if I had a 64 byte buffer
object for my index data, and I did

glDrawElements(GL_POINTS, 16, GL_UNSIGNED_INT, 60);

_mesa_validate_DrawElements would say, "Ok!"

Am I missing something, or is this just broken?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi  wrote:
> Then I do recommend removing the "if (cpu_has_sse4_1)" from this patch and
> similar places, because there is no runtime CPU dispatching happening for
> SSE optimized code paths in action and just adds extra overhead (unnecessary
> branches) to the generated code.

No. Sorry, I realize I misread your previous question:

>> I guess checking for "cpu_has_sse4_1" is unnecessary if it isn't
>> controllable by user at runtime; because "USE_SSE41" is a compile time check
>> and requires the target machine to be SSE 4.1 capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Ian Romanick
On 11/06/2014 06:16 PM, Michel Dänzer wrote:
> On 06.11.2014 19:18, Joonas Lahtinen wrote:
>> On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:
>>> On 05.11.2014 20:14, Joonas Lahtinen wrote:

 Modified not refer to DRI3, just uses the present extension to get rid
 of the excess buffer invalidations.
>>>
>>> AFAICT there's no fallback from your changes to the current behaviour if
>>> the X server doesn't support the Present extension. There probably needs
>>> to be such a fallback.
>>
>> It gets rid of such nasty hack (the intel_viewport one), that I thought
>> there is no point making fallback. Because without this, the egl dri2
>> backend is fundamentally broken anyway.
> 
> Well, AFAICT your code uses Present extension functionality
> unconditionally, without checking that the X server supports Present. I
> can't see how that could possibly work on an X server which doesn't
> support Present, but I think it would be better to keep it working at
> least as badly as it does now in that case. :)

I was going to say pretty much the same thing.  Aren't there (non-Intel)
drivers that don't do Present?  If I'm not mistaken, some parts of DRI3
(not sure about Present) are even disabled in the Intel driver when SNA
is in use... or at least that was the case at one point.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Remove _mesa_max_buffer_index

2014-11-06 Thread Ian Romanick
From: Ian Romanick 

It appears to be completely unused since f9be8543 (February 2012).

Signed-off-by: Ian Romanick 
Cc: Kenneth Graunke 
---
 src/mesa/main/api_validate.c | 46 
 src/mesa/main/api_validate.h |  6 --
 2 files changed, 52 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 0d206d8..d4b962f 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -77,52 +77,6 @@ index_bytes(GLenum type, GLsizei count)
 
 
 /**
- * Find the max index in the given element/index buffer
- */
-GLuint
-_mesa_max_buffer_index(struct gl_context *ctx, GLuint count, GLenum type,
-   const void *indices,
-   struct gl_buffer_object *elementBuf)
-{
-   const GLubyte *map = NULL;
-   GLuint max = 0;
-   GLuint i;
-
-   if (_mesa_is_bufferobj(elementBuf)) {
-  /* elements are in a user-defined buffer object.  need to map it */
-  map = ctx->Driver.MapBufferRange(ctx, 0, elementBuf->Size,
-  GL_MAP_READ_BIT, elementBuf,
-   MAP_INTERNAL);
-  /* Actual address is the sum of pointers */
-  indices = (const GLvoid *) ADD_POINTERS(map, (const GLubyte *) indices);
-   }
-
-   if (type == GL_UNSIGNED_INT) {
-  for (i = 0; i < count; i++)
- if (((GLuint *) indices)[i] > max)
-max = ((GLuint *) indices)[i];
-   }
-   else if (type == GL_UNSIGNED_SHORT) {
-  for (i = 0; i < count; i++)
- if (((GLushort *) indices)[i] > max)
-max = ((GLushort *) indices)[i];
-   }
-   else {
-  ASSERT(type == GL_UNSIGNED_BYTE);
-  for (i = 0; i < count; i++)
- if (((GLubyte *) indices)[i] > max)
-max = ((GLubyte *) indices)[i];
-   }
-
-   if (map) {
-  ctx->Driver.UnmapBuffer(ctx, elementBuf, MAP_INTERNAL);
-   }
-
-   return max;
-}
-
-
-/**
  * Check if OK to draw arrays/elements.
  */
 static GLboolean
diff --git a/src/mesa/main/api_validate.h b/src/mesa/main/api_validate.h
index 8238df1..0bb91c6 100644
--- a/src/mesa/main/api_validate.h
+++ b/src/mesa/main/api_validate.h
@@ -35,12 +35,6 @@ struct gl_context;
 struct gl_transform_feedback_object;
 
 
-extern GLuint
-_mesa_max_buffer_index(struct gl_context *ctx, GLuint count, GLenum type,
-   const void *indices,
-   struct gl_buffer_object *elementBuf);
-
-
 extern bool
 _mesa_is_valid_prim_mode(struct gl_context *ctx, GLenum mode);
 
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Bogus bounds checking in api_validate.c?

2014-11-06 Thread Kenneth Graunke
On Thursday, November 06, 2014 08:09:18 PM Ian Romanick wrote:
> While working on some other things, I came across some bounds checking
> code in _mesa_validate_DrawElements (and related functions) in
> api_validate.c.
> 
>   /* use indices in the buffer object */
>   /* make sure count doesn't go outside buffer bounds */
>   if (index_bytes(type, count) > ctx->Array.VAO->IndexBufferObj->Size) {
>  _mesa_warning(ctx, "glDrawElements index out of buffer bounds");
>  return GL_FALSE;
>   }
> 
> index_bytes calculates how many bytes of data "count" indices will
> occupy based on the type.  The problem is that this doesn't consider
> the base pointer.  As far as I can tell, if I had a 64 byte buffer
> object for my index data, and I did
> 
> glDrawElements(GL_POINTS, 16, GL_UNSIGNED_INT, 60);
> 
> _mesa_validate_DrawElements would say, "Ok!"
> 
> Am I missing something, or is this just broken?

It sure seems broken to me - but, thankfully, in a conservative fashion.  (It 
will say some invalid things are OK, but won't say legal things are invalid.)

Software drivers may be relying on this working to avoid a crash.

I checked the Ivybridge documentation, and found:

"Software is responsible for ensuring that accesses outside the IB do not
 occur. This is possible as software can compute the range of IB values
 referenced by a 3DPRIMITIVE command (knowing the StartVertexLocation,
 InstanceCount, and VerticesPerInstance values) and can then compare this
 range to the IB extent."

which makes it sound like an accurate computation is necessary.  But, right 
below that, it says:

"this field contains the address of the last valid byte in the index buffer.
 Any index buffer reads past this address returns an index value of 0 (as if
 the index buffer was zero-extended)."

So the earlier statement is false; i965 will draw the in-bounds elements 
correctly, and then repeat element 0 over and over for any out-of-bounds data, 
resulting in one strange primitive and a lot of degenerate ones.

It's proabbly worth fixing, but I doubt it's critical either.

A more interesting thing to fix, I think, would be enforcing alignment 
restrictions (i.e. your offset has to be a multiple of the IB element size).

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Remove _mesa_max_buffer_index

2014-11-06 Thread Kenneth Graunke
On Thursday, November 06, 2014 11:00:13 PM Ian Romanick wrote:
> From: Ian Romanick 
> 
> It appears to be completely unused since f9be8543 (February 2012).
> 
> Signed-off-by: Ian Romanick 
> Cc: Kenneth Graunke 

Yep, looks unused to me.

Reviewed-by: Kenneth Graunke 

My 2012 commit message in f9be8543 is not quite true - glDrawRangeElements is 
actually useful even when working with VBOs.  The driver can use the given 
range to decide what portion of a VBO is busy/going to be read by the GPU.  It 
can then optimize glBufferSubData and glMapBufferRange to avoid unnecessary 
synchronization when an application tries to overwrite an unused subregion.

It turns out that's also rather crucial for performance in many modern apps.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl_dri2: fix double free on drm platforms

2014-11-06 Thread Kenneth Graunke
On Friday, November 07, 2014 03:50:42 AM Emil Velikov wrote:
> Earlier commit failed to attribure that for drm platforms one does not
> call dri2_create_screen, thus it does not create the screen and
> driver_configs but inherits them from the "display" - gbm.
> 
> As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
> the issue and still cleanup correctly for non-drm platforms.
> 
> Cc: Kenneth Graunke 
> Cc: Mark Janes 
> Reported-by: Kenneth Graunke 
> Reported-by: Mark Janes 
> Signed-off-by: Emil Velikov 
> ---
>  src/egl/drivers/dri2/egl_dri2.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/egl/drivers/dri2/egl_dri2.c 
b/src/egl/drivers/dri2/egl_dri2.c
> index dcc3239..609afde 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -707,9 +707,18 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp)
>break;
> }
>  
> +   /* The drm platform does not create the screen/driver_configs but reuses
> +* the ones from the gbm device. As such the gbm itself is responsible
> +* for the cleanup. */

Usually */ goes on its own line (same applies to later patches).

> +#ifdef HAVE_DRM_PLATFORM

I don't think you need these #ifdefs - _EGL_PLATFORM_DRM is an enum value 
defined by src/egl/main/egldisplay.h (not the public EGL headers) regardless 
of build options.  I imagine if you didn't build with HAVE_DRM_PLATFORM, disp-
>Platform will never be _EGL_PLATFORM_DRM.

> +   if (disp->Platform != _EGL_PLATFORM_DRM) {
> +#endif
> for (i = 0; dri2_dpy->driver_configs[i]; i++)
>free((__DRIconfig *) dri2_dpy->driver_configs[i]);
> free(dri2_dpy->driver_configs);
> +#ifdef HAVE_DRM_PLATFORM
> +   }
> +#endif
> free(dri2_dpy);
> disp->DriverData = NULL;

Thanks for fixing this so quickly!  The series fixes my crashes, and is:
Reviewed-and-tested-by: Kenneth Graunke 

with, of course, the caveat that I don't know the EGL code worth beans. :)

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev