date:20150708

[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1

2015-07-08 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91254

--- Comment #4 from Tomasz C.  ---
I made a test the bisection method, here's the result:

0e0e23ef537c9add672ff322f34e129a07edc55e is the first bad commit
commit 0e0e23ef537c9add672ff322f34e129a07edc55e
Author: Jordan Justen 
Date:   Wed Apr 22 11:43:50 2015 -0700

i965/state: Emit pipeline select when changing pipelines

Signed-off-by: Jordan Justen 
Reviewed-by: Kenneth Graunke 

:04 04 725572870b90c40592e3d382faef748469403f67
ae158c5c1fc9c0487dbfad8b338ebdb27300c0a1 M  src


It will be helpful?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-08 Thread Chris Wilson

On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote:
> Hi Chris,
> 
> I made a genuine effort to review this patch, hoping to better understand
> the various changes and what you were trying to accomplish.  I spent many
> hours reading and trying to enumerate changes - or potential changes I
> needed to look hard at to convince myself whether they were correct.
> 
> I came up with a frighteningly long list of changes:

* Locking completely overhauled.
 
> * Relocation handling changes considerably (the original point of
>   Kristian's endeavour which led up to this).

It's been a long time since 2011.
 
> * Fencing, busy tracking, and sync objects are completely reworked.

sync obje
That's a slight overstatement.
 
> * Render-to-texture cache flushing and dirty buffer tracking is
>   completely reworked.

And immensely simplified. A hash table? Did you notice that the current
dirty tracking misses where the same buffer is rewritten?
 
> * Gen7 SOL buffer offset resetting now uses MI_LOAD_REGISTER_IMM rather
>   than the execbuf2 parameter, requiring the command validator on Haswell.
>   This effectively bumps the kernel requirement from v3.6 to v4.2-rc1,
>   which will simply not fly with distributions at this time.

My bad, completely missed that there was EXT_transform_feedback that only
did the subset. I wondered why you had code to the pipelined loads/save
but then still flushed the batch every time.
 
> * glBufferSubData() now uses intel_upload_data() rather than allocating
>   a temporary BO.  This is the first use of the upload buffer by the
>   BLT engine, and could imply that the upload buffer's lifetime now
>   extends across batches - longer than before.  Separable change that
>   requires separate evaluation and justification.

Yes and no, it is easy to demonstrate that the buffer's lifetime is
longer than a batch current.
 
> * Per buffer cache-coherency checking rather than brw->has_llc?

We've pointed out the bugs in the current usage of brw->has_llc before.
 
> * glBufferSubData()'s prefer_stall_to_blit flag appears to depend on
>   per-buffer cache-coherency rather than being set globally.  Could
>   impact performance of buffer uploads.
> 
> * Potential missing flushes (which can cause hangs or misrendering):
> 
>   - It looks like calling brw_bo_busy() with BUSY_FLUSH causes a flush
> when necessary.  However, some instances of the old bo_busy,
> bo_references, batch_flush pattern are replaced without that flag.
> One occurrance was in BufferSubData(); I did not spend time to
> check every case.

The change is still accurate.

>   - Flushes are often done implicitly by e.g. brw_bo_read calling
> brw_bo_map with the appropriate flags, and many explicit checks
> and flushes are removed.  Not bad, but needs careful review.
>
>   - Gen6+ query object code might have dropped an implicit flush
> guaranteeing that when the GL application requests the result,
> any pending work will be kicked off so they can poll/spin
> repeatedly until the result arrives.

The flush is not missing.
 
>   - New code to avoid redundant flushes.
 
> * perf_debug() warnings are removed all over the code for some reason:

The problem is perf_debug() was a layering violation (my goal was to
keep brw_batch a mini-library, either to extract it back to libdrm or
for reuse). I have an idea of using a macro like

#define PERF_DEBUG_LOC(brw, str) ((brw->perf_debug ? &(struct loc){__FILE__,
__FUNCTION__, __LINE__, str)} : NULL))

brw_bo_map(..., PERF_DEBUG_LOC(brw));

Then pass back to a brw_context callback if a function stalls.

>   - Unsynchronized maps/BufferSubData not working on !LLC platforms?
> If they work now, that's a huge change!  If not, why drop the warning?

They have always worked on !llc. The same caveats with using multiple
aliasing to the same physical address through different modes of write
caching apply to both llc and !llc.
 
>   - Warnings about stalls on mapping buffers and miptrees are gone now.
> These have been useful in tracking down performance problems.  They
> might not always be accurate, but surely removing them should be done
> separately with justification?

See issues with calling perf_debug.
 
>   - Warnings about stalls on query objects are gone.  I've used these when
> analyzing application performance.  Why?

See issues with calling perf_debug.
 
>   - Warnings about implicit flushes are gone.

See issues with calling perf_debug.

It was at the back of my mind that I had to fix perf_debug - if I had
put in the list of issues, I would probably would have remembered to fix
it before posting.
 
> * BO unmap calls appears to be missing in some places.  A few map calls
>   have moved around in hard-to-follow ways.  Unclear how lifetimes of
>   buffers and lifetimes of maps are affected.

Bo unmap is a figment of your imagination. libdrm never unmaps a bo until
it is closed (simply to avoid the cost of refaulting the objec

[Mesa-dev] [PATCH] nir: Fix comment above nir_convert_from_ssa() prototype.

2015-07-08 Thread Kenneth Graunke

Presumably Connor renamed the parameter, inverting the sense.
Update the comment accordingly.

Cc: Connor Abbott 
Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 4cb7d2f..9e2a281 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1659,9 +1659,9 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def 
*b);
 void nir_convert_to_ssa_impl(nir_function_impl *impl);
 void nir_convert_to_ssa(nir_shader *shader);
 
-/* If convert_everything is true, convert all values (even those not involved
- * in a phi node) to registers. If false, only convert SSA values involved in
- * phi nodes to registers.
+/* If phi_webs_only is true, only convert SSA values involved in phi nodes to
+ * registers.  If false, convert all values (even those not involved in a phi
+ * node) to registers.
  */
 void nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only);
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices

2015-07-08 Thread Chris Wilson

On Tue, Jul 07, 2015 at 01:42:47PM +0300, Martin Peres wrote:
> On 06/07/15 19:43, Kenneth Graunke wrote:
> >On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote:
> >>Rather than allocating a fresh page every time we clear a buffer, keep
> >>that page around between invocations by tracking the last used offset
> >>and only allocating a fresh page when we wrap.
> >>
> >>Signed-off-by: Chris Wilson 
> >>---
> >>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++---
> >>  1 file changed, 14 insertions(+), 3 deletions(-)
> >This looks okay to me.  Do you have any performance data to justify the
> >extra complexity?
> 
> I actually get a negative performance improvement on a customer
> benchmark (-1.3%). Could it be because we are waiting on the VBO at
> some point?
> 
> What benchmark did you try to get a perf improvement?

For this something as simple and cpu bound as windowed glxgears:
14550 -> 15400 (+-30fps both sides).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1

2015-07-08 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91254

--- Comment #5 from Tomasz C.  ---
This confirmed yet by compiling 10.6.1 of reverse patch:
https://git.thm.de/tjkl80/mesa/commit/0e0e23ef537c9add672ff322f34e129a07edc55e.patch

It works correctly.
Do you expect the correction in the next version?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once

2015-07-08 Thread Chris Wilson

On Mon, Jul 06, 2015 at 10:01:21AM -0700, Kenneth Graunke wrote:
> On Monday, July 06, 2015 05:12:10 PM Chris Wilson wrote:
> > On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote:
> > > 
> > > 
> > > On 06/07/15 16:15, Martin Peres wrote:
> > > >On 06/07/15 16:13, Chris Wilson wrote:
> > > >>On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote:
> > > >>>On 06/07/15 13:33, Chris Wilson wrote:
> > > Move the query for the TIMESTAMP register from context init to the
> > > screen, so that it is only queried once for all contexts.
> > > 
> > > On 32bit systems, some old kernels trigger a hw bug resulting in the
> > > TIMESTAMP register being shifted and the low bits always zero. Detect
> > > this by repeating the read a few times and check the register is
> > > incrementing.
> > > >>>You do not do the latter. You only check for the low bits.
> > > >>>
> > > >>>I guess the counter is supposed to be monotonically increasing and
> > > >>>with a resolution of a few microseconds which would make this
> > > >>>perfectly valid. Could you confirm and make sure to add this
> > > >>>information in the commit message please?
> > > >>The counter should increment every 80ns. What's misleading in what I
> > > >>wrote? It describes the hw bug and how to detect it.
> > > >
> > > >Well, it is not misleading, it just lacks this information.
> > > >
> > > >If it incremented every seconds, the patch would be stupid because
> > > >the timestamp could be at 0 and polling 10 times at a few us of
> > > >interval would always yield the same result. That's all :)
> > > 
> > > Oh, forgot to say: With this information added in the commit message
> > > and the commit message duplicated as a comment in
> > > intel_detect_timestamp(), the patch is:
> > 
> > How about:
> > 
> > On 32bit systems, some old kernels trigger a hw bug resulting in the
> > TIMESTAMP register being shifted and the low 32bits always zero. Detect
> > this by repeating the read a few times and check the register is
> > incrementing every 80ns as expected and not stuck on zero (as would be 
> > the case with the buggy kernel/hw.).
> > -Chris
> > 
> > 
> 
> It would be great to put this in a comment above the loop rather than
> only in the commit message.
> 
> Also, moving the check to screen-time and fixing the check to use a loop
> are really two separate things - but they're so trivial I'm inclined to
> not be picky. :)
> 
> Reviewed-by: Kenneth Graunke 

Pushed,
To ssh://git.freedesktop.org/git/mesa/mesa
   38c2ec5..c8d3eba  c8d3ebaffc0d7d915c1c19d54dba61fd1e57b338 -> master
Thanks for the review,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-08 Thread Emil Velikov

On 08/07/15 07:33, Matt Turner wrote:
> Cc: "10.6" 
Thanks for that I've completely forgot about this.

Seems like we never got to using mesa_sha. Can we just remove it for now
and revive it as a user comes along ?

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts

2015-07-08 Thread Chris Wilson

On Mon, Jul 06, 2015 at 09:34:10AM -0700, Kenneth Graunke wrote:
> On Monday, July 06, 2015 11:33:08 AM Chris Wilson wrote:
> > Since the workaround bo is used strictly as a write-only buffer, we need
> > only allocate one per screen and use the same one from all contexts.
> > 
> > (The caveat here is during extension initialisation, where we write into
> > and read back register values from the buffer, but that is performed only
> > once for the first context - and baring synchronisation issues should not
> > be a problem. Safer would be to move that also to the screen.)
> > 
> > Signed-off-by: Chris Wilson 
> > ---
> >  src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++---
> >  src/mesa/drivers/dri/i965/intel_screen.c | 4 
> >  src/mesa/drivers/dri/i965/intel_screen.h | 1 +
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
> > b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > index 7ee3cb6..05e14cd 100644
> > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw,
> >  * the gen6 workaround because it involves actually writing to
> >  * the buffer, and the kernel doesn't let us write to the batch.
> >  */
> > -   brw->workaround_bo = drm_intel_bo_alloc(brw->bufmgr,
> > -   "pipe_control workaround",
> > -   4096, 4096);
> > +   brw->workaround_bo = brw->intelScreen->workaround_bo;
> > if (brw->workaround_bo == NULL)
> >return -ENOMEM;
> 
> Checking for out-of-memory conditions in code that doesn't actually
> allocate anything looks funky now.  I'd be inclined just to drop the
> -ENOMEM path and make this a void function.
> 
> Alternatively, you could just fold this into the brw_context setup and
> ditch the functions altogether.  Up to you.

Ok, will refactor this since the check should now be at the screen.
> >  
> > +   drm_intel_bo_reference(brw->workaround_bo);
> > +
> > brw->pipe_controls_since_last_cs_stall = 0;
> >  
> > return 0;
> > diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> > b/src/mesa/drivers/dri/i965/intel_screen.c
> > index 839a984..cd8e6eb 100644
> > --- a/src/mesa/drivers/dri/i965/intel_screen.c
> > +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> > @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
> >  {
> > struct intel_screen *intelScreen = sPriv->driverPrivate;
> >  
> > +   drm_intel_bo_unreference(intelScreen->workaround_bo);
> > dri_bufmgr_destroy(intelScreen->bufmgr);
> > driDestroyOptionInfo(&intelScreen->optionCache);
> >  
> > @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
> >return false;
> > }
> >  
> > +   intelScreen->workaround_bo =
> > +  drm_intel_bo_alloc(intelScreen->bufmgr, "pipe_control w/a", 4096, 
> > 4096);
> > +
> 
> Seems a little funny to put this in intel_init_bufmgr, since it's not
> setting up libdrm - why not put it in the caller?

No reason. I was considering this to be global state associated with the
bufmgr so just liked to keep it together.

Thanks,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 17/18] loader: Look for any version of currently linked libudev.so

2015-07-08 Thread Chris Wilson

On Tue, Jul 07, 2015 at 08:40:12PM +0100, Emil Velikov wrote:
> On 06/07/15 11:33, Chris Wilson wrote:
> > Since there was an ABI break and linking twice against libudev.so.0 and
> > libudev.so.1 causes the application to quickly crash, we first check if
> > the application is currently linked against libudev before dlopening a
> > local handle. However for backwards/forwards compatability, we need to
> > inspect the application for current linkage against all known versions
> > first.
> > 
> > Signed-off-by: Chris Wilson 
> I'm ever so slightly conserned that RTLD_NOLOAD is not part of the POSIX
> standard, thus it's missing on some platforms (*BSD seems ok, while
> Solaris, MacOS are not).
> 
> Then again this code is not build for them so we are safe. Plus it does
> save nastry crashes :-) Feel free to add the Cc: mesa-stable tag.
> 
> Reviewed-by: Emil Velikov 

Included your addendum as a reference for the future and pushed,
To ssh://git.freedesktop.org/git/mesa/mesa
   c8d3eba..f241345  f2413457937f8f4a92e11379569be69e508d7477 -> maste
Thanks for the review.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control

2015-07-08 Thread Chris Wilson

On Mon, Jul 06, 2015 at 09:30:48AM -0700, Kenneth Graunke wrote:
> On Monday, July 06, 2015 11:33:07 AM Chris Wilson wrote:
> > With the exception of gen8, the sole user of the workaround bo are for
> > emitting pipe controls. Move it out of the purview of the batchbuffer
> > and into the pipecontrol.
> > 
> > Signed-off-by: Chris Wilson 
> 
> Reviewed-by: Kenneth Graunke 

Pushed,
To ssh://git.freedesktop.org/git/mesa/mesa
   f241345..f1d08c4  f1d08c4f75794add30d1714a4cd9ce2bf335148d -> master
Thanks for the review,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: Fix comment above nir_convert_from_ssa() prototype.

2015-07-08 Thread Connor Abbott

Whoops!

Reviewed-by: Connor Abbott 

On Wed, Jul 8, 2015 at 1:57 AM, Kenneth Graunke  wrote:
> Presumably Connor renamed the parameter, inverting the sense.
> Update the comment accordingly.
>
> Cc: Connor Abbott 
> Signed-off-by: Kenneth Graunke 
> ---
>  src/glsl/nir/nir.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 4cb7d2f..9e2a281 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1659,9 +1659,9 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def 
> *b);
>  void nir_convert_to_ssa_impl(nir_function_impl *impl);
>  void nir_convert_to_ssa(nir_shader *shader);
>
> -/* If convert_everything is true, convert all values (even those not involved
> - * in a phi node) to registers. If false, only convert SSA values involved in
> - * phi nodes to registers.
> +/* If phi_webs_only is true, only convert SSA values involved in phi nodes to
> + * registers.  If false, convert all values (even those not involved in a phi
> + * node) to registers.
>   */
>  void nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only);
>
> --
> 2.4.5
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] i965: Move the pipelined test for SO register access to the screen

2015-07-08 Thread Chris Wilson

Moving the test to the screen places it alongside the other global HW
feature tesst that want to be shared between contexts.

Signed-off-by: Chris Wilson 
Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_context.c  |  1 +
 src/mesa/drivers/dri/i965/intel_extensions.c | 69 ---
 src/mesa/drivers/dri/i965/intel_screen.c | 99 
 src/mesa/drivers/dri/i965/intel_screen.h |  7 ++
 4 files changed, 107 insertions(+), 69 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 3dac01b..705f042 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -728,6 +728,7 @@ brwCreateContext(gl_api api,
 
brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil;
brw->has_swizzling = screen->hw_has_swizzling;
+   brw->has_pipelined_so = screen->hw_has_pipelined_so;
 
brw->vs.base.stage = MESA_SHADER_VERTEX;
brw->gs.base.stage = MESA_SHADER_GEOMETRY;
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 489a00f..1414015 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -32,74 +32,6 @@
 #include "intel_reg.h"
 #include "utils.h"
 
-/**
- * Test if we can use MI_LOAD_REGISTER_MEM from an untrusted batchbuffer.
- *
- * Some combinations of hardware and kernel versions allow this feature,
- * while others don't.  Instead of trying to enumerate every case, just
- * try and write a register and see if works.
- */
-static bool
-can_do_pipelined_register_writes(struct brw_context *brw)
-{
-   /* Supposedly, Broadwell just works. */
-   if (brw->gen >= 8)
-  return true;
-
-   static int result = -1;
-   if (result != -1)
-  return result;
-
-   /* We use SO_WRITE_OFFSET0 since you're supposed to write it (unlike the
-* statistics registers), and we already reset it to zero before using it.
-*/
-   const int reg = GEN7_SO_WRITE_OFFSET(0);
-   const int expected_value = 0x1337d0d0;
-   const int offset = 100;
-
-   /* The register we picked only exists on Gen7+. */
-   assert(brw->gen == 7);
-
-   uint32_t *data;
-   /* Set a value in a BO to a known quantity.  The workaround BO already
-* exists and doesn't contain anything important, so we may as well use it.
-*/
-   drm_intel_bo_map(brw->workaround_bo, true);
-   data = brw->workaround_bo->virtual;
-   data[offset] = 0x;
-   drm_intel_bo_unmap(brw->workaround_bo);
-
-   /* Write the register. */
-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-   OUT_BATCH(reg);
-   OUT_BATCH(expected_value);
-   ADVANCE_BATCH();
-
-   brw_emit_mi_flush(brw);
-
-   /* Save the register's value back to the buffer. */
-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));
-   OUT_BATCH(reg);
-   OUT_RELOC(brw->workaround_bo,
- I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
- offset * sizeof(uint32_t));
-   ADVANCE_BATCH();
-
-   intel_batchbuffer_flush(brw);
-
-   /* Check whether the value got written. */
-   drm_intel_bo_map(brw->workaround_bo, false);
-   data = brw->workaround_bo->virtual;
-   bool success = data[offset] == expected_value;
-   drm_intel_bo_unmap(brw->workaround_bo);
-
-   result = success;
-
-   return success;
-}
-
 static bool
 can_write_oacontrol(struct brw_context *brw)
 {
@@ -327,7 +259,6 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_texture_compression_bptc = true;
   ctx->Extensions.ARB_texture_view = true;
 
-  brw->has_pipelined_so = can_do_pipelined_register_writes(brw);
   if (brw->has_pipelined_so) {
  ctx->Extensions.ARB_draw_indirect = true;
  ctx->Extensions.ARB_transform_feedback2 = true;
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 71f2957..9067399 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1157,6 +1157,104 @@ intel_detect_timestamp(struct intel_screen *screen)
 }
 
 /**
+ * Test if we can use MI_LOAD_REGISTER_MEM from an untrusted batchbuffer.
+ *
+ * Some combinations of hardware and kernel versions allow this feature,
+ * while others don't.  Instead of trying to enumerate every case, just
+ * try and write a register and see if works.
+ */
+static bool
+intel_detect_pipelined_register(struct intel_screen *screen,
+   int reg, uint32_t expected_value)
+{
+   const int offset = 100;
+
+   drm_intel_bo *bo;
+   uint32_t buf[100];
+   uint32_t *batch = buf;
+
+   uint32_t *data;
+   /* Set a value in a BO to a known quantity.  The workaround BO already
+* exists and doesn't contain anything important, so we may as well use it.
+*/
+   if (drm_intel_bo_map(screen->workaround_bo, true))
+  return false;
+
+   data = screen->workaround_bo->virtual;

[Mesa-dev] [PATCH 2/5] i965: Share the workaround bo between all contexts

2015-07-08 Thread Chris Wilson

Since the workaround bo is used strictly as a write-only buffer, we need
only allocate one per screen and use the same one from all contexts.

(The caveat here is during extension initialisation, where we write into
and read back register values from the buffer, but that is performed only
once for the first context - and baring synchronisation issues should not
be a problem. Safer would be to move that also to the screen.)

v2: Give the workaround bo its own init function and don't piggy back
intel_bufmgr_init() since it is not that related.

Signed-off-by: Chris Wilson 
Cc: Kenneth Graunke 
Cc: Martin Peres 
---
 src/mesa/drivers/dri/i965/brw_context.c  |  7 +--
 src/mesa/drivers/dri/i965/brw_context.h  |  4 ++--
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 13 -
 src/mesa/drivers/dri/i965/intel_screen.c | 15 +++
 src/mesa/drivers/dri/i965/intel_screen.h |  1 +
 5 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 8150b94..3dac01b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -819,12 +819,7 @@ brwCreateContext(gl_api api,
   }
}
 
-   if (brw_init_pipe_control(brw, devinfo)) {
-  *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY;
-  intelDestroyContext(driContextPriv);
-  return false;
-   }
-
+   brw_init_pipe_control(brw, devinfo);
brw_init_state(brw);
 
intelInitExtensions(ctx);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7f73b10..3397336 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -2002,8 +2002,8 @@ gen9_use_linear_1d_layout(const struct brw_context *brw,
   const struct intel_mipmap_tree *mt);
 
 /* brw_pipe_control.c */
-int brw_init_pipe_control(struct brw_context *brw,
- const struct brw_device_info *info);
+void brw_init_pipe_control(struct brw_context *brw,
+   const struct brw_device_info *info);
 void brw_fini_pipe_control(struct brw_context *brw);
 
 void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags);
diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 7ee3cb6..872bfe8 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -330,26 +330,21 @@ brw_emit_mi_flush(struct brw_context *brw)
brw_render_cache_set_clear(brw);
 }
 
-int
+void
 brw_init_pipe_control(struct brw_context *brw,
   const struct brw_device_info *devinfo)
 {
if (devinfo->gen < 6)
-  return 0;
+  return;
 
/* We can't just use brw_state_batch to get a chunk of space for
 * the gen6 workaround because it involves actually writing to
 * the buffer, and the kernel doesn't let us write to the batch.
 */
-   brw->workaround_bo = drm_intel_bo_alloc(brw->bufmgr,
-   "pipe_control workaround",
-   4096, 4096);
-   if (brw->workaround_bo == NULL)
-  return -ENOMEM;
+   brw->workaround_bo = brw->intelScreen->workaround_bo;
+   drm_intel_bo_reference(brw->workaround_bo);
 
brw->pipe_controls_since_last_cs_stall = 0;
-
-   return 0;
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index c0f5c92..71f2957 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv)
 {
struct intel_screen *intelScreen = sPriv->driverPrivate;
 
+   drm_intel_bo_unreference(intelScreen->workaround_bo);
dri_bufmgr_destroy(intelScreen->bufmgr);
driDestroyOptionInfo(&intelScreen->optionCache);
 
@@ -1100,6 +1101,17 @@ intel_init_bufmgr(struct intel_screen *intelScreen)
 }
 
 static bool
+intel_init_workaround_bo(struct intel_screen *intelScreen)
+{
+   /* A small scratch bo shared by all contexts, primarily used
+* for doing PIPECONTROL serialisation writes that are discarded.
+*/
+   intelScreen->workaround_bo =
+  drm_intel_bo_alloc(intelScreen->bufmgr, "pipe_control w/a", 4096, 4096);
+   return intelScreen->workaround_bo != NULL;
+}
+
+static bool
 intel_detect_swizzling(struct intel_screen *screen)
 {
drm_intel_bo *buffer;
@@ -1388,6 +1400,9 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
if (!intel_init_bufmgr(intelScreen))
return false;
 
+   if (!intel_init_workaround_bo(intelScreen))
+   return false;
+
intelScreen->deviceID = drm_intel_bufmgr_gem_get_devid(intelScreen->bufmgr);
intelScreen->devinfo = brw_get_device_info(intelScreen->deviceID,
   brw_get_revision(psp->fd));
diff --git a/src/mesa/drivers/dri/i965/intel_scre

[Mesa-dev] [PATCH 1/5] i965: Only flush the batchbuffer if we need to zero the SO offsets

2015-07-08 Thread Chris Wilson

If we don't have pipelined register access (e.g. Haswell before kernel
v4.2), then we can only implement EXT_transform_feedback by reseting the
SO offsets *between* batches. However, if we do have pipelined access to
the SO registers on gen7, we can simply emit an inline reset of the SO
registers without a full batch flush.

Signed-off-by: Chris Wilson 
Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  1 +
 src/mesa/drivers/dri/i965/gen7_sol_state.c   | 12 +++-
 src/mesa/drivers/dri/i965/intel_extensions.c |  3 ++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 65f34c3..7f73b10 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1125,6 +1125,7 @@ struct brw_context
bool must_use_separate_stencil;
bool has_llc;
bool has_swizzling;
+   bool has_pipelined_so;
bool has_surface_tile_offset;
bool has_compr4;
bool has_negative_rhw_bug;
diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 41573a8..140752c 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -446,7 +446,7 @@ gen7_begin_transform_feedback(struct gl_context *ctx, 
GLenum mode,
/* Reset the SO buffer offsets to 0. */
if (brw->gen >= 8) {
   brw_obj->zero_offsets = true;
-   } else {
+   } else if (!brw->has_pipelined_so) {
   intel_batchbuffer_flush(brw);
   brw->batch.needs_sol_reset = true;
}
@@ -462,6 +462,16 @@ gen7_begin_transform_feedback(struct gl_context *ctx, 
GLenum mode,
   brw_obj->prims_generated[i] = 0;
}
 
+   if (brw->gen == 7 && brw->has_pipelined_so) {
+  for (int i = 0; i < 4; i++) {
+ BEGIN_BATCH(3);
+ OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
+ OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
+ OUT_BATCH(0);
+ ADVANCE_BATCH();
+  }
+   }
+
/* Store the starting value of the SO_NUM_PRIMS_WRITTEN counters. */
gen7_save_primitives_written_counters(brw, brw_obj);
 
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 6b3bd12..489a00f 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -327,7 +327,8 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_texture_compression_bptc = true;
   ctx->Extensions.ARB_texture_view = true;
 
-  if (can_do_pipelined_register_writes(brw)) {
+  brw->has_pipelined_so = can_do_pipelined_register_writes(brw);
+  if (brw->has_pipelined_so) {
  ctx->Extensions.ARB_draw_indirect = true;
  ctx->Extensions.ARB_transform_feedback2 = true;
  ctx->Extensions.ARB_transform_feedback3 = true;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] i965: Combine the multiple pipelined register detection into one round-trip

2015-07-08 Thread Chris Wilson

Combining the multiple access checks into a few batches and a single
serialising read can reduce detection times from around 100us to 70us on
a fast Haswell system.

Signed-off-by: Chris Wilson 
Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_screen.c | 177 +++
 1 file changed, 109 insertions(+), 68 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index cb49e9a..595d2dc 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1156,6 +1156,12 @@ intel_detect_timestamp(struct intel_screen *screen)
return loop > 0;
 }
 
+struct detect_pipelined_register {
+   uint32_t reg;
+   uint32_t expected_value;
+   bool *result;
+};
+
 /**
  * Test if we can use MI_LOAD_REGISTER_MEM from an untrusted batchbuffer.
  *
@@ -1163,107 +1169,143 @@ intel_detect_timestamp(struct intel_screen *screen)
  * while others don't.  Instead of trying to enumerate every case, just
  * try and write a register and see if works.
  */
-static bool
-intel_detect_pipelined_register(struct intel_screen *screen,
-   int reg, uint32_t expected_value)
+static void
+__intel_detect_pipelined_registers(struct intel_screen *screen,
+   struct detect_pipelined_register *r,
+   int count)
 {
const int offset = 100;
-
-   drm_intel_bo *bo;
-   uint32_t buf[100];
-   uint32_t *batch = buf;
+   int i;
 
uint32_t *data;
+
+   if (count == 0)
+  return;
+
+   if (drm_intel_bo_map(screen->workaround_bo, true))
+  return;
+
/* Set a value in a BO to a known quantity.  The workaround BO already
 * exists and doesn't contain anything important, so we may as well use it.
 */
-   if (drm_intel_bo_map(screen->workaround_bo, true))
-  return false;
-
data = screen->workaround_bo->virtual;
-   data[offset] = 0x;
+   for (i = 0; i < count; i++)
+  data[offset+i] = 0x;
drm_intel_bo_unmap(screen->workaround_bo);
 
-   bo = drm_intel_bo_alloc(screen->bufmgr, "batchbuffer", 4096, 0);
-   if (bo == NULL)
-  return false;
+   /* Emit each access in a separate batch buffer so that if the kernel
+* rejects an individual access attempt, we don't incorrectly assume
+* all the register accesses are invalid.
+*/
+   for (i = 0; i < count; i++) {
+  drm_intel_bo *bo;
+  uint32_t buf[100];
+  uint32_t *batch = buf;
+
+  bo = drm_intel_bo_alloc(screen->bufmgr, "batchbuffer", 4096, 0);
+  if (bo == NULL)
+ continue;
+
+  /* Write the register. */
+  *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2);
+  *batch++ = r[i].reg;
+  *batch++ = r[i].expected_value;
+
+  /* Force a command barrier between the write then read */
+  *batch++ = _3DSTATE_PIPE_CONTROL | (5 - 2);
+  *batch++ = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_CS_STALL;
+  *batch++ = 0;
+  *batch++ = 0;
+  *batch++ = 0;
 
-   /* Write the register. */
-   *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2);
-   *batch++ = reg;
-   *batch++ = expected_value;
-
-   /* Force a command barrier between the write then read */
-   *batch++ = _3DSTATE_PIPE_CONTROL | (5 - 2);
-   *batch++ = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_CS_STALL;
-   *batch++ = 0;
-   *batch++ = 0;
-   *batch++ = 0;
-
-   /* Save the register's value back to the buffer. */
-   *batch++ = MI_STORE_REGISTER_MEM | (3 - 2);
-   *batch++ = reg;
-   drm_intel_bo_emit_reloc(bo, (char *)batch -(char *)buf,
-   screen->workaround_bo, offset*sizeof(uint32_t),
-   I915_GEM_DOMAIN_INSTRUCTION,
-   I915_GEM_DOMAIN_INSTRUCTION);
-   *batch++ = screen->workaround_bo->offset + offset*sizeof(uint32_t);
-
-   /* And afterwards clear the register */
-   *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2);
-   *batch++ = reg;
-   *batch++ = 0;
-
-   *batch++ = MI_BATCH_BUFFER_END;
-   if ((batch - buf) & 1)
+  /* Save the register's value back to the buffer. */
+  *batch++ = MI_STORE_REGISTER_MEM | (3 - 2);
+  *batch++ = r[i].reg;
+  drm_intel_bo_emit_reloc(bo, (char *)batch -(char *)buf,
+  screen->workaround_bo,
+  (offset+i)*sizeof(uint32_t),
+  I915_GEM_DOMAIN_INSTRUCTION,
+  I915_GEM_DOMAIN_INSTRUCTION);
+  *batch++ = screen->workaround_bo->offset + (offset+i)*sizeof(uint32_t);
+
+  /* And afterwards clear the register */
+  *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2);
+  *batch++ = r[i].reg;
   *batch++ = 0;
 
-   if (drm_intel_bo_subdata(bo, 0, (char *)batch - (char *)buf, buf) == 0)
-  drm_intel_bo_mrb_exec(bo, (char *)batch - (char *)buf,
-NULL, 0, 0,
-I915_EXEC_RENDER);
+  *batch++ = MI_BATCH_BUFFER_END;
+  if ((batch - buf) & 1)
+

[Mesa-dev] [PATCH 4/5] i965: Move the OACONTROL pipelined access check from context to screen

2015-07-08 Thread Chris Wilson

Similarly to the pipelined SO_OFFSET check, this moves the global HW
compatability check to the screen next to the other global checks.

Signed-off-by: Chris Wilson 
---
 src/mesa/drivers/dri/i965/brw_context.c  |  1 +
 src/mesa/drivers/dri/i965/brw_context.h  |  1 +
 src/mesa/drivers/dri/i965/intel_extensions.c | 67 +---
 src/mesa/drivers/dri/i965/intel_screen.c | 14 ++
 src/mesa/drivers/dri/i965/intel_screen.h |  1 +
 5 files changed, 18 insertions(+), 66 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 705f042..de939f1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -729,6 +729,7 @@ brwCreateContext(gl_api api,
brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil;
brw->has_swizzling = screen->hw_has_swizzling;
brw->has_pipelined_so = screen->hw_has_pipelined_so;
+   brw->has_pipelined_oacontrol = screen->hw_has_pipelined_oacontrol;
 
brw->vs.base.stage = MESA_SHADER_VERTEX;
brw->gs.base.stage = MESA_SHADER_GEOMETRY;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3397336..61d55ff 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1126,6 +1126,7 @@ struct brw_context
bool has_llc;
bool has_swizzling;
bool has_pipelined_so;
+   bool has_pipelined_oacontrol;
bool has_surface_tile_offset;
bool has_compr4;
bool has_negative_rhw_bug;
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 1414015..3f836c7 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -32,71 +32,6 @@
 #include "intel_reg.h"
 #include "utils.h"
 
-static bool
-can_write_oacontrol(struct brw_context *brw)
-{
-   if (brw->gen < 6 || brw->gen >= 8)
-  return false;
-
-   static int result = -1;
-   if (result != -1)
-  return result;
-
-   /* Set "Select Context ID" to a particular address (which is likely not a
-* context), but leave all counting disabled.  This should be harmless.
-*/
-   const int expected_value = 0x31337000;
-   const int offset = 110;
-
-   uint32_t *data;
-   /* Set a value in a BO to a known quantity.  The workaround BO already
-* exists and doesn't contain anything important, so we may as well use it.
-*/
-   drm_intel_bo_map(brw->workaround_bo, true);
-   data = brw->workaround_bo->virtual;
-   data[offset] = 0x;
-   drm_intel_bo_unmap(brw->workaround_bo);
-
-   /* Write OACONTROL. */
-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-   OUT_BATCH(OACONTROL);
-   OUT_BATCH(expected_value);
-   ADVANCE_BATCH();
-
-   brw_emit_mi_flush(brw);
-
-   /* Save the register's value back to the buffer. */
-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));
-   OUT_BATCH(OACONTROL);
-   OUT_RELOC(brw->workaround_bo,
- I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
- offset * sizeof(uint32_t));
-   ADVANCE_BATCH();
-
-   brw_emit_mi_flush(brw);
-
-   /* Set OACONTROL back to zero (everything off). */
-   BEGIN_BATCH(3);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
-   OUT_BATCH(OACONTROL);
-   OUT_BATCH(0);
-   ADVANCE_BATCH();
-
-   intel_batchbuffer_flush(brw);
-
-   /* Check whether the value got written. */
-   drm_intel_bo_map(brw->workaround_bo, false);
-   data = brw->workaround_bo->virtual;
-   bool success = data[offset] == expected_value;
-   drm_intel_bo_unmap(brw->workaround_bo);
-
-   result = success;
-
-   return success;
-}
-
 /**
  * Initializes potential list of extensions if ctx == NULL, or actually enables
  * extensions for a context.
@@ -207,7 +142,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.EXT_shader_integer_mix = ctx->Const.GLSLVersion >= 130;
   ctx->Extensions.EXT_timer_query = true;
 
-  if (brw->gen == 5 || can_write_oacontrol(brw)) {
+  if (brw->gen == 5 || brw->has_pipelined_oacontrol) {
  ctx->Extensions.AMD_performance_monitor = true;
  ctx->Extensions.INTEL_performance_query = true;
   }
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 9067399..cb49e9a 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1254,6 +1254,18 @@ intel_detect_pipelined_so(struct intel_screen *screen)
   0x1337d0d0);
 }
 
+static bool
+intel_detect_pipelined_oacontrol(struct intel_screen *screen)
+{
+   if (screen->devinfo->gen < 6 || screen->devinfo->gen >= 8)
+  return false;
+
+   /* Set "Select Context ID" to a particular address (which is likely not a
+* context), but leave all counting disabled.  This should be harmless.
+*/
+   return intel_detect_pipelined

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-08 Thread Chris Wilson

On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote:
> On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote:
> > * Gen4-5 structure changes.

Did you mean brw_structs.h?

diff --git a/src/mesa/drivers/dri/i965/brw_structs.h 
b/src/mesa/drivers/dri/i965/brw_structs.h
index 55338c0..e167254 100644
--- a/src/mesa/drivers/dri/i965/brw_structs.h
+++ b/src/mesa/drivers/dri/i965/brw_structs.h
@@ -391,13 +391,16 @@ struct brw_sf_unit_state
   unsigned pad3:1;
} thread4;
 
-   struct
+   union
{
+  struct {
 unsigned front_winding:1;
 unsigned viewport_transform:1;
 unsigned pad0:3;
 unsigned sf_viewport_state_offset:27; /* Offset from 
GENERAL_STATE_BASE */
   } sf5;
+  uint32_t dw5;
+   };
 
struct
{
@@ -525,15 +528,17 @@ struct brw_wm_unit_state
struct thread2 thread2;
struct thread3 thread3;
 
+   union {
   struct {
 unsigned stats_enable:1;
 unsigned depth_buffer_clear:1;
 unsigned sampler_count:3;
 unsigned sampler_state_pointer:27;
   } wm4;
+  uint32_t dw4;
+   };
 
-   struct
-   {
+   struct {
   unsigned enable_8_pix:1;
   unsigned enable_16_pix:1;
   unsigned enable_32_pix:1;
diff --git a/src/mesa/drivers/dri/i965/brw_urb.c 
b/src/mesa/drivers/dri/i965/brw_urb.c

Or something else?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices

2015-07-08 Thread Martin Peres


On 08/07/15 12:01, Chris Wilson wrote:

On Tue, Jul 07, 2015 at 01:42:47PM +0300, Martin Peres wrote:

On 06/07/15 19:43, Kenneth Graunke wrote:

On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote:

Rather than allocating a fresh page every time we clear a buffer, keep
that page around between invocations by tracking the last used offset
and only allocating a fresh page when we wrap.

Signed-off-by: Chris Wilson 
---
  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++---
  1 file changed, 14 insertions(+), 3 deletions(-)

This looks okay to me.  Do you have any performance data to justify the
extra complexity?

I actually get a negative performance improvement on a customer
benchmark (-1.3%). Could it be because we are waiting on the VBO at
some point?

What benchmark did you try to get a perf improvement?

For this something as simple and cpu bound as windowed glxgears:
14550 -> 15400 (+-30fps both sides).
-Chris


Thanks!

I re-run the tests on the same kernel (4.0.7-2-ARCH) and another one 
(drm-intel-nighly-06-07-2015) and I can't reproduce the hit anymore... 
The hit was there, visible over the 30 rounds of the test, so this is 
weird. Let's just ignore this.


Martin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome

2015-07-08 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=90264

--- Comment #33 from Jose P.  ---
(In reply to Jose P. from comment #21)
> This seems to have been fixed, I can't see the corruption anymore. I'm using
> Chromium 43.0.2357.81 on Ubuntu 14.04 (64-bit) and Mesa 10.7.0-devel
> (git-20dca37 2015-06-23 trusty-oibaf-ppa).
This is wrong, the bug surely hasn't been fixed. As Furkan says in #18, GPU
acceleration was disabled in the Chromium package (ver 43.0.2357.81) in Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1466670/ , the
reason being "because GPU crashes (on Intel, Nvidia, and AMD) was the biggest
cause of crash reports in all of Ubuntu"...

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 3:00 AM, Emil Velikov  wrote:
> On 08/07/15 07:33, Matt Turner wrote:
>> Cc: "10.6" 
> Thanks for that I've completely forgot about this.
>
> Seems like we never got to using mesa_sha. Can we just remove it for now
> and revive it as a user comes along ?

I don't have a problem with that. I shouldn't have let in what's been committed.

I guess it mostly depends on how soon we expect to finish the shader-cache.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: bump libdrm requirement to 2.4.61 and drop in-tree workaround

2015-07-08 Thread Emil Velikov

Signed-off-by: Emil Velikov 
---
 configure.ac | 2 +-
 src/mesa/drivers/dri/i965/intel_screen.c | 5 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/configure.ac b/configure.ac
index d819bef..eb7180b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -68,7 +68,7 @@ AC_SUBST([OSMESA_VERSION])
 dnl Versions for external dependencies
 LIBDRM_REQUIRED=2.4.38
 LIBDRM_RADEON_REQUIRED=2.4.56
-LIBDRM_INTEL_REQUIRED=2.4.60
+LIBDRM_INTEL_REQUIRED=2.4.61
 LIBDRM_NVVIEUX_REQUIRED=2.4.33
 LIBDRM_NOUVEAU_REQUIRED=2.4.62
 LIBDRM_FREEDRENO_REQUIRED=2.4.57
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index c0f5c92..fd343ee 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1330,11 +1330,6 @@ set_max_gl_versions(struct intel_screen *screen)
}
 }
 
-/* drop when libdrm 2.4.61 is released */
-#ifndef I915_PARAM_REVISION
-#define I915_PARAM_REVISION 32
-#endif
-
 static int
 brw_get_revision(int fd)
 {
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/11] radeon: remove dri_mirror state

2015-07-08 Thread Emil Velikov

Most of the data stored(duplicated) was unused, and for the one that is
follow the approach set by other drivers.
This eliminates the use of legacy (dri1) types.

XXX: The radeon code is the only user of __DRIscreen::drm_version (the
only __DRIversion outside of dri1 land). Should we move it into radeon
and/or bump the min. required drm module version ?

Cc: Ian Romanick 
Cc: Marek Olšák 
Cc: Michel Dänzer 
Signed-off-by: Emil Velikov 
---
 src/mesa/drivers/dri/radeon/radeon_common.c | 18 +-
 src/mesa/drivers/dri/radeon/radeon_common_context.c |  7 ++-
 src/mesa/drivers/dri/radeon/radeon_common_context.h | 19 +++
 src/mesa/drivers/dri/radeon/radeon_texture.c|  2 +-
 4 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/radeon/radeon_common.c 
b/src/mesa/drivers/dri/radeon/radeon_common.c
index 2a8bd6c9..d834d9b 100644
--- a/src/mesa/drivers/dri/radeon/radeon_common.c
+++ b/src/mesa/drivers/dri/radeon/radeon_common.c
@@ -164,7 +164,7 @@ uint32_t radeonGetAge(radeonContextPtr radeon)
 
gp.param = RADEON_PARAM_LAST_CLEAR;
gp.value = (int *)&age;
-   ret = drmCommandWriteRead(radeon->dri.fd, DRM_RADEON_GETPARAM,
+   ret = drmCommandWriteRead(radeon->radeonScreen->driScreen->fd, 
DRM_RADEON_GETPARAM,
  &gp, sizeof(gp));
if (ret) {
fprintf(stderr, "%s: drmRadeonGetParam: %d\n", __func__,
@@ -358,8 +358,8 @@ void radeonDrawBuffer( struct gl_context *ctx, GLenum mode )
* that the front-buffer has actually been allocated.
*/
if (!was_front_buffer_rendering && 
radeon->is_front_buffer_rendering) {
-   radeon_update_renderbuffers(radeon->dri.context,
-   radeon->dri.context->driDrawablePriv, GL_FALSE);
+   radeon_update_renderbuffers(radeon->driContext,
+   radeon->driContext->driDrawablePriv, GL_FALSE);
   }
}
 
@@ -375,8 +375,8 @@ void radeonReadBuffer( struct gl_context *ctx, GLenum mode )
|| (mode == GL_FRONT);
 
if (!was_front_buffer_reading && 
rmesa->is_front_buffer_reading) {
-   radeon_update_renderbuffers(rmesa->dri.context,
-   
rmesa->dri.context->driReadablePriv, GL_FALSE);
+   radeon_update_renderbuffers(rmesa->driContext,
+   
rmesa->driContext->driReadablePriv, GL_FALSE);
}
}
/* nothing, until we implement h/w glRead/CopyPixels or CopyTexImage */
@@ -399,7 +399,7 @@ void radeon_window_moved(radeonContextPtr radeon)
 void radeon_viewport(struct gl_context *ctx)
 {
radeonContextPtr radeon = RADEON_CONTEXT(ctx);
-   __DRIcontext *driContext = radeon->dri.context;
+   __DRIcontext *driContext = radeon->driContext;
void (*old_viewport)(struct gl_context *ctx);
 
if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) {
@@ -693,6 +693,7 @@ void rcommonInitCmdBuf(radeonContextPtr rmesa)
 {
GLuint size;
struct drm_radeon_gem_info mminfo = { 0 };
+   int fd = rmesa->radeonScreen->driScreen->fd;
 
/* Initialize command buffer */
size = 256 * driQueryOptioni(&rmesa->optionCache,
@@ -711,8 +712,7 @@ void rcommonInitCmdBuf(radeonContextPtr rmesa)
"Allocating %d bytes command buffer (max state is %d 
bytes)\n",
size * 4, rmesa->hw.max_state_size * 4);
 
-   rmesa->cmdbuf.csm =
-   radeon_cs_manager_gem_ctor(rmesa->radeonScreen->driScreen->fd);
+   rmesa->cmdbuf.csm = radeon_cs_manager_gem_ctor(fd);
if (rmesa->cmdbuf.csm == NULL) {
/* FIXME: fatal error */
return;
@@ -725,7 +725,7 @@ void rcommonInitCmdBuf(radeonContextPtr rmesa)
  (void (*)(void *))rmesa->glCtx.Driver.Flush, 
&rmesa->glCtx);
 
 
-   if (!drmCommandWriteRead(rmesa->dri.fd, DRM_RADEON_GEM_INFO,
+   if (!drmCommandWriteRead(fd, DRM_RADEON_GEM_INFO,
 &mminfo, sizeof(mminfo))) {
radeon_cs_set_limit(rmesa->cmdbuf.cs, RADEON_GEM_DOMAIN_VRAM,
mminfo.vram_visible);
diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.c 
b/src/mesa/drivers/dri/radeon/radeon_common_context.c
index 3d0ceda..4660d98 100644
--- a/src/mesa/drivers/dri/radeon/radeon_common_context.c
+++ b/src/mesa/drivers/dri/radeon/radeon_common_context.c
@@ -162,10 +162,7 @@ GLboolean radeonInitContext(radeonContextPtr radeon,
_mesa_meta_init(ctx);
 
/* DRI fields */
-   radeon->dri.context = driContextPriv;
-   radeon->dri.screen = sPriv;
-   radeon->dri.fd = sPriv->fd;
-   radeon->dri.drmMinor = sPriv->drm_version.minor;
+   radeon->driContext = driC

[Mesa-dev] [PATCH 01/11] configure.ac: do not set HAVE_DRI(23) when libdrm is missing

2015-07-08 Thread Emil Velikov

These conditionals are used to guard both dri modules and loader(s).

Currently if we try to build the gallium swrast dri module (without glx)
on a system that's missing libdrm the build will fail.

Cc: 10.6 
Signed-off-by: Emil Velikov 
---
 configure.ac | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index eb7180b..a97b739 100644
--- a/configure.ac
+++ b/configure.ac
@@ -923,8 +923,8 @@ esac
 
 AM_CONDITIONAL(HAVE_DRICOMMON, test "x$enable_dri" = xyes )
 AM_CONDITIONAL(HAVE_DRISW, test "x$enable_dri" = xyes )
-AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = 
xdrm )
-AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = 
xdrm )
+AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = 
xdrm -a "x$have_libdrm" = xyes )
+AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = 
xdrm -a "x$have_libdrm" = xyes )
 AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = 
xapple )
 
 AC_ARG_ENABLE([shared-glapi],
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/11] android: don't build the kms-dri winsys

2015-07-08 Thread Emil Velikov

GBM (the only user of kms-dri) is currently not available under Android.
Considering we have no way of testing/using this let's not bother
building it for now.

Cc: Chih-Wei Huang 
Cc: Eric Anholt 
Signed-off-by: Emil Velikov 
---
 src/gallium/Android.mk|  2 +-
 src/gallium/state_trackers/dri/Android.mk |  1 -
 src/gallium/targets/dri/Android.mk|  2 +-
 src/gallium/winsys/sw/kms-dri/Android.mk  | 37 ---
 4 files changed, 2 insertions(+), 40 deletions(-)
 delete mode 100644 src/gallium/winsys/sw/kms-dri/Android.mk

diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk
index b946681..0703148 100644
--- a/src/gallium/Android.mk
+++ b/src/gallium/Android.mk
@@ -34,7 +34,7 @@ SUBDIRS := auxiliary
 
 # swrast
 ifneq ($(filter swrast,$(MESA_GPU_DRIVERS)),)
-SUBDIRS += winsys/sw/dri winsys/sw/kms-dri drivers/softpipe
+SUBDIRS += winsys/sw/dri drivers/softpipe
 endif
 
 # freedreno
diff --git a/src/gallium/state_trackers/dri/Android.mk 
b/src/gallium/state_trackers/dri/Android.mk
index 4ee4338..43f0de9 100644
--- a/src/gallium/state_trackers/dri/Android.mk
+++ b/src/gallium/state_trackers/dri/Android.mk
@@ -44,7 +44,6 @@ LOCAL_STATIC_LIBRARIES := \
libmesa_dri_common \
 
 ifneq ($(filter swrast,$(MESA_GPU_DRIVERS)),)
-LOCAL_CFLAGS += -DGALLIUM_SOFTPIPE
 LOCAL_SRC_FILES += $(drisw_SOURCES)
 endif
 
diff --git a/src/gallium/targets/dri/Android.mk 
b/src/gallium/targets/dri/Android.mk
index bccc91a..7168e1d 100644
--- a/src/gallium/targets/dri/Android.mk
+++ b/src/gallium/targets/dri/Android.mk
@@ -85,7 +85,7 @@ gallium_DRIVERS += libmesa_winsys_radeon libmesa_pipe_radeon
 LOCAL_SHARED_LIBRARIES += libdrm_radeon
 endif
 ifneq ($(filter swrast,$(MESA_GPU_DRIVERS)),)
-gallium_DRIVERS += libmesa_pipe_softpipe libmesa_winsys_sw_dri 
libmesa_winsys_sw_kms_dri
+gallium_DRIVERS += libmesa_pipe_softpipe libmesa_winsys_sw_dri
 LOCAL_CFLAGS += -DGALLIUM_SOFTPIPE
 endif
 ifneq ($(filter vc4,$(MESA_GPU_DRIVERS)),)
diff --git a/src/gallium/winsys/sw/kms-dri/Android.mk 
b/src/gallium/winsys/sw/kms-dri/Android.mk
deleted file mode 100644
index b065242..000
--- a/src/gallium/winsys/sw/kms-dri/Android.mk
+++ /dev/null
@@ -1,37 +0,0 @@
-# Mesa 3-D graphics library
-#
-# Copyright (C) 2015 Chih-Wei Huang 
-# Copyright (C) 2015 Android-x86 Open Source Project
-#
-# Permission is hereby granted, free of charge, to any person obtaining a
-# copy of this software and associated documentation files (the "Software"),
-# to deal in the Software without restriction, including without limitation
-# the rights to use, copy, modify, merge, publish, distribute, sublicense,
-# and/or sell copies of the Software, and to permit persons to whom the
-# Software is furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
-# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-# DEALINGS IN THE SOFTWARE.
-
-LOCAL_PATH := $(call my-dir)
-
-include $(LOCAL_PATH)/Makefile.sources
-
-include $(CLEAR_VARS)
-
-LOCAL_SRC_FILES := $(C_SOURCES)
-
-LOCAL_MODULE := libmesa_winsys_sw_kms_dri
-
-LOCAL_SHARED_LIBRARIES := libdrm
-
-include $(GALLIUM_COMMON_MK)
-include $(BUILD_STATIC_LIBRARY)
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/11] loader: use HAVE_LIBDRM instead of ! __NOT_HAVE_DRM_H

2015-07-08 Thread Emil Velikov

Double negatives in English language are normally avoided, plus the
former seems cleaner and more consistent.

Signed-off-by: Emil Velikov 
---
 src/loader/Android.mk  | 6 ++
 src/loader/Makefile.am | 5 +
 src/loader/SConscript  | 2 --
 src/loader/loader.c| 8 
 src/loader/pci_id_driver_map.c | 2 +-
 5 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/src/loader/Android.mk b/src/loader/Android.mk
index 92d9fd2..8690565 100644
--- a/src/loader/Android.mk
+++ b/src/loader/Android.mk
@@ -33,10 +33,8 @@ include $(CLEAR_VARS)
 LOCAL_SRC_FILES := \
$(LOADER_C_FILES)
 
-# swrast only
-ifeq ($(MESA_GPU_DRIVERS),swrast)
-LOCAL_CFLAGS += -D__NOT_HAVE_DRM_H
-else
+ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
+LOCAL_CFLAGS += -DHAVE_LIBDRM
 LOCAL_SHARED_LIBRARIES := libdrm
 endif
 
diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am
index aef1bd6..5190f7f 100644
--- a/src/loader/Makefile.am
+++ b/src/loader/Makefile.am
@@ -48,10 +48,7 @@ libloader_la_CPPFLAGS += \
 
 endif
 
-if !HAVE_LIBDRM
-libloader_la_CPPFLAGS += \
-   -D__NOT_HAVE_DRM_H
-else
+if HAVE_LIBDRM
 libloader_la_CPPFLAGS += \
$(LIBDRM_CFLAGS)
 
diff --git a/src/loader/SConscript b/src/loader/SConscript
index 16d1053..d98f11e 100644
--- a/src/loader/SConscript
+++ b/src/loader/SConscript
@@ -8,8 +8,6 @@ env.Prepend(CPPPATH = [
 '#include'
 ])
 
-env.Append(CPPDEFINES = ['__NOT_HAVE_DRM_H'])
-
 if env['udev']:
 env.PkgUseModules('UDEV')
 env.Append(CPPDEFINES = ['HAVE_LIBUDEV'])
diff --git a/src/loader/loader.c b/src/loader/loader.c
index 8780587..4ed0a1f 100644
--- a/src/loader/loader.c
+++ b/src/loader/loader.c
@@ -85,7 +85,7 @@
 #endif
 #include "loader.h"
 
-#ifndef __NOT_HAVE_DRM_H
+#ifdef HAVE_LIBDRM
 #include 
 #endif
 
@@ -501,7 +501,7 @@ sysfs_get_pci_id_for_fd(int fd, int *vendor_id, int 
*chip_id)
 }
 #endif
 
-#if !defined(__NOT_HAVE_DRM_H)
+#if defined(HAVE_LIBDRM)
 /* for i915 */
 #include 
 /* for radeon */
@@ -584,7 +584,7 @@ loader_get_pci_id_for_fd(int fd, int *vendor_id, int 
*chip_id)
if (sysfs_get_pci_id_for_fd(fd, vendor_id, chip_id))
   return 1;
 #endif
-#if !defined(__NOT_HAVE_DRM_H)
+#if HAVE_LIBDRM
if (drm_get_pci_id_for_fd(fd, vendor_id, chip_id))
   return 1;
 #endif
@@ -695,7 +695,7 @@ loader_get_driver_for_fd(int fd, unsigned driver_types)
 
if (!loader_get_pci_id_for_fd(fd, &vendor_id, &chip_id)) {
 
-#ifndef __NOT_HAVE_DRM_H
+#if HAVE_LIBDRM
   /* fallback to drmGetVersion(): */
   drmVersionPtr version = drmGetVersion(fd);
 
diff --git a/src/loader/pci_id_driver_map.c b/src/loader/pci_id_driver_map.c
index cb6f705..3c4657f 100644
--- a/src/loader/pci_id_driver_map.c
+++ b/src/loader/pci_id_driver_map.c
@@ -23,7 +23,7 @@
 
 int is_nouveau_vieux(int fd);
 
-#ifndef __NOT_HAVE_DRM_H
+#ifdef HAVE_LIBDRM
 
 #include 
 #include 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/11] Bye bye __NOT_HAVE_DRM_H

2015-07-08 Thread Emil Velikov

Hi all,

As double-negatives are not too appealing in the English language I've
decided to nuke the common ifndef __NOT_HAVE_DRM_H.

The first three patches are trivial bugfix + cleanups which I've noticed 
while browsing through, 04-09 remove all occurrences of the lastly macro 
(HAVE_LIBDRM is used instead). With the last two dropping kms_swrast 
from the scons/android build.

Note: I'm not 100% sure if the scons build sets the HAVE_LIBDRM macro, 
despite that it builds just fine. If anyone can confirm that will be 
appreciated.

Cheers,
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/11] st/dri: unwrap/remove __NOT_HAVE_DRM_H magic

2015-07-08 Thread Emil Velikov

With the dri_interface.h clean of the macro, we can remove the final
only st/dri specific use of the very same.

Seemingly it was incorrectly used, as the build-time presence of dri2 is
not libdrm specific. At run-time, the code is already limited to dri2
use-cases plus returning true, when the extension is not present (or too
old) will likely lead to a crash as one tries to use it shortly after
the dri_with_format() call.

As a side effect this gives us a nice cleanup the builds.

Signed-off-by: Emil Velikov 
---
 src/gallium/state_trackers/dri/Android.mk   |  5 +
 src/gallium/state_trackers/dri/Makefile.am  |  4 
 src/gallium/state_trackers/dri/SConscript   |  3 ---
 src/gallium/state_trackers/dri/dri_screen.h | 12 
 4 files changed, 1 insertion(+), 23 deletions(-)

diff --git a/src/gallium/state_trackers/dri/Android.mk 
b/src/gallium/state_trackers/dri/Android.mk
index 188e4a1..4ee4338 100644
--- a/src/gallium/state_trackers/dri/Android.mk
+++ b/src/gallium/state_trackers/dri/Android.mk
@@ -48,10 +48,7 @@ LOCAL_CFLAGS += -DGALLIUM_SOFTPIPE
 LOCAL_SRC_FILES += $(drisw_SOURCES)
 endif
 
-# swrast only?
-ifeq ($(MESA_GPU_DRIVERS),swrast)
-LOCAL_CFLAGS += -D__NOT_HAVE_DRM_H
-else
+ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
 LOCAL_SRC_FILES += $(dri2_SOURCES)
 LOCAL_SHARED_LIBRARIES := libdrm
 endif
diff --git a/src/gallium/state_trackers/dri/Makefile.am 
b/src/gallium/state_trackers/dri/Makefile.am
index d2c7a82..9f4deba 100644
--- a/src/gallium/state_trackers/dri/Makefile.am
+++ b/src/gallium/state_trackers/dri/Makefile.am
@@ -50,10 +50,6 @@ noinst_LTLIBRARIES = libdri.la
 libdri_la_SOURCES = $(common_SOURCES)
 
 if HAVE_DRISW
-if !HAVE_DRI2
-AM_CPPFLAGS += \
-   -D__NOT_HAVE_DRM_H
-endif
 libdri_la_SOURCES += $(drisw_SOURCES)
 endif
 
diff --git a/src/gallium/state_trackers/dri/SConscript 
b/src/gallium/state_trackers/dri/SConscript
index 89b5e61..7a7060c 100644
--- a/src/gallium/state_trackers/dri/SConscript
+++ b/src/gallium/state_trackers/dri/SConscript
@@ -5,10 +5,7 @@ Import('*')
 
 env = env.Clone()
 
-# XXX: If HAVE_DRI2
 env.PkgUseModules(['DRM'])
-# else
-#env.Append(CPPDEFINES = [('__NOT_HAVE_DRM_H', '1')])
 
 env.Append(CPPPATH = [
 '#/src',
diff --git a/src/gallium/state_trackers/dri/dri_screen.h 
b/src/gallium/state_trackers/dri/dri_screen.h
index 173f403..1263fa2 100644
--- a/src/gallium/state_trackers/dri/dri_screen.h
+++ b/src/gallium/state_trackers/dri/dri_screen.h
@@ -122,8 +122,6 @@ struct __DRIimageRec {
 
 };
 
-#ifndef __NOT_HAVE_DRM_H
-
 static INLINE boolean
 dri_with_format(__DRIscreen * sPriv)
 {
@@ -134,16 +132,6 @@ dri_with_format(__DRIscreen * sPriv)
&& (loader->getBuffersWithFormat != NULL);
 }
 
-#else
-
-static INLINE boolean
-dri_with_format(__DRIscreen * sPriv)
-{
-   return TRUE;
-}
-
-#endif
-
 void
 dri_fill_st_visual(struct st_visual *stvis, struct dri_screen *screen,
const struct gl_config *mode);
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/11] swrast: remove unneeded __NOT_HAVE_DRM_H define

2015-07-08 Thread Emil Velikov

No longer applicable since the cleanup of dri_interface.h.

Signed-off-by: Emil Velikov 
---
 src/mesa/drivers/dri/swrast/Makefile.am | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/drivers/dri/swrast/Makefile.am 
b/src/mesa/drivers/dri/swrast/Makefile.am
index bfc3c10..b1ba34d 100644
--- a/src/mesa/drivers/dri/swrast/Makefile.am
+++ b/src/mesa/drivers/dri/swrast/Makefile.am
@@ -24,7 +24,6 @@
 include Makefile.sources
 
 AM_CFLAGS = \
-   -D__NOT_HAVE_DRM_H \
-I$(top_srcdir)/include \
-I$(top_srcdir)/src/ \
-I$(top_srcdir)/src/mapi \
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/11] dri_interface: drop __NOT_HAVE_DRM_H magic

2015-07-08 Thread Emil Velikov

Signed-off-by: Emil Velikov 
---
 include/GL/internal/dri_interface.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index c827bb6..c0545b1 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -40,20 +40,9 @@
 #ifndef DRI_INTERFACE_H
 #define DRI_INTERFACE_H
 
-/* For archs with no drm.h */
-#if defined(__APPLE__) || defined(__CYGWIN__) || defined(__GNU__)
-#ifndef __NOT_HAVE_DRM_H
-#define __NOT_HAVE_DRM_H
-#endif
-#endif
-
-#ifndef __NOT_HAVE_DRM_H
-#include 
-#else
 typedef unsigned int drm_context_t;
 typedef unsigned int drm_drawable_t;
 typedef struct drm_clip_rect drm_clip_rect_t;
-#endif
 
 /**
  * \name DRI interface structures
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/11] android: dri: correctly set HAVE_LIBDRM

2015-07-08 Thread Emil Velikov

Set the macro if we're not building swrast alone.

Cc: Chih-Wei Huang 
Cc: Eric Anholt 
Signed-off-by: Emil Velikov 
---
 src/gallium/targets/dri/Android.mk | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/gallium/targets/dri/Android.mk 
b/src/gallium/targets/dri/Android.mk
index 5ba129b..bccc91a 100644
--- a/src/gallium/targets/dri/Android.mk
+++ b/src/gallium/targets/dri/Android.mk
@@ -35,17 +35,15 @@ endif
 
 LOCAL_SRC_FILES := target.c
 
-LOCAL_CFLAGS := -DDRI_TARGET -DHAVE_LIBDRM
+LOCAL_CFLAGS := -DDRI_TARGET
 
 LOCAL_SHARED_LIBRARIES := \
libdl \
libglapi \
libexpat \
 
-# swrast only?
-ifeq ($(MESA_GPU_DRIVERS),swrast)
-LOCAL_CFLAGS += -D__NOT_HAVE_DRM_H
-else
+ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
+LOCAL_CFLAGS += -DHAVE_LIBDRM
 LOCAL_SHARED_LIBRARIES += libdrm
 endif
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/11] dri/common: use HAVE_LIBDRM over __NOT_HAVE_DRM_H

2015-07-08 Thread Emil Velikov

See previous commit message for details.

Signed-off-by: Emil Velikov 
---
 src/mesa/drivers/dri/common/Android.mk  | 13 ++---
 src/mesa/drivers/dri/common/Makefile.am |  2 +-
 src/mesa/drivers/dri/common/SConscript  |  4 
 src/mesa/drivers/dri/common/dri_util.c  |  4 ++--
 4 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/common/Android.mk 
b/src/mesa/drivers/dri/common/Android.mk
index 6986f5e..6e29baf 100644
--- a/src/mesa/drivers/dri/common/Android.mk
+++ b/src/mesa/drivers/dri/common/Android.mk
@@ -43,10 +43,8 @@ LOCAL_EXPORT_C_INCLUDE_DIRS := \
 $(LOCAL_PATH) \
 $(intermediates)
 
-# swrast only
-ifeq ($(MESA_GPU_DRIVERS),swrast)
-LOCAL_CFLAGS := -D__NOT_HAVE_DRM_H
-else
+ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),)
+LOCAL_CFLAGS := -DHAVE_LIBDRM
 LOCAL_SHARED_LIBRARIES := libdrm
 endif
 
@@ -110,13 +108,6 @@ LOCAL_MODULE_CLASS := STATIC_LIBRARIES
 LOCAL_C_INCLUDES := \
 $(MESA_DRI_C_INCLUDES)
 
-# swrast only
-ifeq ($(MESA_GPU_DRIVERS),swrast)
-LOCAL_CFLAGS := -D__NOT_HAVE_DRM_H
-else
-LOCAL_SHARED_LIBRARIES := libdrm
-endif
-
 LOCAL_SRC_FILES := $(megadriver_stub_FILES)
 
 include $(MESA_COMMON_MK)
diff --git a/src/mesa/drivers/dri/common/Makefile.am 
b/src/mesa/drivers/dri/common/Makefile.am
index ae19fcb..ee6c691 100644
--- a/src/mesa/drivers/dri/common/Makefile.am
+++ b/src/mesa/drivers/dri/common/Makefile.am
@@ -58,5 +58,5 @@ if DRICOMMON_NEED_LIBDRM
 AM_CFLAGS += $(LIBDRM_CFLAGS)
 libdricommon_la_LIBADD = $(LIBDRM_LIBS)
 else
-AM_CFLAGS += -D__NOT_HAVE_DRM_H
+AM_CFLAGS += -UHAVE_LIBDRM
 endif
diff --git a/src/mesa/drivers/dri/common/SConscript 
b/src/mesa/drivers/dri/common/SConscript
index b402736..adaac29 100644
--- a/src/mesa/drivers/dri/common/SConscript
+++ b/src/mesa/drivers/dri/common/SConscript
@@ -32,10 +32,7 @@ drienv.AppendUnique(LIBS = [
 'expat',
 ])
 
-# if HAVE_DRI2
 drienv.PkgUseModules('DRM')
-# else
-#env.Append(CPPDEFINES = ['__NOT_HAVE_DRM_H'])
 
 sources = drienv.ParseSourceList('Makefile.sources', ['DRI_COMMON_FILES', 
'XMLCONFIG_FILES' ])
 
@@ -57,7 +54,6 @@ env.Append(CPPPATH = [
 ])
 
 env.Append(CPPDEFINES = [
-'__NOT_HAVE_DRM_H',
 'HAVE_DLADDR',
 ])
 
diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index ae4592c..100a727 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -40,7 +40,7 @@
 
 
 #include 
-#ifndef __NOT_HAVE_DRM_H
+#ifdef HAVE_LIBDRM
 #include 
 #endif
 #include "dri_util.h"
@@ -137,7 +137,7 @@ driCreateNewScreen2(int scrn, int fd,
 
 setupLoaderExtensions(psp, extensions);
 
-#ifndef __NOT_HAVE_DRM_H
+#ifdef HAVE_LIBDRM
 if (fd != -1) {
drmVersionPtr version = drmGetVersion(fd);
if (version) {
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/11] scons: don't build the kms-dri winsys

2015-07-08 Thread Emil Velikov

Same as previous commit - unused (gbm is not a thing outside the
autotools build).

Signed-off-by: Emil Velikov 
---
 src/gallium/SConscript|  1 -
 src/gallium/state_trackers/dri/SConscript |  1 -
 src/gallium/targets/dri/SConscript|  4 
 src/gallium/winsys/sw/kms-dri/SConscript  | 23 ---
 4 files changed, 29 deletions(-)
 delete mode 100644 src/gallium/winsys/sw/kms-dri/SConscript

diff --git a/src/gallium/SConscript b/src/gallium/SConscript
index eeb1c78..fa5fa6e 100644
--- a/src/gallium/SConscript
+++ b/src/gallium/SConscript
@@ -46,7 +46,6 @@ if env['platform'] == 'haiku':
 if env['dri']:
 SConscript([
 'winsys/sw/dri/SConscript',
-'winsys/sw/kms-dri/SConscript',
 'winsys/svga/drm/SConscript',
 ])
 
diff --git a/src/gallium/state_trackers/dri/SConscript 
b/src/gallium/state_trackers/dri/SConscript
index 7a7060c..657300b 100644
--- a/src/gallium/state_trackers/dri/SConscript
+++ b/src/gallium/state_trackers/dri/SConscript
@@ -17,7 +17,6 @@ env.Append(CPPPATH = [
 
 env.Append(CPPDEFINES = [
 ('GALLIUM_STATIC_TARGETS', '1'),
-'GALLIUM_SOFTPIPE',
 ])
 
 sources = env.ParseSourceList('Makefile.sources', 'common_SOURCES')
diff --git a/src/gallium/targets/dri/SConscript 
b/src/gallium/targets/dri/SConscript
index a51ed56..80ce8de 100644
--- a/src/gallium/targets/dri/SConscript
+++ b/src/gallium/targets/dri/SConscript
@@ -37,7 +37,6 @@ env.Prepend(LIBS = [
 svgadrm,
 svga,
 ws_dri,
-ws_kms_dri,
 softpipe,
 libloader,
 mesautil,
@@ -58,9 +57,6 @@ module = env.LoadableModule(
 env.Command('vmwgfx_dri.so', 'gallium_dri.so', "ln -f ${SOURCE} ${TARGET}")
 # swrast_dri.so
 env.Command('swrast_dri.so', 'gallium_dri.so', "ln -f ${SOURCE} ${TARGET}")
-# kms_swrast_dri.so
-env.Command('kms_swrast_dri.so', 'gallium_dri.so', "ln -f ${SOURCE} ${TARGET}")
 
 env.Alias('dri-vmwgfx', module)
 env.Alias('dri-swrast', module)
-env.Alias('dri-kms-swrast', module)
diff --git a/src/gallium/winsys/sw/kms-dri/SConscript 
b/src/gallium/winsys/sw/kms-dri/SConscript
deleted file mode 100644
index e7dd721..000
--- a/src/gallium/winsys/sw/kms-dri/SConscript
+++ /dev/null
@@ -1,23 +0,0 @@
-###
-# SConscript for kms-dri winsys
-
-
-Import('*')
-
-if env['platform'] not in ('linux'):
-Return()
-
-env = env.Clone()
-
-env.PkgUseModules('DRM')
-
-env.Append(CPPPATH = [
-'#/src/gallium/include',
-'#/src/gallium/auxiliary',
-])
-
-ws_kms_dri = env.ConvenienceLibrary(
-target = 'ws_kms_dri',
-source = env.ParseSourceList('Makefile.sources', 'C_SOURCES'),
-)
-Export('ws_kms_dri')
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/11] i915; remove unused driFd variable

2015-07-08 Thread Emil Velikov

Signed-off-by: Emil Velikov 
---
 src/mesa/drivers/dri/i915/intel_context.c | 1 -
 src/mesa/drivers/dri/i915/intel_context.h | 2 --
 2 files changed, 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i915/intel_context.c 
b/src/mesa/drivers/dri/i915/intel_context.c
index 5618dcd..c780103 100644
--- a/src/mesa/drivers/dri/i915/intel_context.c
+++ b/src/mesa/drivers/dri/i915/intel_context.c
@@ -428,7 +428,6 @@ intelInitContext(struct intel_context *intel,
 
driContextPriv->driverPrivate = intel;
intel->driContext = driContextPriv;
-   intel->driFd = sPriv->fd;
 
intel->gen = intelScreen->gen;
 
diff --git a/src/mesa/drivers/dri/i915/intel_context.h 
b/src/mesa/drivers/dri/i915/intel_context.h
index 350d35d..4ec4015 100644
--- a/src/mesa/drivers/dri/i915/intel_context.h
+++ b/src/mesa/drivers/dri/i915/intel_context.h
@@ -273,8 +273,6 @@ struct intel_context
 
bool use_early_z;
 
-   int driFd;
-
__DRIcontext *driContext;
struct intel_screen *intelScreen;
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 02/11] i915; remove unused driFd variable

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 10:07 AM, Emil Velikov  wrote:
> Signed-off-by: Emil Velikov 
> ---
>  src/mesa/drivers/dri/i915/intel_context.c | 1 -
>  src/mesa/drivers/dri/i915/intel_context.h | 2 --
>  2 files changed, 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i915/intel_context.c 
> b/src/mesa/drivers/dri/i915/intel_context.c
> index 5618dcd..c780103 100644
> --- a/src/mesa/drivers/dri/i915/intel_context.c
> +++ b/src/mesa/drivers/dri/i915/intel_context.c
> @@ -428,7 +428,6 @@ intelInitContext(struct intel_context *intel,
>
> driContextPriv->driverPrivate = intel;
> intel->driContext = driContextPriv;
> -   intel->driFd = sPriv->fd;
>
> intel->gen = intelScreen->gen;
>
> diff --git a/src/mesa/drivers/dri/i915/intel_context.h 
> b/src/mesa/drivers/dri/i915/intel_context.h
> index 350d35d..4ec4015 100644
> --- a/src/mesa/drivers/dri/i915/intel_context.h
> +++ b/src/mesa/drivers/dri/i915/intel_context.h
> @@ -273,8 +273,6 @@ struct intel_context
>
> bool use_early_z;
>
> -   int driFd;
> -
> __DRIcontext *driContext;
> struct intel_screen *intelScreen;
>
> --

s/;/:/ in the subject.

Wow, that's some DRI1 stuff!

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] loader: libudev vs sysfs vs libdrm

2015-07-08 Thread Eric Anholt

Emil Velikov  writes:

> Hello all,
>
> A recent patch by Chris, fixing some libudev fun in our loader, made
> me think if we can clear it up a bit.
>
> Having three different ways of retrieving the vendor/device ID does
> feel a bit excessive. Plus as one gets fixed others are likely to
> break - and they do.
> So here is a summary of each method, from portability POV.
>  - libudev: widely common across Linux distributions (but not all).
>  - sysfs: written by Gary Wong to target GNU Hurd and *BSD. The *BSD
> folk never got to using it though :-\

Huh?  There's no sysfs on BSD.  I actually don't see a reason for this
path to exist, unless we wanted to drop libudev entirely.  We should
pick one of these two, certainly.

>  - libdrm: used as a last resource fall-back after the above two. the
> sole option used by *BSD, MacOS and Android.
>
> libdrm seems like a nice middle ground that can be used everywhere.
> Which begs the question: from a technical POV, is there any
> advantage/disadvantage of using one over the other ?

This "use the kernel driver name" thing is a bad hack, without some
mapping in userspace from kernel driver name to userspace driver name.
It's a hack that non-pci are relying on so far, though.

One of the promises of udev was that Mesa would be able to ship some
config files that attached our drivers' names to devices, so that udev
could answer "what DRI driver do I open for this fd?" in one central
way, rather than having Mesa's loader and X's loader both trying to do
replicate the same behavior.  We haven't made use of that, though, and
it seems less important now that X doesn't do the loader thing (now our
only loader variation is trying to handle old DRI drivers with new
libGL).

> I do recall Kristian and Eric participating in this discussion before,
> but the only thing I can find is along the lines of "linux distros
> should be using libudev" :-(

Honestly, I'm leaning toward "let's just use the sysfs path and drop
libudev" -- the sysfs code exists, and I think we've got more code
complexity (and time wasted) dealing with the fallout from libudev
screwing up their ABI, compared to just using Gary's open-coded stuff.

signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] xa: don't leak fences

2015-07-08 Thread Rob Clark

From: Rob Clark 

XA was never unref'ing last_fence in the various call paths to
pipe->flush().  Add this to xa_context_flush() and update the other
open-coded calls to pipe->flush() to use xa_context_flush() instead.

This fixes a memory leak reported with xf86-video-freedreno.

Reported-by: Nicolas Dechesne 
Cc: "10.5 10.6" 
Signed-off-by: Rob Clark 
---
 src/gallium/state_trackers/xa/xa_context.c | 6 +-
 src/gallium/state_trackers/xa/xa_tracker.c | 2 +-
 src/gallium/state_trackers/xa/xa_yuv.c | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/xa/xa_context.c 
b/src/gallium/state_trackers/xa/xa_context.c
index fd49c82..ebfb290 100644
--- a/src/gallium/state_trackers/xa/xa_context.c
+++ b/src/gallium/state_trackers/xa/xa_context.c
@@ -37,7 +37,11 @@
 XA_EXPORT void
 xa_context_flush(struct xa_context *ctx)
 {
-   ctx->pipe->flush(ctx->pipe, &ctx->last_fence, 0);
+if (ctx->last_fence) {
+struct pipe_screen *screen = ctx->xa->screen;
+screen->fence_reference(screen, &ctx->last_fence, NULL);
+}
+ctx->pipe->flush(ctx->pipe, &ctx->last_fence, 0);
 }
 
 XA_EXPORT struct xa_context *
diff --git a/src/gallium/state_trackers/xa/xa_tracker.c 
b/src/gallium/state_trackers/xa/xa_tracker.c
index a384c1c..1df1da7 100644
--- a/src/gallium/state_trackers/xa/xa_tracker.c
+++ b/src/gallium/state_trackers/xa/xa_tracker.c
@@ -464,7 +464,7 @@ xa_surface_redefine(struct xa_surface *srf,
xa_min(save_height, template->height0), &src_box);
pipe->resource_copy_region(pipe, texture,
   0, 0, 0, 0, srf->tex, 0, &src_box);
-   pipe->flush(pipe, &xa->default_ctx->last_fence, 0);
+   xa_context_flush(xa->default_ctx);
 }
 
 pipe_resource_reference(&srf->tex, texture);
diff --git a/src/gallium/state_trackers/xa/xa_yuv.c 
b/src/gallium/state_trackers/xa/xa_yuv.c
index 1519639..97a1833 100644
--- a/src/gallium/state_trackers/xa/xa_yuv.c
+++ b/src/gallium/state_trackers/xa/xa_yuv.c
@@ -154,7 +154,7 @@ xa_yuv_planar_blit(struct xa_context *r,
box++;
 }
 
-r->pipe->flush(r->pipe, &r->last_fence, 0);
+xa_context_flush(r);
 
 xa_ctx_sampler_views_destroy(r);
 xa_ctx_srf_destroy(r);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1

2015-07-08 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91254

Ian Romanick  changed:

   What|Removed |Added

 CC||i...@freedesktop.org
 QA Contact|i...@freedesktop.org |jljus...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-08 Thread Kenneth Graunke

On Wednesday, July 08, 2015 03:17:35 PM Chris Wilson wrote:
> On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote:
> > On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote:
> > > * Gen4-5 structure changes.
> 
> Did you mean brw_structs.h?
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_structs.h 
> b/src/mesa/drivers/dri/i965/brw_structs.h
> index 55338c0..e167254 100644
> --- a/src/mesa/drivers/dri/i965/brw_structs.h
> +++ b/src/mesa/drivers/dri/i965/brw_structs.h
> @@ -391,13 +391,16 @@ struct brw_sf_unit_state
>unsigned pad3:1;
> } thread4;
>  
> -   struct
> +   union
> {
> +  struct {
>  unsigned front_winding:1;
>  unsigned viewport_transform:1;
>  unsigned pad0:3;
>  unsigned sf_viewport_state_offset:27; /* Offset from 
> GENERAL_STATE_BASE */
>} sf5;
> +  uint32_t dw5;
> +   };
>  
> struct
> {
> @@ -525,15 +528,17 @@ struct brw_wm_unit_state
> struct thread2 thread2;
> struct thread3 thread3;
>  
> +   union {
>struct {
>  unsigned stats_enable:1;
>  unsigned depth_buffer_clear:1;
>  unsigned sampler_count:3;
>  unsigned sampler_state_pointer:27;
>} wm4;
> +  uint32_t dw4;
> +   };
>  
> -   struct
> -   {
> +   struct {
>unsigned enable_8_pix:1;
>unsigned enable_16_pix:1;
>unsigned enable_32_pix:1;
> diff --git a/src/mesa/drivers/dri/i965/brw_urb.c 
> b/src/mesa/drivers/dri/i965/brw_urb.c
> 
> Or something else?
> -Chris

Yes - and the changes are fine.  But...separable.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager

2015-07-08 Thread Chris Wilson

On Wed, Jul 08, 2015 at 10:49:24AM -0700, Kenneth Graunke wrote:
> On Wednesday, July 08, 2015 03:17:35 PM Chris Wilson wrote:
> > On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote:
> > > On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote:
> > > > * Gen4-5 structure changes.
> > 
> > Did you mean brw_structs.h?
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_structs.h 
> > b/src/mesa/drivers/dri/i965/brw_structs.h
> > index 55338c0..e167254 100644
> > --- a/src/mesa/drivers/dri/i965/brw_structs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_structs.h
> > @@ -391,13 +391,16 @@ struct brw_sf_unit_state
> >unsigned pad3:1;
> > } thread4;
> >  
> > -   struct
> > +   union
> > {
> > +  struct {
> >  unsigned front_winding:1;
> >  unsigned viewport_transform:1;
> >  unsigned pad0:3;
> >  unsigned sf_viewport_state_offset:27; /* Offset from 
> > GENERAL_STATE_BASE */
> >} sf5;
> > +  uint32_t dw5;
> > +   };
> >  
> > struct
> > {
> > @@ -525,15 +528,17 @@ struct brw_wm_unit_state
> > struct thread2 thread2;
> > struct thread3 thread3;
> >  
> > +   union {
> >struct {
> >  unsigned stats_enable:1;
> >  unsigned depth_buffer_clear:1;
> >  unsigned sampler_count:3;
> >  unsigned sampler_state_pointer:27;
> >} wm4;
> > +  uint32_t dw4;
> > +   };
> >  
> > -   struct
> > -   {
> > +   struct {
> >unsigned enable_8_pix:1;
> >unsigned enable_16_pix:1;
> >unsigned enable_32_pix:1;
> > diff --git a/src/mesa/drivers/dri/i965/brw_urb.c 
> > b/src/mesa/drivers/dri/i965/brw_urb.c
> > 
> > Or something else?
> > -Chris
> 
> Yes - and the changes are fine.  But...separable.

Would you like a patch to provide dword alternates for all the structs,
or just the ones used for relocation processing?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] loader: libudev vs sysfs vs libdrm

2015-07-08 Thread Axel Davy


On 08/07/2015 00:15, Emil Velikov wrote :



Can anyone shed a light/cast their 2c ?

Thanks
Emil
___



The DRI3/Wayland/Gallium Nine DRI_PRIME path uses libudev,
and I'm not sure it could work without.

It is based on the ID_PATH_TAG advertised by udev with the device.

Perhaps there is another way to compute that tag though.

Axel Davy
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-08 Thread Emil Velikov

On 8 July 2015 at 17:47, Matt Turner  wrote:
> On Wed, Jul 8, 2015 at 3:00 AM, Emil Velikov  wrote:
>> On 08/07/15 07:33, Matt Turner wrote:
>>> Cc: "10.6" 
>> Thanks for that I've completely forgot about this.
>>
>> Seems like we never got to using mesa_sha. Can we just remove it for now
>> and revive it as a user comes along ?
>
> I don't have a problem with that. I shouldn't have let in what's been 
> committed.
>
> I guess it mostly depends on how soon we expect to finish the shader-cache.
I've applied your original "revert it all" from 10.5 for the 10.6
branch and barring any regression I'll push it out tonight.

Whereas for master it'll depend on the input we get from Ken and others :-)

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] gallium: clarify reference counting for fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

No where was it spelled out that the state tracker may expect the pipe
driver to unref the old fence.

Signed-off-by: Rob Clark 
---
 src/gallium/include/pipe/p_context.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_context.h 
b/src/gallium/include/pipe/p_context.h
index c2eedf8..d2c2e4c 100644
--- a/src/gallium/include/pipe/p_context.h
+++ b/src/gallium/include/pipe/p_context.h
@@ -361,8 +361,14 @@ struct pipe_context {
 const void *clear_value,
 int clear_value_size);
 
-   /** Flush draw commands
+   /**
+* Flush draw commands
+*
+* NOTE: use screen->fence_reference() (or equivalent) to transfer
+* new fence ref to **fence, to ensure that previous fence is unref'd
 *
+* \param fence  if not NULL, an old fence to unref and transfer a
+*new fence reference to
 * \param flags  bitfield of enum pipe_flush_flags values.
 */
void (*flush)(struct pipe_context *pipe,
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] vc4: unref old fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush().  So driver
should use fence_ref() which will unref the old fence if not NULL.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/vc4/vc4_context.c | 3 ++-
 src/gallium/drivers/vc4/vc4_fence.c   | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/vc4/vc4_context.c 
b/src/gallium/drivers/vc4/vc4_context.c
index 630f8e6..7de4fad 100644
--- a/src/gallium/drivers/vc4/vc4_context.c
+++ b/src/gallium/drivers/vc4/vc4_context.c
@@ -103,9 +103,10 @@ vc4_pipe_flush(struct pipe_context *pctx, struct 
pipe_fence_handle **fence,
 vc4_flush(pctx);
 
 if (fence) {
+struct pipe_screen *screen = pctx->screen;
 struct vc4_fence *f = vc4_fence_create(vc4->screen,
vc4->last_emit_seqno);
-*fence = (struct pipe_fence_handle *)f;
+screen->fence_reference(screen, fence, (struct 
pipe_fence_handle *)f);
 }
 }
 
diff --git a/src/gallium/drivers/vc4/vc4_fence.c 
b/src/gallium/drivers/vc4/vc4_fence.c
index f2ee91d..6055ffe 100644
--- a/src/gallium/drivers/vc4/vc4_fence.c
+++ b/src/gallium/drivers/vc4/vc4_fence.c
@@ -88,7 +88,7 @@ vc4_fence_create(struct vc4_screen *screen, uint64_t seqno)
 if (!f)
 return NULL;
 
-pipe_reference_init(&f->reference, 1);
+pipe_reference_init(&f->reference, 0);
 f->seqno = seqno;
 
 return f;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/4] fence refcnting fixes

2015-07-08 Thread Rob Clark

From: Rob Clark 

This isn't at all clear for pipe driver writers currently, since it
is not documented anywhere.  But radeon/nouveau/llvmpipe seem to drop
the ref on the **fence passed in to pipe->flush() (if *fence!=NULL).
Freedreno/ilo/vc4 where not doing this.  Some state trackers do call
screen->fence_reference(screen, &fence, NULL) before pipe->flush(),
but others do not.

Add a comment for pipe->flush() to clairify what is expected of the
driver, and fixup freedreno/ilo/vc4 to comply.

Note: that ilo/vc4 patches are untested

Rob Clark (4):
  gallium: clarify reference counting for fence
  freedreno: unref old fence
  ilo: unref old fence
  vc4: unref old fence

 src/gallium/drivers/freedreno/freedreno_context.c | 2 +-
 src/gallium/drivers/freedreno/freedreno_fence.c   | 2 +-
 src/gallium/drivers/ilo/ilo_context.c | 5 -
 src/gallium/drivers/ilo/ilo_screen.c  | 2 +-
 src/gallium/drivers/vc4/vc4_context.c | 3 ++-
 src/gallium/drivers/vc4/vc4_fence.c   | 2 +-
 src/gallium/include/pipe/p_context.h  | 8 +++-
 7 files changed, 17 insertions(+), 7 deletions(-)

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] freedreno: unref old fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush().  So driver
should use fence_ref() which will unref the old fence if not NULL.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/freedreno/freedreno_context.c | 2 +-
 src/gallium/drivers/freedreno/freedreno_fence.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_context.c 
b/src/gallium/drivers/freedreno/freedreno_context.c
index 668ef36..42f79d3 100644
--- a/src/gallium/drivers/freedreno/freedreno_context.c
+++ b/src/gallium/drivers/freedreno/freedreno_context.c
@@ -145,7 +145,7 @@ fd_context_flush(struct pipe_context *pctx, struct 
pipe_fence_handle **fence,
fd_context_render(pctx);
 
if (fence)
-   *fence = fd_fence_create(pctx);
+   fd_screen_fence_ref(pctx->screen, fence, fd_fence_create(pctx));
 }
 
 void
diff --git a/src/gallium/drivers/freedreno/freedreno_fence.c 
b/src/gallium/drivers/freedreno/freedreno_fence.c
index 375e58f..c30c3f9 100644
--- a/src/gallium/drivers/freedreno/freedreno_fence.c
+++ b/src/gallium/drivers/freedreno/freedreno_fence.c
@@ -84,7 +84,7 @@ struct pipe_fence_handle * fd_fence_create(struct 
pipe_context *pctx)
if (!fence)
return NULL;
 
-   pipe_reference_init(&fence->reference, 1);
+   pipe_reference_init(&fence->reference, 0);
 
fence->ctx = ctx;
fence->screen = ctx->screen;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] ilo: unref old fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush().  So driver
should use fence_ref() which will unref the old fence if not NULL.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/ilo/ilo_context.c | 5 -
 src/gallium/drivers/ilo/ilo_screen.c  | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/ilo/ilo_context.c 
b/src/gallium/drivers/ilo/ilo_context.c
index 3d5c7b6..8d17816 100644
--- a/src/gallium/drivers/ilo/ilo_context.c
+++ b/src/gallium/drivers/ilo/ilo_context.c
@@ -62,7 +62,10 @@ ilo_flush(struct pipe_context *pipe,
  (flags & PIPE_FLUSH_END_OF_FRAME) ? "frame end" : "user request");
 
if (f) {
-  *f = ilo_screen_fence_create(pipe->screen, ilo->cp->last_submitted_bo);
+  struct pipe_screen *screen = pipe->screen;
+  struct pipe_fence_handle *fence;
+  fence = ilo_screen_fence_create(pipe->screen, 
ilo->cp->last_submitted_bo);
+  screen->fence_reference(screen, f, fence);
}
 }
 
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index faebb92..a6abd84 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -693,7 +693,7 @@ ilo_screen_fence_create(struct pipe_screen *screen, struct 
intel_bo *bo)
if (!fence)
   return NULL;
 
-   pipe_reference_init(&fence->reference, 1);
+   pipe_reference_init(&fence->reference, 0);
 
fence->seqno_bo = intel_bo_ref(bo);
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Matt Turner

By keeping a pointer to the next available location, we reduce the
number of memory accesses needed to write to the batchbuffer.

A net ~7k reduction of .text size, 7.5k of which is from the change to
intel_batchbuffer_emit_dword().

   text data  bss  dec  hex  filename
4943740   19515226192  5165084   4ed01c  i965_dri.so before
4936804   19515226192  5158148   4eb504  i965_dri.so after

Combined with the previous patch, improves performance of Synmark
OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
---
Full disclosure: when testing on an IVB desktop, I measured a
regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
I don't have any explanation.

Testing on other benchmarks and especially on Atom systems would
be very welcome.

 src/mesa/drivers/dri/i965/brw_blorp.cpp|  4 +--
 src/mesa/drivers/dri/i965/brw_context.h|  5 ++--
 .../drivers/dri/i965/brw_performance_monitor.c |  6 ++--
 src/mesa/drivers/dri/i965/brw_state_batch.c|  4 +--
 src/mesa/drivers/dri/i965/brw_urb.c|  6 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.c  | 35 +++---
 src/mesa/drivers/dri/i965/intel_batchbuffer.h  | 11 ---
 7 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp.cpp
index 2ccfae1..eac1f00 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
@@ -226,7 +226,7 @@ retry:
intel_batchbuffer_require_space(brw, estimated_max_batch_usage, 
RENDER_RING);
intel_batchbuffer_save_state(brw);
drm_intel_bo *saved_bo = brw->batch.bo;
-   uint32_t saved_used = brw->batch.used;
+   uint32_t saved_used = USED_BATCH(brw->batch);
uint32_t saved_state_batch_offset = brw->batch.state_batch_offset;
 
switch (brw->gen) {
@@ -245,7 +245,7 @@ retry:
 * reserved enough space that a wrap will never happen.
 */
assert(brw->batch.bo == saved_bo);
-   assert((brw->batch.used - saved_used) * 4 +
+   assert((USED_BATCH(brw->batch) - saved_used) * 4 +
   (saved_state_batch_offset - brw->batch.state_batch_offset) <
   estimated_max_batch_usage);
/* Shut up compiler warnings on release build */
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index afb714b..c7ad35e 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -875,7 +875,8 @@ struct intel_batchbuffer {
 #ifdef DEBUG
uint16_t emit, total;
 #endif
-   uint16_t used, reserved_space;
+   uint16_t reserved_space;
+   uint32_t *map_next;
uint32_t *map;
uint32_t *cpu_map;
 #define BATCH_SZ (8192*sizeof(uint32_t))
@@ -887,7 +888,7 @@ struct intel_batchbuffer {
uint8_t pipe_controls_since_last_cs_stall;
 
struct {
-  uint16_t used;
+  uint32_t *map_next;
   int reloc_count;
} saved;
 };
diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c 
b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
index 0a12375..7e90e8a 100644
--- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c
@@ -710,7 +710,7 @@ emit_mi_report_perf_count(struct brw_context *brw,
/* Make sure the commands to take a snapshot fits in a single batch. */
intel_batchbuffer_require_space(brw, MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4,
RENDER_RING);
-   int batch_used = brw->batch.used;
+   int batch_used = USED_BATCH(brw->batch);
 
/* Reports apparently don't always get written unless we flush first. */
brw_emit_mi_flush(brw);
@@ -754,7 +754,7 @@ emit_mi_report_perf_count(struct brw_context *brw,
brw_emit_mi_flush(brw);
 
(void) batch_used;
-   assert(brw->batch.used - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 
4);
+   assert(USED_BATCH(brw->batch) - batch_used <= 
MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4);
 }
 
 /**
@@ -1386,7 +1386,7 @@ void
 brw_perf_monitor_new_batch(struct brw_context *brw)
 {
assert(brw->batch.ring == RENDER_RING);
-   assert(brw->gen < 6 || brw->batch.used == 0);
+   assert(brw->gen < 6 || USED_BATCH(brw->batch) == 0);
 
if (brw->perfmon.oa_users == 0)
   return;
diff --git a/src/mesa/drivers/dri/i965/brw_state_batch.c 
b/src/mesa/drivers/dri/i965/brw_state_batch.c
index a405a80..d79e0ea 100644
--- a/src/mesa/drivers/dri/i965/brw_state_batch.c
+++ b/src/mesa/drivers/dri/i965/brw_state_batch.c
@@ -87,7 +87,7 @@ brw_annotate_aub(struct brw_context *brw)
drm_intel_aub_annotation annotations[annotation_count];
int a = 0;
make_annotation(&annotations[a++], AUB_TRACE_TYPE_BATCH, 0,
-   4*brw->batch.used);
+   4 * USED_BATCH(brw->batch));
for (int i = brw->state_batch_count; i-- > 0; ) {
   uint32_t type = brw->state_batch_list[i].type;
   uint32_t start_offset = brw->state_batch_list[i].offset;

[Mesa-dev] [PATCH 1/2] i965: Set brw->batch.emit only #ifdef DEBUG.

2015-07-08 Thread Matt Turner

It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more
importantly prevents a larger code size regression in the next commit
when the .used field is replaced and calculated on demand.

   text data  bss  dec  hex  filename
4945468   19515226192  5166812   4ed6dc  i965_dri.so before
4943740   19515226192  5165084   4ed01c  i965_dri.so after

And surround the emit and total fields with #ifdef DEBUG to prevent
such mistakes from happening again.
---
 src/mesa/drivers/dri/i965/brw_context.h   | 2 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7596139..afb714b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -872,7 +872,9 @@ struct intel_batchbuffer {
/** BO for post-sync nonzero writes for gen6 workaround. */
drm_intel_bo *workaround_bo;
 
+#ifdef DEBUG
uint16_t emit, total;
+#endif
uint16_t used, reserved_space;
uint32_t *map;
uint32_t *cpu_map;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index fdd07e0..f0971e9 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -134,8 +134,8 @@ intel_batchbuffer_begin(struct brw_context *brw, int n, 
enum brw_gpu_ring ring)
 {
intel_batchbuffer_require_space(brw, n * 4, ring);
 
-   brw->batch.emit = brw->batch.used;
 #ifdef DEBUG
+   brw->batch.emit = brw->batch.used;
brw->batch.total = n;
 #endif
 }
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Chris Wilson

On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote:
> By keeping a pointer to the next available location, we reduce the
> number of memory accesses needed to write to the batchbuffer.
> 
> A net ~7k reduction of .text size, 7.5k of which is from the change to
> intel_batchbuffer_emit_dword().
> 
>text data  bss  dec  hex  filename
> 4943740   19515226192  5165084   4ed01c  i965_dri.so before
> 4936804   19515226192  5158148   4eb504  i965_dri.so after
> 
> Combined with the previous patch, improves performance of Synmark
> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
> ---
> Full disclosure: when testing on an IVB desktop, I measured a
> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
> I don't have any explanation.

The problem is that it seems to generate worse code with multiple
adjacent emit_dwords. I have seen similar regressions when doing the
same batch[index] to *batch++ elsewhere.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy

2015-07-08 Thread Ben Widawsky

WARNING: No perf data, please keep reading though)

This implements the suggestion provided by the paper, "Fast USWC to WB Memory
Copy"
(https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers).
This is described throughout the paper, but the sample code lives in Figure 3-3.
That paper purports a roughly 40% performance gain in Mbyte/second over the
original implementation done by Matt.

Section 3.1.2 is the summary of why an intermediate cache buffer is used. It
claims that if you use the naive implementation, fill buffers are contended for.
To be honest, I can't quite fathom the underlying explanation, but I'll think
about it some more. Most importantly would be to get the perf data... This patch
does need performance data. I don't currently have a platform that this would
benefit (BYT or BSW), so I can't get anything useful. As soon as I get a
platform to test it on, I will - meanwhile, maybe whomever tested the original
patch the first time around come run this through?

Cc: Matt Turner 
Cc: Chad Versace 
Cc: Kristian Høgsberg 
Signed-off-by: Ben Widawsky 
---
 src/mesa/main/streaming-load-memcpy.c | 61 +++
 1 file changed, 47 insertions(+), 14 deletions(-)

diff --git a/src/mesa/main/streaming-load-memcpy.c 
b/src/mesa/main/streaming-load-memcpy.c
index d7147af..3cd310a 100644
--- a/src/mesa/main/streaming-load-memcpy.c
+++ b/src/mesa/main/streaming-load-memcpy.c
@@ -30,6 +30,8 @@
 #include "main/streaming-load-memcpy.h"
 #include 
 
+static uint8_t rsvd_space[4096];
+
 /* Copies memory from src to dst, using SSE 4.1's MOVNTDQA to get streaming
  * read performance from uncached memory.
  */
@@ -59,23 +61,54 @@ _mesa_streaming_load_memcpy(void *restrict dst, void 
*restrict src, size_t len)
   len -= MIN2(bytes_before_alignment_boundary, len);
}
 
-   while (len >= 64) {
-  __m128i *dst_cacheline = (__m128i *)d;
-  __m128i *src_cacheline = (__m128i *)s;
+   while (len > 64) {
+  __m128i *cached_buffer = (__m128i *)rsvd_space;
+  size_t streaming_len = len > 4096 ? 4096 : len;
+
+  __asm__ volatile("mfence" ::: "memory");
+
+  while (streaming_len >= 64) {
+ __m128i *src_cacheline = (__m128i *)s;
+
+ __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
+ __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
+ __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
+ __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
+
+ _mm_store_si128(cached_buffer + 0, temp1);
+ _mm_store_si128(cached_buffer + 1, temp2);
+ _mm_store_si128(cached_buffer + 2, temp3);
+ _mm_store_si128(cached_buffer + 3, temp4);
+
+ s += 64;
+ streaming_len -= 64;
+ cached_buffer += 4;
+  }
+
+  cached_buffer = (__m128i *)rsvd_space;
+  streaming_len = len > 4096 ? 4096 : len;
+
+  __asm__ volatile("mfence" ::: "memory");
+
+  while (streaming_len >= 64) {
+ __m128i *dst_cacheline = (__m128i *)d;
+
+ __m128i temp1 = _mm_stream_load_si128(cached_buffer + 0);
+ __m128i temp2 = _mm_stream_load_si128(cached_buffer + 1);
+ __m128i temp3 = _mm_stream_load_si128(cached_buffer + 2);
+ __m128i temp4 = _mm_stream_load_si128(cached_buffer + 3);
 
-  __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
-  __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
-  __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
-  __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
+ _mm_store_si128(dst_cacheline + 0, temp1);
+ _mm_store_si128(dst_cacheline + 1, temp2);
+ _mm_store_si128(dst_cacheline + 2, temp3);
+ _mm_store_si128(dst_cacheline + 3, temp4);
 
-  _mm_store_si128(dst_cacheline + 0, temp1);
-  _mm_store_si128(dst_cacheline + 1, temp2);
-  _mm_store_si128(dst_cacheline + 2, temp3);
-  _mm_store_si128(dst_cacheline + 3, temp4);
+ d += 64;
+ streaming_len -= 64;
+ cached_buffer += 4;
 
-  d += 64;
-  s += 64;
-  len -= 64;
+ len -= 64;
+  }
}
 
/* memcpy() the tail. */
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/4] fence refcnting fixes

2015-07-08 Thread Rob Clark

Bleh, assert(pipe_is_referenced(ptr)) in pipe_reference_described() is
non-pleased in debug builds about the trick of starting out the fences
w/ refcnt=0 (so that it doesn't end up getting upref'd to 2 in
pipe->flush().. so I'll have to re-work the fd/ilo/vc4 patches
slightly..

BR,
-R

On Wed, Jul 8, 2015 at 4:34 PM, Rob Clark  wrote:
> From: Rob Clark 
>
> This isn't at all clear for pipe driver writers currently, since it
> is not documented anywhere.  But radeon/nouveau/llvmpipe seem to drop
> the ref on the **fence passed in to pipe->flush() (if *fence!=NULL).
> Freedreno/ilo/vc4 where not doing this.  Some state trackers do call
> screen->fence_reference(screen, &fence, NULL) before pipe->flush(),
> but others do not.
>
> Add a comment for pipe->flush() to clairify what is expected of the
> driver, and fixup freedreno/ilo/vc4 to comply.
>
> Note: that ilo/vc4 patches are untested
>
> Rob Clark (4):
>   gallium: clarify reference counting for fence
>   freedreno: unref old fence
>   ilo: unref old fence
>   vc4: unref old fence
>
>  src/gallium/drivers/freedreno/freedreno_context.c | 2 +-
>  src/gallium/drivers/freedreno/freedreno_fence.c   | 2 +-
>  src/gallium/drivers/ilo/ilo_context.c | 5 -
>  src/gallium/drivers/ilo/ilo_screen.c  | 2 +-
>  src/gallium/drivers/vc4/vc4_context.c | 3 ++-
>  src/gallium/drivers/vc4/vc4_fence.c   | 2 +-
>  src/gallium/include/pipe/p_context.h  | 8 +++-
>  7 files changed, 17 insertions(+), 7 deletions(-)
>
> --
> 2.4.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2] i965/gen9: Use custom MOCS entries set up by the kernel.

2015-07-08 Thread Ben Widawsky

On Tue, Jul 07, 2015 at 10:21:28PM +0300, Francisco Jerez wrote:
> Instead of relying on hardware defaults the i915 kernel driver is
> going program custom MOCS tables system-wide on Gen9 hardware.  The
> "WT" entry previously used for renderbuffers had a number of problems:
> It disabled caching on eLLC, it used a reserved L3 cacheability
> setting, and it used to override the PTE controls making renderbuffers
> always WT on LLC regardless of the kernel's setting.  Instead use an
> entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
> L3CC=WB.
> 
> The "WB" entry previously used for anything other than renderbuffers
> has moved to a different index in the new MOCS tables but it should
> have the same caching semantics as the old entry.
> 
> Even though the corresponding kernel change ("drm/i915: Added
> Programming of the MOCS") is in a way an ABI break it doesn't seem
> necessary to check that the kernel is recent enough because the change
> should only affect Gen9 which is still unreleased hardware.
> 
> v2: Update MOCS values for the new Android-incompatible tables
> introduced in v7 of the kernel patch.
> 
> Cc: 10.6 

It'd be cool to get perf data, but certainly not a requirement here since the
requirement to change is pretty obvious, IMO (mostly, I'm just curious). I do
like having the References: in the commit for the kernel patch, but that's just
me, and I can live with whatever.

> ---
>  src/mesa/drivers/dri/i965/brw_defines.h| 11 ++-
>  src/mesa/drivers/dri/i965/gen8_surface_state.c |  3 +--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 66b9abc..8ab8d62 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2491,12 +2491,13 @@ enum brw_wm_barycentric_interp_mode {
>  #define BDW_MOCS_WT  0x58
>  #define BDW_MOCS_PTE 0x18
>  
> -/* Skylake: MOCS is now an index into an array of 64 different configurable
> - * cache settings.  We still use only either write-back or write-through; and
> - * rely on the documented default values.
> +/* Skylake: MOCS is now an index into an array of 62 different caching
> + * configurations programmed by the kernel.

I'd keep the '64' instead of '62' the latter is a software construct, but
whatever you like.

>   */
> -#define SKL_MOCS_WB (0b001001 << 1)
> -#define SKL_MOCS_WT (0b000101 << 1)
> +/* TC=LLC/eLLC, LeCC=WB, LRUM=3, L3CC=WB */
> +#define SKL_MOCS_WB  (2 << 1)
> +/* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */
> +#define SKL_MOCS_PTE (1 << 1)
>  
>  #define MEDIA_VFE_STATE 0x7000
>  /* GEN7 DW2, GEN8+ DW3 */
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index bd3eb00..dfaf762 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -401,8 +401,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
>irb->mt_layer : (irb->mt_layer / MAX2(mt->num_samples, 1));
> GLenum gl_target =
>rb->TexImage ? rb->TexImage->TexObject->Target : GL_TEXTURE_2D;
> -   /* FINISHME: Use PTE MOCS on Skylake. */
> -   uint32_t mocs = brw->gen >= 9 ? SKL_MOCS_WT : BDW_MOCS_PTE;
> +   uint32_t mocs = brw->gen >= 9 ? SKL_MOCS_PTE : BDW_MOCS_PTE;

I don't know the policy on const really, but this is a good opportunity to
const.
>  
> intel_miptree_used_for_rendering(mt);
>  

Reviewed-by: Ben Widawsky 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a

2015-07-08 Thread Ben Widawsky

On Thu, Jul 02, 2015 at 12:58:33PM -0700, Matt Turner wrote:
> On Thu, Jul 2, 2015 at 12:57 PM, Matt Turner  wrote:
> > On Wed, Jul 1, 2015 at 4:03 PM, Ben Widawsky
> >  wrote:
> >> Certain compressed formats require this setting. The docs don't go into 
> >> much
> >> detail as to why it's needed exactly.
> >>
> >> This fixes 0 piglit failures with a GBM gpu piglit run.
> >
> > That's a really weird way of saying that.
> >
> >>
> >> Signed-off-by: Ben Widawsky 
> >> ---
> >>
> >> I had this one sitting around for almost 2 months. I'm not sure why I 
> >> didn't
> >> send it out sooner. It seems like it's needed.
> >>
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_defines.h|  1 +
> >>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 26 
> >> ++
> >>  2 files changed, 27 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> >> b/src/mesa/drivers/dri/i965/brw_defines.h
> >> index 66b9abc..f55fd49 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> @@ -276,6 +276,7 @@
> >>  #define GEN8_SURFACE_TILING_W   (1 << 12)
> >>  #define GEN8_SURFACE_TILING_X   (2 << 12)
> >>  #define GEN8_SURFACE_TILING_Y   (3 << 12)
> >> +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
> >>  #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
> >>  #define BRW_SURFACE_MIPLAYOUT_SHIFT10
> >>  #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
> >> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> >> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> index ca5ed17..a245379 100644
> >> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> @@ -264,6 +264,19 @@ gen8_emit_texture_surface_state(struct brw_context 
> >> *brw,
> >>surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
> >> }
> >>
> >> +   if (brw->is_cherryview || brw->gen >= 9) {
> >> +  /* "This bit must be set for the following surface types: BC2_UNORM
> >> +   * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM"
> >
> > Don't do naked quotes -- Use the normal style.
> >
> >> +   */
> >> +  switch (format) {
> >> +  case BRW_SURFACEFORMAT_BC2_UNORM:
> >> +  case BRW_SURFACEFORMAT_BC3_UNORM:
> >> +  case BRW_SURFACEFORMAT_BC5_SNORM:
> >
> > Missing BRW_SURFACEFORMAT_BC5_UNORM.
> >
> >> +  case BRW_SURFACEFORMAT_BC7_UNORM:
> >> + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
> >
> > It wouldn't surprise me if static analysis tools complain about the
> > missing break.
> >
> > Add a break or make it an if statement (which would be two lines
> > shorter). All together, how about
> >
> >/* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
> > * bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
> > *
> > *This bit must be set for the following surface types: BC2_UNORM
> > *BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
> > */
> >if (format == BRW_SURFACEFORMAT_BC2_UNORM ||
> >format == BRW_SURFACEFORMAT_BC3_UNORM ||
> >format == BRW_SURFACEFORMAT_BC5_SNORM ||
> >format == BRW_SURFACEFORMAT_BC5_UNORM ||
> 
> Bah, would be nice to so BC5_UNORM and then BC5_SNORM to better match
> the comment.
> 

Thanks for catching the missing case.

> >format == BRW_SURFACEFORMAT_BC7_UNORM)
> >   surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
> >}
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson  wrote:
> On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote:
>> By keeping a pointer to the next available location, we reduce the
>> number of memory accesses needed to write to the batchbuffer.
>>
>> A net ~7k reduction of .text size, 7.5k of which is from the change to
>> intel_batchbuffer_emit_dword().
>>
>>text data  bss  dec  hex  filename
>> 4943740   19515226192  5165084   4ed01c  i965_dri.so before
>> 4936804   19515226192  5158148   4eb504  i965_dri.so after
>>
>> Combined with the previous patch, improves performance of Synmark
>> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
>> ---
>> Full disclosure: when testing on an IVB desktop, I measured a
>> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
>> I don't have any explanation.
>
> The problem is that it seems to generate worse code with multiple
> adjacent emit_dwords. I have seen similar regressions when doing the
> same batch[index] to *batch++ elsewhere.
> -Chris

That is in conflict with the data I've provided. In fact, I started by
noticing that if I added intel_batchbuffer_emit_dword* functions that
took multiple dwords that there was a reduction in .text size, so it's
something that I've considered.

The two attached patches demonstrate that the batch[index++] pattern
generates larger (worse) code than *batch++. 1.patch

   text   databssdechex filename
4936804 195152  26192 5158148 4eb504 mesa-release/lib/i965_dri.so
after 1.patch (no change)
4936884 195152  26192 5158228 4eb554 mesa-release/lib/i965_dri.so after 2.patch
From 03256a2898bf74b56a35f8fcc130f8fee5902b59 Mon Sep 17 00:00:00 2001
From: Matt Turner 
Date: Wed, 8 Jul 2015 15:11:07 -0700
Subject: [PATCH 1/2] 1

---
 src/mesa/drivers/dri/i965/gen7_blorp.cpp  | 17 +
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 24 
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index abace6d..74e0763 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -520,14 +520,15 @@ gen7_blorp_emit_ps_config(struct brw_context *brw,
}
 
BEGIN_BATCH(8);
-   OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
-   OUT_BATCH(params->use_wm_prog ? prog_offset : 0);
-   OUT_BATCH(dw2);
-   OUT_BATCH(0);
-   OUT_BATCH(dw4);
-   OUT_BATCH(dw5);
-   OUT_BATCH(0);
-   OUT_BATCH(0);
+   intel_batchbuffer_emit_dword8(brw,
+ _3DSTATE_PS << 16 | (8 - 2),
+ params->use_wm_prog ? prog_offset : 0,
+ dw2,
+ 0,
+ dw4,
+ dw5,
+ 0,
+ 0);
ADVANCE_BATCH();
 }
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index a12387b..a5d8f0b 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -101,6 +101,30 @@ intel_batchbuffer_emit_dword(struct brw_context *brw, GLuint dword)
 }
 
 static inline void
+intel_batchbuffer_emit_dword8(struct brw_context *brw, GLuint dword1,
+   GLuint dword2,
+   GLuint dword3,
+   GLuint dword4,
+   GLuint dword5,
+   GLuint dword6,
+   GLuint dword7,
+   GLuint dword8)
+{
+#ifdef DEBUG
+   assert(intel_batchbuffer_space(brw) >= 4);
+#endif
+   *brw->batch.map_next++ = dword1;
+   *brw->batch.map_next++ = dword2;
+   *brw->batch.map_next++ = dword3;
+   *brw->batch.map_next++ = dword4;
+   *brw->batch.map_next++ = dword5;
+   *brw->batch.map_next++ = dword6;
+   *brw->batch.map_next++ = dword7;
+   *brw->batch.map_next++ = dword8;
+   assert(brw->batch.ring != UNKNOWN_RING);
+}
+
+static inline void
 intel_batchbuffer_emit_float(struct brw_context *brw, float f)
 {
intel_batchbuffer_emit_dword(brw, float_as_int(f));
-- 
2.3.6

From 08c78f95dfbaa39303d25c3348ba1fa99b04b31b Mon Sep 17 00:00:00 2001
From: Matt Turner 
Date: Wed, 8 Jul 2015 15:30:48 -0700
Subject: [PATCH 2/2] 2

---
 src/mesa/drivers/dri/i965/brw_context.h   |  2 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index c7ad35e..c2f6658 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -

Re: [Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 2:07 PM, Ben Widawsky
 wrote:
> WARNING: No perf data, please keep reading though)
>
> This implements the suggestion provided by the paper, "Fast USWC to WB Memory
> Copy"
> (https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers).
> This is described throughout the paper, but the sample code lives in Figure 
> 3-3.
> That paper purports a roughly 40% performance gain in Mbyte/second over the
> original implementation done by Matt.

Sounds interesting!

> Section 3.1.2 is the summary of why an intermediate cache buffer is used. It
> claims that if you use the naive implementation, fill buffers are contended 
> for.
> To be honest, I can't quite fathom the underlying explanation, but I'll think
> about it some more. Most importantly would be to get the perf data... This 
> patch
> does need performance data. I don't currently have a platform that this would
> benefit (BYT or BSW), so I can't get anything useful. As soon as I get a
> platform to test it on, I will - meanwhile, maybe whomever tested the original
> patch the first time around come run this through?
>
> Cc: Matt Turner 
> Cc: Chad Versace 
> Cc: Kristian Høgsberg 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/main/streaming-load-memcpy.c | 61 
> +++
>  1 file changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/src/mesa/main/streaming-load-memcpy.c 
> b/src/mesa/main/streaming-load-memcpy.c
> index d7147af..3cd310a 100644
> --- a/src/mesa/main/streaming-load-memcpy.c
> +++ b/src/mesa/main/streaming-load-memcpy.c
> @@ -30,6 +30,8 @@
>  #include "main/streaming-load-memcpy.h"
>  #include 
>
> +static uint8_t rsvd_space[4096];

This being static makes this thread-unsafe.

Also, I don't know if it's guaranteed to be 16-byte aligned.
__attribute__((__aligned__(16))) would avoid any problems, but I think
you should do 64-byte alignment to ensure the block is cache-line
aligned. Oh, in fact the linked article says as much: "This buffer
should also be 64 byte (cache line) aligned."

> +
>  /* Copies memory from src to dst, using SSE 4.1's MOVNTDQA to get streaming
>   * read performance from uncached memory.
>   */
> @@ -59,23 +61,54 @@ _mesa_streaming_load_memcpy(void *restrict dst, void 
> *restrict src, size_t len)
>len -= MIN2(bytes_before_alignment_boundary, len);
> }
>
> -   while (len >= 64) {
> -  __m128i *dst_cacheline = (__m128i *)d;
> -  __m128i *src_cacheline = (__m128i *)s;
> +   while (len > 64) {

The article suggests we align the source to a 64-byte boundary as
well. We should do some performance testing around that.

It also suggests copying data to a 64-byte aligned destination as
well, which we could ensure by making the coalignment check above more
strict, though at the cost of skipping this fast path for some other
cases.

> +  __m128i *cached_buffer = (__m128i *)rsvd_space;
> +  size_t streaming_len = len > 4096 ? 4096 : len;

MIN2(len, 4096)

> +
> +  __asm__ volatile("mfence" ::: "memory");

Use _mm_mfence() like the sample code does.

> +
> +  while (streaming_len >= 64) {
> + __m128i *src_cacheline = (__m128i *)s;
> +
> + __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
> + __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
> + __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
> + __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
> +
> + _mm_store_si128(cached_buffer + 0, temp1);
> + _mm_store_si128(cached_buffer + 1, temp2);
> + _mm_store_si128(cached_buffer + 2, temp3);
> + _mm_store_si128(cached_buffer + 3, temp4);
> +
> + s += 64;
> + streaming_len -= 64;
> + cached_buffer += 4;
> +  }
> +
> +  cached_buffer = (__m128i *)rsvd_space;
> +  streaming_len = len > 4096 ? 4096 : len;

MIN2(len, 4096)

> +
> +  __asm__ volatile("mfence" ::: "memory");

_mm_mfence()

> +
> +  while (streaming_len >= 64) {
> + __m128i *dst_cacheline = (__m128i *)d;
> +
> + __m128i temp1 = _mm_stream_load_si128(cached_buffer + 0);
> + __m128i temp2 = _mm_stream_load_si128(cached_buffer + 1);
> + __m128i temp3 = _mm_stream_load_si128(cached_buffer + 2);
> + __m128i temp4 = _mm_stream_load_si128(cached_buffer + 3);

_mm_load_si128()

>
> -  __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
> -  __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
> -  __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
> -  __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
> + _mm_store_si128(dst_cacheline + 0, temp1);
> + _mm_store_si128(dst_cacheline + 1, temp2);
> + _mm_store_si128(dst_cacheline + 2, temp3);
> + _mm_store_si128(dst_cacheline + 3, temp4);

_mm_stream_si128()

>
> -  _mm_store_si128(dst_cacheline + 0, temp1);
> -  _mm_store_si128(dst_cacheline + 1, te

Re: [Mesa-dev] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.

2015-07-08 Thread Kenneth Graunke

On Tuesday, July 07, 2015 11:33:57 PM Matt Turner wrote:
> Cc: "10.6" 
> ---
>  src/util/Makefile.am | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/util/Makefile.am b/src/util/Makefile.am
> index 2e7542e..1e087b4 100644
> --- a/src/util/Makefile.am
> +++ b/src/util/Makefile.am
> @@ -46,9 +46,9 @@ libmesautil_la_SOURCES = \
>  
>  if ENABLE_SHADER_CACHE
>  libmesautil_la_SOURCES += $(MESA_UTIL_SHADER_CACHE_FILES)
> -endif
>  
>  libmesautil_la_LIBADD = $(SHA1_LIBS)
> +endif
>  
>  roundeven_test_LDADD = -lm
>  
> 

Yeah...haven't thought through a plan yet.  This certainly seems
reasonable.  Reverting it all and pushing it back in later isn't crazy
either, but with this patch I'm not sure it's hurting anything...

Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Chris Wilson

On Wed, Jul 08, 2015 at 03:33:17PM -0700, Matt Turner wrote:
> On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson  wrote:
> > On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote:
> >> By keeping a pointer to the next available location, we reduce the
> >> number of memory accesses needed to write to the batchbuffer.
> >>
> >> A net ~7k reduction of .text size, 7.5k of which is from the change to
> >> intel_batchbuffer_emit_dword().
> >>
> >>text data  bss  dec  hex  filename
> >> 4943740   19515226192  5165084   4ed01c  i965_dri.so before
> >> 4936804   19515226192  5158148   4eb504  i965_dri.so after
> >>
> >> Combined with the previous patch, improves performance of Synmark
> >> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
> >> ---
> >> Full disclosure: when testing on an IVB desktop, I measured a
> >> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
> >> I don't have any explanation.
> >
> > The problem is that it seems to generate worse code with multiple
> > adjacent emit_dwords. I have seen similar regressions when doing the
> > same batch[index] to *batch++ elsewhere.
> > -Chris
> 
> That is in conflict with the data I've provided. In fact, I started by
> noticing that if I added intel_batchbuffer_emit_dword* functions that
> took multiple dwords that there was a reduction in .text size, so it's
> something that I've considered.

This is part of a disassembly of the index version of
gen6_viewport_state.c::upload_viewport_state_pointers()
(ivb with -march=native -Ofast)

   0x0287 <+103>:   lea0x1(%rax),%esi
   0x028a <+106>:   mov%si,0x22f24(%rdi)
   0x0291 <+113>:   mov%ecx,(%rdx,%rax,4)
   0x0294 <+116>:   movzwl 0x22f24(%rdi),%eax
   0x029b <+123>:   mov0x24330(%rdi),%ecx
   0x02a1 <+129>:   mov0x22f08(%rdi),%rdx
   0x02a8 <+136>:   lea0x1(%rax),%esi
   0x02ab <+139>:   mov%si,0x22f24(%rdi)
   0x02b2 <+146>:   mov%ecx,(%rdx,%rax,4)
   0x02b5 <+149>:   movzwl 0x22f24(%rdi),%eax
   0x02bc <+156>:   mov0x2480c(%rdi),%ecx
   0x02c2 <+162>:   mov0x22f08(%rdi),%rdx
   0x02c9 <+169>:   lea0x1(%rax),%esi
   0x02cc <+172>:   mov%si,0x22f24(%rdi)
   0x02d3 <+179>:   mov%ecx,(%rdx,%rax,4)
   0x02d6 <+182>:   pop%rbp
   0x02d7 <+183>:   retq   

and for comparison the pointer version:

   0x0280 <+96>:lea0x4(%rax),%rcx
   0x0284 <+100>:   mov%rcx,0x22f10(%rdi)
   0x028b <+107>:   mov%edx,(%rax)
   0x028d <+109>:   mov0x22f10(%rdi),%rax
   0x0294 <+116>:   mov0x24338(%rdi),%edx
   0x029a <+122>:   lea0x4(%rax),%rcx
   0x029e <+126>:   mov%rcx,0x22f10(%rdi)
   0x02a5 <+133>:   mov%edx,(%rax)
   0x02a7 <+135>:   mov0x22f10(%rdi),%rax
   0x02ae <+142>:   mov0x24814(%rdi),%edx
   0x02b4 <+148>:   lea0x4(%rax),%rcx
   0x02b8 <+152>:   mov%rcx,0x22f10(%rdi)
   0x02bf <+159>:   mov%edx,(%rax)
   0x02c1 <+161>:   pop%rbp
   0x02c2 <+162>:   retq 

So in neither case does gcc avoid incrementing either the index or the
pointer in the struct after emit_dword, and then reloads it for the
next.

This is what I expected to see
   0x025e <+62>:movl   $0x780d1c02,(%rcx)
   0x0264 <+68>:mov0x22f08(%rdi),%rax
   0x026b <+75>:mov0x24320(%rdi),%edx
   0x0271 <+81>:mov%edx,0x4(%rax)
   0x0274 <+84>:mov0x22f08(%rdi),%rax
   0x027b <+91>:mov0x24338(%rdi),%edx
   0x0281 <+97>:mov%edx,0x8(%rax)
   0x0284 <+100>:   mov0x22f08(%rdi),%rax
   0x028b <+107>:   mov0x24814(%rdi),%edx
   0x0291 <+113>:   mov%edx,0xc(%rax)
   0x0294 <+116>:   addq   $0x10,0x22f08(%rdi)
   0x029c <+124>:   pop%rbp
   0x029d <+125>:   retq   
with the pointer increments coalesced to the end. Generated by
opencoding emit_dwords as

static void upload_viewport_state_pointers(struct brw_context *brw)
{
   BEGIN_BATCH(4);
   brw->batch.map[0] = (_3DSTATE_VIEWPORT_STATE_POINTERS << 16 | (4 - 2) |
GEN6_CC_VIEWPORT_MODIFY |
GEN6_SF_VIEWPORT_MODIFY |
GEN6_CLIP_VIEWPORT_MODIFY);
   brw->batch.map[1] = (brw->clip.vp_offset);
   brw->batch.map[2] = (brw->sf.vp_offset);
   brw->batch.map[3] = (brw->cc.vp_offset);
   brw->batch.map += 4;
   ADVANCE_BATCH();
}
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.free

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Chris Wilson

On Thu, Jul 09, 2015 at 12:53:23AM +0100, Chris Wilson wrote:
> This is what I expected to see
>0x025e <+62>:  movl   $0x780d1c02,(%rcx)
>0x0264 <+68>:  mov0x22f08(%rdi),%rax
>0x026b <+75>:  mov0x24320(%rdi),%edx
>0x0271 <+81>:  mov%edx,0x4(%rax)
>0x0274 <+84>:  mov0x22f08(%rdi),%rax
>0x027b <+91>:  mov0x24338(%rdi),%edx
>0x0281 <+97>:  mov%edx,0x8(%rax)
>0x0284 <+100>: mov0x22f08(%rdi),%rax
>0x028b <+107>: mov0x24814(%rdi),%edx
>0x0291 <+113>: mov%edx,0xc(%rax)
>0x0294 <+116>: addq   $0x10,0x22f08(%rdi)
>0x029c <+124>: pop%rbp
>0x029d <+125>: retq   
> with the pointer increments coalesced to the end. Generated by
> opencoding emit_dwords as
> 
> static void upload_viewport_state_pointers(struct brw_context *brw)
> {
>BEGIN_BATCH(4);
>brw->batch.map[0] = (_3DSTATE_VIEWPORT_STATE_POINTERS << 16 | (4 - 2) |
> GEN6_CC_VIEWPORT_MODIFY |
> GEN6_SF_VIEWPORT_MODIFY |
> GEN6_CLIP_VIEWPORT_MODIFY);
>brw->batch.map[1] = (brw->clip.vp_offset);
>brw->batch.map[2] = (brw->sf.vp_offset);
>brw->batch.map[3] = (brw->cc.vp_offset);
>brw->batch.map += 4;
>ADVANCE_BATCH();
> }

But it is still reloading brw->batch.map everytime!
One manual local variable later,
   0x025e <+62>:movl   $0x780d1c02,(%rdx)
   0x0264 <+68>:mov0x24320(%rdi),%eax
   0x026a <+74>:mov%eax,0x4(%rdx)
   0x026d <+77>:mov0x24338(%rdi),%eax
   0x0273 <+83>:mov%eax,0x8(%rdx)
   0x0276 <+86>:mov0x24814(%rdi),%eax
   0x027c <+92>:mov%eax,0xc(%rdx)
   0x027f <+95>:addq   $0x10,0x22f08(%rdi)
   0x0287 <+103>:   pop%rbp
   0x0288 <+104>:   retq   
(I hope telling gcc to tune for atom does better.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] [v2] i965/chv|skl: Apply sampler bypass w/a

2015-07-08 Thread Ben Widawsky

Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/

v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
  surface state open coded because it seems to go better with the existing code.

Cc: Matt Turner 
Cc: Nanley Chery 
Cc: Jordan Justen  (aux-hiz needs this too)
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 29 ++
 2 files changed, 30 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 19489ab..3668967 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -276,6 +276,7 @@
 #define GEN8_SURFACE_TILING_W   (1 << 12)
 #define GEN8_SURFACE_TILING_X   (2 << 12)
 #define GEN8_SURFACE_TILING_Y   (3 << 12)
+#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
 #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
 #define BRW_SURFACE_MIPLAYOUT_SHIFT10
 #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index bd3eb00..9bbe8ae 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -132,6 +132,27 @@ horizontal_alignment(const struct brw_context *brw,
}
 }
 
+static bool
+sampler_l2_bypass_disable(struct brw_context *brw, unsigned format)
+{
+   /* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
+* bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
+*
+*This bit must be set for the following surface types: BC2_UNORM
+*BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
+*/
+   if ((brw->gen >= 9 || brw->is_cherryview) &&
+   (format == BRW_SURFACEFORMAT_BC2_UNORM ||
+format == BRW_SURFACEFORMAT_BC3_UNORM ||
+format == BRW_SURFACEFORMAT_BC5_UNORM ||
+format == BRW_SURFACEFORMAT_BC5_SNORM ||
+format == BRW_SURFACEFORMAT_BC7_UNORM)) {
+  return true;
+   }
+
+   return false;
+}
+
 static uint32_t *
 allocate_surface_state(struct brw_context *brw, uint32_t *out_offset, int 
index)
 {
@@ -238,6 +259,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
   surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
}
 
+   if (sampler_l2_bypass_disable(brw, format)) {
+  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
+   }
+
if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
   surf[0] |= GEN8_SURFACE_IS_ARRAY;
 
@@ -465,6 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
  horizontal_alignment(brw, mt, surf_type) |
  surface_tiling_mode(tiling);
 
+   if (sampler_l2_bypass_disable(brw, format)) {
+  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
+   }
+
surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
 
surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) |
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().

2015-07-08 Thread Matt Turner

On Wed, Jul 8, 2015 at 4:53 PM, Chris Wilson  wrote:
> On Wed, Jul 08, 2015 at 03:33:17PM -0700, Matt Turner wrote:
>> On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson  
>> wrote:
>> > On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote:
>> >> By keeping a pointer to the next available location, we reduce the
>> >> number of memory accesses needed to write to the batchbuffer.
>> >>
>> >> A net ~7k reduction of .text size, 7.5k of which is from the change to
>> >> intel_batchbuffer_emit_dword().
>> >>
>> >>text data  bss  dec  hex  filename
>> >> 4943740   19515226192  5165084   4ed01c  i965_dri.so before
>> >> 4936804   19515226192  5158148   4eb504  i965_dri.so after
>> >>
>> >> Combined with the previous patch, improves performance of Synmark
>> >> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell.
>> >> ---
>> >> Full disclosure: when testing on an IVB desktop, I measured a
>> >> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30).
>> >> I don't have any explanation.
>> >
>> > The problem is that it seems to generate worse code with multiple
>> > adjacent emit_dwords. I have seen similar regressions when doing the
>> > same batch[index] to *batch++ elsewhere.
>> > -Chris
>>
>> That is in conflict with the data I've provided. In fact, I started by
>> noticing that if I added intel_batchbuffer_emit_dword* functions that
>> took multiple dwords that there was a reduction in .text size, so it's
>> something that I've considered.
>
> This is part of a disassembly of the index version of
> gen6_viewport_state.c::upload_viewport_state_pointers()
> (ivb with -march=native -Ofast)
>
>0x0287 <+103>:   lea0x1(%rax),%esi
>0x028a <+106>:   mov%si,0x22f24(%rdi)
>0x0291 <+113>:   mov%ecx,(%rdx,%rax,4)
>0x0294 <+116>:   movzwl 0x22f24(%rdi),%eax
>0x029b <+123>:   mov0x24330(%rdi),%ecx
>0x02a1 <+129>:   mov0x22f08(%rdi),%rdx
>0x02a8 <+136>:   lea0x1(%rax),%esi
>0x02ab <+139>:   mov%si,0x22f24(%rdi)
>0x02b2 <+146>:   mov%ecx,(%rdx,%rax,4)
>0x02b5 <+149>:   movzwl 0x22f24(%rdi),%eax
>0x02bc <+156>:   mov0x2480c(%rdi),%ecx
>0x02c2 <+162>:   mov0x22f08(%rdi),%rdx
>0x02c9 <+169>:   lea0x1(%rax),%esi
>0x02cc <+172>:   mov%si,0x22f24(%rdi)
>0x02d3 <+179>:   mov%ecx,(%rdx,%rax,4)
>0x02d6 <+182>:   pop%rbp
>0x02d7 <+183>:   retq
>
> and for comparison the pointer version:
>
>0x0280 <+96>:lea0x4(%rax),%rcx
>0x0284 <+100>:   mov%rcx,0x22f10(%rdi)
>0x028b <+107>:   mov%edx,(%rax)
>0x028d <+109>:   mov0x22f10(%rdi),%rax
>0x0294 <+116>:   mov0x24338(%rdi),%edx
>0x029a <+122>:   lea0x4(%rax),%rcx
>0x029e <+126>:   mov%rcx,0x22f10(%rdi)
>0x02a5 <+133>:   mov%edx,(%rax)
>0x02a7 <+135>:   mov0x22f10(%rdi),%rax
>0x02ae <+142>:   mov0x24814(%rdi),%edx
>0x02b4 <+148>:   lea0x4(%rax),%rcx
>0x02b8 <+152>:   mov%rcx,0x22f10(%rdi)
>0x02bf <+159>:   mov%edx,(%rax)
>0x02c1 <+161>:   pop%rbp
>0x02c2 <+162>:   retq
>
> So in neither case does gcc avoid incrementing either the index or the
> pointer in the struct after emit_dword, and then reloads it for the
> next.
>
> This is what I expected to see
>0x025e <+62>:movl   $0x780d1c02,(%rcx)
>0x0264 <+68>:mov0x22f08(%rdi),%rax
>0x026b <+75>:mov0x24320(%rdi),%edx
>0x0271 <+81>:mov%edx,0x4(%rax)
>0x0274 <+84>:mov0x22f08(%rdi),%rax
>0x027b <+91>:mov0x24338(%rdi),%edx
>0x0281 <+97>:mov%edx,0x8(%rax)
>0x0284 <+100>:   mov0x22f08(%rdi),%rax
>0x028b <+107>:   mov0x24814(%rdi),%edx
>0x0291 <+113>:   mov%edx,0xc(%rax)
>0x0294 <+116>:   addq   $0x10,0x22f08(%rdi)
>0x029c <+124>:   pop%rbp
>0x029d <+125>:   retq
> with the pointer increments coalesced to the end. Generated by
> opencoding emit_dwords as
>
> static void upload_viewport_state_pointers(struct brw_context *brw)
> {
>BEGIN_BATCH(4);
>brw->batch.map[0] = (_3DSTATE_VIEWPORT_STATE_POINTERS << 16 | (4 - 2) |
> GEN6_CC_VIEWPORT_MODIFY |
> GEN6_SF_VIEWPORT_MODIFY |
> GEN6_CLIP_VIEWPORT_MODIFY);
>brw->batch.map[1] = (brw->clip.vp_offset);
>brw->batch.map[2] = (brw->sf.vp_offset);
>brw->batch.map[3] = (brw->cc.vp_offset);
>brw->

[Mesa-dev] [PATCH 3/4] ilo: unref old fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush().  So driver
should use fence_ref() which will unref the old fence if not NULL.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/ilo/ilo_context.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/ilo/ilo_context.c 
b/src/gallium/drivers/ilo/ilo_context.c
index 3d5c7b6..b9a16aa 100644
--- a/src/gallium/drivers/ilo/ilo_context.c
+++ b/src/gallium/drivers/ilo/ilo_context.c
@@ -62,6 +62,8 @@ ilo_flush(struct pipe_context *pipe,
  (flags & PIPE_FLUSH_END_OF_FRAME) ? "frame end" : "user request");
 
if (f) {
+  struct pipe_screen *screen = pipe->screen;
+  screen->fence_reference(screen, f, NULL);
   *f = ilo_screen_fence_create(pipe->screen, ilo->cp->last_submitted_bo);
}
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] freedreno: unref old fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush().  So driver
should use fence_ref() which will unref the old fence if not NULL.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/freedreno/freedreno_context.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/freedreno/freedreno_context.c 
b/src/gallium/drivers/freedreno/freedreno_context.c
index 668ef36..127fb5f 100644
--- a/src/gallium/drivers/freedreno/freedreno_context.c
+++ b/src/gallium/drivers/freedreno/freedreno_context.c
@@ -144,8 +144,10 @@ fd_context_flush(struct pipe_context *pctx, struct 
pipe_fence_handle **fence,
 {
fd_context_render(pctx);
 
-   if (fence)
+   if (fence) {
+   fd_screen_fence_ref(pctx->screen, fence, NULL);
*fence = fd_fence_create(pctx);
+   }
 }
 
 void
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] vc4: unref old fence

2015-07-08 Thread Rob Clark

From: Rob Clark 

Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush().  So driver
should use fence_ref() which will unref the old fence if not NULL.

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/vc4/vc4_context.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/vc4/vc4_context.c 
b/src/gallium/drivers/vc4/vc4_context.c
index 630f8e6..316598f 100644
--- a/src/gallium/drivers/vc4/vc4_context.c
+++ b/src/gallium/drivers/vc4/vc4_context.c
@@ -103,8 +103,10 @@ vc4_pipe_flush(struct pipe_context *pctx, struct 
pipe_fence_handle **fence,
 vc4_flush(pctx);
 
 if (fence) {
+struct pipe_screen *screen = pctx->screen;
 struct vc4_fence *f = vc4_fence_create(vc4->screen,
vc4->last_emit_seqno);
+screen->fence_reference(screen, fence, NULL);
 *fence = (struct pipe_fence_handle *)f;
 }
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] gallium: clarify reference counting for fence

2015-07-08 Thread Michel Dänzer

On 09.07.2015 05:34, Rob Clark wrote:
> From: Rob Clark 
> 
> No where was it spelled out that the state tracker may expect the pipe

"Nowhere"?

> driver to unref the old fence.
> 
> Signed-off-by: Rob Clark 

Either way, the series is

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 03/11] radeon: remove dri_mirror state

2015-07-08 Thread Michel Dänzer

On 09.07.2015 02:07, Emil Velikov wrote:
> Most of the data stored(duplicated) was unused, and for the one that is
> follow the approach set by other drivers.
> This eliminates the use of legacy (dri1) types.

The commentary below should have been after the --- separator, not in
the actual Git commit log. With that fixed, this patch is

Reviewed-by: Michel Dänzer 

> XXX: The radeon code is the only user of __DRIscreen::drm_version (the
> only __DRIversion outside of dri1 land). Should we move it into radeon
> and/or bump the min. required drm module version ?

Moving this into radeon sounds good to me, but I'm not sure what exactly
you mean by bumping the minimum required version, or what it's supposed
to be good for.

FWIW though, any code which is specific to radeon DRM major version 1
can be removed, because that's the UMS major version.

-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91259] FS compile failed: Register spilling not supported with m14 used

2015-07-08 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91259

Kenneth Graunke  changed:

   What|Removed |Added

  Component|Other   |Drivers/DRI/i965
   Assignee|mesa-dev@lists.freedesktop. |i...@freedesktop.org
   |org |
 QA Contact|mesa-dev@lists.freedesktop. |intel-3d-bugs@lists.freedes
   |org |ktop.org

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] vc4: unref old fence

2015-07-08 Thread Eric Anholt

Rob Clark  writes:

> From: Rob Clark 
>
> Some, but not all, state trackers will explicitly unref (and set to
> NULL) the previous *fence before calling pipe->flush().  So driver
> should use fence_ref() which will unref the old fence if not NULL.
>
> Signed-off-by: Rob Clark 

Acked-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600g: fix sampler/ubo indexing on cayman

2015-07-08 Thread Dave Airlie

From: Dave Airlie 

Cayman needs a different method to upload the CF IDX0/1

This fixes 31 piglits when ARB_gpu_shader5 is forced on
with cayman.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/eg_asm.c | 17 +++--
 src/gallium/drivers/r600/eg_sq.h  | 11 +++
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index d04921e..c32d317 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -161,6 +161,9 @@ int egcm_load_index_reg(struct r600_bytecode *bc, unsigned 
id, bool inside_alu_c
alu.op = ALU_OP1_MOVA_INT;
alu.src[0].sel = bc->index_reg[id];
alu.src[0].chan = 0;
+   if (bc->chip_class == CAYMAN)
+   alu.dst.sel = id == 0 ? CM_V_SQ_MOVA_DST_CF_IDX0 : 
CM_V_SQ_MOVA_DST_CF_IDX1;
+
alu.last = 1;
r = r600_bytecode_add_alu(bc, &alu);
if (r)
@@ -168,12 +171,14 @@ int egcm_load_index_reg(struct r600_bytecode *bc, 
unsigned id, bool inside_alu_c
 
bc->ar_loaded = 0; /* clobbered */
 
-   memset(&alu, 0, sizeof(alu));
-   alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1;
-   alu.last = 1;
-   r = r600_bytecode_add_alu(bc, &alu);
-   if (r)
-   return r;
+   if (bc->chip_class == EVERGREEN) {
+   memset(&alu, 0, sizeof(alu));
+   alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(bc, &alu);
+   if (r)
+   return r;
+   }
 
/* Must split ALU group as index only applies to following group */
if (inside_alu_clause) {
diff --git a/src/gallium/drivers/r600/eg_sq.h b/src/gallium/drivers/r600/eg_sq.h
index b534872..10caa07 100644
--- a/src/gallium/drivers/r600/eg_sq.h
+++ b/src/gallium/drivers/r600/eg_sq.h
@@ -521,4 +521,15 @@
 
 #define V_SQ_REL_ABSOLUTE 0
 #define V_SQ_REL_RELATIVE 1
+
+/* CAYMAN has special encoding for MOVA_INT destination */
+#define CM_V_SQ_MOVA_DST_AR_X 0
+#define CM_V_SQ_MOVA_DST_CF_PC 1
+#define CM_V_SQ_MOVA_DST_CF_IDX0 2
+#define CM_V_SQ_MOVA_DST_CF_IDX1 3
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_7_0 4
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_15_8 5
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_23_16 6
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_31_24 7
+
 #endif
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 91222] lp_test_format regression on CentOS 7

2015-07-08 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=91222

Luke  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Luke  ---
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7b06af9d3ca7310197d39d55fc52c265da4bc59e

Fixed this

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Replace illegal compacted NOP with valid compact instruction

2015-07-08 Thread Zhenyu Wang

NOP actually has no compact version, but we use it for instruction
alignment for compact kernel. Although it seems working on HW, it is
illegal and might not be valid for any future one.

This trys to get a temporary compact instruction with no effect for
alignment to replace compacted NOP. G45 spec has note that HW compact
logic could determine NENOP and drop it right away, so we can still
keep with that.

v2: rebase to master, we still need this to work with internal tool.

Signed-off-by: Zhenyu Wang 
---
 src/mesa/drivers/dri/i965/brw_eu_compact.c | 41 +-
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
b/src/mesa/drivers/dri/i965/brw_eu_compact.c
index 67f0b45..719667a 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
@@ -1367,6 +1367,39 @@ brw_init_compaction_tables(const struct brw_device_info 
*devinfo)
}
 }
 
+static void
+brw_get_noop_compact(struct brw_codegen *p, brw_compact_inst *dst)
+{
+   const struct brw_device_info *devinfo = p->devinfo;
+   brw_inst *inst, i;
+   struct brw_reg g0 = brw_vec8_grf(0, 0);
+
+   memset(dst, 0, sizeof(*dst));
+
+   /* G45 compact logic could recognize NENOP and drop right away. */
+   if (devinfo->is_g4x) {
+  brw_compact_inst_set_opcode(dst, BRW_OPCODE_NENOP);
+  brw_compact_inst_set_cmpt_control(dst, true);
+  return;
+   }
+
+   /*
+* As NOP has no legal compact version, try to use a legal compact
+* instruction for compact instruction alignment.
+*/
+   brw_push_insn_state(p);
+   brw_set_default_predicate_control(p, BRW_PREDICATE_NONE);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   inst = brw_MOV(p, g0, g0);
+   memcpy(&i, inst, sizeof(brw_inst));
+   brw_pop_insn_state(p);
+
+   if (!brw_try_compact_instruction(devinfo, dst, &i)) {
+  fprintf(stderr, "Failed to generate compact inst for alignment!\n");
+  exit(1);
+   }
+}
+
 void
 brw_compact_instructions(struct brw_codegen *p, int start_offset,
  int num_annotations, struct annotation *annotation)
@@ -1414,9 +1447,7 @@ brw_compact_instructions(struct brw_codegen *p, int 
start_offset,
  /* All uncompacted instructions need to be aligned on G45. */
  if ((offset & sizeof(brw_compact_inst)) != 0 && devinfo->is_g4x){
 brw_compact_inst *align = store + offset;
-memset(align, 0, sizeof(*align));
-brw_compact_inst_set_opcode(align, BRW_OPCODE_NENOP);
-brw_compact_inst_set_cmpt_control(align, true);
+brw_get_noop_compact(p, align);
 offset += sizeof(brw_compact_inst);
 compacted_count--;
 compacted_counts[src_offset / sizeof(brw_inst)] = compacted_count;
@@ -1523,9 +1554,7 @@ brw_compact_instructions(struct brw_codegen *p, int 
start_offset,
 */
if (p->next_insn_offset & sizeof(brw_compact_inst)) {
   brw_compact_inst *align = store + offset;
-  memset(align, 0, sizeof(*align));
-  brw_compact_inst_set_opcode(align, BRW_OPCODE_NOP);
-  brw_compact_inst_set_cmpt_control(align, true);
+  brw_get_noop_compact(p, align);
   p->next_insn_offset += sizeof(brw_compact_inst);
}
p->nr_insn = p->next_insn_offset / sizeof(brw_inst);
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600g: move sampler/ubo index registers before temp reg

2015-07-08 Thread Dave Airlie

From: Dave Airlie 

temp_reg needs to be last, as we increment things
away from it, otherwise on cayman some tests were overwriting
the index regs.

Fixes 2 piglit with ARB_gpu_shader5 forced on cayman.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/r600_shader.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index af7622e..1a72bf6 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1931,15 +1931,14 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.file_offset[TGSI_FILE_IMMEDIATE] = V_SQ_ALU_SRC_LITERAL;
ctx.bc->ar_reg = ctx.file_offset[TGSI_FILE_TEMPORARY] +
ctx.info.file_max[TGSI_FILE_TEMPORARY] + 1;
+   ctx.bc->index_reg[0] = ctx.bc->ar_reg + 1;
+   ctx.bc->index_reg[1] = ctx.bc->ar_reg + 2;
+
if (ctx.type == TGSI_PROCESSOR_GEOMETRY) {
-   ctx.gs_export_gpr_treg = ctx.bc->ar_reg + 1;
-   ctx.temp_reg = ctx.bc->ar_reg + 2;
-   ctx.bc->index_reg[0] = ctx.bc->ar_reg + 3;
-   ctx.bc->index_reg[1] = ctx.bc->ar_reg + 4;
+   ctx.gs_export_gpr_treg = ctx.bc->ar_reg + 3;
+   ctx.temp_reg = ctx.bc->ar_reg + 4;
} else {
-   ctx.temp_reg = ctx.bc->ar_reg + 1;
-   ctx.bc->index_reg[0] = ctx.bc->ar_reg + 2;
-   ctx.bc->index_reg[1] = ctx.bc->ar_reg + 3;
+   ctx.temp_reg = ctx.bc->ar_reg + 3;
}
 
shader->max_arrays = 0;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] r600g/sb: add support for multiple streams to SB backend

2015-07-08 Thread Dave Airlie

From: Glenn Kennard 

This adds a peephole and removes an assert that isn't
actually valid with some of the stream emit instructions.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp |  2 --
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   | 18 +++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index 8c2cd14..48d56ac 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -761,8 +761,6 @@ void bc_finalizer::finalize_cf(cf_node* c) {
mask |= (1 << chan);
}
 
-   assert(reg >= 0 && mask);
-
if (reg >= 0)
update_ngpr(reg);
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 4879c03..25b0f55 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -757,10 +757,22 @@ int bc_parser::prepare_ir() {
c->bc.end_of_program = eop;
 
} else if (flags & CF_EMIT) {
-   c->flags |= NF_DONT_KILL | NF_DONT_HOIST | NF_DONT_MOVE;
+   /* quick peephole */
+   cf_node *prev = static_cast(c->prev);
+   if (c->bc.op == CF_OP_CUT_VERTEX &&
+   prev && prev->is_valid() &&
+   prev->bc.op == CF_OP_EMIT_VERTEX &&
+   c->bc.count == prev->bc.count) {
+   prev->bc.set_op(CF_OP_EMIT_CUT_VERTEX);
+   prev->bc.end_of_program = c->bc.end_of_program;
+   c->remove();
+   }
+   else {
+   c->flags |= NF_DONT_KILL | NF_DONT_HOIST | 
NF_DONT_MOVE;
 
-   
c->src.push_back(sh->get_special_value(SV_GEOMETRY_EMIT));
-   
c->dst.push_back(sh->get_special_value(SV_GEOMETRY_EMIT));
+   
c->src.push_back(sh->get_special_value(SV_GEOMETRY_EMIT));
+   
c->dst.push_back(sh->get_special_value(SV_GEOMETRY_EMIT));
+   }
}
}
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] r600g: enable ARB_gpu_shader5 on evergreen and up

2015-07-08 Thread Dave Airlie

From: Dave Airlie 

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/r600_pipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 67caa69..0db1c1c 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -299,7 +299,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
 
case PIPE_CAP_GLSL_FEATURE_LEVEL:
if (family >= CHIP_CEDAR)
-  return 330;
+  return 400;
/* pre-evergreen geom shaders need newer kernel */
if (rscreen->b.info.drm_minor >= 37)
   return 330;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC] r600 geometry streams (and ARB_gpu_shader5 support)

2015-07-08 Thread Dave Airlie

This applies on top of the two patches I've sent already,
and enables the geometry streams, which is the final
piece missing for ARB_gpu_shader5 on evergreen and cayman.
(I'll do doc update patches later)

Glenn wrote most of this, I just spent some time making it work
and cleaning up the code. Though I suspect it could do with more
cleaning.

The main thing I don't really like is we try and emit all
outputs into each ring so we waste a bit of space, however
it does work, though the special casing to avoid POSITION
going into any other streams kinda makes me wonder if anything
else needs that special casing.

Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] r600g: add support for streams to the assembler.

2015-07-08 Thread Dave Airlie

From: Glenn Kennard 

This just adds support to the assembler dumper and allows
stream instructions to be generated. Also fix up the stream
debugging to add stream info.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/eg_asm.c  | 1 +
 src/gallium/drivers/r600/r600_asm.c| 2 ++
 src/gallium/drivers/r600/r600_asm.h| 1 +
 src/gallium/drivers/r600/r600_shader.c | 6 --
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 3 +++
 5 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index 42e8b0b..c32d317 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -115,6 +115,7 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)
S_SQ_CF_WORD1_BARRIER(1) |
S_SQ_CF_WORD1_COND(cf->cond) |
S_SQ_CF_WORD1_POP_COUNT(cf->pop_count) |
+   S_SQ_CF_WORD1_COUNT(cf->count) |

S_SQ_CF_WORD1_END_OF_PROGRAM(cf->end_of_program);
}
}
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 762cc7f..40639d0 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2029,6 +2029,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
fprintf(stderr, "CND:%X ", cf->cond);
if (cf->pop_count)
fprintf(stderr, "POP:%X ", 
cf->pop_count);
+   if (cf->count && (cfop->flags & CF_EMIT))
+   fprintf(stderr, "STREAM%d ", cf->count);
fprintf(stderr, "\n");
}
}
diff --git a/src/gallium/drivers/r600/r600_asm.h 
b/src/gallium/drivers/r600/r600_asm.h
index e37d926..b282907 100644
--- a/src/gallium/drivers/r600/r600_asm.h
+++ b/src/gallium/drivers/r600/r600_asm.h
@@ -149,6 +149,7 @@ struct r600_bytecode_cf {
unsignedid;
unsignedcond;
unsignedpop_count;
+   unsignedcount;
unsignedcf_addr; /* control flow addr */
struct r600_bytecode_kcache kcache[4];
unsignedr6xx_uses_waterfall;
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 1a72bf6..dbff313 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -93,8 +93,10 @@ static void r600_dump_streamout(struct 
pipe_stream_output_info *so)
for (i = 0; i < so->num_outputs; i++) {
unsigned mask = ((1 << so->output[i].num_components) - 1) <<
so->output[i].start_component;
-   fprintf(stderr, "  %i: MEM_STREAM0_BUF%i[%i..%i] <- 
OUT[%i].%s%s%s%s%s\n",
-   i, so->output[i].output_buffer,
+   fprintf(stderr, "  %i: MEM_STREAM%d_BUF%i[%i..%i] <- 
OUT[%i].%s%s%s%s%s\n",
+   i,
+   so->output[i].stream,
+   so->output[i].output_buffer,
so->output[i].dst_offset, so->output[i].dst_offset + 
so->output[i].num_components - 1,
so->output[i].register_index,
mask & 1 ? "x" : "",
diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
index 5232782..631fac2 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
@@ -182,6 +182,9 @@ void bc_dump::dump(cf_node& n) {
 
if (n.bc.pop_count)
s << " POP:" << n.bc.pop_count;
+
+   if (n.bc.count && (n.bc.op_ptr->flags & CF_EMIT))
+   s << " STREAM" << n.bc.count;
}
 
if (!n.bc.barrier)
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] radeon: add streamout status 1-3 queries.

2015-07-08 Thread Dave Airlie

From: Glenn Kennard 

This adds support for queries against the non-0 vertex streams.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/radeon/r600_query.c   | 18 --
 src/gallium/drivers/radeon/r600d_common.h |  3 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index 71f4a15..9ad2452 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -54,6 +54,8 @@ struct r600_query {
uint64_t end_result;
/* Fence for GPU_FINISHED. */
struct pipe_fence_handle *fence;
+   /* For transform feedback: which stream the query is for */
+   unsigned stream;
 };
 
 
@@ -157,6 +159,17 @@ static void r600_update_occlusion_query_state(struct 
r600_common_context *rctx,
}
 }
 
+static unsigned event_type_for_stream(struct r600_query *query)
+{
+   switch (query->stream) {
+   default:
+   case 0: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS;
+   case 1: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS1; /* enum values 
snarfed from SI kernel sid.h */
+   case 2: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS2;
+   case 3: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS3;
+   }
+}
+
 static void r600_emit_query_begin(struct r600_common_context *ctx, struct 
r600_query *query)
 {
struct radeon_winsys_cs *cs = ctx->rings.gfx.cs;
@@ -191,7 +204,7 @@ static void r600_emit_query_begin(struct 
r600_common_context *ctx, struct r600_q
case PIPE_QUERY_SO_STATISTICS:
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
-   radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | 
EVENT_INDEX(3));
+   radeon_emit(cs, EVENT_TYPE(event_type_for_stream(query)) | 
EVENT_INDEX(3));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
break;
@@ -248,7 +261,7 @@ static void r600_emit_query_end(struct r600_common_context 
*ctx, struct r600_que
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
va += query->buffer.results_end + query->result_size/2;
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0));
-   radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | 
EVENT_INDEX(3));
+   radeon_emit(cs, EVENT_TYPE(event_type_for_stream(query)) | 
EVENT_INDEX(3));
radeon_emit(cs, va);
radeon_emit(cs, (va >> 32UL) & 0xFF);
break;
@@ -369,6 +382,7 @@ static struct pipe_query *r600_create_query(struct 
pipe_context *ctx, unsigned q
/* NumPrimitivesWritten, PrimitiveStorageNeeded. */
query->result_size = 32;
query->num_cs_dw = 6;
+   query->stream = index;
break;
case PIPE_QUERY_PIPELINE_STATISTICS:
/* 11 values on EG, 8 on R600. */
diff --git a/src/gallium/drivers/radeon/r600d_common.h 
b/src/gallium/drivers/radeon/r600d_common.h
index 74c8d87..5a56a54 100644
--- a/src/gallium/drivers/radeon/r600d_common.h
+++ b/src/gallium/drivers/radeon/r600d_common.h
@@ -66,6 +66,9 @@
 #define PKT3_SET_SH_REG0x76 /* SI and later */
 #define PKT3_SET_UCONFIG_REG   0x79 /* CIK and later */
 
+#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS1  0x1 /* EG and later */
+#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS2  0x2 /* EG and later */
+#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS3  0x3 /* EG and later */
 #define EVENT_TYPE_PS_PARTIAL_FLUSH0x10
 #define EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT 0x14
 #define EVENT_TYPE_ZPASS_DONE  0x15
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/6] r600g: add streamout support

2015-07-08 Thread Dave Airlie

From: Glenn Kennard 

This adds the main chunk of the geometry shader multiple stream
support to the r600 driver.

Glenn wrote the original pass, and I took his code and hacked
it into a working state.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/evergreen_state.c   |  29 ++--
 src/gallium/drivers/r600/r600_pipe.c |   2 +-
 src/gallium/drivers/r600/r600_shader.c   | 200 ---
 src/gallium/drivers/r600/r600_shader.h   |   6 +-
 src/gallium/drivers/r600/r600_state.c|   6 +-
 src/gallium/drivers/r600/r600_state_common.c |   7 +
 6 files changed, 180 insertions(+), 70 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 4ddbc0b..788bf54 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2988,8 +2988,12 @@ void evergreen_update_gs_state(struct pipe_context *ctx, 
struct r600_pipe_shader
struct r600_command_buffer *cb = &shader->command_buffer;
struct r600_shader *rshader = &shader->shader;
struct r600_shader *cp_shader = &shader->gs_copy_shader->shader;
-   unsigned gsvs_itemsize =
-   (cp_shader->ring_item_size * 
rshader->gs_max_out_vertices) >> 2;
+   unsigned gsvs_itemsizes[4] = {
+   (cp_shader->ring_item_sizes[0] * 
rshader->gs_max_out_vertices) >> 2,
+   (cp_shader->ring_item_sizes[1] * 
rshader->gs_max_out_vertices) >> 2,
+   (cp_shader->ring_item_sizes[2] * 
rshader->gs_max_out_vertices) >> 2,
+   (cp_shader->ring_item_sizes[3] * 
rshader->gs_max_out_vertices) >> 2
+   };
 
r600_init_command_buffer(cb, 64);
 
@@ -3008,21 +3012,24 @@ void evergreen_update_gs_state(struct pipe_context 
*ctx, struct r600_pipe_shader
S_028B90_ENABLE(rshader->gs_num_invocations > 
0));
}
r600_store_context_reg_seq(cb, R_02891C_SQ_GS_VERT_ITEMSIZE, 4);
-   r600_store_value(cb, cp_shader->ring_item_size >> 2);
-   r600_store_value(cb, 0);
-   r600_store_value(cb, 0);
-   r600_store_value(cb, 0);
+   r600_store_value(cb, cp_shader->ring_item_sizes[0] >> 2);
+   r600_store_value(cb, cp_shader->ring_item_sizes[1] >> 2);
+   r600_store_value(cb, cp_shader->ring_item_sizes[2] >> 2);
+   r600_store_value(cb, cp_shader->ring_item_sizes[3] >> 2);
 
r600_store_context_reg(cb, R_028900_SQ_ESGS_RING_ITEMSIZE,
-  (rshader->ring_item_size) >> 2);
+  (rshader->ring_item_sizes[0]) >> 2);
 
r600_store_context_reg(cb, R_028904_SQ_GSVS_RING_ITEMSIZE,
-  gsvs_itemsize);
+  gsvs_itemsizes[0] +
+  gsvs_itemsizes[1] +
+  gsvs_itemsizes[2] +
+  gsvs_itemsizes[3]);
 
r600_store_context_reg_seq(cb, R_02892C_SQ_GSVS_RING_OFFSET_1, 3);
-   r600_store_value(cb, gsvs_itemsize);
-   r600_store_value(cb, gsvs_itemsize);
-   r600_store_value(cb, gsvs_itemsize);
+   r600_store_value(cb, gsvs_itemsizes[0]);
+   r600_store_value(cb, gsvs_itemsizes[0] + gsvs_itemsizes[1]);
+   r600_store_value(cb, gsvs_itemsizes[0] + gsvs_itemsizes[1] + 
gsvs_itemsizes[2]);
 
/* FIXME calculate these values somehow ??? */
r600_store_context_reg_seq(cb, R_028A54_GS_PER_ES, 3);
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 143e98e..67caa69 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -352,7 +352,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS:
return 16384;
case PIPE_CAP_MAX_VERTEX_STREAMS:
-   return 1;
+   return family >= CHIP_CEDAR ? 4 : 1;
 
case PIPE_CAP_MAX_VERTEX_ATTRIB_STRIDE:
return 2047;
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index dbff313..ef19706 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -311,7 +311,9 @@ struct r600_shader_ctx {
int gs_out_ring_offset;
int gs_next_vertex;
struct r600_shader  *gs_for_vs;
-   int gs_export_gpr_treg;
+   int gs_export_gpr_tregs[4];
+   const struct pipe_stream_output_info*gs_stream_output_info;
+   unsignedenabled_stream_buffers_mask; /* 
See r600_streamout.enabled_stream_buffers_mask */
 };
 
 struct r600_shader_tgsi_instruction {
@@ -319,7 +321,7 @@ struct r600_shader_tgsi_instruction

[Mesa-dev] [PATCH 3/6] radeon: add support for streams to the common streamout code.

2015-07-08 Thread Dave Airlie

From: Glenn Kennard 

This just adds to the common radeon streamout code, support
for multiple streams.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
 src/gallium/drivers/radeon/r600_streamout.c   | 23 +--
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index a471426..22d940e 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -327,6 +327,7 @@ struct r600_streamout {
/* External state which comes from the vertex shader,
 * it must be set explicitly when binding a shader. */
unsigned*stride_in_dw;
+   unsignedenabled_stream_buffers_mask; /* stream0 
buffers0-3 in 4 LSB */
 
/* The state of VGT_STRMOUT_(CONFIG|EN). */
struct r600_atomenable_atom;
diff --git a/src/gallium/drivers/radeon/r600_streamout.c 
b/src/gallium/drivers/radeon/r600_streamout.c
index bc8bf97..a602dac 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -195,7 +195,11 @@ static void r600_emit_streamout_begin(struct 
r600_common_context *rctx, struct r
r600_write_context_reg(cs, rctx->chip_class >= EVERGREEN ?
   R_028B98_VGT_STRMOUT_BUFFER_CONFIG :
   R_028B20_VGT_STRMOUT_BUFFER_EN,
-  rctx->streamout.enabled_mask);
+  (rctx->streamout.enabled_mask |
+   (rctx->streamout.enabled_mask << 4) |
+   (rctx->streamout.enabled_mask << 8) |
+   (rctx->streamout.enabled_mask << 12)) &
+
rctx->streamout.enabled_stream_buffers_mask);
 
for (i = 0; i < rctx->streamout.num_targets; i++) {
if (!t[i])
@@ -326,11 +330,18 @@ static bool r600_get_strmout_en(struct 
r600_common_context *rctx)
 static void r600_emit_streamout_enable(struct r600_common_context *rctx,
   struct r600_atom *atom)
 {
-   r600_write_context_reg(rctx->rings.gfx.cs,
-  rctx->chip_class >= EVERGREEN ?
-  R_028B94_VGT_STRMOUT_CONFIG :
-  R_028AB0_VGT_STRMOUT_EN,
-  
S_028B94_STREAMOUT_0_EN(r600_get_strmout_en(rctx)));
+   unsigned reg = R_028AB0_VGT_STRMOUT_EN;
+   unsigned val = S_028B94_STREAMOUT_0_EN(r600_get_strmout_en(rctx));
+   if (rctx->chip_class >= EVERGREEN) {
+   reg = R_028B94_VGT_STRMOUT_CONFIG;
+   val |=
+   S_028B94_RAST_STREAM(0) |
+   S_028B94_STREAMOUT_1_EN(r600_get_strmout_en(rctx)) |
+   S_028B94_STREAMOUT_2_EN(r600_get_strmout_en(rctx)) |
+   S_028B94_STREAMOUT_3_EN(r600_get_strmout_en(rctx));
+   }
+
+   r600_write_context_reg(rctx->rings.gfx.cs, reg, val);
 }
 
 static void r600_set_streamout_enable(struct r600_common_context *rctx, bool 
enable)
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 16/78] i965/nir/vec4: Implement store_output intrinsic

2015-07-08 Thread Eduardo Lima Mitev

On 06/30/2015 06:51 PM, Jason Ekstrand wrote:
> On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev  wrote:
>> The index into the output_reg array where to store the destination register 
>> is
>> fetched from the nir_outputs map built during nir_setup_outputs stage.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 17 +++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> index 8a2d335..55d4490 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> @@ -520,10 +520,23 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
>> *instr)
>> }
>>
>> case nir_intrinsic_store_output_indirect:
>> +  has_indirect = true;
>>/* fallthrough */
>> -   case nir_intrinsic_store_output:
>> -  /* @TODO: Not yet implemented */
>> +   case nir_intrinsic_store_output: {
>> +  int offset = instr->const_index[0];
>> +  int output = nir_outputs[offset];
>> +
>> +  src = get_nir_src(instr->src[0], nir_output_types[offset]);
>> +  dest = dst_reg(src);
>> +
>> +  dest.writemask = brw_writemask_for_size(instr->num_components);
>> +
>> +  if (has_indirect)
>> + dest.reladdr = new(mem_ctx) src_reg(get_nir_src(instr->src[1]));
>> +
>> +  output_reg[output] = dest;
> 
> I'm very confused about the amount of indirection going on here.  It
> seems to me that we should be setting these outputs up in
> setup_outputs() rather than storring off a map from ints to other ints
> and setting it up here.  I didn't make this comment on the patch for
> setup_outputs() because I wanted to wait to see it used before I
> commented on it.
> 
> I'm guessing you did it this way because the nir_assign_var_locations
> is giving you bogus values.  If so, then it might be better to just
> assign variable locations in setup_outputs() rather than having a
> remap table.  The whole point of nir_lower_io is to make IO easy for
> the back-end.  If you need a re-map table, then it's no longer making
> it easy and we need to think more about what's going on.
> --Jason
> 

That double indirection felt bad since the beginning, but it was needed
to store the original variable's location (var->data.location). Let me
explain:

We are (re)using the plumbering in vec4_visitor to setup URB, so the
only thing we need to do is to store the out register in "output_reg"
map at the correct location. And that location is given by the original
location in the shader (var->data.location).

So, in this case, "nir_assign_var_locations" pass, which constructs
var->data.driver_location, is not useful to us, except to give us
consecutive indexes to construct the other map we have, the type map,
which is needed to carry the correct type from the original variable to
the output register.

So, before knowing that I could modify nir_lower_io, my best shot at
transferring the original variable location was to create this
nir_outputs map. Now, what I have done is to put that value in
const_index[1] of the intrinsic instruction, which was previously
unused. What do you think?

That removes the offset to offset map, but we still need the type map.

About your comment on initializing the register during setup stage, I'm
a bit confused: the register that we need to store is not available
during setup stage, because we still don't have local registers allocated.

>>break;
>> +   }
>>
>> case nir_intrinsic_load_vertex_id:
>>unreachable("should be lowered by lower_vertex_id()");
>> --

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

79 matches

Mail list logo