[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 --- Comment #4 from Tomasz C. --- I made a test the bisection method, here's the result: 0e0e23ef537c9add672ff322f34e129a07edc55e is the first bad commit commit 0e0e23ef537c9add672ff322f34e129a07edc55e Author: Jordan Justen Date: Wed Apr 22 11:43:50 2015 -0700 i965/state: Emit pipeline select when changing pipelines Signed-off-by: Jordan Justen Reviewed-by: Kenneth Graunke :04 04 725572870b90c40592e3d382faef748469403f67 ae158c5c1fc9c0487dbfad8b338ebdb27300c0a1 M src It will be helpful? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote: > Hi Chris, > > I made a genuine effort to review this patch, hoping to better understand > the various changes and what you were trying to accomplish. I spent many > hours reading and trying to enumerate changes - or potential changes I > needed to look hard at to convince myself whether they were correct. > > I came up with a frighteningly long list of changes: * Locking completely overhauled. > * Relocation handling changes considerably (the original point of > Kristian's endeavour which led up to this). It's been a long time since 2011. > * Fencing, busy tracking, and sync objects are completely reworked. sync obje That's a slight overstatement. > * Render-to-texture cache flushing and dirty buffer tracking is > completely reworked. And immensely simplified. A hash table? Did you notice that the current dirty tracking misses where the same buffer is rewritten? > * Gen7 SOL buffer offset resetting now uses MI_LOAD_REGISTER_IMM rather > than the execbuf2 parameter, requiring the command validator on Haswell. > This effectively bumps the kernel requirement from v3.6 to v4.2-rc1, > which will simply not fly with distributions at this time. My bad, completely missed that there was EXT_transform_feedback that only did the subset. I wondered why you had code to the pipelined loads/save but then still flushed the batch every time. > * glBufferSubData() now uses intel_upload_data() rather than allocating > a temporary BO. This is the first use of the upload buffer by the > BLT engine, and could imply that the upload buffer's lifetime now > extends across batches - longer than before. Separable change that > requires separate evaluation and justification. Yes and no, it is easy to demonstrate that the buffer's lifetime is longer than a batch current. > * Per buffer cache-coherency checking rather than brw->has_llc? We've pointed out the bugs in the current usage of brw->has_llc before. > * glBufferSubData()'s prefer_stall_to_blit flag appears to depend on > per-buffer cache-coherency rather than being set globally. Could > impact performance of buffer uploads. > > * Potential missing flushes (which can cause hangs or misrendering): > > - It looks like calling brw_bo_busy() with BUSY_FLUSH causes a flush > when necessary. However, some instances of the old bo_busy, > bo_references, batch_flush pattern are replaced without that flag. > One occurrance was in BufferSubData(); I did not spend time to > check every case. The change is still accurate. > - Flushes are often done implicitly by e.g. brw_bo_read calling > brw_bo_map with the appropriate flags, and many explicit checks > and flushes are removed. Not bad, but needs careful review. > > - Gen6+ query object code might have dropped an implicit flush > guaranteeing that when the GL application requests the result, > any pending work will be kicked off so they can poll/spin > repeatedly until the result arrives. The flush is not missing. > - New code to avoid redundant flushes. > * perf_debug() warnings are removed all over the code for some reason: The problem is perf_debug() was a layering violation (my goal was to keep brw_batch a mini-library, either to extract it back to libdrm or for reuse). I have an idea of using a macro like #define PERF_DEBUG_LOC(brw, str) ((brw->perf_debug ? &(struct loc){__FILE__, __FUNCTION__, __LINE__, str)} : NULL)) brw_bo_map(..., PERF_DEBUG_LOC(brw)); Then pass back to a brw_context callback if a function stalls. > - Unsynchronized maps/BufferSubData not working on !LLC platforms? > If they work now, that's a huge change! If not, why drop the warning? They have always worked on !llc. The same caveats with using multiple aliasing to the same physical address through different modes of write caching apply to both llc and !llc. > - Warnings about stalls on mapping buffers and miptrees are gone now. > These have been useful in tracking down performance problems. They > might not always be accurate, but surely removing them should be done > separately with justification? See issues with calling perf_debug. > - Warnings about stalls on query objects are gone. I've used these when > analyzing application performance. Why? See issues with calling perf_debug. > - Warnings about implicit flushes are gone. See issues with calling perf_debug. It was at the back of my mind that I had to fix perf_debug - if I had put in the list of issues, I would probably would have remembered to fix it before posting. > * BO unmap calls appears to be missing in some places. A few map calls > have moved around in hard-to-follow ways. Unclear how lifetimes of > buffers and lifetimes of maps are affected. Bo unmap is a figment of your imagination. libdrm never unmaps a bo until it is closed (simply to avoid the cost of refaulting the objec
[Mesa-dev] [PATCH] nir: Fix comment above nir_convert_from_ssa() prototype.
Presumably Connor renamed the parameter, inverting the sense. Update the comment accordingly. Cc: Connor Abbott Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 4cb7d2f..9e2a281 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1659,9 +1659,9 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b); void nir_convert_to_ssa_impl(nir_function_impl *impl); void nir_convert_to_ssa(nir_shader *shader); -/* If convert_everything is true, convert all values (even those not involved - * in a phi node) to registers. If false, only convert SSA values involved in - * phi nodes to registers. +/* If phi_webs_only is true, only convert SSA values involved in phi nodes to + * registers. If false, convert all values (even those not involved in a phi + * node) to registers. */ void nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only); -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices
On Tue, Jul 07, 2015 at 01:42:47PM +0300, Martin Peres wrote: > On 06/07/15 19:43, Kenneth Graunke wrote: > >On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote: > >>Rather than allocating a fresh page every time we clear a buffer, keep > >>that page around between invocations by tracking the last used offset > >>and only allocating a fresh page when we wrap. > >> > >>Signed-off-by: Chris Wilson > >>--- > >> src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++--- > >> 1 file changed, 14 insertions(+), 3 deletions(-) > >This looks okay to me. Do you have any performance data to justify the > >extra complexity? > > I actually get a negative performance improvement on a customer > benchmark (-1.3%). Could it be because we are waiting on the VBO at > some point? > > What benchmark did you try to get a perf improvement? For this something as simple and cpu bound as windowed glxgears: 14550 -> 15400 (+-30fps both sides). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 --- Comment #5 from Tomasz C. --- This confirmed yet by compiling 10.6.1 of reverse patch: https://git.thm.de/tjkl80/mesa/commit/0e0e23ef537c9add672ff322f34e129a07edc55e.patch It works correctly. Do you expect the correction in the next version? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/18] i965: Query whether we have kernel support for the TIMESTAMP register once
On Mon, Jul 06, 2015 at 10:01:21AM -0700, Kenneth Graunke wrote: > On Monday, July 06, 2015 05:12:10 PM Chris Wilson wrote: > > On Mon, Jul 06, 2015 at 04:19:36PM +0300, Martin Peres wrote: > > > > > > > > > On 06/07/15 16:15, Martin Peres wrote: > > > >On 06/07/15 16:13, Chris Wilson wrote: > > > >>On Mon, Jul 06, 2015 at 03:10:48PM +0300, Martin Peres wrote: > > > >>>On 06/07/15 13:33, Chris Wilson wrote: > > > Move the query for the TIMESTAMP register from context init to the > > > screen, so that it is only queried once for all contexts. > > > > > > On 32bit systems, some old kernels trigger a hw bug resulting in the > > > TIMESTAMP register being shifted and the low bits always zero. Detect > > > this by repeating the read a few times and check the register is > > > incrementing. > > > >>>You do not do the latter. You only check for the low bits. > > > >>> > > > >>>I guess the counter is supposed to be monotonically increasing and > > > >>>with a resolution of a few microseconds which would make this > > > >>>perfectly valid. Could you confirm and make sure to add this > > > >>>information in the commit message please? > > > >>The counter should increment every 80ns. What's misleading in what I > > > >>wrote? It describes the hw bug and how to detect it. > > > > > > > >Well, it is not misleading, it just lacks this information. > > > > > > > >If it incremented every seconds, the patch would be stupid because > > > >the timestamp could be at 0 and polling 10 times at a few us of > > > >interval would always yield the same result. That's all :) > > > > > > Oh, forgot to say: With this information added in the commit message > > > and the commit message duplicated as a comment in > > > intel_detect_timestamp(), the patch is: > > > > How about: > > > > On 32bit systems, some old kernels trigger a hw bug resulting in the > > TIMESTAMP register being shifted and the low 32bits always zero. Detect > > this by repeating the read a few times and check the register is > > incrementing every 80ns as expected and not stuck on zero (as would be > > the case with the buggy kernel/hw.). > > -Chris > > > > > > It would be great to put this in a comment above the loop rather than > only in the commit message. > > Also, moving the check to screen-time and fixing the check to use a loop > are really two separate things - but they're so trivial I'm inclined to > not be picky. :) > > Reviewed-by: Kenneth Graunke Pushed, To ssh://git.freedesktop.org/git/mesa/mesa 38c2ec5..c8d3eba c8d3ebaffc0d7d915c1c19d54dba61fd1e57b338 -> master Thanks for the review, -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.
On 08/07/15 07:33, Matt Turner wrote: > Cc: "10.6" Thanks for that I've completely forgot about this. Seems like we never got to using mesa_sha. Can we just remove it for now and revive it as a user comes along ? Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/18] i965: Share the workaround bo between all contexts
On Mon, Jul 06, 2015 at 09:34:10AM -0700, Kenneth Graunke wrote: > On Monday, July 06, 2015 11:33:08 AM Chris Wilson wrote: > > Since the workaround bo is used strictly as a write-only buffer, we need > > only allocate one per screen and use the same one from all contexts. > > > > (The caveat here is during extension initialisation, where we write into > > and read back register values from the buffer, but that is performed only > > once for the first context - and baring synchronisation issues should not > > be a problem. Safer would be to move that also to the screen.) > > > > Signed-off-by: Chris Wilson > > --- > > src/mesa/drivers/dri/i965/brw_pipe_control.c | 6 +++--- > > src/mesa/drivers/dri/i965/intel_screen.c | 4 > > src/mesa/drivers/dri/i965/intel_screen.h | 1 + > > 3 files changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > index 7ee3cb6..05e14cd 100644 > > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > @@ -341,12 +341,12 @@ brw_init_pipe_control(struct brw_context *brw, > > * the gen6 workaround because it involves actually writing to > > * the buffer, and the kernel doesn't let us write to the batch. > > */ > > - brw->workaround_bo = drm_intel_bo_alloc(brw->bufmgr, > > - "pipe_control workaround", > > - 4096, 4096); > > + brw->workaround_bo = brw->intelScreen->workaround_bo; > > if (brw->workaround_bo == NULL) > >return -ENOMEM; > > Checking for out-of-memory conditions in code that doesn't actually > allocate anything looks funky now. I'd be inclined just to drop the > -ENOMEM path and make this a void function. > > Alternatively, you could just fold this into the brw_context setup and > ditch the functions altogether. Up to you. Ok, will refactor this since the check should now be at the screen. > > > > + drm_intel_bo_reference(brw->workaround_bo); > > + > > brw->pipe_controls_since_last_cs_stall = 0; > > > > return 0; > > diff --git a/src/mesa/drivers/dri/i965/intel_screen.c > > b/src/mesa/drivers/dri/i965/intel_screen.c > > index 839a984..cd8e6eb 100644 > > --- a/src/mesa/drivers/dri/i965/intel_screen.c > > +++ b/src/mesa/drivers/dri/i965/intel_screen.c > > @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv) > > { > > struct intel_screen *intelScreen = sPriv->driverPrivate; > > > > + drm_intel_bo_unreference(intelScreen->workaround_bo); > > dri_bufmgr_destroy(intelScreen->bufmgr); > > driDestroyOptionInfo(&intelScreen->optionCache); > > > > @@ -1096,6 +1097,9 @@ intel_init_bufmgr(struct intel_screen *intelScreen) > >return false; > > } > > > > + intelScreen->workaround_bo = > > + drm_intel_bo_alloc(intelScreen->bufmgr, "pipe_control w/a", 4096, > > 4096); > > + > > Seems a little funny to put this in intel_init_bufmgr, since it's not > setting up libdrm - why not put it in the caller? No reason. I was considering this to be global state associated with the bufmgr so just liked to keep it together. Thanks, -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 17/18] loader: Look for any version of currently linked libudev.so
On Tue, Jul 07, 2015 at 08:40:12PM +0100, Emil Velikov wrote: > On 06/07/15 11:33, Chris Wilson wrote: > > Since there was an ABI break and linking twice against libudev.so.0 and > > libudev.so.1 causes the application to quickly crash, we first check if > > the application is currently linked against libudev before dlopening a > > local handle. However for backwards/forwards compatability, we need to > > inspect the application for current linkage against all known versions > > first. > > > > Signed-off-by: Chris Wilson > I'm ever so slightly conserned that RTLD_NOLOAD is not part of the POSIX > standard, thus it's missing on some platforms (*BSD seems ok, while > Solaris, MacOS are not). > > Then again this code is not build for them so we are safe. Plus it does > save nastry crashes :-) Feel free to add the Cc: mesa-stable tag. > > Reviewed-by: Emil Velikov Included your addendum as a reference for the future and pushed, To ssh://git.freedesktop.org/git/mesa/mesa c8d3eba..f241345 f2413457937f8f4a92e11379569be69e508d7477 -> maste Thanks for the review. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/18] i965: Move pipecontrol workaround bo to brw_pipe_control
On Mon, Jul 06, 2015 at 09:30:48AM -0700, Kenneth Graunke wrote: > On Monday, July 06, 2015 11:33:07 AM Chris Wilson wrote: > > With the exception of gen8, the sole user of the workaround bo are for > > emitting pipe controls. Move it out of the purview of the batchbuffer > > and into the pipecontrol. > > > > Signed-off-by: Chris Wilson > > Reviewed-by: Kenneth Graunke Pushed, To ssh://git.freedesktop.org/git/mesa/mesa f241345..f1d08c4 f1d08c4f75794add30d1714a4cd9ce2bf335148d -> master Thanks for the review, -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Fix comment above nir_convert_from_ssa() prototype.
Whoops! Reviewed-by: Connor Abbott On Wed, Jul 8, 2015 at 1:57 AM, Kenneth Graunke wrote: > Presumably Connor renamed the parameter, inverting the sense. > Update the comment accordingly. > > Cc: Connor Abbott > Signed-off-by: Kenneth Graunke > --- > src/glsl/nir/nir.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h > index 4cb7d2f..9e2a281 100644 > --- a/src/glsl/nir/nir.h > +++ b/src/glsl/nir/nir.h > @@ -1659,9 +1659,9 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def > *b); > void nir_convert_to_ssa_impl(nir_function_impl *impl); > void nir_convert_to_ssa(nir_shader *shader); > > -/* If convert_everything is true, convert all values (even those not involved > - * in a phi node) to registers. If false, only convert SSA values involved in > - * phi nodes to registers. > +/* If phi_webs_only is true, only convert SSA values involved in phi nodes to > + * registers. If false, convert all values (even those not involved in a phi > + * node) to registers. > */ > void nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only); > > -- > 2.4.5 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965: Move the pipelined test for SO register access to the screen
Moving the test to the screen places it alongside the other global HW feature tesst that want to be shared between contexts. Signed-off-by: Chris Wilson Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.c | 1 + src/mesa/drivers/dri/i965/intel_extensions.c | 69 --- src/mesa/drivers/dri/i965/intel_screen.c | 99 src/mesa/drivers/dri/i965/intel_screen.h | 7 ++ 4 files changed, 107 insertions(+), 69 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 3dac01b..705f042 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -728,6 +728,7 @@ brwCreateContext(gl_api api, brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil; brw->has_swizzling = screen->hw_has_swizzling; + brw->has_pipelined_so = screen->hw_has_pipelined_so; brw->vs.base.stage = MESA_SHADER_VERTEX; brw->gs.base.stage = MESA_SHADER_GEOMETRY; diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 489a00f..1414015 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -32,74 +32,6 @@ #include "intel_reg.h" #include "utils.h" -/** - * Test if we can use MI_LOAD_REGISTER_MEM from an untrusted batchbuffer. - * - * Some combinations of hardware and kernel versions allow this feature, - * while others don't. Instead of trying to enumerate every case, just - * try and write a register and see if works. - */ -static bool -can_do_pipelined_register_writes(struct brw_context *brw) -{ - /* Supposedly, Broadwell just works. */ - if (brw->gen >= 8) - return true; - - static int result = -1; - if (result != -1) - return result; - - /* We use SO_WRITE_OFFSET0 since you're supposed to write it (unlike the -* statistics registers), and we already reset it to zero before using it. -*/ - const int reg = GEN7_SO_WRITE_OFFSET(0); - const int expected_value = 0x1337d0d0; - const int offset = 100; - - /* The register we picked only exists on Gen7+. */ - assert(brw->gen == 7); - - uint32_t *data; - /* Set a value in a BO to a known quantity. The workaround BO already -* exists and doesn't contain anything important, so we may as well use it. -*/ - drm_intel_bo_map(brw->workaround_bo, true); - data = brw->workaround_bo->virtual; - data[offset] = 0x; - drm_intel_bo_unmap(brw->workaround_bo); - - /* Write the register. */ - BEGIN_BATCH(3); - OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2)); - OUT_BATCH(reg); - OUT_BATCH(expected_value); - ADVANCE_BATCH(); - - brw_emit_mi_flush(brw); - - /* Save the register's value back to the buffer. */ - BEGIN_BATCH(3); - OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2)); - OUT_BATCH(reg); - OUT_RELOC(brw->workaround_bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - offset * sizeof(uint32_t)); - ADVANCE_BATCH(); - - intel_batchbuffer_flush(brw); - - /* Check whether the value got written. */ - drm_intel_bo_map(brw->workaround_bo, false); - data = brw->workaround_bo->virtual; - bool success = data[offset] == expected_value; - drm_intel_bo_unmap(brw->workaround_bo); - - result = success; - - return success; -} - static bool can_write_oacontrol(struct brw_context *brw) { @@ -327,7 +259,6 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.ARB_texture_compression_bptc = true; ctx->Extensions.ARB_texture_view = true; - brw->has_pipelined_so = can_do_pipelined_register_writes(brw); if (brw->has_pipelined_so) { ctx->Extensions.ARB_draw_indirect = true; ctx->Extensions.ARB_transform_feedback2 = true; diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 71f2957..9067399 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1157,6 +1157,104 @@ intel_detect_timestamp(struct intel_screen *screen) } /** + * Test if we can use MI_LOAD_REGISTER_MEM from an untrusted batchbuffer. + * + * Some combinations of hardware and kernel versions allow this feature, + * while others don't. Instead of trying to enumerate every case, just + * try and write a register and see if works. + */ +static bool +intel_detect_pipelined_register(struct intel_screen *screen, + int reg, uint32_t expected_value) +{ + const int offset = 100; + + drm_intel_bo *bo; + uint32_t buf[100]; + uint32_t *batch = buf; + + uint32_t *data; + /* Set a value in a BO to a known quantity. The workaround BO already +* exists and doesn't contain anything important, so we may as well use it. +*/ + if (drm_intel_bo_map(screen->workaround_bo, true)) + return false; + + data = screen->workaround_bo->virtual;
[Mesa-dev] [PATCH 2/5] i965: Share the workaround bo between all contexts
Since the workaround bo is used strictly as a write-only buffer, we need only allocate one per screen and use the same one from all contexts. (The caveat here is during extension initialisation, where we write into and read back register values from the buffer, but that is performed only once for the first context - and baring synchronisation issues should not be a problem. Safer would be to move that also to the screen.) v2: Give the workaround bo its own init function and don't piggy back intel_bufmgr_init() since it is not that related. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Martin Peres --- src/mesa/drivers/dri/i965/brw_context.c | 7 +-- src/mesa/drivers/dri/i965/brw_context.h | 4 ++-- src/mesa/drivers/dri/i965/brw_pipe_control.c | 13 - src/mesa/drivers/dri/i965/intel_screen.c | 15 +++ src/mesa/drivers/dri/i965/intel_screen.h | 1 + 5 files changed, 23 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 8150b94..3dac01b 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -819,12 +819,7 @@ brwCreateContext(gl_api api, } } - if (brw_init_pipe_control(brw, devinfo)) { - *dri_ctx_error = __DRI_CTX_ERROR_NO_MEMORY; - intelDestroyContext(driContextPriv); - return false; - } - + brw_init_pipe_control(brw, devinfo); brw_init_state(brw); intelInitExtensions(ctx); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 7f73b10..3397336 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -2002,8 +2002,8 @@ gen9_use_linear_1d_layout(const struct brw_context *brw, const struct intel_mipmap_tree *mt); /* brw_pipe_control.c */ -int brw_init_pipe_control(struct brw_context *brw, - const struct brw_device_info *info); +void brw_init_pipe_control(struct brw_context *brw, + const struct brw_device_info *info); void brw_fini_pipe_control(struct brw_context *brw); void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags); diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 7ee3cb6..872bfe8 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -330,26 +330,21 @@ brw_emit_mi_flush(struct brw_context *brw) brw_render_cache_set_clear(brw); } -int +void brw_init_pipe_control(struct brw_context *brw, const struct brw_device_info *devinfo) { if (devinfo->gen < 6) - return 0; + return; /* We can't just use brw_state_batch to get a chunk of space for * the gen6 workaround because it involves actually writing to * the buffer, and the kernel doesn't let us write to the batch. */ - brw->workaround_bo = drm_intel_bo_alloc(brw->bufmgr, - "pipe_control workaround", - 4096, 4096); - if (brw->workaround_bo == NULL) - return -ENOMEM; + brw->workaround_bo = brw->intelScreen->workaround_bo; + drm_intel_bo_reference(brw->workaround_bo); brw->pipe_controls_since_last_cs_stall = 0; - - return 0; } void diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index c0f5c92..71f2957 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -961,6 +961,7 @@ intelDestroyScreen(__DRIscreen * sPriv) { struct intel_screen *intelScreen = sPriv->driverPrivate; + drm_intel_bo_unreference(intelScreen->workaround_bo); dri_bufmgr_destroy(intelScreen->bufmgr); driDestroyOptionInfo(&intelScreen->optionCache); @@ -1100,6 +1101,17 @@ intel_init_bufmgr(struct intel_screen *intelScreen) } static bool +intel_init_workaround_bo(struct intel_screen *intelScreen) +{ + /* A small scratch bo shared by all contexts, primarily used +* for doing PIPECONTROL serialisation writes that are discarded. +*/ + intelScreen->workaround_bo = + drm_intel_bo_alloc(intelScreen->bufmgr, "pipe_control w/a", 4096, 4096); + return intelScreen->workaround_bo != NULL; +} + +static bool intel_detect_swizzling(struct intel_screen *screen) { drm_intel_bo *buffer; @@ -1388,6 +1400,9 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) if (!intel_init_bufmgr(intelScreen)) return false; + if (!intel_init_workaround_bo(intelScreen)) + return false; + intelScreen->deviceID = drm_intel_bufmgr_gem_get_devid(intelScreen->bufmgr); intelScreen->devinfo = brw_get_device_info(intelScreen->deviceID, brw_get_revision(psp->fd)); diff --git a/src/mesa/drivers/dri/i965/intel_scre
[Mesa-dev] [PATCH 1/5] i965: Only flush the batchbuffer if we need to zero the SO offsets
If we don't have pipelined register access (e.g. Haswell before kernel v4.2), then we can only implement EXT_transform_feedback by reseting the SO offsets *between* batches. However, if we do have pipelined access to the SO registers on gen7, we can simply emit an inline reset of the SO registers without a full batch flush. Signed-off-by: Chris Wilson Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/gen7_sol_state.c | 12 +++- src/mesa/drivers/dri/i965/intel_extensions.c | 3 ++- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 65f34c3..7f73b10 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1125,6 +1125,7 @@ struct brw_context bool must_use_separate_stencil; bool has_llc; bool has_swizzling; + bool has_pipelined_so; bool has_surface_tile_offset; bool has_compr4; bool has_negative_rhw_bug; diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 41573a8..140752c 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -446,7 +446,7 @@ gen7_begin_transform_feedback(struct gl_context *ctx, GLenum mode, /* Reset the SO buffer offsets to 0. */ if (brw->gen >= 8) { brw_obj->zero_offsets = true; - } else { + } else if (!brw->has_pipelined_so) { intel_batchbuffer_flush(brw); brw->batch.needs_sol_reset = true; } @@ -462,6 +462,16 @@ gen7_begin_transform_feedback(struct gl_context *ctx, GLenum mode, brw_obj->prims_generated[i] = 0; } + if (brw->gen == 7 && brw->has_pipelined_so) { + for (int i = 0; i < 4; i++) { + BEGIN_BATCH(3); + OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2)); + OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); + OUT_BATCH(0); + ADVANCE_BATCH(); + } + } + /* Store the starting value of the SO_NUM_PRIMS_WRITTEN counters. */ gen7_save_primitives_written_counters(brw, brw_obj); diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 6b3bd12..489a00f 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -327,7 +327,8 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.ARB_texture_compression_bptc = true; ctx->Extensions.ARB_texture_view = true; - if (can_do_pipelined_register_writes(brw)) { + brw->has_pipelined_so = can_do_pipelined_register_writes(brw); + if (brw->has_pipelined_so) { ctx->Extensions.ARB_draw_indirect = true; ctx->Extensions.ARB_transform_feedback2 = true; ctx->Extensions.ARB_transform_feedback3 = true; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] i965: Combine the multiple pipelined register detection into one round-trip
Combining the multiple access checks into a few batches and a single serialising read can reduce detection times from around 100us to 70us on a fast Haswell system. Signed-off-by: Chris Wilson Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/intel_screen.c | 177 +++ 1 file changed, 109 insertions(+), 68 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index cb49e9a..595d2dc 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1156,6 +1156,12 @@ intel_detect_timestamp(struct intel_screen *screen) return loop > 0; } +struct detect_pipelined_register { + uint32_t reg; + uint32_t expected_value; + bool *result; +}; + /** * Test if we can use MI_LOAD_REGISTER_MEM from an untrusted batchbuffer. * @@ -1163,107 +1169,143 @@ intel_detect_timestamp(struct intel_screen *screen) * while others don't. Instead of trying to enumerate every case, just * try and write a register and see if works. */ -static bool -intel_detect_pipelined_register(struct intel_screen *screen, - int reg, uint32_t expected_value) +static void +__intel_detect_pipelined_registers(struct intel_screen *screen, + struct detect_pipelined_register *r, + int count) { const int offset = 100; - - drm_intel_bo *bo; - uint32_t buf[100]; - uint32_t *batch = buf; + int i; uint32_t *data; + + if (count == 0) + return; + + if (drm_intel_bo_map(screen->workaround_bo, true)) + return; + /* Set a value in a BO to a known quantity. The workaround BO already * exists and doesn't contain anything important, so we may as well use it. */ - if (drm_intel_bo_map(screen->workaround_bo, true)) - return false; - data = screen->workaround_bo->virtual; - data[offset] = 0x; + for (i = 0; i < count; i++) + data[offset+i] = 0x; drm_intel_bo_unmap(screen->workaround_bo); - bo = drm_intel_bo_alloc(screen->bufmgr, "batchbuffer", 4096, 0); - if (bo == NULL) - return false; + /* Emit each access in a separate batch buffer so that if the kernel +* rejects an individual access attempt, we don't incorrectly assume +* all the register accesses are invalid. +*/ + for (i = 0; i < count; i++) { + drm_intel_bo *bo; + uint32_t buf[100]; + uint32_t *batch = buf; + + bo = drm_intel_bo_alloc(screen->bufmgr, "batchbuffer", 4096, 0); + if (bo == NULL) + continue; + + /* Write the register. */ + *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2); + *batch++ = r[i].reg; + *batch++ = r[i].expected_value; + + /* Force a command barrier between the write then read */ + *batch++ = _3DSTATE_PIPE_CONTROL | (5 - 2); + *batch++ = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_CS_STALL; + *batch++ = 0; + *batch++ = 0; + *batch++ = 0; - /* Write the register. */ - *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2); - *batch++ = reg; - *batch++ = expected_value; - - /* Force a command barrier between the write then read */ - *batch++ = _3DSTATE_PIPE_CONTROL | (5 - 2); - *batch++ = PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_CS_STALL; - *batch++ = 0; - *batch++ = 0; - *batch++ = 0; - - /* Save the register's value back to the buffer. */ - *batch++ = MI_STORE_REGISTER_MEM | (3 - 2); - *batch++ = reg; - drm_intel_bo_emit_reloc(bo, (char *)batch -(char *)buf, - screen->workaround_bo, offset*sizeof(uint32_t), - I915_GEM_DOMAIN_INSTRUCTION, - I915_GEM_DOMAIN_INSTRUCTION); - *batch++ = screen->workaround_bo->offset + offset*sizeof(uint32_t); - - /* And afterwards clear the register */ - *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2); - *batch++ = reg; - *batch++ = 0; - - *batch++ = MI_BATCH_BUFFER_END; - if ((batch - buf) & 1) + /* Save the register's value back to the buffer. */ + *batch++ = MI_STORE_REGISTER_MEM | (3 - 2); + *batch++ = r[i].reg; + drm_intel_bo_emit_reloc(bo, (char *)batch -(char *)buf, + screen->workaround_bo, + (offset+i)*sizeof(uint32_t), + I915_GEM_DOMAIN_INSTRUCTION, + I915_GEM_DOMAIN_INSTRUCTION); + *batch++ = screen->workaround_bo->offset + (offset+i)*sizeof(uint32_t); + + /* And afterwards clear the register */ + *batch++ = MI_LOAD_REGISTER_IMM | (3 - 2); + *batch++ = r[i].reg; *batch++ = 0; - if (drm_intel_bo_subdata(bo, 0, (char *)batch - (char *)buf, buf) == 0) - drm_intel_bo_mrb_exec(bo, (char *)batch - (char *)buf, -NULL, 0, 0, -I915_EXEC_RENDER); + *batch++ = MI_BATCH_BUFFER_END; + if ((batch - buf) & 1) +
[Mesa-dev] [PATCH 4/5] i965: Move the OACONTROL pipelined access check from context to screen
Similarly to the pipelined SO_OFFSET check, this moves the global HW compatability check to the screen next to the other global checks. Signed-off-by: Chris Wilson --- src/mesa/drivers/dri/i965/brw_context.c | 1 + src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/intel_extensions.c | 67 +--- src/mesa/drivers/dri/i965/intel_screen.c | 14 ++ src/mesa/drivers/dri/i965/intel_screen.h | 1 + 5 files changed, 18 insertions(+), 66 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 705f042..de939f1 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -729,6 +729,7 @@ brwCreateContext(gl_api api, brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil; brw->has_swizzling = screen->hw_has_swizzling; brw->has_pipelined_so = screen->hw_has_pipelined_so; + brw->has_pipelined_oacontrol = screen->hw_has_pipelined_oacontrol; brw->vs.base.stage = MESA_SHADER_VERTEX; brw->gs.base.stage = MESA_SHADER_GEOMETRY; diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3397336..61d55ff 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1126,6 +1126,7 @@ struct brw_context bool has_llc; bool has_swizzling; bool has_pipelined_so; + bool has_pipelined_oacontrol; bool has_surface_tile_offset; bool has_compr4; bool has_negative_rhw_bug; diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 1414015..3f836c7 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -32,71 +32,6 @@ #include "intel_reg.h" #include "utils.h" -static bool -can_write_oacontrol(struct brw_context *brw) -{ - if (brw->gen < 6 || brw->gen >= 8) - return false; - - static int result = -1; - if (result != -1) - return result; - - /* Set "Select Context ID" to a particular address (which is likely not a -* context), but leave all counting disabled. This should be harmless. -*/ - const int expected_value = 0x31337000; - const int offset = 110; - - uint32_t *data; - /* Set a value in a BO to a known quantity. The workaround BO already -* exists and doesn't contain anything important, so we may as well use it. -*/ - drm_intel_bo_map(brw->workaround_bo, true); - data = brw->workaround_bo->virtual; - data[offset] = 0x; - drm_intel_bo_unmap(brw->workaround_bo); - - /* Write OACONTROL. */ - BEGIN_BATCH(3); - OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2)); - OUT_BATCH(OACONTROL); - OUT_BATCH(expected_value); - ADVANCE_BATCH(); - - brw_emit_mi_flush(brw); - - /* Save the register's value back to the buffer. */ - BEGIN_BATCH(3); - OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2)); - OUT_BATCH(OACONTROL); - OUT_RELOC(brw->workaround_bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - offset * sizeof(uint32_t)); - ADVANCE_BATCH(); - - brw_emit_mi_flush(brw); - - /* Set OACONTROL back to zero (everything off). */ - BEGIN_BATCH(3); - OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2)); - OUT_BATCH(OACONTROL); - OUT_BATCH(0); - ADVANCE_BATCH(); - - intel_batchbuffer_flush(brw); - - /* Check whether the value got written. */ - drm_intel_bo_map(brw->workaround_bo, false); - data = brw->workaround_bo->virtual; - bool success = data[offset] == expected_value; - drm_intel_bo_unmap(brw->workaround_bo); - - result = success; - - return success; -} - /** * Initializes potential list of extensions if ctx == NULL, or actually enables * extensions for a context. @@ -207,7 +142,7 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.EXT_shader_integer_mix = ctx->Const.GLSLVersion >= 130; ctx->Extensions.EXT_timer_query = true; - if (brw->gen == 5 || can_write_oacontrol(brw)) { + if (brw->gen == 5 || brw->has_pipelined_oacontrol) { ctx->Extensions.AMD_performance_monitor = true; ctx->Extensions.INTEL_performance_query = true; } diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 9067399..cb49e9a 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1254,6 +1254,18 @@ intel_detect_pipelined_so(struct intel_screen *screen) 0x1337d0d0); } +static bool +intel_detect_pipelined_oacontrol(struct intel_screen *screen) +{ + if (screen->devinfo->gen < 6 || screen->devinfo->gen >= 8) + return false; + + /* Set "Select Context ID" to a particular address (which is likely not a +* context), but leave all counting disabled. This should be harmless. +*/ + return intel_detect_pipelined
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote: > On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote: > > * Gen4-5 structure changes. Did you mean brw_structs.h? diff --git a/src/mesa/drivers/dri/i965/brw_structs.h b/src/mesa/drivers/dri/i965/brw_structs.h index 55338c0..e167254 100644 --- a/src/mesa/drivers/dri/i965/brw_structs.h +++ b/src/mesa/drivers/dri/i965/brw_structs.h @@ -391,13 +391,16 @@ struct brw_sf_unit_state unsigned pad3:1; } thread4; - struct + union { + struct { unsigned front_winding:1; unsigned viewport_transform:1; unsigned pad0:3; unsigned sf_viewport_state_offset:27; /* Offset from GENERAL_STATE_BASE */ } sf5; + uint32_t dw5; + }; struct { @@ -525,15 +528,17 @@ struct brw_wm_unit_state struct thread2 thread2; struct thread3 thread3; + union { struct { unsigned stats_enable:1; unsigned depth_buffer_clear:1; unsigned sampler_count:3; unsigned sampler_state_pointer:27; } wm4; + uint32_t dw4; + }; - struct - { + struct { unsigned enable_8_pix:1; unsigned enable_16_pix:1; unsigned enable_32_pix:1; diff --git a/src/mesa/drivers/dri/i965/brw_urb.c b/src/mesa/drivers/dri/i965/brw_urb.c Or something else? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/18] i965: Reuse our VBO for streaming fast-clear vertices
On 08/07/15 12:01, Chris Wilson wrote: On Tue, Jul 07, 2015 at 01:42:47PM +0300, Martin Peres wrote: On 06/07/15 19:43, Kenneth Graunke wrote: On Monday, July 06, 2015 11:33:10 AM Chris Wilson wrote: Rather than allocating a fresh page every time we clear a buffer, keep that page around between invocations by tracking the last used offset and only allocating a fresh page when we wrap. Signed-off-by: Chris Wilson --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) This looks okay to me. Do you have any performance data to justify the extra complexity? I actually get a negative performance improvement on a customer benchmark (-1.3%). Could it be because we are waiting on the VBO at some point? What benchmark did you try to get a perf improvement? For this something as simple and cpu bound as windowed glxgears: 14550 -> 15400 (+-30fps both sides). -Chris Thanks! I re-run the tests on the same kernel (4.0.7-2-ARCH) and another one (drm-intel-nighly-06-07-2015) and I can't reproduce the hit anymore... The hit was there, visible over the 30 rounds of the test, so this is weird. Let's just ignore this. Martin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 --- Comment #33 from Jose P. --- (In reply to Jose P. from comment #21) > This seems to have been fixed, I can't see the corruption anymore. I'm using > Chromium 43.0.2357.81 on Ubuntu 14.04 (64-bit) and Mesa 10.7.0-devel > (git-20dca37 2015-06-23 trusty-oibaf-ppa). This is wrong, the bug surely hasn't been fixed. As Furkan says in #18, GPU acceleration was disabled in the Chromium package (ver 43.0.2357.81) in Ubuntu: https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1466670/ , the reason being "because GPU crashes (on Intel, Nvidia, and AMD) was the biggest cause of crash reports in all of Ubuntu"... -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.
On Wed, Jul 8, 2015 at 3:00 AM, Emil Velikov wrote: > On 08/07/15 07:33, Matt Turner wrote: >> Cc: "10.6" > Thanks for that I've completely forgot about this. > > Seems like we never got to using mesa_sha. Can we just remove it for now > and revive it as a user comes along ? I don't have a problem with that. I shouldn't have let in what's been committed. I guess it mostly depends on how soon we expect to finish the shader-cache. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: bump libdrm requirement to 2.4.61 and drop in-tree workaround
Signed-off-by: Emil Velikov --- configure.ac | 2 +- src/mesa/drivers/dri/i965/intel_screen.c | 5 - 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index d819bef..eb7180b 100644 --- a/configure.ac +++ b/configure.ac @@ -68,7 +68,7 @@ AC_SUBST([OSMESA_VERSION]) dnl Versions for external dependencies LIBDRM_REQUIRED=2.4.38 LIBDRM_RADEON_REQUIRED=2.4.56 -LIBDRM_INTEL_REQUIRED=2.4.60 +LIBDRM_INTEL_REQUIRED=2.4.61 LIBDRM_NVVIEUX_REQUIRED=2.4.33 LIBDRM_NOUVEAU_REQUIRED=2.4.62 LIBDRM_FREEDRENO_REQUIRED=2.4.57 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index c0f5c92..fd343ee 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1330,11 +1330,6 @@ set_max_gl_versions(struct intel_screen *screen) } } -/* drop when libdrm 2.4.61 is released */ -#ifndef I915_PARAM_REVISION -#define I915_PARAM_REVISION 32 -#endif - static int brw_get_revision(int fd) { -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] radeon: remove dri_mirror state
Most of the data stored(duplicated) was unused, and for the one that is follow the approach set by other drivers. This eliminates the use of legacy (dri1) types. XXX: The radeon code is the only user of __DRIscreen::drm_version (the only __DRIversion outside of dri1 land). Should we move it into radeon and/or bump the min. required drm module version ? Cc: Ian Romanick Cc: Marek Olšák Cc: Michel Dänzer Signed-off-by: Emil Velikov --- src/mesa/drivers/dri/radeon/radeon_common.c | 18 +- src/mesa/drivers/dri/radeon/radeon_common_context.c | 7 ++- src/mesa/drivers/dri/radeon/radeon_common_context.h | 19 +++ src/mesa/drivers/dri/radeon/radeon_texture.c| 2 +- 4 files changed, 15 insertions(+), 31 deletions(-) diff --git a/src/mesa/drivers/dri/radeon/radeon_common.c b/src/mesa/drivers/dri/radeon/radeon_common.c index 2a8bd6c9..d834d9b 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common.c +++ b/src/mesa/drivers/dri/radeon/radeon_common.c @@ -164,7 +164,7 @@ uint32_t radeonGetAge(radeonContextPtr radeon) gp.param = RADEON_PARAM_LAST_CLEAR; gp.value = (int *)&age; - ret = drmCommandWriteRead(radeon->dri.fd, DRM_RADEON_GETPARAM, + ret = drmCommandWriteRead(radeon->radeonScreen->driScreen->fd, DRM_RADEON_GETPARAM, &gp, sizeof(gp)); if (ret) { fprintf(stderr, "%s: drmRadeonGetParam: %d\n", __func__, @@ -358,8 +358,8 @@ void radeonDrawBuffer( struct gl_context *ctx, GLenum mode ) * that the front-buffer has actually been allocated. */ if (!was_front_buffer_rendering && radeon->is_front_buffer_rendering) { - radeon_update_renderbuffers(radeon->dri.context, - radeon->dri.context->driDrawablePriv, GL_FALSE); + radeon_update_renderbuffers(radeon->driContext, + radeon->driContext->driDrawablePriv, GL_FALSE); } } @@ -375,8 +375,8 @@ void radeonReadBuffer( struct gl_context *ctx, GLenum mode ) || (mode == GL_FRONT); if (!was_front_buffer_reading && rmesa->is_front_buffer_reading) { - radeon_update_renderbuffers(rmesa->dri.context, - rmesa->dri.context->driReadablePriv, GL_FALSE); + radeon_update_renderbuffers(rmesa->driContext, + rmesa->driContext->driReadablePriv, GL_FALSE); } } /* nothing, until we implement h/w glRead/CopyPixels or CopyTexImage */ @@ -399,7 +399,7 @@ void radeon_window_moved(radeonContextPtr radeon) void radeon_viewport(struct gl_context *ctx) { radeonContextPtr radeon = RADEON_CONTEXT(ctx); - __DRIcontext *driContext = radeon->dri.context; + __DRIcontext *driContext = radeon->driContext; void (*old_viewport)(struct gl_context *ctx); if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) { @@ -693,6 +693,7 @@ void rcommonInitCmdBuf(radeonContextPtr rmesa) { GLuint size; struct drm_radeon_gem_info mminfo = { 0 }; + int fd = rmesa->radeonScreen->driScreen->fd; /* Initialize command buffer */ size = 256 * driQueryOptioni(&rmesa->optionCache, @@ -711,8 +712,7 @@ void rcommonInitCmdBuf(radeonContextPtr rmesa) "Allocating %d bytes command buffer (max state is %d bytes)\n", size * 4, rmesa->hw.max_state_size * 4); - rmesa->cmdbuf.csm = - radeon_cs_manager_gem_ctor(rmesa->radeonScreen->driScreen->fd); + rmesa->cmdbuf.csm = radeon_cs_manager_gem_ctor(fd); if (rmesa->cmdbuf.csm == NULL) { /* FIXME: fatal error */ return; @@ -725,7 +725,7 @@ void rcommonInitCmdBuf(radeonContextPtr rmesa) (void (*)(void *))rmesa->glCtx.Driver.Flush, &rmesa->glCtx); - if (!drmCommandWriteRead(rmesa->dri.fd, DRM_RADEON_GEM_INFO, + if (!drmCommandWriteRead(fd, DRM_RADEON_GEM_INFO, &mminfo, sizeof(mminfo))) { radeon_cs_set_limit(rmesa->cmdbuf.cs, RADEON_GEM_DOMAIN_VRAM, mminfo.vram_visible); diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.c b/src/mesa/drivers/dri/radeon/radeon_common_context.c index 3d0ceda..4660d98 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common_context.c +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.c @@ -162,10 +162,7 @@ GLboolean radeonInitContext(radeonContextPtr radeon, _mesa_meta_init(ctx); /* DRI fields */ - radeon->dri.context = driContextPriv; - radeon->dri.screen = sPriv; - radeon->dri.fd = sPriv->fd; - radeon->dri.drmMinor = sPriv->drm_version.minor; + radeon->driContext = driC
[Mesa-dev] [PATCH 01/11] configure.ac: do not set HAVE_DRI(23) when libdrm is missing
These conditionals are used to guard both dri modules and loader(s). Currently if we try to build the gallium swrast dri module (without glx) on a system that's missing libdrm the build will fail. Cc: 10.6 Signed-off-by: Emil Velikov --- configure.ac | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index eb7180b..a97b739 100644 --- a/configure.ac +++ b/configure.ac @@ -923,8 +923,8 @@ esac AM_CONDITIONAL(HAVE_DRICOMMON, test "x$enable_dri" = xyes ) AM_CONDITIONAL(HAVE_DRISW, test "x$enable_dri" = xyes ) -AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm ) -AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = xdrm ) +AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes ) +AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = xdrm -a "x$have_libdrm" = xyes ) AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = xapple ) AC_ARG_ENABLE([shared-glapi], -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] android: don't build the kms-dri winsys
GBM (the only user of kms-dri) is currently not available under Android. Considering we have no way of testing/using this let's not bother building it for now. Cc: Chih-Wei Huang Cc: Eric Anholt Signed-off-by: Emil Velikov --- src/gallium/Android.mk| 2 +- src/gallium/state_trackers/dri/Android.mk | 1 - src/gallium/targets/dri/Android.mk| 2 +- src/gallium/winsys/sw/kms-dri/Android.mk | 37 --- 4 files changed, 2 insertions(+), 40 deletions(-) delete mode 100644 src/gallium/winsys/sw/kms-dri/Android.mk diff --git a/src/gallium/Android.mk b/src/gallium/Android.mk index b946681..0703148 100644 --- a/src/gallium/Android.mk +++ b/src/gallium/Android.mk @@ -34,7 +34,7 @@ SUBDIRS := auxiliary # swrast ifneq ($(filter swrast,$(MESA_GPU_DRIVERS)),) -SUBDIRS += winsys/sw/dri winsys/sw/kms-dri drivers/softpipe +SUBDIRS += winsys/sw/dri drivers/softpipe endif # freedreno diff --git a/src/gallium/state_trackers/dri/Android.mk b/src/gallium/state_trackers/dri/Android.mk index 4ee4338..43f0de9 100644 --- a/src/gallium/state_trackers/dri/Android.mk +++ b/src/gallium/state_trackers/dri/Android.mk @@ -44,7 +44,6 @@ LOCAL_STATIC_LIBRARIES := \ libmesa_dri_common \ ifneq ($(filter swrast,$(MESA_GPU_DRIVERS)),) -LOCAL_CFLAGS += -DGALLIUM_SOFTPIPE LOCAL_SRC_FILES += $(drisw_SOURCES) endif diff --git a/src/gallium/targets/dri/Android.mk b/src/gallium/targets/dri/Android.mk index bccc91a..7168e1d 100644 --- a/src/gallium/targets/dri/Android.mk +++ b/src/gallium/targets/dri/Android.mk @@ -85,7 +85,7 @@ gallium_DRIVERS += libmesa_winsys_radeon libmesa_pipe_radeon LOCAL_SHARED_LIBRARIES += libdrm_radeon endif ifneq ($(filter swrast,$(MESA_GPU_DRIVERS)),) -gallium_DRIVERS += libmesa_pipe_softpipe libmesa_winsys_sw_dri libmesa_winsys_sw_kms_dri +gallium_DRIVERS += libmesa_pipe_softpipe libmesa_winsys_sw_dri LOCAL_CFLAGS += -DGALLIUM_SOFTPIPE endif ifneq ($(filter vc4,$(MESA_GPU_DRIVERS)),) diff --git a/src/gallium/winsys/sw/kms-dri/Android.mk b/src/gallium/winsys/sw/kms-dri/Android.mk deleted file mode 100644 index b065242..000 --- a/src/gallium/winsys/sw/kms-dri/Android.mk +++ /dev/null @@ -1,37 +0,0 @@ -# Mesa 3-D graphics library -# -# Copyright (C) 2015 Chih-Wei Huang -# Copyright (C) 2015 Android-x86 Open Source Project -# -# Permission is hereby granted, free of charge, to any person obtaining a -# copy of this software and associated documentation files (the "Software"), -# to deal in the Software without restriction, including without limitation -# the rights to use, copy, modify, merge, publish, distribute, sublicense, -# and/or sell copies of the Software, and to permit persons to whom the -# Software is furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included -# in all copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL -# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER -# DEALINGS IN THE SOFTWARE. - -LOCAL_PATH := $(call my-dir) - -include $(LOCAL_PATH)/Makefile.sources - -include $(CLEAR_VARS) - -LOCAL_SRC_FILES := $(C_SOURCES) - -LOCAL_MODULE := libmesa_winsys_sw_kms_dri - -LOCAL_SHARED_LIBRARIES := libdrm - -include $(GALLIUM_COMMON_MK) -include $(BUILD_STATIC_LIBRARY) -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] loader: use HAVE_LIBDRM instead of ! __NOT_HAVE_DRM_H
Double negatives in English language are normally avoided, plus the former seems cleaner and more consistent. Signed-off-by: Emil Velikov --- src/loader/Android.mk | 6 ++ src/loader/Makefile.am | 5 + src/loader/SConscript | 2 -- src/loader/loader.c| 8 src/loader/pci_id_driver_map.c | 2 +- 5 files changed, 8 insertions(+), 15 deletions(-) diff --git a/src/loader/Android.mk b/src/loader/Android.mk index 92d9fd2..8690565 100644 --- a/src/loader/Android.mk +++ b/src/loader/Android.mk @@ -33,10 +33,8 @@ include $(CLEAR_VARS) LOCAL_SRC_FILES := \ $(LOADER_C_FILES) -# swrast only -ifeq ($(MESA_GPU_DRIVERS),swrast) -LOCAL_CFLAGS += -D__NOT_HAVE_DRM_H -else +ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),) +LOCAL_CFLAGS += -DHAVE_LIBDRM LOCAL_SHARED_LIBRARIES := libdrm endif diff --git a/src/loader/Makefile.am b/src/loader/Makefile.am index aef1bd6..5190f7f 100644 --- a/src/loader/Makefile.am +++ b/src/loader/Makefile.am @@ -48,10 +48,7 @@ libloader_la_CPPFLAGS += \ endif -if !HAVE_LIBDRM -libloader_la_CPPFLAGS += \ - -D__NOT_HAVE_DRM_H -else +if HAVE_LIBDRM libloader_la_CPPFLAGS += \ $(LIBDRM_CFLAGS) diff --git a/src/loader/SConscript b/src/loader/SConscript index 16d1053..d98f11e 100644 --- a/src/loader/SConscript +++ b/src/loader/SConscript @@ -8,8 +8,6 @@ env.Prepend(CPPPATH = [ '#include' ]) -env.Append(CPPDEFINES = ['__NOT_HAVE_DRM_H']) - if env['udev']: env.PkgUseModules('UDEV') env.Append(CPPDEFINES = ['HAVE_LIBUDEV']) diff --git a/src/loader/loader.c b/src/loader/loader.c index 8780587..4ed0a1f 100644 --- a/src/loader/loader.c +++ b/src/loader/loader.c @@ -85,7 +85,7 @@ #endif #include "loader.h" -#ifndef __NOT_HAVE_DRM_H +#ifdef HAVE_LIBDRM #include #endif @@ -501,7 +501,7 @@ sysfs_get_pci_id_for_fd(int fd, int *vendor_id, int *chip_id) } #endif -#if !defined(__NOT_HAVE_DRM_H) +#if defined(HAVE_LIBDRM) /* for i915 */ #include /* for radeon */ @@ -584,7 +584,7 @@ loader_get_pci_id_for_fd(int fd, int *vendor_id, int *chip_id) if (sysfs_get_pci_id_for_fd(fd, vendor_id, chip_id)) return 1; #endif -#if !defined(__NOT_HAVE_DRM_H) +#if HAVE_LIBDRM if (drm_get_pci_id_for_fd(fd, vendor_id, chip_id)) return 1; #endif @@ -695,7 +695,7 @@ loader_get_driver_for_fd(int fd, unsigned driver_types) if (!loader_get_pci_id_for_fd(fd, &vendor_id, &chip_id)) { -#ifndef __NOT_HAVE_DRM_H +#if HAVE_LIBDRM /* fallback to drmGetVersion(): */ drmVersionPtr version = drmGetVersion(fd); diff --git a/src/loader/pci_id_driver_map.c b/src/loader/pci_id_driver_map.c index cb6f705..3c4657f 100644 --- a/src/loader/pci_id_driver_map.c +++ b/src/loader/pci_id_driver_map.c @@ -23,7 +23,7 @@ int is_nouveau_vieux(int fd); -#ifndef __NOT_HAVE_DRM_H +#ifdef HAVE_LIBDRM #include #include -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/11] Bye bye __NOT_HAVE_DRM_H
Hi all, As double-negatives are not too appealing in the English language I've decided to nuke the common ifndef __NOT_HAVE_DRM_H. The first three patches are trivial bugfix + cleanups which I've noticed while browsing through, 04-09 remove all occurrences of the lastly macro (HAVE_LIBDRM is used instead). With the last two dropping kms_swrast from the scons/android build. Note: I'm not 100% sure if the scons build sets the HAVE_LIBDRM macro, despite that it builds just fine. If anyone can confirm that will be appreciated. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/11] st/dri: unwrap/remove __NOT_HAVE_DRM_H magic
With the dri_interface.h clean of the macro, we can remove the final only st/dri specific use of the very same. Seemingly it was incorrectly used, as the build-time presence of dri2 is not libdrm specific. At run-time, the code is already limited to dri2 use-cases plus returning true, when the extension is not present (or too old) will likely lead to a crash as one tries to use it shortly after the dri_with_format() call. As a side effect this gives us a nice cleanup the builds. Signed-off-by: Emil Velikov --- src/gallium/state_trackers/dri/Android.mk | 5 + src/gallium/state_trackers/dri/Makefile.am | 4 src/gallium/state_trackers/dri/SConscript | 3 --- src/gallium/state_trackers/dri/dri_screen.h | 12 4 files changed, 1 insertion(+), 23 deletions(-) diff --git a/src/gallium/state_trackers/dri/Android.mk b/src/gallium/state_trackers/dri/Android.mk index 188e4a1..4ee4338 100644 --- a/src/gallium/state_trackers/dri/Android.mk +++ b/src/gallium/state_trackers/dri/Android.mk @@ -48,10 +48,7 @@ LOCAL_CFLAGS += -DGALLIUM_SOFTPIPE LOCAL_SRC_FILES += $(drisw_SOURCES) endif -# swrast only? -ifeq ($(MESA_GPU_DRIVERS),swrast) -LOCAL_CFLAGS += -D__NOT_HAVE_DRM_H -else +ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),) LOCAL_SRC_FILES += $(dri2_SOURCES) LOCAL_SHARED_LIBRARIES := libdrm endif diff --git a/src/gallium/state_trackers/dri/Makefile.am b/src/gallium/state_trackers/dri/Makefile.am index d2c7a82..9f4deba 100644 --- a/src/gallium/state_trackers/dri/Makefile.am +++ b/src/gallium/state_trackers/dri/Makefile.am @@ -50,10 +50,6 @@ noinst_LTLIBRARIES = libdri.la libdri_la_SOURCES = $(common_SOURCES) if HAVE_DRISW -if !HAVE_DRI2 -AM_CPPFLAGS += \ - -D__NOT_HAVE_DRM_H -endif libdri_la_SOURCES += $(drisw_SOURCES) endif diff --git a/src/gallium/state_trackers/dri/SConscript b/src/gallium/state_trackers/dri/SConscript index 89b5e61..7a7060c 100644 --- a/src/gallium/state_trackers/dri/SConscript +++ b/src/gallium/state_trackers/dri/SConscript @@ -5,10 +5,7 @@ Import('*') env = env.Clone() -# XXX: If HAVE_DRI2 env.PkgUseModules(['DRM']) -# else -#env.Append(CPPDEFINES = [('__NOT_HAVE_DRM_H', '1')]) env.Append(CPPPATH = [ '#/src', diff --git a/src/gallium/state_trackers/dri/dri_screen.h b/src/gallium/state_trackers/dri/dri_screen.h index 173f403..1263fa2 100644 --- a/src/gallium/state_trackers/dri/dri_screen.h +++ b/src/gallium/state_trackers/dri/dri_screen.h @@ -122,8 +122,6 @@ struct __DRIimageRec { }; -#ifndef __NOT_HAVE_DRM_H - static INLINE boolean dri_with_format(__DRIscreen * sPriv) { @@ -134,16 +132,6 @@ dri_with_format(__DRIscreen * sPriv) && (loader->getBuffersWithFormat != NULL); } -#else - -static INLINE boolean -dri_with_format(__DRIscreen * sPriv) -{ - return TRUE; -} - -#endif - void dri_fill_st_visual(struct st_visual *stvis, struct dri_screen *screen, const struct gl_config *mode); -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] swrast: remove unneeded __NOT_HAVE_DRM_H define
No longer applicable since the cleanup of dri_interface.h. Signed-off-by: Emil Velikov --- src/mesa/drivers/dri/swrast/Makefile.am | 1 - 1 file changed, 1 deletion(-) diff --git a/src/mesa/drivers/dri/swrast/Makefile.am b/src/mesa/drivers/dri/swrast/Makefile.am index bfc3c10..b1ba34d 100644 --- a/src/mesa/drivers/dri/swrast/Makefile.am +++ b/src/mesa/drivers/dri/swrast/Makefile.am @@ -24,7 +24,6 @@ include Makefile.sources AM_CFLAGS = \ - -D__NOT_HAVE_DRM_H \ -I$(top_srcdir)/include \ -I$(top_srcdir)/src/ \ -I$(top_srcdir)/src/mapi \ -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/11] dri_interface: drop __NOT_HAVE_DRM_H magic
Signed-off-by: Emil Velikov --- include/GL/internal/dri_interface.h | 11 --- 1 file changed, 11 deletions(-) diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h index c827bb6..c0545b1 100644 --- a/include/GL/internal/dri_interface.h +++ b/include/GL/internal/dri_interface.h @@ -40,20 +40,9 @@ #ifndef DRI_INTERFACE_H #define DRI_INTERFACE_H -/* For archs with no drm.h */ -#if defined(__APPLE__) || defined(__CYGWIN__) || defined(__GNU__) -#ifndef __NOT_HAVE_DRM_H -#define __NOT_HAVE_DRM_H -#endif -#endif - -#ifndef __NOT_HAVE_DRM_H -#include -#else typedef unsigned int drm_context_t; typedef unsigned int drm_drawable_t; typedef struct drm_clip_rect drm_clip_rect_t; -#endif /** * \name DRI interface structures -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] android: dri: correctly set HAVE_LIBDRM
Set the macro if we're not building swrast alone. Cc: Chih-Wei Huang Cc: Eric Anholt Signed-off-by: Emil Velikov --- src/gallium/targets/dri/Android.mk | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/gallium/targets/dri/Android.mk b/src/gallium/targets/dri/Android.mk index 5ba129b..bccc91a 100644 --- a/src/gallium/targets/dri/Android.mk +++ b/src/gallium/targets/dri/Android.mk @@ -35,17 +35,15 @@ endif LOCAL_SRC_FILES := target.c -LOCAL_CFLAGS := -DDRI_TARGET -DHAVE_LIBDRM +LOCAL_CFLAGS := -DDRI_TARGET LOCAL_SHARED_LIBRARIES := \ libdl \ libglapi \ libexpat \ -# swrast only? -ifeq ($(MESA_GPU_DRIVERS),swrast) -LOCAL_CFLAGS += -D__NOT_HAVE_DRM_H -else +ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),) +LOCAL_CFLAGS += -DHAVE_LIBDRM LOCAL_SHARED_LIBRARIES += libdrm endif -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] dri/common: use HAVE_LIBDRM over __NOT_HAVE_DRM_H
See previous commit message for details. Signed-off-by: Emil Velikov --- src/mesa/drivers/dri/common/Android.mk | 13 ++--- src/mesa/drivers/dri/common/Makefile.am | 2 +- src/mesa/drivers/dri/common/SConscript | 4 src/mesa/drivers/dri/common/dri_util.c | 4 ++-- 4 files changed, 5 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/common/Android.mk b/src/mesa/drivers/dri/common/Android.mk index 6986f5e..6e29baf 100644 --- a/src/mesa/drivers/dri/common/Android.mk +++ b/src/mesa/drivers/dri/common/Android.mk @@ -43,10 +43,8 @@ LOCAL_EXPORT_C_INCLUDE_DIRS := \ $(LOCAL_PATH) \ $(intermediates) -# swrast only -ifeq ($(MESA_GPU_DRIVERS),swrast) -LOCAL_CFLAGS := -D__NOT_HAVE_DRM_H -else +ifneq ($(filter-out swrast,$(MESA_GPU_DRIVERS)),) +LOCAL_CFLAGS := -DHAVE_LIBDRM LOCAL_SHARED_LIBRARIES := libdrm endif @@ -110,13 +108,6 @@ LOCAL_MODULE_CLASS := STATIC_LIBRARIES LOCAL_C_INCLUDES := \ $(MESA_DRI_C_INCLUDES) -# swrast only -ifeq ($(MESA_GPU_DRIVERS),swrast) -LOCAL_CFLAGS := -D__NOT_HAVE_DRM_H -else -LOCAL_SHARED_LIBRARIES := libdrm -endif - LOCAL_SRC_FILES := $(megadriver_stub_FILES) include $(MESA_COMMON_MK) diff --git a/src/mesa/drivers/dri/common/Makefile.am b/src/mesa/drivers/dri/common/Makefile.am index ae19fcb..ee6c691 100644 --- a/src/mesa/drivers/dri/common/Makefile.am +++ b/src/mesa/drivers/dri/common/Makefile.am @@ -58,5 +58,5 @@ if DRICOMMON_NEED_LIBDRM AM_CFLAGS += $(LIBDRM_CFLAGS) libdricommon_la_LIBADD = $(LIBDRM_LIBS) else -AM_CFLAGS += -D__NOT_HAVE_DRM_H +AM_CFLAGS += -UHAVE_LIBDRM endif diff --git a/src/mesa/drivers/dri/common/SConscript b/src/mesa/drivers/dri/common/SConscript index b402736..adaac29 100644 --- a/src/mesa/drivers/dri/common/SConscript +++ b/src/mesa/drivers/dri/common/SConscript @@ -32,10 +32,7 @@ drienv.AppendUnique(LIBS = [ 'expat', ]) -# if HAVE_DRI2 drienv.PkgUseModules('DRM') -# else -#env.Append(CPPDEFINES = ['__NOT_HAVE_DRM_H']) sources = drienv.ParseSourceList('Makefile.sources', ['DRI_COMMON_FILES', 'XMLCONFIG_FILES' ]) @@ -57,7 +54,6 @@ env.Append(CPPPATH = [ ]) env.Append(CPPDEFINES = [ -'__NOT_HAVE_DRM_H', 'HAVE_DLADDR', ]) diff --git a/src/mesa/drivers/dri/common/dri_util.c b/src/mesa/drivers/dri/common/dri_util.c index ae4592c..100a727 100644 --- a/src/mesa/drivers/dri/common/dri_util.c +++ b/src/mesa/drivers/dri/common/dri_util.c @@ -40,7 +40,7 @@ #include -#ifndef __NOT_HAVE_DRM_H +#ifdef HAVE_LIBDRM #include #endif #include "dri_util.h" @@ -137,7 +137,7 @@ driCreateNewScreen2(int scrn, int fd, setupLoaderExtensions(psp, extensions); -#ifndef __NOT_HAVE_DRM_H +#ifdef HAVE_LIBDRM if (fd != -1) { drmVersionPtr version = drmGetVersion(fd); if (version) { -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/11] scons: don't build the kms-dri winsys
Same as previous commit - unused (gbm is not a thing outside the autotools build). Signed-off-by: Emil Velikov --- src/gallium/SConscript| 1 - src/gallium/state_trackers/dri/SConscript | 1 - src/gallium/targets/dri/SConscript| 4 src/gallium/winsys/sw/kms-dri/SConscript | 23 --- 4 files changed, 29 deletions(-) delete mode 100644 src/gallium/winsys/sw/kms-dri/SConscript diff --git a/src/gallium/SConscript b/src/gallium/SConscript index eeb1c78..fa5fa6e 100644 --- a/src/gallium/SConscript +++ b/src/gallium/SConscript @@ -46,7 +46,6 @@ if env['platform'] == 'haiku': if env['dri']: SConscript([ 'winsys/sw/dri/SConscript', -'winsys/sw/kms-dri/SConscript', 'winsys/svga/drm/SConscript', ]) diff --git a/src/gallium/state_trackers/dri/SConscript b/src/gallium/state_trackers/dri/SConscript index 7a7060c..657300b 100644 --- a/src/gallium/state_trackers/dri/SConscript +++ b/src/gallium/state_trackers/dri/SConscript @@ -17,7 +17,6 @@ env.Append(CPPPATH = [ env.Append(CPPDEFINES = [ ('GALLIUM_STATIC_TARGETS', '1'), -'GALLIUM_SOFTPIPE', ]) sources = env.ParseSourceList('Makefile.sources', 'common_SOURCES') diff --git a/src/gallium/targets/dri/SConscript b/src/gallium/targets/dri/SConscript index a51ed56..80ce8de 100644 --- a/src/gallium/targets/dri/SConscript +++ b/src/gallium/targets/dri/SConscript @@ -37,7 +37,6 @@ env.Prepend(LIBS = [ svgadrm, svga, ws_dri, -ws_kms_dri, softpipe, libloader, mesautil, @@ -58,9 +57,6 @@ module = env.LoadableModule( env.Command('vmwgfx_dri.so', 'gallium_dri.so', "ln -f ${SOURCE} ${TARGET}") # swrast_dri.so env.Command('swrast_dri.so', 'gallium_dri.so', "ln -f ${SOURCE} ${TARGET}") -# kms_swrast_dri.so -env.Command('kms_swrast_dri.so', 'gallium_dri.so', "ln -f ${SOURCE} ${TARGET}") env.Alias('dri-vmwgfx', module) env.Alias('dri-swrast', module) -env.Alias('dri-kms-swrast', module) diff --git a/src/gallium/winsys/sw/kms-dri/SConscript b/src/gallium/winsys/sw/kms-dri/SConscript deleted file mode 100644 index e7dd721..000 --- a/src/gallium/winsys/sw/kms-dri/SConscript +++ /dev/null @@ -1,23 +0,0 @@ -### -# SConscript for kms-dri winsys - - -Import('*') - -if env['platform'] not in ('linux'): -Return() - -env = env.Clone() - -env.PkgUseModules('DRM') - -env.Append(CPPPATH = [ -'#/src/gallium/include', -'#/src/gallium/auxiliary', -]) - -ws_kms_dri = env.ConvenienceLibrary( -target = 'ws_kms_dri', -source = env.ParseSourceList('Makefile.sources', 'C_SOURCES'), -) -Export('ws_kms_dri') -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/11] i915; remove unused driFd variable
Signed-off-by: Emil Velikov --- src/mesa/drivers/dri/i915/intel_context.c | 1 - src/mesa/drivers/dri/i915/intel_context.h | 2 -- 2 files changed, 3 deletions(-) diff --git a/src/mesa/drivers/dri/i915/intel_context.c b/src/mesa/drivers/dri/i915/intel_context.c index 5618dcd..c780103 100644 --- a/src/mesa/drivers/dri/i915/intel_context.c +++ b/src/mesa/drivers/dri/i915/intel_context.c @@ -428,7 +428,6 @@ intelInitContext(struct intel_context *intel, driContextPriv->driverPrivate = intel; intel->driContext = driContextPriv; - intel->driFd = sPriv->fd; intel->gen = intelScreen->gen; diff --git a/src/mesa/drivers/dri/i915/intel_context.h b/src/mesa/drivers/dri/i915/intel_context.h index 350d35d..4ec4015 100644 --- a/src/mesa/drivers/dri/i915/intel_context.h +++ b/src/mesa/drivers/dri/i915/intel_context.h @@ -273,8 +273,6 @@ struct intel_context bool use_early_z; - int driFd; - __DRIcontext *driContext; struct intel_screen *intelScreen; -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/11] i915; remove unused driFd variable
On Wed, Jul 8, 2015 at 10:07 AM, Emil Velikov wrote: > Signed-off-by: Emil Velikov > --- > src/mesa/drivers/dri/i915/intel_context.c | 1 - > src/mesa/drivers/dri/i915/intel_context.h | 2 -- > 2 files changed, 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i915/intel_context.c > b/src/mesa/drivers/dri/i915/intel_context.c > index 5618dcd..c780103 100644 > --- a/src/mesa/drivers/dri/i915/intel_context.c > +++ b/src/mesa/drivers/dri/i915/intel_context.c > @@ -428,7 +428,6 @@ intelInitContext(struct intel_context *intel, > > driContextPriv->driverPrivate = intel; > intel->driContext = driContextPriv; > - intel->driFd = sPriv->fd; > > intel->gen = intelScreen->gen; > > diff --git a/src/mesa/drivers/dri/i915/intel_context.h > b/src/mesa/drivers/dri/i915/intel_context.h > index 350d35d..4ec4015 100644 > --- a/src/mesa/drivers/dri/i915/intel_context.h > +++ b/src/mesa/drivers/dri/i915/intel_context.h > @@ -273,8 +273,6 @@ struct intel_context > > bool use_early_z; > > - int driFd; > - > __DRIcontext *driContext; > struct intel_screen *intelScreen; > > -- s/;/:/ in the subject. Wow, that's some DRI1 stuff! Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] loader: libudev vs sysfs vs libdrm
Emil Velikov writes: > Hello all, > > A recent patch by Chris, fixing some libudev fun in our loader, made > me think if we can clear it up a bit. > > Having three different ways of retrieving the vendor/device ID does > feel a bit excessive. Plus as one gets fixed others are likely to > break - and they do. > So here is a summary of each method, from portability POV. > - libudev: widely common across Linux distributions (but not all). > - sysfs: written by Gary Wong to target GNU Hurd and *BSD. The *BSD > folk never got to using it though :-\ Huh? There's no sysfs on BSD. I actually don't see a reason for this path to exist, unless we wanted to drop libudev entirely. We should pick one of these two, certainly. > - libdrm: used as a last resource fall-back after the above two. the > sole option used by *BSD, MacOS and Android. > > libdrm seems like a nice middle ground that can be used everywhere. > Which begs the question: from a technical POV, is there any > advantage/disadvantage of using one over the other ? This "use the kernel driver name" thing is a bad hack, without some mapping in userspace from kernel driver name to userspace driver name. It's a hack that non-pci are relying on so far, though. One of the promises of udev was that Mesa would be able to ship some config files that attached our drivers' names to devices, so that udev could answer "what DRI driver do I open for this fd?" in one central way, rather than having Mesa's loader and X's loader both trying to do replicate the same behavior. We haven't made use of that, though, and it seems less important now that X doesn't do the loader thing (now our only loader variation is trying to handle old DRI drivers with new libGL). > I do recall Kristian and Eric participating in this discussion before, > but the only thing I can find is along the lines of "linux distros > should be using libudev" :-( Honestly, I'm leaning toward "let's just use the sysfs path and drop libudev" -- the sysfs code exists, and I think we've got more code complexity (and time wasted) dealing with the fallout from libudev screwing up their ABI, compared to just using Gary's open-coded stuff. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] xa: don't leak fences
From: Rob Clark XA was never unref'ing last_fence in the various call paths to pipe->flush(). Add this to xa_context_flush() and update the other open-coded calls to pipe->flush() to use xa_context_flush() instead. This fixes a memory leak reported with xf86-video-freedreno. Reported-by: Nicolas Dechesne Cc: "10.5 10.6" Signed-off-by: Rob Clark --- src/gallium/state_trackers/xa/xa_context.c | 6 +- src/gallium/state_trackers/xa/xa_tracker.c | 2 +- src/gallium/state_trackers/xa/xa_yuv.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/xa/xa_context.c b/src/gallium/state_trackers/xa/xa_context.c index fd49c82..ebfb290 100644 --- a/src/gallium/state_trackers/xa/xa_context.c +++ b/src/gallium/state_trackers/xa/xa_context.c @@ -37,7 +37,11 @@ XA_EXPORT void xa_context_flush(struct xa_context *ctx) { - ctx->pipe->flush(ctx->pipe, &ctx->last_fence, 0); +if (ctx->last_fence) { +struct pipe_screen *screen = ctx->xa->screen; +screen->fence_reference(screen, &ctx->last_fence, NULL); +} +ctx->pipe->flush(ctx->pipe, &ctx->last_fence, 0); } XA_EXPORT struct xa_context * diff --git a/src/gallium/state_trackers/xa/xa_tracker.c b/src/gallium/state_trackers/xa/xa_tracker.c index a384c1c..1df1da7 100644 --- a/src/gallium/state_trackers/xa/xa_tracker.c +++ b/src/gallium/state_trackers/xa/xa_tracker.c @@ -464,7 +464,7 @@ xa_surface_redefine(struct xa_surface *srf, xa_min(save_height, template->height0), &src_box); pipe->resource_copy_region(pipe, texture, 0, 0, 0, 0, srf->tex, 0, &src_box); - pipe->flush(pipe, &xa->default_ctx->last_fence, 0); + xa_context_flush(xa->default_ctx); } pipe_resource_reference(&srf->tex, texture); diff --git a/src/gallium/state_trackers/xa/xa_yuv.c b/src/gallium/state_trackers/xa/xa_yuv.c index 1519639..97a1833 100644 --- a/src/gallium/state_trackers/xa/xa_yuv.c +++ b/src/gallium/state_trackers/xa/xa_yuv.c @@ -154,7 +154,7 @@ xa_yuv_planar_blit(struct xa_context *r, box++; } -r->pipe->flush(r->pipe, &r->last_fence, 0); +xa_context_flush(r); xa_ctx_sampler_views_destroy(r); xa_ctx_srf_destroy(r); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91254] (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
https://bugs.freedesktop.org/show_bug.cgi?id=91254 Ian Romanick changed: What|Removed |Added CC||i...@freedesktop.org QA Contact|i...@freedesktop.org |jljus...@gmail.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On Wednesday, July 08, 2015 03:17:35 PM Chris Wilson wrote: > On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote: > > On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote: > > > * Gen4-5 structure changes. > > Did you mean brw_structs.h? > > diff --git a/src/mesa/drivers/dri/i965/brw_structs.h > b/src/mesa/drivers/dri/i965/brw_structs.h > index 55338c0..e167254 100644 > --- a/src/mesa/drivers/dri/i965/brw_structs.h > +++ b/src/mesa/drivers/dri/i965/brw_structs.h > @@ -391,13 +391,16 @@ struct brw_sf_unit_state >unsigned pad3:1; > } thread4; > > - struct > + union > { > + struct { > unsigned front_winding:1; > unsigned viewport_transform:1; > unsigned pad0:3; > unsigned sf_viewport_state_offset:27; /* Offset from > GENERAL_STATE_BASE */ >} sf5; > + uint32_t dw5; > + }; > > struct > { > @@ -525,15 +528,17 @@ struct brw_wm_unit_state > struct thread2 thread2; > struct thread3 thread3; > > + union { >struct { > unsigned stats_enable:1; > unsigned depth_buffer_clear:1; > unsigned sampler_count:3; > unsigned sampler_state_pointer:27; >} wm4; > + uint32_t dw4; > + }; > > - struct > - { > + struct { >unsigned enable_8_pix:1; >unsigned enable_16_pix:1; >unsigned enable_32_pix:1; > diff --git a/src/mesa/drivers/dri/i965/brw_urb.c > b/src/mesa/drivers/dri/i965/brw_urb.c > > Or something else? > -Chris Yes - and the changes are fine. But...separable. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/18] i965: Introduce a context-local batch manager
On Wed, Jul 08, 2015 at 10:49:24AM -0700, Kenneth Graunke wrote: > On Wednesday, July 08, 2015 03:17:35 PM Chris Wilson wrote: > > On Wed, Jul 08, 2015 at 09:51:07AM +0100, Chris Wilson wrote: > > > On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote: > > > > * Gen4-5 structure changes. > > > > Did you mean brw_structs.h? > > > > diff --git a/src/mesa/drivers/dri/i965/brw_structs.h > > b/src/mesa/drivers/dri/i965/brw_structs.h > > index 55338c0..e167254 100644 > > --- a/src/mesa/drivers/dri/i965/brw_structs.h > > +++ b/src/mesa/drivers/dri/i965/brw_structs.h > > @@ -391,13 +391,16 @@ struct brw_sf_unit_state > >unsigned pad3:1; > > } thread4; > > > > - struct > > + union > > { > > + struct { > > unsigned front_winding:1; > > unsigned viewport_transform:1; > > unsigned pad0:3; > > unsigned sf_viewport_state_offset:27; /* Offset from > > GENERAL_STATE_BASE */ > >} sf5; > > + uint32_t dw5; > > + }; > > > > struct > > { > > @@ -525,15 +528,17 @@ struct brw_wm_unit_state > > struct thread2 thread2; > > struct thread3 thread3; > > > > + union { > >struct { > > unsigned stats_enable:1; > > unsigned depth_buffer_clear:1; > > unsigned sampler_count:3; > > unsigned sampler_state_pointer:27; > >} wm4; > > + uint32_t dw4; > > + }; > > > > - struct > > - { > > + struct { > >unsigned enable_8_pix:1; > >unsigned enable_16_pix:1; > >unsigned enable_32_pix:1; > > diff --git a/src/mesa/drivers/dri/i965/brw_urb.c > > b/src/mesa/drivers/dri/i965/brw_urb.c > > > > Or something else? > > -Chris > > Yes - and the changes are fine. But...separable. Would you like a patch to provide dword alternates for all the structs, or just the ones used for relocation processing? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] loader: libudev vs sysfs vs libdrm
On 08/07/2015 00:15, Emil Velikov wrote : Can anyone shed a light/cast their 2c ? Thanks Emil ___ The DRI3/Wayland/Gallium Nine DRI_PRIME path uses libudev, and I'm not sure it could work without. It is based on the ID_PATH_TAG advertised by udev with the device. Perhaps there is another way to compute that tag though. Axel Davy ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.
On 8 July 2015 at 17:47, Matt Turner wrote: > On Wed, Jul 8, 2015 at 3:00 AM, Emil Velikov wrote: >> On 08/07/15 07:33, Matt Turner wrote: >>> Cc: "10.6" >> Thanks for that I've completely forgot about this. >> >> Seems like we never got to using mesa_sha. Can we just remove it for now >> and revive it as a user comes along ? > > I don't have a problem with that. I shouldn't have let in what's been > committed. > > I guess it mostly depends on how soon we expect to finish the shader-cache. I've applied your original "revert it all" from 10.5 for the 10.6 branch and barring any regression I'll push it out tonight. Whereas for master it'll depend on the input we get from Ken and others :-) -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] gallium: clarify reference counting for fence
From: Rob Clark No where was it spelled out that the state tracker may expect the pipe driver to unref the old fence. Signed-off-by: Rob Clark --- src/gallium/include/pipe/p_context.h | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index c2eedf8..d2c2e4c 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -361,8 +361,14 @@ struct pipe_context { const void *clear_value, int clear_value_size); - /** Flush draw commands + /** +* Flush draw commands +* +* NOTE: use screen->fence_reference() (or equivalent) to transfer +* new fence ref to **fence, to ensure that previous fence is unref'd * +* \param fence if not NULL, an old fence to unref and transfer a +*new fence reference to * \param flags bitfield of enum pipe_flush_flags values. */ void (*flush)(struct pipe_context *pipe, -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] vc4: unref old fence
From: Rob Clark Some, but not all, state trackers will explicitly unref (and set to NULL) the previous *fence before calling pipe->flush(). So driver should use fence_ref() which will unref the old fence if not NULL. Signed-off-by: Rob Clark --- src/gallium/drivers/vc4/vc4_context.c | 3 ++- src/gallium/drivers/vc4/vc4_fence.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/vc4/vc4_context.c b/src/gallium/drivers/vc4/vc4_context.c index 630f8e6..7de4fad 100644 --- a/src/gallium/drivers/vc4/vc4_context.c +++ b/src/gallium/drivers/vc4/vc4_context.c @@ -103,9 +103,10 @@ vc4_pipe_flush(struct pipe_context *pctx, struct pipe_fence_handle **fence, vc4_flush(pctx); if (fence) { +struct pipe_screen *screen = pctx->screen; struct vc4_fence *f = vc4_fence_create(vc4->screen, vc4->last_emit_seqno); -*fence = (struct pipe_fence_handle *)f; +screen->fence_reference(screen, fence, (struct pipe_fence_handle *)f); } } diff --git a/src/gallium/drivers/vc4/vc4_fence.c b/src/gallium/drivers/vc4/vc4_fence.c index f2ee91d..6055ffe 100644 --- a/src/gallium/drivers/vc4/vc4_fence.c +++ b/src/gallium/drivers/vc4/vc4_fence.c @@ -88,7 +88,7 @@ vc4_fence_create(struct vc4_screen *screen, uint64_t seqno) if (!f) return NULL; -pipe_reference_init(&f->reference, 1); +pipe_reference_init(&f->reference, 0); f->seqno = seqno; return f; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] fence refcnting fixes
From: Rob Clark This isn't at all clear for pipe driver writers currently, since it is not documented anywhere. But radeon/nouveau/llvmpipe seem to drop the ref on the **fence passed in to pipe->flush() (if *fence!=NULL). Freedreno/ilo/vc4 where not doing this. Some state trackers do call screen->fence_reference(screen, &fence, NULL) before pipe->flush(), but others do not. Add a comment for pipe->flush() to clairify what is expected of the driver, and fixup freedreno/ilo/vc4 to comply. Note: that ilo/vc4 patches are untested Rob Clark (4): gallium: clarify reference counting for fence freedreno: unref old fence ilo: unref old fence vc4: unref old fence src/gallium/drivers/freedreno/freedreno_context.c | 2 +- src/gallium/drivers/freedreno/freedreno_fence.c | 2 +- src/gallium/drivers/ilo/ilo_context.c | 5 - src/gallium/drivers/ilo/ilo_screen.c | 2 +- src/gallium/drivers/vc4/vc4_context.c | 3 ++- src/gallium/drivers/vc4/vc4_fence.c | 2 +- src/gallium/include/pipe/p_context.h | 8 +++- 7 files changed, 17 insertions(+), 7 deletions(-) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] freedreno: unref old fence
From: Rob Clark Some, but not all, state trackers will explicitly unref (and set to NULL) the previous *fence before calling pipe->flush(). So driver should use fence_ref() which will unref the old fence if not NULL. Signed-off-by: Rob Clark --- src/gallium/drivers/freedreno/freedreno_context.c | 2 +- src/gallium/drivers/freedreno/freedreno_fence.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/freedreno/freedreno_context.c b/src/gallium/drivers/freedreno/freedreno_context.c index 668ef36..42f79d3 100644 --- a/src/gallium/drivers/freedreno/freedreno_context.c +++ b/src/gallium/drivers/freedreno/freedreno_context.c @@ -145,7 +145,7 @@ fd_context_flush(struct pipe_context *pctx, struct pipe_fence_handle **fence, fd_context_render(pctx); if (fence) - *fence = fd_fence_create(pctx); + fd_screen_fence_ref(pctx->screen, fence, fd_fence_create(pctx)); } void diff --git a/src/gallium/drivers/freedreno/freedreno_fence.c b/src/gallium/drivers/freedreno/freedreno_fence.c index 375e58f..c30c3f9 100644 --- a/src/gallium/drivers/freedreno/freedreno_fence.c +++ b/src/gallium/drivers/freedreno/freedreno_fence.c @@ -84,7 +84,7 @@ struct pipe_fence_handle * fd_fence_create(struct pipe_context *pctx) if (!fence) return NULL; - pipe_reference_init(&fence->reference, 1); + pipe_reference_init(&fence->reference, 0); fence->ctx = ctx; fence->screen = ctx->screen; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] ilo: unref old fence
From: Rob Clark Some, but not all, state trackers will explicitly unref (and set to NULL) the previous *fence before calling pipe->flush(). So driver should use fence_ref() which will unref the old fence if not NULL. Signed-off-by: Rob Clark --- src/gallium/drivers/ilo/ilo_context.c | 5 - src/gallium/drivers/ilo/ilo_screen.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/ilo/ilo_context.c b/src/gallium/drivers/ilo/ilo_context.c index 3d5c7b6..8d17816 100644 --- a/src/gallium/drivers/ilo/ilo_context.c +++ b/src/gallium/drivers/ilo/ilo_context.c @@ -62,7 +62,10 @@ ilo_flush(struct pipe_context *pipe, (flags & PIPE_FLUSH_END_OF_FRAME) ? "frame end" : "user request"); if (f) { - *f = ilo_screen_fence_create(pipe->screen, ilo->cp->last_submitted_bo); + struct pipe_screen *screen = pipe->screen; + struct pipe_fence_handle *fence; + fence = ilo_screen_fence_create(pipe->screen, ilo->cp->last_submitted_bo); + screen->fence_reference(screen, f, fence); } } diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index faebb92..a6abd84 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -693,7 +693,7 @@ ilo_screen_fence_create(struct pipe_screen *screen, struct intel_bo *bo) if (!fence) return NULL; - pipe_reference_init(&fence->reference, 1); + pipe_reference_init(&fence->reference, 0); fence->seqno_bo = intel_bo_ref(bo); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().
By keeping a pointer to the next available location, we reduce the number of memory accesses needed to write to the batchbuffer. A net ~7k reduction of .text size, 7.5k of which is from the change to intel_batchbuffer_emit_dword(). text data bss dec hex filename 4943740 19515226192 5165084 4ed01c i965_dri.so before 4936804 19515226192 5158148 4eb504 i965_dri.so after Combined with the previous patch, improves performance of Synmark OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell. --- Full disclosure: when testing on an IVB desktop, I measured a regression in the same benchmark of -4.19005% +/- 1.15188% (n=30). I don't have any explanation. Testing on other benchmarks and especially on Atom systems would be very welcome. src/mesa/drivers/dri/i965/brw_blorp.cpp| 4 +-- src/mesa/drivers/dri/i965/brw_context.h| 5 ++-- .../drivers/dri/i965/brw_performance_monitor.c | 6 ++-- src/mesa/drivers/dri/i965/brw_state_batch.c| 4 +-- src/mesa/drivers/dri/i965/brw_urb.c| 6 ++-- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 35 +++--- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 11 --- 7 files changed, 38 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp b/src/mesa/drivers/dri/i965/brw_blorp.cpp index 2ccfae1..eac1f00 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp @@ -226,7 +226,7 @@ retry: intel_batchbuffer_require_space(brw, estimated_max_batch_usage, RENDER_RING); intel_batchbuffer_save_state(brw); drm_intel_bo *saved_bo = brw->batch.bo; - uint32_t saved_used = brw->batch.used; + uint32_t saved_used = USED_BATCH(brw->batch); uint32_t saved_state_batch_offset = brw->batch.state_batch_offset; switch (brw->gen) { @@ -245,7 +245,7 @@ retry: * reserved enough space that a wrap will never happen. */ assert(brw->batch.bo == saved_bo); - assert((brw->batch.used - saved_used) * 4 + + assert((USED_BATCH(brw->batch) - saved_used) * 4 + (saved_state_batch_offset - brw->batch.state_batch_offset) < estimated_max_batch_usage); /* Shut up compiler warnings on release build */ diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index afb714b..c7ad35e 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -875,7 +875,8 @@ struct intel_batchbuffer { #ifdef DEBUG uint16_t emit, total; #endif - uint16_t used, reserved_space; + uint16_t reserved_space; + uint32_t *map_next; uint32_t *map; uint32_t *cpu_map; #define BATCH_SZ (8192*sizeof(uint32_t)) @@ -887,7 +888,7 @@ struct intel_batchbuffer { uint8_t pipe_controls_since_last_cs_stall; struct { - uint16_t used; + uint32_t *map_next; int reloc_count; } saved; }; diff --git a/src/mesa/drivers/dri/i965/brw_performance_monitor.c b/src/mesa/drivers/dri/i965/brw_performance_monitor.c index 0a12375..7e90e8a 100644 --- a/src/mesa/drivers/dri/i965/brw_performance_monitor.c +++ b/src/mesa/drivers/dri/i965/brw_performance_monitor.c @@ -710,7 +710,7 @@ emit_mi_report_perf_count(struct brw_context *brw, /* Make sure the commands to take a snapshot fits in a single batch. */ intel_batchbuffer_require_space(brw, MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4, RENDER_RING); - int batch_used = brw->batch.used; + int batch_used = USED_BATCH(brw->batch); /* Reports apparently don't always get written unless we flush first. */ brw_emit_mi_flush(brw); @@ -754,7 +754,7 @@ emit_mi_report_perf_count(struct brw_context *brw, brw_emit_mi_flush(brw); (void) batch_used; - assert(brw->batch.used - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4); + assert(USED_BATCH(brw->batch) - batch_used <= MI_REPORT_PERF_COUNT_BATCH_DWORDS * 4); } /** @@ -1386,7 +1386,7 @@ void brw_perf_monitor_new_batch(struct brw_context *brw) { assert(brw->batch.ring == RENDER_RING); - assert(brw->gen < 6 || brw->batch.used == 0); + assert(brw->gen < 6 || USED_BATCH(brw->batch) == 0); if (brw->perfmon.oa_users == 0) return; diff --git a/src/mesa/drivers/dri/i965/brw_state_batch.c b/src/mesa/drivers/dri/i965/brw_state_batch.c index a405a80..d79e0ea 100644 --- a/src/mesa/drivers/dri/i965/brw_state_batch.c +++ b/src/mesa/drivers/dri/i965/brw_state_batch.c @@ -87,7 +87,7 @@ brw_annotate_aub(struct brw_context *brw) drm_intel_aub_annotation annotations[annotation_count]; int a = 0; make_annotation(&annotations[a++], AUB_TRACE_TYPE_BATCH, 0, - 4*brw->batch.used); + 4 * USED_BATCH(brw->batch)); for (int i = brw->state_batch_count; i-- > 0; ) { uint32_t type = brw->state_batch_list[i].type; uint32_t start_offset = brw->state_batch_list[i].offset;
[Mesa-dev] [PATCH 1/2] i965: Set brw->batch.emit only #ifdef DEBUG.
It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more importantly prevents a larger code size regression in the next commit when the .used field is replaced and calculated on demand. text data bss dec hex filename 4945468 19515226192 5166812 4ed6dc i965_dri.so before 4943740 19515226192 5165084 4ed01c i965_dri.so after And surround the emit and total fields with #ifdef DEBUG to prevent such mistakes from happening again. --- src/mesa/drivers/dri/i965/brw_context.h | 2 ++ src/mesa/drivers/dri/i965/intel_batchbuffer.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 7596139..afb714b 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -872,7 +872,9 @@ struct intel_batchbuffer { /** BO for post-sync nonzero writes for gen6 workaround. */ drm_intel_bo *workaround_bo; +#ifdef DEBUG uint16_t emit, total; +#endif uint16_t used, reserved_space; uint32_t *map; uint32_t *cpu_map; diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index fdd07e0..f0971e9 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -134,8 +134,8 @@ intel_batchbuffer_begin(struct brw_context *brw, int n, enum brw_gpu_ring ring) { intel_batchbuffer_require_space(brw, n * 4, ring); - brw->batch.emit = brw->batch.used; #ifdef DEBUG + brw->batch.emit = brw->batch.used; brw->batch.total = n; #endif } -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().
On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote: > By keeping a pointer to the next available location, we reduce the > number of memory accesses needed to write to the batchbuffer. > > A net ~7k reduction of .text size, 7.5k of which is from the change to > intel_batchbuffer_emit_dword(). > >text data bss dec hex filename > 4943740 19515226192 5165084 4ed01c i965_dri.so before > 4936804 19515226192 5158148 4eb504 i965_dri.so after > > Combined with the previous patch, improves performance of Synmark > OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell. > --- > Full disclosure: when testing on an IVB desktop, I measured a > regression in the same benchmark of -4.19005% +/- 1.15188% (n=30). > I don't have any explanation. The problem is that it seems to generate worse code with multiple adjacent emit_dwords. I have seen similar regressions when doing the same batch[index] to *batch++ elsewhere. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy
WARNING: No perf data, please keep reading though) This implements the suggestion provided by the paper, "Fast USWC to WB Memory Copy" (https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers). This is described throughout the paper, but the sample code lives in Figure 3-3. That paper purports a roughly 40% performance gain in Mbyte/second over the original implementation done by Matt. Section 3.1.2 is the summary of why an intermediate cache buffer is used. It claims that if you use the naive implementation, fill buffers are contended for. To be honest, I can't quite fathom the underlying explanation, but I'll think about it some more. Most importantly would be to get the perf data... This patch does need performance data. I don't currently have a platform that this would benefit (BYT or BSW), so I can't get anything useful. As soon as I get a platform to test it on, I will - meanwhile, maybe whomever tested the original patch the first time around come run this through? Cc: Matt Turner Cc: Chad Versace Cc: Kristian Høgsberg Signed-off-by: Ben Widawsky --- src/mesa/main/streaming-load-memcpy.c | 61 +++ 1 file changed, 47 insertions(+), 14 deletions(-) diff --git a/src/mesa/main/streaming-load-memcpy.c b/src/mesa/main/streaming-load-memcpy.c index d7147af..3cd310a 100644 --- a/src/mesa/main/streaming-load-memcpy.c +++ b/src/mesa/main/streaming-load-memcpy.c @@ -30,6 +30,8 @@ #include "main/streaming-load-memcpy.h" #include +static uint8_t rsvd_space[4096]; + /* Copies memory from src to dst, using SSE 4.1's MOVNTDQA to get streaming * read performance from uncached memory. */ @@ -59,23 +61,54 @@ _mesa_streaming_load_memcpy(void *restrict dst, void *restrict src, size_t len) len -= MIN2(bytes_before_alignment_boundary, len); } - while (len >= 64) { - __m128i *dst_cacheline = (__m128i *)d; - __m128i *src_cacheline = (__m128i *)s; + while (len > 64) { + __m128i *cached_buffer = (__m128i *)rsvd_space; + size_t streaming_len = len > 4096 ? 4096 : len; + + __asm__ volatile("mfence" ::: "memory"); + + while (streaming_len >= 64) { + __m128i *src_cacheline = (__m128i *)s; + + __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0); + __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1); + __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2); + __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3); + + _mm_store_si128(cached_buffer + 0, temp1); + _mm_store_si128(cached_buffer + 1, temp2); + _mm_store_si128(cached_buffer + 2, temp3); + _mm_store_si128(cached_buffer + 3, temp4); + + s += 64; + streaming_len -= 64; + cached_buffer += 4; + } + + cached_buffer = (__m128i *)rsvd_space; + streaming_len = len > 4096 ? 4096 : len; + + __asm__ volatile("mfence" ::: "memory"); + + while (streaming_len >= 64) { + __m128i *dst_cacheline = (__m128i *)d; + + __m128i temp1 = _mm_stream_load_si128(cached_buffer + 0); + __m128i temp2 = _mm_stream_load_si128(cached_buffer + 1); + __m128i temp3 = _mm_stream_load_si128(cached_buffer + 2); + __m128i temp4 = _mm_stream_load_si128(cached_buffer + 3); - __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0); - __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1); - __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2); - __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3); + _mm_store_si128(dst_cacheline + 0, temp1); + _mm_store_si128(dst_cacheline + 1, temp2); + _mm_store_si128(dst_cacheline + 2, temp3); + _mm_store_si128(dst_cacheline + 3, temp4); - _mm_store_si128(dst_cacheline + 0, temp1); - _mm_store_si128(dst_cacheline + 1, temp2); - _mm_store_si128(dst_cacheline + 2, temp3); - _mm_store_si128(dst_cacheline + 3, temp4); + d += 64; + streaming_len -= 64; + cached_buffer += 4; - d += 64; - s += 64; - len -= 64; + len -= 64; + } } /* memcpy() the tail. */ -- 2.4.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] fence refcnting fixes
Bleh, assert(pipe_is_referenced(ptr)) in pipe_reference_described() is non-pleased in debug builds about the trick of starting out the fences w/ refcnt=0 (so that it doesn't end up getting upref'd to 2 in pipe->flush().. so I'll have to re-work the fd/ilo/vc4 patches slightly.. BR, -R On Wed, Jul 8, 2015 at 4:34 PM, Rob Clark wrote: > From: Rob Clark > > This isn't at all clear for pipe driver writers currently, since it > is not documented anywhere. But radeon/nouveau/llvmpipe seem to drop > the ref on the **fence passed in to pipe->flush() (if *fence!=NULL). > Freedreno/ilo/vc4 where not doing this. Some state trackers do call > screen->fence_reference(screen, &fence, NULL) before pipe->flush(), > but others do not. > > Add a comment for pipe->flush() to clairify what is expected of the > driver, and fixup freedreno/ilo/vc4 to comply. > > Note: that ilo/vc4 patches are untested > > Rob Clark (4): > gallium: clarify reference counting for fence > freedreno: unref old fence > ilo: unref old fence > vc4: unref old fence > > src/gallium/drivers/freedreno/freedreno_context.c | 2 +- > src/gallium/drivers/freedreno/freedreno_fence.c | 2 +- > src/gallium/drivers/ilo/ilo_context.c | 5 - > src/gallium/drivers/ilo/ilo_screen.c | 2 +- > src/gallium/drivers/vc4/vc4_context.c | 3 ++- > src/gallium/drivers/vc4/vc4_fence.c | 2 +- > src/gallium/include/pipe/p_context.h | 8 +++- > 7 files changed, 17 insertions(+), 7 deletions(-) > > -- > 2.4.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCHv2] i965/gen9: Use custom MOCS entries set up by the kernel.
On Tue, Jul 07, 2015 at 10:21:28PM +0300, Francisco Jerez wrote: > Instead of relying on hardware defaults the i915 kernel driver is > going program custom MOCS tables system-wide on Gen9 hardware. The > "WT" entry previously used for renderbuffers had a number of problems: > It disabled caching on eLLC, it used a reserved L3 cacheability > setting, and it used to override the PTE controls making renderbuffers > always WT on LLC regardless of the kernel's setting. Instead use an > entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE, > L3CC=WB. > > The "WB" entry previously used for anything other than renderbuffers > has moved to a different index in the new MOCS tables but it should > have the same caching semantics as the old entry. > > Even though the corresponding kernel change ("drm/i915: Added > Programming of the MOCS") is in a way an ABI break it doesn't seem > necessary to check that the kernel is recent enough because the change > should only affect Gen9 which is still unreleased hardware. > > v2: Update MOCS values for the new Android-incompatible tables > introduced in v7 of the kernel patch. > > Cc: 10.6 It'd be cool to get perf data, but certainly not a requirement here since the requirement to change is pretty obvious, IMO (mostly, I'm just curious). I do like having the References: in the commit for the kernel patch, but that's just me, and I can live with whatever. > --- > src/mesa/drivers/dri/i965/brw_defines.h| 11 ++- > src/mesa/drivers/dri/i965/gen8_surface_state.c | 3 +-- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 66b9abc..8ab8d62 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -2491,12 +2491,13 @@ enum brw_wm_barycentric_interp_mode { > #define BDW_MOCS_WT 0x58 > #define BDW_MOCS_PTE 0x18 > > -/* Skylake: MOCS is now an index into an array of 64 different configurable > - * cache settings. We still use only either write-back or write-through; and > - * rely on the documented default values. > +/* Skylake: MOCS is now an index into an array of 62 different caching > + * configurations programmed by the kernel. I'd keep the '64' instead of '62' the latter is a software construct, but whatever you like. > */ > -#define SKL_MOCS_WB (0b001001 << 1) > -#define SKL_MOCS_WT (0b000101 << 1) > +/* TC=LLC/eLLC, LeCC=WB, LRUM=3, L3CC=WB */ > +#define SKL_MOCS_WB (2 << 1) > +/* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */ > +#define SKL_MOCS_PTE (1 << 1) > > #define MEDIA_VFE_STATE 0x7000 > /* GEN7 DW2, GEN8+ DW3 */ > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c > b/src/mesa/drivers/dri/i965/gen8_surface_state.c > index bd3eb00..dfaf762 100644 > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c > @@ -401,8 +401,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, >irb->mt_layer : (irb->mt_layer / MAX2(mt->num_samples, 1)); > GLenum gl_target = >rb->TexImage ? rb->TexImage->TexObject->Target : GL_TEXTURE_2D; > - /* FINISHME: Use PTE MOCS on Skylake. */ > - uint32_t mocs = brw->gen >= 9 ? SKL_MOCS_WT : BDW_MOCS_PTE; > + uint32_t mocs = brw->gen >= 9 ? SKL_MOCS_PTE : BDW_MOCS_PTE; I don't know the policy on const really, but this is a good opportunity to const. > > intel_miptree_used_for_rendering(mt); > Reviewed-by: Ben Widawsky ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a
On Thu, Jul 02, 2015 at 12:58:33PM -0700, Matt Turner wrote: > On Thu, Jul 2, 2015 at 12:57 PM, Matt Turner wrote: > > On Wed, Jul 1, 2015 at 4:03 PM, Ben Widawsky > > wrote: > >> Certain compressed formats require this setting. The docs don't go into > >> much > >> detail as to why it's needed exactly. > >> > >> This fixes 0 piglit failures with a GBM gpu piglit run. > > > > That's a really weird way of saying that. > > > >> > >> Signed-off-by: Ben Widawsky > >> --- > >> > >> I had this one sitting around for almost 2 months. I'm not sure why I > >> didn't > >> send it out sooner. It seems like it's needed. > >> > >> --- > >> src/mesa/drivers/dri/i965/brw_defines.h| 1 + > >> src/mesa/drivers/dri/i965/gen8_surface_state.c | 26 > >> ++ > >> 2 files changed, 27 insertions(+) > >> > >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > >> b/src/mesa/drivers/dri/i965/brw_defines.h > >> index 66b9abc..f55fd49 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_defines.h > >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h > >> @@ -276,6 +276,7 @@ > >> #define GEN8_SURFACE_TILING_W (1 << 12) > >> #define GEN8_SURFACE_TILING_X (2 << 12) > >> #define GEN8_SURFACE_TILING_Y (3 << 12) > >> +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE (1 << 9) > >> #define BRW_SURFACE_RC_READ_WRITE (1 << 8) > >> #define BRW_SURFACE_MIPLAYOUT_SHIFT10 > >> #define BRW_SURFACE_MIPMAPLAYOUT_BELOW 0 > >> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c > >> b/src/mesa/drivers/dri/i965/gen8_surface_state.c > >> index ca5ed17..a245379 100644 > >> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c > >> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c > >> @@ -264,6 +264,19 @@ gen8_emit_texture_surface_state(struct brw_context > >> *brw, > >>surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES; > >> } > >> > >> + if (brw->is_cherryview || brw->gen >= 9) { > >> + /* "This bit must be set for the following surface types: BC2_UNORM > >> + * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM" > > > > Don't do naked quotes -- Use the normal style. > > > >> + */ > >> + switch (format) { > >> + case BRW_SURFACEFORMAT_BC2_UNORM: > >> + case BRW_SURFACEFORMAT_BC3_UNORM: > >> + case BRW_SURFACEFORMAT_BC5_SNORM: > > > > Missing BRW_SURFACEFORMAT_BC5_UNORM. > > > >> + case BRW_SURFACEFORMAT_BC7_UNORM: > >> + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE; > > > > It wouldn't surprise me if static analysis tools complain about the > > missing break. > > > > Add a break or make it an if statement (which would be two lines > > shorter). All together, how about > > > >/* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0 > > * bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes): > > * > > *This bit must be set for the following surface types: BC2_UNORM > > *BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM > > */ > >if (format == BRW_SURFACEFORMAT_BC2_UNORM || > >format == BRW_SURFACEFORMAT_BC3_UNORM || > >format == BRW_SURFACEFORMAT_BC5_SNORM || > >format == BRW_SURFACEFORMAT_BC5_UNORM || > > Bah, would be nice to so BC5_UNORM and then BC5_SNORM to better match > the comment. > Thanks for catching the missing case. > >format == BRW_SURFACEFORMAT_BC7_UNORM) > > surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE; > >} ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().
On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson wrote: > On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote: >> By keeping a pointer to the next available location, we reduce the >> number of memory accesses needed to write to the batchbuffer. >> >> A net ~7k reduction of .text size, 7.5k of which is from the change to >> intel_batchbuffer_emit_dword(). >> >>text data bss dec hex filename >> 4943740 19515226192 5165084 4ed01c i965_dri.so before >> 4936804 19515226192 5158148 4eb504 i965_dri.so after >> >> Combined with the previous patch, improves performance of Synmark >> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell. >> --- >> Full disclosure: when testing on an IVB desktop, I measured a >> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30). >> I don't have any explanation. > > The problem is that it seems to generate worse code with multiple > adjacent emit_dwords. I have seen similar regressions when doing the > same batch[index] to *batch++ elsewhere. > -Chris That is in conflict with the data I've provided. In fact, I started by noticing that if I added intel_batchbuffer_emit_dword* functions that took multiple dwords that there was a reduction in .text size, so it's something that I've considered. The two attached patches demonstrate that the batch[index++] pattern generates larger (worse) code than *batch++. 1.patch text databssdechex filename 4936804 195152 26192 5158148 4eb504 mesa-release/lib/i965_dri.so after 1.patch (no change) 4936884 195152 26192 5158228 4eb554 mesa-release/lib/i965_dri.so after 2.patch From 03256a2898bf74b56a35f8fcc130f8fee5902b59 Mon Sep 17 00:00:00 2001 From: Matt Turner Date: Wed, 8 Jul 2015 15:11:07 -0700 Subject: [PATCH 1/2] 1 --- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 17 + src/mesa/drivers/dri/i965/intel_batchbuffer.h | 24 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index abace6d..74e0763 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -520,14 +520,15 @@ gen7_blorp_emit_ps_config(struct brw_context *brw, } BEGIN_BATCH(8); - OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2)); - OUT_BATCH(params->use_wm_prog ? prog_offset : 0); - OUT_BATCH(dw2); - OUT_BATCH(0); - OUT_BATCH(dw4); - OUT_BATCH(dw5); - OUT_BATCH(0); - OUT_BATCH(0); + intel_batchbuffer_emit_dword8(brw, + _3DSTATE_PS << 16 | (8 - 2), + params->use_wm_prog ? prog_offset : 0, + dw2, + 0, + dw4, + dw5, + 0, + 0); ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index a12387b..a5d8f0b 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -101,6 +101,30 @@ intel_batchbuffer_emit_dword(struct brw_context *brw, GLuint dword) } static inline void +intel_batchbuffer_emit_dword8(struct brw_context *brw, GLuint dword1, + GLuint dword2, + GLuint dword3, + GLuint dword4, + GLuint dword5, + GLuint dword6, + GLuint dword7, + GLuint dword8) +{ +#ifdef DEBUG + assert(intel_batchbuffer_space(brw) >= 4); +#endif + *brw->batch.map_next++ = dword1; + *brw->batch.map_next++ = dword2; + *brw->batch.map_next++ = dword3; + *brw->batch.map_next++ = dword4; + *brw->batch.map_next++ = dword5; + *brw->batch.map_next++ = dword6; + *brw->batch.map_next++ = dword7; + *brw->batch.map_next++ = dword8; + assert(brw->batch.ring != UNKNOWN_RING); +} + +static inline void intel_batchbuffer_emit_float(struct brw_context *brw, float f) { intel_batchbuffer_emit_dword(brw, float_as_int(f)); -- 2.3.6 From 08c78f95dfbaa39303d25c3348ba1fa99b04b31b Mon Sep 17 00:00:00 2001 From: Matt Turner Date: Wed, 8 Jul 2015 15:30:48 -0700 Subject: [PATCH 2/2] 2 --- src/mesa/drivers/dri/i965/brw_context.h | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 11 +++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index c7ad35e..c2f6658 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -
Re: [Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy
On Wed, Jul 8, 2015 at 2:07 PM, Ben Widawsky wrote: > WARNING: No perf data, please keep reading though) > > This implements the suggestion provided by the paper, "Fast USWC to WB Memory > Copy" > (https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers). > This is described throughout the paper, but the sample code lives in Figure > 3-3. > That paper purports a roughly 40% performance gain in Mbyte/second over the > original implementation done by Matt. Sounds interesting! > Section 3.1.2 is the summary of why an intermediate cache buffer is used. It > claims that if you use the naive implementation, fill buffers are contended > for. > To be honest, I can't quite fathom the underlying explanation, but I'll think > about it some more. Most importantly would be to get the perf data... This > patch > does need performance data. I don't currently have a platform that this would > benefit (BYT or BSW), so I can't get anything useful. As soon as I get a > platform to test it on, I will - meanwhile, maybe whomever tested the original > patch the first time around come run this through? > > Cc: Matt Turner > Cc: Chad Versace > Cc: Kristian Høgsberg > Signed-off-by: Ben Widawsky > --- > src/mesa/main/streaming-load-memcpy.c | 61 > +++ > 1 file changed, 47 insertions(+), 14 deletions(-) > > diff --git a/src/mesa/main/streaming-load-memcpy.c > b/src/mesa/main/streaming-load-memcpy.c > index d7147af..3cd310a 100644 > --- a/src/mesa/main/streaming-load-memcpy.c > +++ b/src/mesa/main/streaming-load-memcpy.c > @@ -30,6 +30,8 @@ > #include "main/streaming-load-memcpy.h" > #include > > +static uint8_t rsvd_space[4096]; This being static makes this thread-unsafe. Also, I don't know if it's guaranteed to be 16-byte aligned. __attribute__((__aligned__(16))) would avoid any problems, but I think you should do 64-byte alignment to ensure the block is cache-line aligned. Oh, in fact the linked article says as much: "This buffer should also be 64 byte (cache line) aligned." > + > /* Copies memory from src to dst, using SSE 4.1's MOVNTDQA to get streaming > * read performance from uncached memory. > */ > @@ -59,23 +61,54 @@ _mesa_streaming_load_memcpy(void *restrict dst, void > *restrict src, size_t len) >len -= MIN2(bytes_before_alignment_boundary, len); > } > > - while (len >= 64) { > - __m128i *dst_cacheline = (__m128i *)d; > - __m128i *src_cacheline = (__m128i *)s; > + while (len > 64) { The article suggests we align the source to a 64-byte boundary as well. We should do some performance testing around that. It also suggests copying data to a 64-byte aligned destination as well, which we could ensure by making the coalignment check above more strict, though at the cost of skipping this fast path for some other cases. > + __m128i *cached_buffer = (__m128i *)rsvd_space; > + size_t streaming_len = len > 4096 ? 4096 : len; MIN2(len, 4096) > + > + __asm__ volatile("mfence" ::: "memory"); Use _mm_mfence() like the sample code does. > + > + while (streaming_len >= 64) { > + __m128i *src_cacheline = (__m128i *)s; > + > + __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0); > + __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1); > + __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2); > + __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3); > + > + _mm_store_si128(cached_buffer + 0, temp1); > + _mm_store_si128(cached_buffer + 1, temp2); > + _mm_store_si128(cached_buffer + 2, temp3); > + _mm_store_si128(cached_buffer + 3, temp4); > + > + s += 64; > + streaming_len -= 64; > + cached_buffer += 4; > + } > + > + cached_buffer = (__m128i *)rsvd_space; > + streaming_len = len > 4096 ? 4096 : len; MIN2(len, 4096) > + > + __asm__ volatile("mfence" ::: "memory"); _mm_mfence() > + > + while (streaming_len >= 64) { > + __m128i *dst_cacheline = (__m128i *)d; > + > + __m128i temp1 = _mm_stream_load_si128(cached_buffer + 0); > + __m128i temp2 = _mm_stream_load_si128(cached_buffer + 1); > + __m128i temp3 = _mm_stream_load_si128(cached_buffer + 2); > + __m128i temp4 = _mm_stream_load_si128(cached_buffer + 3); _mm_load_si128() > > - __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0); > - __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1); > - __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2); > - __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3); > + _mm_store_si128(dst_cacheline + 0, temp1); > + _mm_store_si128(dst_cacheline + 1, temp2); > + _mm_store_si128(dst_cacheline + 2, temp3); > + _mm_store_si128(dst_cacheline + 3, temp4); _mm_stream_si128() > > - _mm_store_si128(dst_cacheline + 0, temp1); > - _mm_store_si128(dst_cacheline + 1, te
Re: [Mesa-dev] [PATCH] util: Don't link to SHA1 library if shader-cache is disabled.
On Tuesday, July 07, 2015 11:33:57 PM Matt Turner wrote: > Cc: "10.6" > --- > src/util/Makefile.am | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/util/Makefile.am b/src/util/Makefile.am > index 2e7542e..1e087b4 100644 > --- a/src/util/Makefile.am > +++ b/src/util/Makefile.am > @@ -46,9 +46,9 @@ libmesautil_la_SOURCES = \ > > if ENABLE_SHADER_CACHE > libmesautil_la_SOURCES += $(MESA_UTIL_SHADER_CACHE_FILES) > -endif > > libmesautil_la_LIBADD = $(SHA1_LIBS) > +endif > > roundeven_test_LDADD = -lm > > Yeah...haven't thought through a plan yet. This certainly seems reasonable. Reverting it all and pushing it back in later isn't crazy either, but with this patch I'm not sure it's hurting anything... Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().
On Wed, Jul 08, 2015 at 03:33:17PM -0700, Matt Turner wrote: > On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson wrote: > > On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote: > >> By keeping a pointer to the next available location, we reduce the > >> number of memory accesses needed to write to the batchbuffer. > >> > >> A net ~7k reduction of .text size, 7.5k of which is from the change to > >> intel_batchbuffer_emit_dword(). > >> > >>text data bss dec hex filename > >> 4943740 19515226192 5165084 4ed01c i965_dri.so before > >> 4936804 19515226192 5158148 4eb504 i965_dri.so after > >> > >> Combined with the previous patch, improves performance of Synmark > >> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell. > >> --- > >> Full disclosure: when testing on an IVB desktop, I measured a > >> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30). > >> I don't have any explanation. > > > > The problem is that it seems to generate worse code with multiple > > adjacent emit_dwords. I have seen similar regressions when doing the > > same batch[index] to *batch++ elsewhere. > > -Chris > > That is in conflict with the data I've provided. In fact, I started by > noticing that if I added intel_batchbuffer_emit_dword* functions that > took multiple dwords that there was a reduction in .text size, so it's > something that I've considered. This is part of a disassembly of the index version of gen6_viewport_state.c::upload_viewport_state_pointers() (ivb with -march=native -Ofast) 0x0287 <+103>: lea0x1(%rax),%esi 0x028a <+106>: mov%si,0x22f24(%rdi) 0x0291 <+113>: mov%ecx,(%rdx,%rax,4) 0x0294 <+116>: movzwl 0x22f24(%rdi),%eax 0x029b <+123>: mov0x24330(%rdi),%ecx 0x02a1 <+129>: mov0x22f08(%rdi),%rdx 0x02a8 <+136>: lea0x1(%rax),%esi 0x02ab <+139>: mov%si,0x22f24(%rdi) 0x02b2 <+146>: mov%ecx,(%rdx,%rax,4) 0x02b5 <+149>: movzwl 0x22f24(%rdi),%eax 0x02bc <+156>: mov0x2480c(%rdi),%ecx 0x02c2 <+162>: mov0x22f08(%rdi),%rdx 0x02c9 <+169>: lea0x1(%rax),%esi 0x02cc <+172>: mov%si,0x22f24(%rdi) 0x02d3 <+179>: mov%ecx,(%rdx,%rax,4) 0x02d6 <+182>: pop%rbp 0x02d7 <+183>: retq and for comparison the pointer version: 0x0280 <+96>:lea0x4(%rax),%rcx 0x0284 <+100>: mov%rcx,0x22f10(%rdi) 0x028b <+107>: mov%edx,(%rax) 0x028d <+109>: mov0x22f10(%rdi),%rax 0x0294 <+116>: mov0x24338(%rdi),%edx 0x029a <+122>: lea0x4(%rax),%rcx 0x029e <+126>: mov%rcx,0x22f10(%rdi) 0x02a5 <+133>: mov%edx,(%rax) 0x02a7 <+135>: mov0x22f10(%rdi),%rax 0x02ae <+142>: mov0x24814(%rdi),%edx 0x02b4 <+148>: lea0x4(%rax),%rcx 0x02b8 <+152>: mov%rcx,0x22f10(%rdi) 0x02bf <+159>: mov%edx,(%rax) 0x02c1 <+161>: pop%rbp 0x02c2 <+162>: retq So in neither case does gcc avoid incrementing either the index or the pointer in the struct after emit_dword, and then reloads it for the next. This is what I expected to see 0x025e <+62>:movl $0x780d1c02,(%rcx) 0x0264 <+68>:mov0x22f08(%rdi),%rax 0x026b <+75>:mov0x24320(%rdi),%edx 0x0271 <+81>:mov%edx,0x4(%rax) 0x0274 <+84>:mov0x22f08(%rdi),%rax 0x027b <+91>:mov0x24338(%rdi),%edx 0x0281 <+97>:mov%edx,0x8(%rax) 0x0284 <+100>: mov0x22f08(%rdi),%rax 0x028b <+107>: mov0x24814(%rdi),%edx 0x0291 <+113>: mov%edx,0xc(%rax) 0x0294 <+116>: addq $0x10,0x22f08(%rdi) 0x029c <+124>: pop%rbp 0x029d <+125>: retq with the pointer increments coalesced to the end. Generated by opencoding emit_dwords as static void upload_viewport_state_pointers(struct brw_context *brw) { BEGIN_BATCH(4); brw->batch.map[0] = (_3DSTATE_VIEWPORT_STATE_POINTERS << 16 | (4 - 2) | GEN6_CC_VIEWPORT_MODIFY | GEN6_SF_VIEWPORT_MODIFY | GEN6_CLIP_VIEWPORT_MODIFY); brw->batch.map[1] = (brw->clip.vp_offset); brw->batch.map[2] = (brw->sf.vp_offset); brw->batch.map[3] = (brw->cc.vp_offset); brw->batch.map += 4; ADVANCE_BATCH(); } -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.free
Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().
On Thu, Jul 09, 2015 at 12:53:23AM +0100, Chris Wilson wrote: > This is what I expected to see >0x025e <+62>: movl $0x780d1c02,(%rcx) >0x0264 <+68>: mov0x22f08(%rdi),%rax >0x026b <+75>: mov0x24320(%rdi),%edx >0x0271 <+81>: mov%edx,0x4(%rax) >0x0274 <+84>: mov0x22f08(%rdi),%rax >0x027b <+91>: mov0x24338(%rdi),%edx >0x0281 <+97>: mov%edx,0x8(%rax) >0x0284 <+100>: mov0x22f08(%rdi),%rax >0x028b <+107>: mov0x24814(%rdi),%edx >0x0291 <+113>: mov%edx,0xc(%rax) >0x0294 <+116>: addq $0x10,0x22f08(%rdi) >0x029c <+124>: pop%rbp >0x029d <+125>: retq > with the pointer increments coalesced to the end. Generated by > opencoding emit_dwords as > > static void upload_viewport_state_pointers(struct brw_context *brw) > { >BEGIN_BATCH(4); >brw->batch.map[0] = (_3DSTATE_VIEWPORT_STATE_POINTERS << 16 | (4 - 2) | > GEN6_CC_VIEWPORT_MODIFY | > GEN6_SF_VIEWPORT_MODIFY | > GEN6_CLIP_VIEWPORT_MODIFY); >brw->batch.map[1] = (brw->clip.vp_offset); >brw->batch.map[2] = (brw->sf.vp_offset); >brw->batch.map[3] = (brw->cc.vp_offset); >brw->batch.map += 4; >ADVANCE_BATCH(); > } But it is still reloading brw->batch.map everytime! One manual local variable later, 0x025e <+62>:movl $0x780d1c02,(%rdx) 0x0264 <+68>:mov0x24320(%rdi),%eax 0x026a <+74>:mov%eax,0x4(%rdx) 0x026d <+77>:mov0x24338(%rdi),%eax 0x0273 <+83>:mov%eax,0x8(%rdx) 0x0276 <+86>:mov0x24814(%rdi),%eax 0x027c <+92>:mov%eax,0xc(%rdx) 0x027f <+95>:addq $0x10,0x22f08(%rdi) 0x0287 <+103>: pop%rbp 0x0288 <+104>: retq (I hope telling gcc to tune for atom does better.) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] [v2] i965/chv|skl: Apply sampler bypass w/a
Certain compressed formats require this setting. The docs don't go into much detail as to why it's needed exactly. This patch introduces no piglit regressions on gen9 (bsw is untested). Note that the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are WTF. The patch also fixes nothing I can find. http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/ v2: Reworded commit message (Matt); Added piglit results link. Restructured condition (Matt) Moved check out to function (Nanley). I left the setting of the bit in the surface state open coded because it seems to go better with the existing code. Cc: Matt Turner Cc: Nanley Chery Cc: Jordan Justen (aux-hiz needs this too) Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/gen8_surface_state.c | 29 ++ 2 files changed, 30 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 19489ab..3668967 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -276,6 +276,7 @@ #define GEN8_SURFACE_TILING_W (1 << 12) #define GEN8_SURFACE_TILING_X (2 << 12) #define GEN8_SURFACE_TILING_Y (3 << 12) +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE (1 << 9) #define BRW_SURFACE_RC_READ_WRITE (1 << 8) #define BRW_SURFACE_MIPLAYOUT_SHIFT10 #define BRW_SURFACE_MIPMAPLAYOUT_BELOW 0 diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index bd3eb00..9bbe8ae 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -132,6 +132,27 @@ horizontal_alignment(const struct brw_context *brw, } } +static bool +sampler_l2_bypass_disable(struct brw_context *brw, unsigned format) +{ + /* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0 +* bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes): +* +*This bit must be set for the following surface types: BC2_UNORM +*BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM +*/ + if ((brw->gen >= 9 || brw->is_cherryview) && + (format == BRW_SURFACEFORMAT_BC2_UNORM || +format == BRW_SURFACEFORMAT_BC3_UNORM || +format == BRW_SURFACEFORMAT_BC5_UNORM || +format == BRW_SURFACEFORMAT_BC5_SNORM || +format == BRW_SURFACEFORMAT_BC7_UNORM)) { + return true; + } + + return false; +} + static uint32_t * allocate_surface_state(struct brw_context *brw, uint32_t *out_offset, int index) { @@ -238,6 +259,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw, surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES; } + if (sampler_l2_bypass_disable(brw, format)) { + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE; + } + if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP) surf[0] |= GEN8_SURFACE_IS_ARRAY; @@ -465,6 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, horizontal_alignment(brw, mt, surf_type) | surface_tiling_mode(tiling); + if (sampler_l2_bypass_disable(brw, format)) { + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE; + } + surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2; surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) | -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Optimize intel_batchbuffer_emit_dword().
On Wed, Jul 8, 2015 at 4:53 PM, Chris Wilson wrote: > On Wed, Jul 08, 2015 at 03:33:17PM -0700, Matt Turner wrote: >> On Wed, Jul 8, 2015 at 2:07 PM, Chris Wilson >> wrote: >> > On Wed, Jul 08, 2015 at 02:00:02PM -0700, Matt Turner wrote: >> >> By keeping a pointer to the next available location, we reduce the >> >> number of memory accesses needed to write to the batchbuffer. >> >> >> >> A net ~7k reduction of .text size, 7.5k of which is from the change to >> >> intel_batchbuffer_emit_dword(). >> >> >> >>text data bss dec hex filename >> >> 4943740 19515226192 5165084 4ed01c i965_dri.so before >> >> 4936804 19515226192 5158148 4eb504 i965_dri.so after >> >> >> >> Combined with the previous patch, improves performance of Synmark >> >> OglBatch7 by 4.05914% +/- 1.49373% (n=270) on Haswell. >> >> --- >> >> Full disclosure: when testing on an IVB desktop, I measured a >> >> regression in the same benchmark of -4.19005% +/- 1.15188% (n=30). >> >> I don't have any explanation. >> > >> > The problem is that it seems to generate worse code with multiple >> > adjacent emit_dwords. I have seen similar regressions when doing the >> > same batch[index] to *batch++ elsewhere. >> > -Chris >> >> That is in conflict with the data I've provided. In fact, I started by >> noticing that if I added intel_batchbuffer_emit_dword* functions that >> took multiple dwords that there was a reduction in .text size, so it's >> something that I've considered. > > This is part of a disassembly of the index version of > gen6_viewport_state.c::upload_viewport_state_pointers() > (ivb with -march=native -Ofast) > >0x0287 <+103>: lea0x1(%rax),%esi >0x028a <+106>: mov%si,0x22f24(%rdi) >0x0291 <+113>: mov%ecx,(%rdx,%rax,4) >0x0294 <+116>: movzwl 0x22f24(%rdi),%eax >0x029b <+123>: mov0x24330(%rdi),%ecx >0x02a1 <+129>: mov0x22f08(%rdi),%rdx >0x02a8 <+136>: lea0x1(%rax),%esi >0x02ab <+139>: mov%si,0x22f24(%rdi) >0x02b2 <+146>: mov%ecx,(%rdx,%rax,4) >0x02b5 <+149>: movzwl 0x22f24(%rdi),%eax >0x02bc <+156>: mov0x2480c(%rdi),%ecx >0x02c2 <+162>: mov0x22f08(%rdi),%rdx >0x02c9 <+169>: lea0x1(%rax),%esi >0x02cc <+172>: mov%si,0x22f24(%rdi) >0x02d3 <+179>: mov%ecx,(%rdx,%rax,4) >0x02d6 <+182>: pop%rbp >0x02d7 <+183>: retq > > and for comparison the pointer version: > >0x0280 <+96>:lea0x4(%rax),%rcx >0x0284 <+100>: mov%rcx,0x22f10(%rdi) >0x028b <+107>: mov%edx,(%rax) >0x028d <+109>: mov0x22f10(%rdi),%rax >0x0294 <+116>: mov0x24338(%rdi),%edx >0x029a <+122>: lea0x4(%rax),%rcx >0x029e <+126>: mov%rcx,0x22f10(%rdi) >0x02a5 <+133>: mov%edx,(%rax) >0x02a7 <+135>: mov0x22f10(%rdi),%rax >0x02ae <+142>: mov0x24814(%rdi),%edx >0x02b4 <+148>: lea0x4(%rax),%rcx >0x02b8 <+152>: mov%rcx,0x22f10(%rdi) >0x02bf <+159>: mov%edx,(%rax) >0x02c1 <+161>: pop%rbp >0x02c2 <+162>: retq > > So in neither case does gcc avoid incrementing either the index or the > pointer in the struct after emit_dword, and then reloads it for the > next. > > This is what I expected to see >0x025e <+62>:movl $0x780d1c02,(%rcx) >0x0264 <+68>:mov0x22f08(%rdi),%rax >0x026b <+75>:mov0x24320(%rdi),%edx >0x0271 <+81>:mov%edx,0x4(%rax) >0x0274 <+84>:mov0x22f08(%rdi),%rax >0x027b <+91>:mov0x24338(%rdi),%edx >0x0281 <+97>:mov%edx,0x8(%rax) >0x0284 <+100>: mov0x22f08(%rdi),%rax >0x028b <+107>: mov0x24814(%rdi),%edx >0x0291 <+113>: mov%edx,0xc(%rax) >0x0294 <+116>: addq $0x10,0x22f08(%rdi) >0x029c <+124>: pop%rbp >0x029d <+125>: retq > with the pointer increments coalesced to the end. Generated by > opencoding emit_dwords as > > static void upload_viewport_state_pointers(struct brw_context *brw) > { >BEGIN_BATCH(4); >brw->batch.map[0] = (_3DSTATE_VIEWPORT_STATE_POINTERS << 16 | (4 - 2) | > GEN6_CC_VIEWPORT_MODIFY | > GEN6_SF_VIEWPORT_MODIFY | > GEN6_CLIP_VIEWPORT_MODIFY); >brw->batch.map[1] = (brw->clip.vp_offset); >brw->batch.map[2] = (brw->sf.vp_offset); >brw->batch.map[3] = (brw->cc.vp_offset); >brw->
[Mesa-dev] [PATCH 3/4] ilo: unref old fence
From: Rob Clark Some, but not all, state trackers will explicitly unref (and set to NULL) the previous *fence before calling pipe->flush(). So driver should use fence_ref() which will unref the old fence if not NULL. Signed-off-by: Rob Clark --- src/gallium/drivers/ilo/ilo_context.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/ilo/ilo_context.c b/src/gallium/drivers/ilo/ilo_context.c index 3d5c7b6..b9a16aa 100644 --- a/src/gallium/drivers/ilo/ilo_context.c +++ b/src/gallium/drivers/ilo/ilo_context.c @@ -62,6 +62,8 @@ ilo_flush(struct pipe_context *pipe, (flags & PIPE_FLUSH_END_OF_FRAME) ? "frame end" : "user request"); if (f) { + struct pipe_screen *screen = pipe->screen; + screen->fence_reference(screen, f, NULL); *f = ilo_screen_fence_create(pipe->screen, ilo->cp->last_submitted_bo); } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] freedreno: unref old fence
From: Rob Clark Some, but not all, state trackers will explicitly unref (and set to NULL) the previous *fence before calling pipe->flush(). So driver should use fence_ref() which will unref the old fence if not NULL. Signed-off-by: Rob Clark --- src/gallium/drivers/freedreno/freedreno_context.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/freedreno_context.c b/src/gallium/drivers/freedreno/freedreno_context.c index 668ef36..127fb5f 100644 --- a/src/gallium/drivers/freedreno/freedreno_context.c +++ b/src/gallium/drivers/freedreno/freedreno_context.c @@ -144,8 +144,10 @@ fd_context_flush(struct pipe_context *pctx, struct pipe_fence_handle **fence, { fd_context_render(pctx); - if (fence) + if (fence) { + fd_screen_fence_ref(pctx->screen, fence, NULL); *fence = fd_fence_create(pctx); + } } void -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] vc4: unref old fence
From: Rob Clark Some, but not all, state trackers will explicitly unref (and set to NULL) the previous *fence before calling pipe->flush(). So driver should use fence_ref() which will unref the old fence if not NULL. Signed-off-by: Rob Clark --- src/gallium/drivers/vc4/vc4_context.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/vc4/vc4_context.c b/src/gallium/drivers/vc4/vc4_context.c index 630f8e6..316598f 100644 --- a/src/gallium/drivers/vc4/vc4_context.c +++ b/src/gallium/drivers/vc4/vc4_context.c @@ -103,8 +103,10 @@ vc4_pipe_flush(struct pipe_context *pctx, struct pipe_fence_handle **fence, vc4_flush(pctx); if (fence) { +struct pipe_screen *screen = pctx->screen; struct vc4_fence *f = vc4_fence_create(vc4->screen, vc4->last_emit_seqno); +screen->fence_reference(screen, fence, NULL); *fence = (struct pipe_fence_handle *)f; } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] gallium: clarify reference counting for fence
On 09.07.2015 05:34, Rob Clark wrote: > From: Rob Clark > > No where was it spelled out that the state tracker may expect the pipe "Nowhere"? > driver to unref the old fence. > > Signed-off-by: Rob Clark Either way, the series is Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/11] radeon: remove dri_mirror state
On 09.07.2015 02:07, Emil Velikov wrote: > Most of the data stored(duplicated) was unused, and for the one that is > follow the approach set by other drivers. > This eliminates the use of legacy (dri1) types. The commentary below should have been after the --- separator, not in the actual Git commit log. With that fixed, this patch is Reviewed-by: Michel Dänzer > XXX: The radeon code is the only user of __DRIscreen::drm_version (the > only __DRIversion outside of dri1 land). Should we move it into radeon > and/or bump the min. required drm module version ? Moving this into radeon sounds good to me, but I'm not sure what exactly you mean by bumping the minimum required version, or what it's supposed to be good for. FWIW though, any code which is specific to radeon DRM major version 1 can be removed, because that's the UMS major version. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91259] FS compile failed: Register spilling not supported with m14 used
https://bugs.freedesktop.org/show_bug.cgi?id=91259 Kenneth Graunke changed: What|Removed |Added Component|Other |Drivers/DRI/i965 Assignee|mesa-dev@lists.freedesktop. |i...@freedesktop.org |org | QA Contact|mesa-dev@lists.freedesktop. |intel-3d-bugs@lists.freedes |org |ktop.org -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] vc4: unref old fence
Rob Clark writes: > From: Rob Clark > > Some, but not all, state trackers will explicitly unref (and set to > NULL) the previous *fence before calling pipe->flush(). So driver > should use fence_ref() which will unref the old fence if not NULL. > > Signed-off-by: Rob Clark Acked-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: fix sampler/ubo indexing on cayman
From: Dave Airlie Cayman needs a different method to upload the CF IDX0/1 This fixes 31 piglits when ARB_gpu_shader5 is forced on with cayman. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/eg_asm.c | 17 +++-- src/gallium/drivers/r600/eg_sq.h | 11 +++ 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/eg_asm.c b/src/gallium/drivers/r600/eg_asm.c index d04921e..c32d317 100644 --- a/src/gallium/drivers/r600/eg_asm.c +++ b/src/gallium/drivers/r600/eg_asm.c @@ -161,6 +161,9 @@ int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool inside_alu_c alu.op = ALU_OP1_MOVA_INT; alu.src[0].sel = bc->index_reg[id]; alu.src[0].chan = 0; + if (bc->chip_class == CAYMAN) + alu.dst.sel = id == 0 ? CM_V_SQ_MOVA_DST_CF_IDX0 : CM_V_SQ_MOVA_DST_CF_IDX1; + alu.last = 1; r = r600_bytecode_add_alu(bc, &alu); if (r) @@ -168,12 +171,14 @@ int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool inside_alu_c bc->ar_loaded = 0; /* clobbered */ - memset(&alu, 0, sizeof(alu)); - alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1; - alu.last = 1; - r = r600_bytecode_add_alu(bc, &alu); - if (r) - return r; + if (bc->chip_class == EVERGREEN) { + memset(&alu, 0, sizeof(alu)); + alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1; + alu.last = 1; + r = r600_bytecode_add_alu(bc, &alu); + if (r) + return r; + } /* Must split ALU group as index only applies to following group */ if (inside_alu_clause) { diff --git a/src/gallium/drivers/r600/eg_sq.h b/src/gallium/drivers/r600/eg_sq.h index b534872..10caa07 100644 --- a/src/gallium/drivers/r600/eg_sq.h +++ b/src/gallium/drivers/r600/eg_sq.h @@ -521,4 +521,15 @@ #define V_SQ_REL_ABSOLUTE 0 #define V_SQ_REL_RELATIVE 1 + +/* CAYMAN has special encoding for MOVA_INT destination */ +#define CM_V_SQ_MOVA_DST_AR_X 0 +#define CM_V_SQ_MOVA_DST_CF_PC 1 +#define CM_V_SQ_MOVA_DST_CF_IDX0 2 +#define CM_V_SQ_MOVA_DST_CF_IDX1 3 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_7_0 4 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_15_8 5 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_23_16 6 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_31_24 7 + #endif -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91222] lp_test_format regression on CentOS 7
https://bugs.freedesktop.org/show_bug.cgi?id=91222 Luke changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Luke --- http://cgit.freedesktop.org/mesa/mesa/commit/?id=7b06af9d3ca7310197d39d55fc52c265da4bc59e Fixed this -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Replace illegal compacted NOP with valid compact instruction
NOP actually has no compact version, but we use it for instruction alignment for compact kernel. Although it seems working on HW, it is illegal and might not be valid for any future one. This trys to get a temporary compact instruction with no effect for alignment to replace compacted NOP. G45 spec has note that HW compact logic could determine NENOP and drop it right away, so we can still keep with that. v2: rebase to master, we still need this to work with internal tool. Signed-off-by: Zhenyu Wang --- src/mesa/drivers/dri/i965/brw_eu_compact.c | 41 +- 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c b/src/mesa/drivers/dri/i965/brw_eu_compact.c index 67f0b45..719667a 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c @@ -1367,6 +1367,39 @@ brw_init_compaction_tables(const struct brw_device_info *devinfo) } } +static void +brw_get_noop_compact(struct brw_codegen *p, brw_compact_inst *dst) +{ + const struct brw_device_info *devinfo = p->devinfo; + brw_inst *inst, i; + struct brw_reg g0 = brw_vec8_grf(0, 0); + + memset(dst, 0, sizeof(*dst)); + + /* G45 compact logic could recognize NENOP and drop right away. */ + if (devinfo->is_g4x) { + brw_compact_inst_set_opcode(dst, BRW_OPCODE_NENOP); + brw_compact_inst_set_cmpt_control(dst, true); + return; + } + + /* +* As NOP has no legal compact version, try to use a legal compact +* instruction for compact instruction alignment. +*/ + brw_push_insn_state(p); + brw_set_default_predicate_control(p, BRW_PREDICATE_NONE); + brw_set_default_access_mode(p, BRW_ALIGN_1); + inst = brw_MOV(p, g0, g0); + memcpy(&i, inst, sizeof(brw_inst)); + brw_pop_insn_state(p); + + if (!brw_try_compact_instruction(devinfo, dst, &i)) { + fprintf(stderr, "Failed to generate compact inst for alignment!\n"); + exit(1); + } +} + void brw_compact_instructions(struct brw_codegen *p, int start_offset, int num_annotations, struct annotation *annotation) @@ -1414,9 +1447,7 @@ brw_compact_instructions(struct brw_codegen *p, int start_offset, /* All uncompacted instructions need to be aligned on G45. */ if ((offset & sizeof(brw_compact_inst)) != 0 && devinfo->is_g4x){ brw_compact_inst *align = store + offset; -memset(align, 0, sizeof(*align)); -brw_compact_inst_set_opcode(align, BRW_OPCODE_NENOP); -brw_compact_inst_set_cmpt_control(align, true); +brw_get_noop_compact(p, align); offset += sizeof(brw_compact_inst); compacted_count--; compacted_counts[src_offset / sizeof(brw_inst)] = compacted_count; @@ -1523,9 +1554,7 @@ brw_compact_instructions(struct brw_codegen *p, int start_offset, */ if (p->next_insn_offset & sizeof(brw_compact_inst)) { brw_compact_inst *align = store + offset; - memset(align, 0, sizeof(*align)); - brw_compact_inst_set_opcode(align, BRW_OPCODE_NOP); - brw_compact_inst_set_cmpt_control(align, true); + brw_get_noop_compact(p, align); p->next_insn_offset += sizeof(brw_compact_inst); } p->nr_insn = p->next_insn_offset / sizeof(brw_inst); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: move sampler/ubo index registers before temp reg
From: Dave Airlie temp_reg needs to be last, as we increment things away from it, otherwise on cayman some tests were overwriting the index regs. Fixes 2 piglit with ARB_gpu_shader5 forced on cayman. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/r600_shader.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index af7622e..1a72bf6 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1931,15 +1931,14 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, ctx.file_offset[TGSI_FILE_IMMEDIATE] = V_SQ_ALU_SRC_LITERAL; ctx.bc->ar_reg = ctx.file_offset[TGSI_FILE_TEMPORARY] + ctx.info.file_max[TGSI_FILE_TEMPORARY] + 1; + ctx.bc->index_reg[0] = ctx.bc->ar_reg + 1; + ctx.bc->index_reg[1] = ctx.bc->ar_reg + 2; + if (ctx.type == TGSI_PROCESSOR_GEOMETRY) { - ctx.gs_export_gpr_treg = ctx.bc->ar_reg + 1; - ctx.temp_reg = ctx.bc->ar_reg + 2; - ctx.bc->index_reg[0] = ctx.bc->ar_reg + 3; - ctx.bc->index_reg[1] = ctx.bc->ar_reg + 4; + ctx.gs_export_gpr_treg = ctx.bc->ar_reg + 3; + ctx.temp_reg = ctx.bc->ar_reg + 4; } else { - ctx.temp_reg = ctx.bc->ar_reg + 1; - ctx.bc->index_reg[0] = ctx.bc->ar_reg + 2; - ctx.bc->index_reg[1] = ctx.bc->ar_reg + 3; + ctx.temp_reg = ctx.bc->ar_reg + 3; } shader->max_arrays = 0; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] r600g/sb: add support for multiple streams to SB backend
From: Glenn Kennard This adds a peephole and removes an assert that isn't actually valid with some of the stream emit instructions. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 -- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 18 +++--- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index 8c2cd14..48d56ac 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -761,8 +761,6 @@ void bc_finalizer::finalize_cf(cf_node* c) { mask |= (1 << chan); } - assert(reg >= 0 && mask); - if (reg >= 0) update_ngpr(reg); diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index 4879c03..25b0f55 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -757,10 +757,22 @@ int bc_parser::prepare_ir() { c->bc.end_of_program = eop; } else if (flags & CF_EMIT) { - c->flags |= NF_DONT_KILL | NF_DONT_HOIST | NF_DONT_MOVE; + /* quick peephole */ + cf_node *prev = static_cast(c->prev); + if (c->bc.op == CF_OP_CUT_VERTEX && + prev && prev->is_valid() && + prev->bc.op == CF_OP_EMIT_VERTEX && + c->bc.count == prev->bc.count) { + prev->bc.set_op(CF_OP_EMIT_CUT_VERTEX); + prev->bc.end_of_program = c->bc.end_of_program; + c->remove(); + } + else { + c->flags |= NF_DONT_KILL | NF_DONT_HOIST | NF_DONT_MOVE; - c->src.push_back(sh->get_special_value(SV_GEOMETRY_EMIT)); - c->dst.push_back(sh->get_special_value(SV_GEOMETRY_EMIT)); + c->src.push_back(sh->get_special_value(SV_GEOMETRY_EMIT)); + c->dst.push_back(sh->get_special_value(SV_GEOMETRY_EMIT)); + } } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] r600g: enable ARB_gpu_shader5 on evergreen and up
From: Dave Airlie Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/r600_pipe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 67caa69..0db1c1c 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -299,7 +299,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_GLSL_FEATURE_LEVEL: if (family >= CHIP_CEDAR) - return 330; + return 400; /* pre-evergreen geom shaders need newer kernel */ if (rscreen->b.info.drm_minor >= 37) return 330; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] r600 geometry streams (and ARB_gpu_shader5 support)
This applies on top of the two patches I've sent already, and enables the geometry streams, which is the final piece missing for ARB_gpu_shader5 on evergreen and cayman. (I'll do doc update patches later) Glenn wrote most of this, I just spent some time making it work and cleaning up the code. Though I suspect it could do with more cleaning. The main thing I don't really like is we try and emit all outputs into each ring so we waste a bit of space, however it does work, though the special casing to avoid POSITION going into any other streams kinda makes me wonder if anything else needs that special casing. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] r600g: add support for streams to the assembler.
From: Glenn Kennard This just adds support to the assembler dumper and allows stream instructions to be generated. Also fix up the stream debugging to add stream info. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/eg_asm.c | 1 + src/gallium/drivers/r600/r600_asm.c| 2 ++ src/gallium/drivers/r600/r600_asm.h| 1 + src/gallium/drivers/r600/r600_shader.c | 6 -- src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 3 +++ 5 files changed, 11 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/eg_asm.c b/src/gallium/drivers/r600/eg_asm.c index 42e8b0b..c32d317 100644 --- a/src/gallium/drivers/r600/eg_asm.c +++ b/src/gallium/drivers/r600/eg_asm.c @@ -115,6 +115,7 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct r600_bytecode_cf *cf) S_SQ_CF_WORD1_BARRIER(1) | S_SQ_CF_WORD1_COND(cf->cond) | S_SQ_CF_WORD1_POP_COUNT(cf->pop_count) | + S_SQ_CF_WORD1_COUNT(cf->count) | S_SQ_CF_WORD1_END_OF_PROGRAM(cf->end_of_program); } } diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 762cc7f..40639d0 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -2029,6 +2029,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) fprintf(stderr, "CND:%X ", cf->cond); if (cf->pop_count) fprintf(stderr, "POP:%X ", cf->pop_count); + if (cf->count && (cfop->flags & CF_EMIT)) + fprintf(stderr, "STREAM%d ", cf->count); fprintf(stderr, "\n"); } } diff --git a/src/gallium/drivers/r600/r600_asm.h b/src/gallium/drivers/r600/r600_asm.h index e37d926..b282907 100644 --- a/src/gallium/drivers/r600/r600_asm.h +++ b/src/gallium/drivers/r600/r600_asm.h @@ -149,6 +149,7 @@ struct r600_bytecode_cf { unsignedid; unsignedcond; unsignedpop_count; + unsignedcount; unsignedcf_addr; /* control flow addr */ struct r600_bytecode_kcache kcache[4]; unsignedr6xx_uses_waterfall; diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 1a72bf6..dbff313 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -93,8 +93,10 @@ static void r600_dump_streamout(struct pipe_stream_output_info *so) for (i = 0; i < so->num_outputs; i++) { unsigned mask = ((1 << so->output[i].num_components) - 1) << so->output[i].start_component; - fprintf(stderr, " %i: MEM_STREAM0_BUF%i[%i..%i] <- OUT[%i].%s%s%s%s%s\n", - i, so->output[i].output_buffer, + fprintf(stderr, " %i: MEM_STREAM%d_BUF%i[%i..%i] <- OUT[%i].%s%s%s%s%s\n", + i, + so->output[i].stream, + so->output[i].output_buffer, so->output[i].dst_offset, so->output[i].dst_offset + so->output[i].num_components - 1, so->output[i].register_index, mask & 1 ? "x" : "", diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp index 5232782..631fac2 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp @@ -182,6 +182,9 @@ void bc_dump::dump(cf_node& n) { if (n.bc.pop_count) s << " POP:" << n.bc.pop_count; + + if (n.bc.count && (n.bc.op_ptr->flags & CF_EMIT)) + s << " STREAM" << n.bc.count; } if (!n.bc.barrier) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] radeon: add streamout status 1-3 queries.
From: Glenn Kennard This adds support for queries against the non-0 vertex streams. Signed-off-by: Dave Airlie --- src/gallium/drivers/radeon/r600_query.c | 18 -- src/gallium/drivers/radeon/r600d_common.h | 3 +++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_query.c b/src/gallium/drivers/radeon/r600_query.c index 71f4a15..9ad2452 100644 --- a/src/gallium/drivers/radeon/r600_query.c +++ b/src/gallium/drivers/radeon/r600_query.c @@ -54,6 +54,8 @@ struct r600_query { uint64_t end_result; /* Fence for GPU_FINISHED. */ struct pipe_fence_handle *fence; + /* For transform feedback: which stream the query is for */ + unsigned stream; }; @@ -157,6 +159,17 @@ static void r600_update_occlusion_query_state(struct r600_common_context *rctx, } } +static unsigned event_type_for_stream(struct r600_query *query) +{ + switch (query->stream) { + default: + case 0: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS; + case 1: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS1; /* enum values snarfed from SI kernel sid.h */ + case 2: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS2; + case 3: return EVENT_TYPE_SAMPLE_STREAMOUTSTATS3; + } +} + static void r600_emit_query_begin(struct r600_common_context *ctx, struct r600_query *query) { struct radeon_winsys_cs *cs = ctx->rings.gfx.cs; @@ -191,7 +204,7 @@ static void r600_emit_query_begin(struct r600_common_context *ctx, struct r600_q case PIPE_QUERY_SO_STATISTICS: case PIPE_QUERY_SO_OVERFLOW_PREDICATE: radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0)); - radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3)); + radeon_emit(cs, EVENT_TYPE(event_type_for_stream(query)) | EVENT_INDEX(3)); radeon_emit(cs, va); radeon_emit(cs, (va >> 32UL) & 0xFF); break; @@ -248,7 +261,7 @@ static void r600_emit_query_end(struct r600_common_context *ctx, struct r600_que case PIPE_QUERY_SO_OVERFLOW_PREDICATE: va += query->buffer.results_end + query->result_size/2; radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0)); - radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3)); + radeon_emit(cs, EVENT_TYPE(event_type_for_stream(query)) | EVENT_INDEX(3)); radeon_emit(cs, va); radeon_emit(cs, (va >> 32UL) & 0xFF); break; @@ -369,6 +382,7 @@ static struct pipe_query *r600_create_query(struct pipe_context *ctx, unsigned q /* NumPrimitivesWritten, PrimitiveStorageNeeded. */ query->result_size = 32; query->num_cs_dw = 6; + query->stream = index; break; case PIPE_QUERY_PIPELINE_STATISTICS: /* 11 values on EG, 8 on R600. */ diff --git a/src/gallium/drivers/radeon/r600d_common.h b/src/gallium/drivers/radeon/r600d_common.h index 74c8d87..5a56a54 100644 --- a/src/gallium/drivers/radeon/r600d_common.h +++ b/src/gallium/drivers/radeon/r600d_common.h @@ -66,6 +66,9 @@ #define PKT3_SET_SH_REG0x76 /* SI and later */ #define PKT3_SET_UCONFIG_REG 0x79 /* CIK and later */ +#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS1 0x1 /* EG and later */ +#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS2 0x2 /* EG and later */ +#define EVENT_TYPE_SAMPLE_STREAMOUTSTATS3 0x3 /* EG and later */ #define EVENT_TYPE_PS_PARTIAL_FLUSH0x10 #define EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT 0x14 #define EVENT_TYPE_ZPASS_DONE 0x15 -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] r600g: add streamout support
From: Glenn Kennard This adds the main chunk of the geometry shader multiple stream support to the r600 driver. Glenn wrote the original pass, and I took his code and hacked it into a working state. Signed-off-by: Dave Airlie --- src/gallium/drivers/r600/evergreen_state.c | 29 ++-- src/gallium/drivers/r600/r600_pipe.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 200 --- src/gallium/drivers/r600/r600_shader.h | 6 +- src/gallium/drivers/r600/r600_state.c| 6 +- src/gallium/drivers/r600/r600_state_common.c | 7 + 6 files changed, 180 insertions(+), 70 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 4ddbc0b..788bf54 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2988,8 +2988,12 @@ void evergreen_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader struct r600_command_buffer *cb = &shader->command_buffer; struct r600_shader *rshader = &shader->shader; struct r600_shader *cp_shader = &shader->gs_copy_shader->shader; - unsigned gsvs_itemsize = - (cp_shader->ring_item_size * rshader->gs_max_out_vertices) >> 2; + unsigned gsvs_itemsizes[4] = { + (cp_shader->ring_item_sizes[0] * rshader->gs_max_out_vertices) >> 2, + (cp_shader->ring_item_sizes[1] * rshader->gs_max_out_vertices) >> 2, + (cp_shader->ring_item_sizes[2] * rshader->gs_max_out_vertices) >> 2, + (cp_shader->ring_item_sizes[3] * rshader->gs_max_out_vertices) >> 2 + }; r600_init_command_buffer(cb, 64); @@ -3008,21 +3012,24 @@ void evergreen_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader S_028B90_ENABLE(rshader->gs_num_invocations > 0)); } r600_store_context_reg_seq(cb, R_02891C_SQ_GS_VERT_ITEMSIZE, 4); - r600_store_value(cb, cp_shader->ring_item_size >> 2); - r600_store_value(cb, 0); - r600_store_value(cb, 0); - r600_store_value(cb, 0); + r600_store_value(cb, cp_shader->ring_item_sizes[0] >> 2); + r600_store_value(cb, cp_shader->ring_item_sizes[1] >> 2); + r600_store_value(cb, cp_shader->ring_item_sizes[2] >> 2); + r600_store_value(cb, cp_shader->ring_item_sizes[3] >> 2); r600_store_context_reg(cb, R_028900_SQ_ESGS_RING_ITEMSIZE, - (rshader->ring_item_size) >> 2); + (rshader->ring_item_sizes[0]) >> 2); r600_store_context_reg(cb, R_028904_SQ_GSVS_RING_ITEMSIZE, - gsvs_itemsize); + gsvs_itemsizes[0] + + gsvs_itemsizes[1] + + gsvs_itemsizes[2] + + gsvs_itemsizes[3]); r600_store_context_reg_seq(cb, R_02892C_SQ_GSVS_RING_OFFSET_1, 3); - r600_store_value(cb, gsvs_itemsize); - r600_store_value(cb, gsvs_itemsize); - r600_store_value(cb, gsvs_itemsize); + r600_store_value(cb, gsvs_itemsizes[0]); + r600_store_value(cb, gsvs_itemsizes[0] + gsvs_itemsizes[1]); + r600_store_value(cb, gsvs_itemsizes[0] + gsvs_itemsizes[1] + gsvs_itemsizes[2]); /* FIXME calculate these values somehow ??? */ r600_store_context_reg_seq(cb, R_028A54_GS_PER_ES, 3); diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 143e98e..67caa69 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -352,7 +352,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS: return 16384; case PIPE_CAP_MAX_VERTEX_STREAMS: - return 1; + return family >= CHIP_CEDAR ? 4 : 1; case PIPE_CAP_MAX_VERTEX_ATTRIB_STRIDE: return 2047; diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index dbff313..ef19706 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -311,7 +311,9 @@ struct r600_shader_ctx { int gs_out_ring_offset; int gs_next_vertex; struct r600_shader *gs_for_vs; - int gs_export_gpr_treg; + int gs_export_gpr_tregs[4]; + const struct pipe_stream_output_info*gs_stream_output_info; + unsignedenabled_stream_buffers_mask; /* See r600_streamout.enabled_stream_buffers_mask */ }; struct r600_shader_tgsi_instruction { @@ -319,7 +321,7 @@ struct r600_shader_tgsi_instruction
[Mesa-dev] [PATCH 3/6] radeon: add support for streams to the common streamout code.
From: Glenn Kennard This just adds to the common radeon streamout code, support for multiple streams. Signed-off-by: Dave Airlie --- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeon/r600_streamout.c | 23 +-- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index a471426..22d940e 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -327,6 +327,7 @@ struct r600_streamout { /* External state which comes from the vertex shader, * it must be set explicitly when binding a shader. */ unsigned*stride_in_dw; + unsignedenabled_stream_buffers_mask; /* stream0 buffers0-3 in 4 LSB */ /* The state of VGT_STRMOUT_(CONFIG|EN). */ struct r600_atomenable_atom; diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index bc8bf97..a602dac 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -195,7 +195,11 @@ static void r600_emit_streamout_begin(struct r600_common_context *rctx, struct r r600_write_context_reg(cs, rctx->chip_class >= EVERGREEN ? R_028B98_VGT_STRMOUT_BUFFER_CONFIG : R_028B20_VGT_STRMOUT_BUFFER_EN, - rctx->streamout.enabled_mask); + (rctx->streamout.enabled_mask | + (rctx->streamout.enabled_mask << 4) | + (rctx->streamout.enabled_mask << 8) | + (rctx->streamout.enabled_mask << 12)) & + rctx->streamout.enabled_stream_buffers_mask); for (i = 0; i < rctx->streamout.num_targets; i++) { if (!t[i]) @@ -326,11 +330,18 @@ static bool r600_get_strmout_en(struct r600_common_context *rctx) static void r600_emit_streamout_enable(struct r600_common_context *rctx, struct r600_atom *atom) { - r600_write_context_reg(rctx->rings.gfx.cs, - rctx->chip_class >= EVERGREEN ? - R_028B94_VGT_STRMOUT_CONFIG : - R_028AB0_VGT_STRMOUT_EN, - S_028B94_STREAMOUT_0_EN(r600_get_strmout_en(rctx))); + unsigned reg = R_028AB0_VGT_STRMOUT_EN; + unsigned val = S_028B94_STREAMOUT_0_EN(r600_get_strmout_en(rctx)); + if (rctx->chip_class >= EVERGREEN) { + reg = R_028B94_VGT_STRMOUT_CONFIG; + val |= + S_028B94_RAST_STREAM(0) | + S_028B94_STREAMOUT_1_EN(r600_get_strmout_en(rctx)) | + S_028B94_STREAMOUT_2_EN(r600_get_strmout_en(rctx)) | + S_028B94_STREAMOUT_3_EN(r600_get_strmout_en(rctx)); + } + + r600_write_context_reg(rctx->rings.gfx.cs, reg, val); } static void r600_set_streamout_enable(struct r600_common_context *rctx, bool enable) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/78] i965/nir/vec4: Implement store_output intrinsic
On 06/30/2015 06:51 PM, Jason Ekstrand wrote: > On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev wrote: >> The index into the output_reg array where to store the destination register >> is >> fetched from the nir_outputs map built during nir_setup_outputs stage. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >> --- >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 17 +++-- >> 1 file changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> index 8a2d335..55d4490 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >> @@ -520,10 +520,23 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr >> *instr) >> } >> >> case nir_intrinsic_store_output_indirect: >> + has_indirect = true; >>/* fallthrough */ >> - case nir_intrinsic_store_output: >> - /* @TODO: Not yet implemented */ >> + case nir_intrinsic_store_output: { >> + int offset = instr->const_index[0]; >> + int output = nir_outputs[offset]; >> + >> + src = get_nir_src(instr->src[0], nir_output_types[offset]); >> + dest = dst_reg(src); >> + >> + dest.writemask = brw_writemask_for_size(instr->num_components); >> + >> + if (has_indirect) >> + dest.reladdr = new(mem_ctx) src_reg(get_nir_src(instr->src[1])); >> + >> + output_reg[output] = dest; > > I'm very confused about the amount of indirection going on here. It > seems to me that we should be setting these outputs up in > setup_outputs() rather than storring off a map from ints to other ints > and setting it up here. I didn't make this comment on the patch for > setup_outputs() because I wanted to wait to see it used before I > commented on it. > > I'm guessing you did it this way because the nir_assign_var_locations > is giving you bogus values. If so, then it might be better to just > assign variable locations in setup_outputs() rather than having a > remap table. The whole point of nir_lower_io is to make IO easy for > the back-end. If you need a re-map table, then it's no longer making > it easy and we need to think more about what's going on. > --Jason > That double indirection felt bad since the beginning, but it was needed to store the original variable's location (var->data.location). Let me explain: We are (re)using the plumbering in vec4_visitor to setup URB, so the only thing we need to do is to store the out register in "output_reg" map at the correct location. And that location is given by the original location in the shader (var->data.location). So, in this case, "nir_assign_var_locations" pass, which constructs var->data.driver_location, is not useful to us, except to give us consecutive indexes to construct the other map we have, the type map, which is needed to carry the correct type from the original variable to the output register. So, before knowing that I could modify nir_lower_io, my best shot at transferring the original variable location was to create this nir_outputs map. Now, what I have done is to put that value in const_index[1] of the intrinsic instruction, which was previously unused. What do you think? That removes the offset to offset map, but we still need the type map. About your comment on initializing the register during setup stage, I'm a bit confused: the register that we need to store is not available during setup stage, because we still don't have local registers allocated. >>break; >> + } >> >> case nir_intrinsic_load_vertex_id: >>unreachable("should be lowered by lower_vertex_id()"); >> -- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev