from:"Ben Widawsky"

Re: [Mesa-dev] [PATCH 2/5] mesa/st: Use global function _mesa_regions_overlap()

2015-06-10 Thread Ben Widawsky

On Wed, Jun 10, 2015 at 03:34:50PM -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 

1 and 2 are:
Reviewed-by: Ben Widawsky 

[snip]

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: correct alignment units for 2D compressed textures on Skylake

2015-06-11 Thread Ben Widawsky

On Wed, Jun 10, 2015 at 05:01:44PM -0700, Nanley Chery wrote:
> From: Nanley Chery 
> 
> On Gen9+, vertical and horizontal alignment values for compressed textures are
> equal to the pre-Gen9 value squared. Each miplevel must be aligned to this
> value.
> 
> Signed-off-by: Nanley Chery 

While not a requirement, for future reference you should add Cc on patches which
are touching something someone recently changed/and or was working on. In this
case Ccing Neil would have been great.

> ---
> 
> This fixes an FXT1 Piglit test regression and shows no failures on Jenkins. 

You ran full piglit on SKL? Jenkins won't and it's pretty important for this
patch.

> 
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 20 
>  1 file changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index 312a887..dffc699 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -149,15 +149,8 @@ intel_horizontal_texture_alignment_unit(struct 
> brw_context *brw,
>unsigned int i, j;
>_mesa_get_format_block_size(mt->format, &i, &j);
>  
> -  /* On Gen9+ we can pick our own alignment for compressed textures but 
> it
> -   * has to be a multiple of the block size. The minimum alignment we can
> -   * pick is 4 so we effectively have to align to 4 times the block
> -   * size
> -   */
> -  if (brw->gen >= 9)
> - return i * 4;
> -  else
> - return i;
> +  /* On Gen9+ the alignment value is squared. */
> +  return brw->gen >= 9 ? i * i : i;

I don't think this is right. Isn't this going to push non compressed textures to
an invalid HALIGN when we divide later ie. don't you get 1?

>  }
>  
> if (mt->format == MESA_FORMAT_S_UINT8)
> @@ -269,9 +262,12 @@ intel_vertical_texture_alignment_unit(struct brw_context 
> *brw,
>  * Where "*" means either VALIGN_2 or VALIGN_4 depending on the setting of
>  * the SURFACE_STATE "Surface Vertical Alignment" field.
>  */
> -   if (_mesa_is_format_compressed(mt->format))
> +   if (_mesa_is_format_compressed(mt->format)) {
> +  unsigned int i, j;
> +  _mesa_get_format_block_size(mt->format, &i, &j);
>/* See comment above for the horizontal alignment */

I think you need to kill this comment now. Doesn't it refer to what you killed
above?

> -  return brw->gen >= 9 ? 16 : 4;
> +  return brw->gen >= 9 ? j * j : j;
> +   }

It kind of looks like this is just working around the way Neil implemented the
divide later on. There's probably a nicer way to do the right thing, but I
haven't tried it myself. Also, I believe this doesn't actually fix anything, it
just uses slightly more optimal aligns - ie. maybe do a separate patch. No big
deal though.

>  
> if (mt->format == MESA_FORMAT_S_UINT8)
>return brw->gen >= 7 ? 8 : 4;
> @@ -379,7 +375,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
>  
> if (mt->compressed) {
>mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
> - ALIGN(minify(mt->physical_width0, 2), bw);
> + ALIGN(minify(mt->physical_width0, 2), mt->align_w);
> } else {
>mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) +
>   minify(mt->physical_width0, 2);

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] What branch to get patch 47790

2015-06-16 Thread Ben Widawsky

On Tue, Jun 16, 2015 at 03:46:26PM -0700, Kenneth Graunke wrote:
> On Tuesday, June 16, 2015 10:08:38 PM Meng, David wrote:
> > Hi:
> > I am new to this email list.  I would like to get a help from you.
> > 
> > I found a patch with number of 47790 which supports Intel Broadwell(BDW) 
> > system gen8 GPU.  The author is Topi Pohjolainen.  The description is in 
> > below.
> > I need this patch to launch a virtual machine on BDW system in which we are 
> > using Mesa library in user space.  But I could not find this patch in mesa 
> > master or any branches.  Would you please pint me where I can find a branch 
> > including this patch? 
> > 
> > I highly appreciate any help.
> > 
> > Regards,
> > 
> > David
> > patch title and 
> > description-
> > [Mesa-dev] i965: Don't use gl-context for fbo-blits
> > This series introduces new blorp parameter type for blit programs
> > compiled from glsl-sources. For most parts the launch logic just
> > calls core i965 batch emission logic.
> > Vertex batches are handcrafted containing full vertex header
> > information. This is needed because the pipeline is programmed to
> > skip vertex shader, clip and viewport transformation in strips&fans
> > (SF) but to provide the vertices directly from vertex fetcher (VF)
> > to the windower (WM).
> > 
> > Topi Pohjolainen (14):
> >   i965/blorp/gen7: Support for loading glsl-based fragment shaders
> >   i965/blorp/gen6: Support for loading glsl-based fragment shaders
> >   meta: Provide read access to blit shaders
> >   i965/meta: Add helper for looking up blit programs
> >   i965/blorp: Add plumbing for glsl-based color blits
> >   i965/blorp: Add support for loading vertices for glsl-based blits
> >   i965/blorp: Add support for setting up surfaces for glsl-based blits
> >   i965/blorp: Add support for setting samplers for glsl-based blits
> >   i965/gen6: Add support for setting minimum layer for tex surfaces
> >   i965/blorp: Enable glsl-based fbo blits
> >   i965/blorp/gen7: Prepare re-using for gen8
> >   i965/blorp/gen7: Expose state setup applicable to gen8
> >   i965/blorp/gen6: Prepare vertex buffer setup logic for gen8
> >   i965/blorp/gen8: Execution support
> 
> Hi David,
> 
> I'm not sure what you mean by "patch with a number of 47790".  We don't
> number patches in the Mesa community.  That must be some Intel internal
> number.

It's the patchwork id of the last patch in the series:
http://patchwork.freedesktop.org/patch/47790/

[snip]
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/skl: Fix aligning mt->total_width to the block size

2015-06-16 Thread Ben Widawsky

On Tue, Jun 16, 2015 at 01:53:40PM +0100, Neil Roberts wrote:
> brw_miptree_layout_2d tries to ensure that mt->total_width is a
> multiple of the compressed block size, presumably because it wouldn't
> be possible to make an image that has a fraction of a block. However
> it was doing this by aligning mt->total_width to align_w. Previously
> align_w has been used as a shortcut for getting the block width
> because before Gen9 the block width was always equal to the alignment.
> Commit 4ab8d59a2 tried to fix these cases to use the block width
> instead of the alignment but it missed this case.
> 
> I think in practice this probably won't make any difference because
> the buffer for the texture will be allocated to be large enough to
> contain the entire pitch and libdrm aligns the pitch to the tile width
> anyway. However I think the patch is worth having to make the
> intention clearer.

I think this is beginning to infringe upon the definition of align_w. The total
width is a function of it's miptree properties and not the compressed block
properties, right?

In other words, if there is a case where align_w != bw, I think total_width
should be aligned to align_w, NOT bw.

(I'm not opposed to the patch, just making sure I understand.)

> ---
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index 1e7d8a1..dbb6cef 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -366,9 +366,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
>  
> mt->total_width = mt->physical_width0;
>  
> -   if (mt->compressed) {
> -   mt->total_width = ALIGN(mt->physical_width0, mt->align_w);
> -   }
> +   if (mt->compressed)
> +   mt->total_width = ALIGN(mt->total_width, bw);
>  
> /* May need to adjust width to accommodate the placement of
>  * the 2nd mipmap.  This occurs when the alignment
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/9] i965: Fix textureGrad with cube samplers

2015-06-16 Thread Ben Widawsky

ses for now.
> +*
> +* For cube maps the result of these formulas is giving us a value of rho
> +* that is twice the value we should use, so divide it by 2 or,
> +* alternatively, remove one unit from the result of the log2 computation.
> +*/
> ir->op = ir_txl;
> -   ir->lod_info.lod = expr(ir_unop_log2, rho);
> +   if (ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE) {
> +  ir->lod_info.lod = expr(ir_binop_add,
> +  expr(ir_unop_log2, rho),
> +  new(mem_ctx) ir_constant(-1.0f));
> +   } else {
> +  ir->lod_info.lod = expr(ir_unop_log2, rho);
> +   }
>  
> progress = true;
> return visit_continue;

Patch seems to do what it's advertising. I am not really an expert here, but
fwiw:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965/gen9: Don't use encrypted MOCS

2015-06-17 Thread Ben Widawsky

On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 is for
doing encrypted reads.

I don't recall how I decided to do this for BXT. I don't know this patch was
ever needed, since it seems nothing is broken today on SKL. Furthermore, this
patch may no longer be needed because of the ongoing changes with MOCS setup. It
is what is being used/tested, so it's included in the series.

The chosen values are the old values left shifted. That was also an arbitrary
choice.

Cc:  Francisco Jerez 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index bfcc442..5358edc 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2495,8 +2495,8 @@ enum brw_wm_barycentric_interp_mode {
  * cache settings.  We still use only either write-back or write-through; and
  * rely on the documented default values.
  */
-#define SKL_MOCS_WB 9
-#define SKL_MOCS_WT 5
+#define SKL_MOCS_WB 0x12
+#define SKL_MOCS_WT 0xa
 
 #define MEDIA_VFE_STATE 0x7000
 /* GEN7 DW2, GEN8+ DW3 */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] i965/bxt: Add known PCI IDs

2015-06-17 Thread Ben Widawsky

These match the ones defined in the kernel. The only one tested by us is 0x0a84.

Signed-off-by: Ben Widawsky 
---
 include/pci_ids/i965_pci_ids.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 8d757aa..4d8b419 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -128,3 +128,6 @@ CHIPSET(0x22B0, chv, "Intel(R) HD Graphics 
(Cherryview)")
 CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
+CHIPSET(0x0a84, bxt, "Intel(R) Broxton")
+CHIPSET(0x1a84, bxt, "Intel(R) Broxton")
+CHIPSET(0x5a84, bxt, "Intel(R) Broxton")
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] i965/bxt: Add basic Broxton infrastructure

2015-06-17 Thread Ben Widawsky

The thread counts and URB information are all speculative numbers that were
based on some CHV numbers at the time.

v2:
Originally this patch had PCI IDs. I've moved that to a new patch at the end of
the series.
Remove is_cherryview hack.

Cc: Neil Roberts 
Cc: "Lecluse, Philippe" 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_context.c |  1 +
 src/mesa/drivers/dri/i965/brw_context.h |  1 +
 src/mesa/drivers/dri/i965/brw_device_info.c | 16 
 src/mesa/drivers/dri/i965/brw_device_info.h |  1 +
 4 files changed, 19 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index c629f39..0286577 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -752,6 +752,7 @@ brwCreateContext(gl_api api,
brw->is_baytrail = devinfo->is_baytrail;
brw->is_haswell = devinfo->is_haswell;
brw->is_cherryview = devinfo->is_cherryview;
+   brw->is_broxton = devinfo->is_broxton;
brw->has_llc = devinfo->has_llc;
brw->has_hiz = devinfo->has_hiz_and_separate_stencil;
brw->has_separate_stencil = devinfo->has_hiz_and_separate_stencil;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 58119ee..c60053b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1125,6 +1125,7 @@ struct brw_context
bool is_baytrail;
bool is_haswell;
bool is_cherryview;
+   bool is_broxton;
 
bool has_hiz;
bool has_separate_stencil;
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
b/src/mesa/drivers/dri/i965/brw_device_info.c
index 97243a4..342e566 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.c
+++ b/src/mesa/drivers/dri/i965/brw_device_info.c
@@ -334,6 +334,22 @@ static const struct brw_device_info 
brw_device_info_skl_gt3 = {
.supports_simd16_3src = true,
 };
 
+static const struct brw_device_info brw_device_info_bxt = {
+   GEN9_FEATURES,
+   .is_broxton = 1,
+   .gt = 1,
+   .has_llc = false,
+   .max_vs_threads = 112,
+   .max_gs_threads = 112,
+   .max_wm_threads = 32,
+   .urb = {
+  .size = 64,
+  .min_vs_entries = 34,
+  .max_vs_entries = 640,
+  .max_gs_entries = 256,
+   }
+};
+
 const struct brw_device_info *
 brw_get_device_info(int devid, int revision)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.h 
b/src/mesa/drivers/dri/i965/brw_device_info.h
index 65c024c..7b7a1fc 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.h
+++ b/src/mesa/drivers/dri/i965/brw_device_info.h
@@ -35,6 +35,7 @@ struct brw_device_info
bool is_baytrail;
bool is_haswell;
bool is_cherryview;
+   bool is_broxton;
 
bool has_hiz_and_separate_stencil;
bool must_use_separate_stencil;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965/bxt: Don't allow 16B pitch for blits

2015-06-17 Thread Ben Widawsky

NOTE: I can no longer find where this workaround is documented. In my notes it
is required for BXT A*, and B*. I'm happy to drop the patch, but I figured I'd
put it here for completeness.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_blit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
b/src/mesa/drivers/dri/i965/intel_blit.c
index d3ab769..bd1a03a 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -380,6 +380,9 @@ intelEmitCopyBlit(struct brw_context *brw,
dst_pitch % 4 != 0 || dst_offset % cpp != 0)
   return false;
 
+   if (brw->is_broxton && (src_pitch % 16 != 0 || dst_pitch % 16 != 0))
+  return false;
+
/* For big formats (such as floating point), do the copy using 16 or 32bpp
 * and multiply the coordinates.
 */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Add missing braces around if-statement.

2015-06-18 Thread Ben Widawsky


On Thu, Jun 18, 2015 at 04:19:36PM -0700, Matt Turner wrote:
> Fixes a performance problem caused by commit b639ed2f.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90895

Ken spotted this in review.
/me hides

Reviewed-by: Ben Widawsky 

> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index c0c8dfa..49f2e3e 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -339,12 +339,13 @@ is_color_fast_clear_compatible(struct brw_context *brw,
> mesa_format format,
> const union gl_color_union *color)
>  {
> -   if (_mesa_is_format_integer_color(format))
> +   if (_mesa_is_format_integer_color(format)) {
>if (brw->gen >= 8) {
>   perf_debug("Integer fast clear not enabled for (%s)",
>  _mesa_get_format_name(format));
>}
>return false;
> +   }
>  
> for (int i = 0; i < 4; i++) {
>if (color->f[i] != 0.0 && color->f[i] != 1.0 &&
> -- 
> 2.3.6
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa_tags: dri/common no longer exists

2015-06-18 Thread Ben Widawsky

---
 scripts/tags/mesa_tags.sh | 4 
 1 file changed, 4 deletions(-)

diff --git a/scripts/tags/mesa_tags.sh b/scripts/tags/mesa_tags.sh
index 4404b92..c8e2098 100755
--- a/scripts/tags/mesa_tags.sh
+++ b/scripts/tags/mesa_tags.sh
@@ -3,13 +3,9 @@
 rm cscope.*
 rm tags
 git ls-files src/mesa/drivers/dri/i965 >| cscope.files
-git ls-files src/mesa/drivers/dri/common >> cscope.files
 git ls-files src/mesa/main >> cscope.files
 git ls-files include/GL >> cscope.files
 git ls-files src/util >> cscope.files
 ctags -L cscope.files --langmap=C++:.C.h.c.cpp.hpp --languages=C++ 
--c++-kinds=+p --fields=+iaS --extra=+q
 cscope -bkqu
 #rm cscope.files
-
-
-
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/gen8: Use HALIGN_16 for single sample mcs buffers

2015-06-18 Thread Ben Widawsky

The original code meant to do this, but was only checking num_samples == 1 to
figure out if a surface was fast clear capable. However, we can allocate single
sample miptrees with num_samples == 0 (when it's an internally created buffer).

This fixes a bunch of the piglit tests on gen8. Other gens should have been
fine.

Here is the order of events that allowed this to slip through:
t0: I wrote halign patches and tested them. These alignment assertions are for
   gen8 fast clear surfaces, basically.
t1: I pushed bogus perf patch which made fast clears never happen
t2: Reworked halign patches based on Chad's feedback and introduced the bug this
   patch fixes.
t2.5: I tested reworked patches, but assertion wasn't hit because of t1.
t3. Matt fixed issue in t1 which made fast clears happen here:
commit 22af95af8316f2888a3935cdf774ff0997b3dd42
Author: Matt Turner 
Date:   Thu Jun 18 16:14:50 2015 -0700

i965: Add missing braces around if-statement.

This logic should match that of the v1 of my halign patch series.

Cc: Kenneth Graunke 
Cc: Matt Turner 
Reported-by: Kenneth Graunke 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 80c52f2..6aa969a 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -501,7 +501,7 @@ intel_miptree_create_layout(struct brw_context *brw,
 *  6   |  ? |?
 */
if (intel_miptree_is_fast_clear_capable(brw, mt)) {
-  if (brw->gen >= 9 || (brw->gen == 8 && num_samples == 1))
+  if (brw->gen >= 9 || (brw->gen == 8 && num_samples <= 1))
  layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
} else if (brw->gen >= 9 && num_samples > 1) {
   layout_flags |= MIPTREE_LAYOUT_FORCE_HALIGN16;
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa_tags: dri/common no longer exists

2015-06-18 Thread Ben Widawsky

Sorry, ignore this. I had script fail. Real patch coming up.

On Thu, Jun 18, 2015 at 06:44:35PM -0700, Ben Widawsky wrote:
> ---
>  scripts/tags/mesa_tags.sh | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/scripts/tags/mesa_tags.sh b/scripts/tags/mesa_tags.sh
> index 4404b92..c8e2098 100755
> --- a/scripts/tags/mesa_tags.sh
> +++ b/scripts/tags/mesa_tags.sh
> @@ -3,13 +3,9 @@
>  rm cscope.*
>  rm tags
>  git ls-files src/mesa/drivers/dri/i965 >| cscope.files
> -git ls-files src/mesa/drivers/dri/common >> cscope.files
>  git ls-files src/mesa/main >> cscope.files
>  git ls-files include/GL >> cscope.files
>  git ls-files src/util >> cscope.files
>  ctags -L cscope.files --langmap=C++:.C.h.c.cpp.hpp --languages=C++ 
> --c++-kinds=+p --fields=+iaS --extra=+q
>  cscope -bkqu
>  #rm cscope.files
> -
> -
> -
> -- 
> 2.4.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] [RFC] i965/vec4: Reward spills in if/else/endif blocks

2015-06-19 Thread Ben Widawsky

If we have a register that needs spilling in an if/else block, there is a chance
that we may not need to spill if we do[n't] take the branch.

The downside of this patch is the case where the register being spilled ends up
in both if/else blocks. For that case, preferring this path will increase code
size with no possible performance benefit.

Same patch for FS coming up.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
index b9db908..b345f27 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
@@ -309,10 +309,12 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, 
bool *no_spill)
 
   switch (inst->opcode) {
 
+  case BRW_OPCODE_ENDIF:
   case BRW_OPCODE_DO:
 loop_scale *= 10;
 break;
 
+  case BRW_OPCODE_IF:
   case BRW_OPCODE_WHILE:
 loop_scale /= 10;
 break;
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] [RFC] i965/fs: Reward spills in if/else/endif blocks

2015-06-19 Thread Ben Widawsky

Just like the previous patch but for the FS.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index cd78816..d53449a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -805,10 +805,12 @@ fs_visitor::choose_spill_reg(struct ra_graph *g)
 
   switch (inst->opcode) {
 
+  case BRW_OPCODE_ENDIF:
   case BRW_OPCODE_DO:
 loop_scale *= 10;
 break;
 
+  case BRW_OPCODE_IF:
   case BRW_OPCODE_WHILE:
 loop_scale /= 10;
 break;
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] [RFC] i965/vec4: Reward spills in if/else/endif blocks

2015-06-19 Thread Ben Widawsky

I'm not seeing where it does anything other than what I say. Beforehand the cost
is increased (*=) from DO->WHILE. Now it should be decreased (/= 10) from
IF->ENDIF. The factor of 10 probably needs to be modified since I suspect

Can you help me see what I'm not seeing?

On Fri, Jun 19, 2015 at 06:53:28PM -0700, Connor Abbott wrote:
> I don't think this is doing what you think it's doing. This code is
> for calculating the *cost* of spills, so a higher cost means a lower
> priority for choosing the register. We increase the cost for things
> inside loops because we don't want to spill inside loops, and by doing
> the same thing for if's you're actually discouraging spills inside an
> if block.
> 
> On Fri, Jun 19, 2015 at 5:21 PM, Ben Widawsky
>  wrote:
> > If we have a register that needs spilling in an if/else block, there is a 
> > chance
> > that we may not need to spill if we do[n't] take the branch.
> >
> > The downside of this patch is the case where the register being spilled 
> > ends up
> > in both if/else blocks. For that case, preferring this path will increase 
> > code
> > size with no possible performance benefit.
> >
> > Same patch for FS coming up.
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> > index b9db908..b345f27 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> > @@ -309,10 +309,12 @@ vec4_visitor::evaluate_spill_costs(float 
> > *spill_costs, bool *no_spill)
> >
> >switch (inst->opcode) {
> >
> > +  case BRW_OPCODE_ENDIF:
> >case BRW_OPCODE_DO:
> >  loop_scale *= 10;
> >  break;
> >
> > +  case BRW_OPCODE_IF:
> >case BRW_OPCODE_WHILE:
> >  loop_scale /= 10;
> >  break;
> > --
> > 2.4.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] [RFC] i965/vec4: Reward spills in if/else/endif blocks

2015-06-19 Thread Ben Widawsky

On Fri, Jun 19, 2015 at 08:04:51PM -0700, Matt Turner wrote:
> On Fri, Jun 19, 2015 at 6:53 PM, Connor Abbott  wrote:
> > I don't think this is doing what you think it's doing. This code is
> > for calculating the *cost* of spills, so a higher cost means a lower
> > priority for choosing the register. We increase the cost for things
> > inside loops because we don't want to spill inside loops, and by doing
> > the same thing for if's you're actually discouraging spills inside an
> > if block.
> 
> Top quoting is bad, m'kay.
> 
> But, I think it is doing what he thinks since he increases costs for
> ENDIF and decreases costs for IF. That is, it's backwards from
> DO/WHILE.
> 
> Why this is a good thing to do... I don't know. I'd expect some data
> along with this patch in order to evaluate it properly.

Well, I think the theory was described in the patch, so I'm not sure if you're
disagreeing with the theory, or you missed the theory (you spill less of the
time because you don't always take both branches).

As for data... I made the patch RFC for a reason :-). I noticed a lot of the
previous spilling related patches used shader-db as a measure, however, I don't
think that's a good measure for spills in many cases (do/while is exactly such
an example). As I mentioned in the commit as well, there are certainly cases
where I could see shader size increasing, but not actual execution time. So if
there are real benchmarks I can run, which spill, I am happy to do that - but I
don't see any value in me spending time doing anything else. I see shader-db as
a good thing to run to make sure it doesn't blow up every test, and that's about
all. I'm content to leave this as an RFC indefinitely. I'm under the impression
optimizing the spill cases aren't super critical anyway.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] i965/gen9: Plugin the code for selecting YF/YS tiling on skl+

2015-06-22 Thread Ben Widawsky

On Wed, Jun 10, 2015 at 03:30:47PM -0700, Anuj Phogat wrote:
> Buffers with Yf/Ys tiling end up using meta upload / download
> paths or the blitter for cases where they used tiled_memcpy paths
> in case of Y tiling. This has exposed some bugs in meta path. To
> avoid any piglit regressions on SKL this patch keeps the Yf/Ys
> tiling disabled at the moment.
> 
> V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben)
> Few cosmetic changes.
> V4: Get rid of brw_miptree_choose_tr_mode().
> Take care of all tile resource modes {Yf, Ys, none} for all
> generations at one place.
> 
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 97 
> --
>  1 file changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index b9ac4cf..c0ef5cc 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -807,27 +807,88 @@ brw_miptree_layout(struct brw_context *brw,
> enum intel_miptree_tiling_mode requested,
> struct intel_mipmap_tree *mt)
>  {
> -   mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE;
> +   const unsigned bpp = mt->cpp * 8;
> +   const bool is_tr_mode_yf_ys_allowed =
> +  brw->gen >= 9 &&
> +  !for_bo &&
> +  !mt->compressed &&
> +  /* Enable YF/YS tiling only for color surfaces because depth and
> +   * stencil surfaces are not supported in blitter using fast copy
> +   * blit and meta PBO upload, download paths. No other paths
> +   * currently support Yf/Ys tiled surfaces.
> +   * FIXME:  Remove this restriction once we have a tiled_memcpy()
> +   * path to do depth/stencil data upload/download to Yf/Ys tiled
> +   * surfaces.
> +   */

I think it's more readable to move this comment above the variable declaration.
Up to you though. Also I think "FINISHME" is the more appropriate classification
for this type of thing.

> +  _mesa_is_format_color_format(mt->format) &&
> +  (requested == INTEL_MIPTREE_TILING_Y ||
> +   requested == INTEL_MIPTREE_TILING_ANY) &&

This is where my tiling flags would have helped a bit since you should be able
to do flags & Y_TILED :P

> +  (bpp && is_power_of_two(bpp)) &&
> +  /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling
> +   * disabled at the moment.
> +   */
> +  false;

Also, "FINISHME"

>  
> -   intel_miptree_set_alignment(brw, mt);
> -   intel_miptree_set_total_width_height(brw, mt);
> +   /* Lower index (Yf) is the higher priority mode */
> +   const uint32_t tr_mode[3] = {INTEL_MIPTREE_TRMODE_YF,
> +INTEL_MIPTREE_TRMODE_YS,
> +INTEL_MIPTREE_TRMODE_NONE};
> +   int i = is_tr_mode_yf_ys_allowed ? 0 : ARRAY_SIZE(tr_mode) - 1;
>  
> -   if (!mt->total_width || !mt->total_height) {
> -  intel_miptree_release(&mt);
> -  return;
> -   }
> +   while (i < ARRAY_SIZE(tr_mode)) {
> +  if (brw->gen < 9)
> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE);
> +  else
> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_YF ||
> +tr_mode[i] == INTEL_MIPTREE_TRMODE_YS ||
> +tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE);
>  
> -   /* On Gen9+ the alignment values are expressed in multiples of the block
> -* size
> -*/
> -   if (brw->gen >= 9) {
> -  unsigned int i, j;
> -  _mesa_get_format_block_size(mt->format, &i, &j);
> -  mt->align_w /= i;
> -  mt->align_h /= j;
> -   }
> +  mt->tr_mode = tr_mode[i];
> +  intel_miptree_set_alignment(brw, mt);
> +  intel_miptree_set_total_width_height(brw, mt);
>  
> -   if (!for_bo)
> -  mt->tiling = brw_miptree_choose_tiling(brw, requested, mt);
> +  if (!mt->total_width || !mt->total_height) {
> + intel_miptree_release(&mt);
> + return;
> +  }
> +
> +  /* On Gen9+ the alignment values are expressed in multiples of the
> +   * block size.
> +   */
> +  if (brw->gen >= 9) {
> + unsigned int i, j;
> + _mesa_get_format_block_size(mt->format, &i, &j);
> + mt->align_w /= i;
> + mt->align_h /= j;
> +  }

Can we just combine this alignment calculation into
intel_miptree_set_alignment()?

> +
> +  if (!for_bo)
> + mt->tiling = brw_miptree_

Re: [Mesa-dev] [PATCH 3/5] i965: Make a helper function intel_miptree_release_levels()

2015-06-22 Thread Ben Widawsky

I am shocked this is the only place we do this...

On Wed, Jun 10, 2015 at 03:30:48PM -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index c0ef5cc..c185e41 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -801,6 +801,17 @@ intel_miptree_set_alignment(struct brw_context *brw,
> }
>  }
>  
> +static void
> +intel_miptree_release_levels(struct intel_mipmap_tree *mt)
> +{
> +   unsigned int level = 0;
> +
> +   for (level = mt->first_level; level <= mt->last_level; level++) {
> +  free(mt->level[level].slice);
> +  mt->level[level].slice = NULL;
> +   }
> +}
> +
>  void
>  brw_miptree_layout(struct brw_context *brw,
> bool for_bo,
> @@ -866,7 +877,6 @@ brw_miptree_layout(struct brw_context *brw,
>   mt->tiling = brw_miptree_choose_tiling(brw, requested, mt);
>  
>if (is_tr_mode_yf_ys_allowed) {
> - unsigned int level = 0;
>   assert(brw->gen >= 9);
>  
>   if (mt->tiling == I915_TILING_Y ||
> @@ -883,10 +893,7 @@ brw_miptree_layout(struct brw_context *brw,
>   /* Failed to use selected tr_mode. Free up the memory allocated
>* for miptree levels in intel_miptree_total_width_height().
>*/
> - for (level = mt->first_level; level <= mt->last_level; level++) {
> -free(mt->level[level].slice);
> -mt->level[level].slice = NULL;
> - }
> + intel_miptree_release_levels(mt);
>}
>i++;
> }
> -- 
> 1.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/5] i965: Make a helper function intel_miptree_can_use_tr_mode()

2015-06-22 Thread Ben Widawsky

1-4 (with/without changes) are:
Reviewed-by: Ben Widawsky 

On Wed, Jun 10, 2015 at 03:30:49PM -0700, Anuj Phogat wrote:
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 30 
> +++---
>  1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index c185e41..39c6a39 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -812,6 +812,23 @@ intel_miptree_release_levels(struct intel_mipmap_tree 
> *mt)
> }
>  }
>  
> +static bool
> +intel_miptree_can_use_tr_mode(const struct intel_mipmap_tree *mt)
> +{
> +   if (mt->tiling == I915_TILING_Y ||
> +   mt->tiling == (I915_TILING_Y | I915_TILING_X) ||
> +   mt->tr_mode == INTEL_MIPTREE_TRMODE_NONE) {
> +  /* FIXME: Don't allow YS tiling at the moment. Using 64KB tiling
> +   * for small textures might result in to memory wastage. Revisit
> +   * this condition when we have more information about the specific
> +   * cases where using YS over YF will be useful.
> +   */
> +  if (mt->tr_mode != INTEL_MIPTREE_TRMODE_YS)
> + return true;
> +   }
> +   return false;
> +}
> +
>  void
>  brw_miptree_layout(struct brw_context *brw,
> bool for_bo,
> @@ -879,17 +896,8 @@ brw_miptree_layout(struct brw_context *brw,
>if (is_tr_mode_yf_ys_allowed) {
>   assert(brw->gen >= 9);
>  
> - if (mt->tiling == I915_TILING_Y ||
> - mt->tiling == (I915_TILING_Y | I915_TILING_X) ||
> - mt->tr_mode == INTEL_MIPTREE_TRMODE_NONE) {
> -/* FIXME: Don't allow YS tiling at the moment. Using 64KB tiling
> - * for small textures might result in to memory wastage. Revisit
> - * this condition when we have more information about the 
> specific
> - * cases where using YS over YF will be useful.
> - */
> -if (mt->tr_mode != INTEL_MIPTREE_TRMODE_YS)
> -   return;
> - }
> + if (intel_miptree_can_use_tr_mode(mt))
> +return;
>   /* Failed to use selected tr_mode. Free up the memory allocated
>* for miptree levels in intel_miptree_total_width_height().
>*/
> -- 
> 1.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] i965/gen9: Allocate YF/YS tiled buffer objects

2015-06-22 Thread Ben Widawsky

On Wed, Jun 10, 2015 at 03:30:50PM -0700, Anuj Phogat wrote:
> In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm
> using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers
> libdrm need not know about the tiling format because these buffers
> don't have hardware support to be tiled or detiled through a fenced
> region. libdrm still need to know buffer alignment value for its use
> in kernel when resolving the relocation.
> 
> Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers
> satisfy both the above conditions.
> 
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 86 
> +--
>  1 file changed, 80 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 615cbfb..d4d9e76 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -522,6 +522,65 @@ intel_lower_compressed_format(struct brw_context *brw, 
> mesa_format format)
> }
>  }
>  
> +/* This function computes Yf/Ys tiled bo size and alignment. */

It also computes pitch for the yf/ys case

> +static uint64_t
> +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment)
> +{
> +   const uint32_t bpp = mt->cpp * 8;
> +   const uint32_t aspect_ratio = (bpp == 16 || bpp == 64) ? 2 : 1;
> +   uint32_t tile_width, tile_height;
> +   const uint64_t min_size = 512 * 1024;
> +   const uint64_t max_size = 64 * 1024 * 1024;

Where do min/max come from? Add a comment?

> +   uint64_t i, stride, size, aligned_y;
> +
> +   assert(mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE);
> +
> +   switch (bpp) {
> +   case 8:
> +  tile_height = 64;
> +  break;
> +   case 16:
> +   case 32:
> +  tile_height = 32;
> +  break;
> +   case 64:
> +   case 128:
> +  tile_height = 16;
> +  break;
> +   default:
> +  tile_height = 0;

make this unreachable()

> +  printf("Invalid bits per pixel in %s: bpp = %d\n",
> + __FUNCTION__, bpp);
> +   }

I think ideally you should roll this logic into intel_miptree_get_tile_masks().

> +
> +   if (mt->tr_mode == INTEL_MIPTREE_TRMODE_YS)
> +  tile_height *= 4;
> +
> +   aligned_y = ALIGN(mt->total_height, tile_height);
> +
> +   stride = mt->total_width * mt->cpp;
> +   tile_width = tile_height * mt->cpp * aspect_ratio;
> +   stride = ALIGN(stride, tile_width);
> +   size = stride * aligned_y;
> +
> +   if (mt->tr_mode == INTEL_MIPTREE_TRMODE_YF) {
> +  *alignment = 4096;
> +  size = ALIGN(size, 4096);
> +   } else {
> +  *alignment = 64 * 1024;
> +  size = ALIGN(size, 64 * 1024);
> +   }

Hmm. I think the above calculation for size is redundant since you already
aligned to tile_width and height, above. Right? assert((size % 64K) == 0);

> +
> +   if (size > max_size) {
> +  mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE;
> +  return 0;
> +   } else {
> +  mt->pitch = stride;
> +  for (i = min_size; i < size; i <<= 1)
> + ;
> +  return i;

I don't understand this. Why don't you just return size? It seems incredibly
wasteful to both start a 512K, and to increment by powers of 2. Did I miss
something?

Also, I don't understand max_size. I must be missing something in the spec with
the min/max values, can you point me to them?

> +   }
> +}
>  
>  struct intel_mipmap_tree *
>  intel_miptree_create(struct brw_context *brw,
> @@ -575,12 +634,27 @@ intel_miptree_create(struct brw_context *brw,
>  
> unsigned long pitch;
> mt->etc_format = etc_format;
> -   mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree",
> - total_width, total_height, mt->cpp,
> - &mt->tiling, &pitch,
> - (expect_accelerated_upload ?
> -  BO_ALLOC_FOR_RENDER : 0));
> -   mt->pitch = pitch;
> +
> +   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) {
> +  unsigned alignment = 0;
> +  unsigned long size;
> +  size = intel_get_yf_ys_bo_size(mt, &alignment);
> +
> +  /* intel_get_yf_ys_bo_size() might change the tr_mode. */
> +  if (size > 0 && mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) {
> + mt->bo = drm_intel_bo_alloc_for_render(brw->bufmgr, "miptree",
> +size, alignment);
> +  }
> +   }
> +
> +   if

Re: [Mesa-dev] [PATCH 3/4] i965/gen9: Don't use encrypted MOCS

2015-06-22 Thread Ben Widawsky

On Thu, Jun 18, 2015 at 03:41:50PM -0700, Kenneth Graunke wrote:
> On Wednesday, June 17, 2015 03:50:13 PM Ben Widawsky wrote:
> > On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 is 
> > for
> > doing encrypted reads.
> > 
> > I don't recall how I decided to do this for BXT. I don't know this patch was
> > ever needed, since it seems nothing is broken today on SKL. Furthermore, 
> > this
> > patch may no longer be needed because of the ongoing changes with MOCS 
> > setup. It
> > is what is being used/tested, so it's included in the series.
> > 
> > The chosen values are the old values left shifted. That was also an 
> > arbitrary
> > choice.
> > 
> > Cc:  Francisco Jerez 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> > b/src/mesa/drivers/dri/i965/brw_defines.h
> > index bfcc442..5358edc 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -2495,8 +2495,8 @@ enum brw_wm_barycentric_interp_mode {
> >   * cache settings.  We still use only either write-back or write-through; 
> > and
> >   * rely on the documented default values.
> >   */
> > -#define SKL_MOCS_WB 9
> > -#define SKL_MOCS_WT 5
> > +#define SKL_MOCS_WB 0x12
> > +#define SKL_MOCS_WT 0xa
> 
> 
> Yeah, it looks like Kristian made these defines the indices into the
> table, but may have missed that the MOCS field puts that table index in
> [6:1] and bit 0 is something else.
> 
> So shifting left by 1 seems like a good plan.  Perhaps write it as
> 
> #define SKL_MOCS_WB (0b000101 << 1)
> #define SKL_MOCS_WT (0b001001 << 1)
> 

You meant this, right (you reversed it, I think)?
#define SKL_MOCS_WB (0b001001 << 1)
#define SKL_MOCS_WT (0b000101 << 1)


> so the index value is written like it is in the documentation, and the
> shift 1 indicates moving it into the right place for MOCS?
> 
> Either way,
> Reviewed-by: Kenneth Graunke 
> 
> Incidentally...the WT value (index 5) appears to skip eLLC - the target
> cache is 01b = "LLC only".  That doesn't seem desirable.  We probably
> want index 6 instead (0b000110 << 1) which uses both LLC and eLLC.
> 
> That said, we shouldn't ever be using WT in the driver - we want to use
> the PTE value.  (krh even added a FINISHME comment to that effect.)
> 
> I think a proper value for that would be:
> #define SKL_MOCS_PTE (0b10 << 1)
> (Default: 0b10,
>  LeCC = 0x00 - use cacheability controls from page table / ...
>  TC = LLC/eLLC allowed)
> 
> We could either fix the _WT define or just delete it.
> 
> >  
> >  #define MEDIA_VFE_STATE 0x7000
> >  /* GEN7 DW2, GEN8+ DW3 */
> > 

I'll get on this too. Thanks.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/5] i965/gen9: Add XY_FAST_COPY_BLT support to intelEmitCopyBlit()

2015-06-22 Thread Ben Widawsky

On Fri, Jun 19, 2015 at 02:41:50PM -0700, Anuj Phogat wrote:
> On Wed, Jun 10, 2015 at 3:34 PM, Anuj Phogat  wrote:
> > This patch enables using XY_FAST_COPY_BLT only for Yf/Ys tiled buffers.
> > It can be later turned on for other tiling patterns (X,Y) too.
> >
> > V3: Flush in between sequential fast copy blits.
> > Fix src/dst alignment requirements.
> > Make can_fast_copy_blit() helper.
> > Use ffs(), is_power_of_two()
> > Move overlap computation inside intel_miptree_blit().
> >
> > V4: Use _mesa_regions_overlap() function.
> > Simplify horizontal and vertical alignment computations.
> >
> > Signed-off-by: Anuj Phogat 
> > Cc: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/intel_blit.c   | 295 
> > ++-
> >  src/mesa/drivers/dri/i965/intel_blit.h   |   2 +
> >  src/mesa/drivers/dri/i965/intel_copy_image.c |   2 +
> >  src/mesa/drivers/dri/i965/intel_reg.h|  16 ++
> >  4 files changed, 268 insertions(+), 47 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
> > b/src/mesa/drivers/dri/i965/intel_blit.c
> > index 5afc771..800ed7e 100644
> > --- a/src/mesa/drivers/dri/i965/intel_blit.c
> > +++ b/src/mesa/drivers/dri/i965/intel_blit.c
> > @@ -27,6 +27,7 @@
> >
> >
> >  #include "main/mtypes.h"
> > +#include "main/blit.h"
> >  #include "main/context.h"
> >  #include "main/enums.h"
> >  #include "main/colormac.h"
> > @@ -43,6 +44,23 @@
> >
> >  #define FILE_DEBUG_FLAG DEBUG_BLIT
> >
> > +#define SET_TILING_XY_FAST_COPY_BLT(tiling, tr_mode, type)   \
> > +({   \
> > +   switch (tiling) { \
> > +   case I915_TILING_X:   \
> > +  CMD |= type ## _TILED_X;   \
> > +  break; \
> > +   case I915_TILING_Y:   \
> > +  if (tr_mode == INTEL_MIPTREE_TRMODE_YS)\
> > + CMD |= type ## _TILED_64K;  \
> > +  else   \
> > + CMD |= type ## _TILED_Y;\
> > +  break; \
> > +   default:  \
> > +  unreachable("not reached");\
> > +   } \
> > +})
> > +
> >  static void
> >  intel_miptree_set_alpha_to_one(struct brw_context *brw,
> > struct intel_mipmap_tree *mt,
> > @@ -75,6 +93,10 @@ static uint32_t
> >  br13_for_cpp(int cpp)
> >  {
> > switch (cpp) {
> > +   case 16:
> > +  return BR13_32323232;
> > +   case 8:
> > +  return BR13_16161616;
> > case 4:
> >return BR13_;
> >break;
> > @@ -89,6 +111,66 @@ br13_for_cpp(int cpp)
> > }
> >  }
> >
> > +static uint32_t
> > +get_tr_horizontal_align(uint32_t tr_mode, uint32_t cpp, bool is_src) {
> > +   /* Alignment tables for YF/YS tiled surfaces. */
> > +   const uint32_t align_2d_yf[] = {64, 64, 32, 32, 16};
> > +   const uint32_t align_2d_ys[] = {256, 256, 128, 128, 64};

If you move the alignment stuff from the other patch series to a more generic
place, you could reuse it here. Also, as you pointed out in that other patch,
ys = 4 * ys

> > +   const uint32_t bpp = cpp * 8;
> > +   const uint32_t shift = is_src ? 17 : 10;
> > +   uint32_t align;
> > +   int i = 0;
> > +
> > +   if (tr_mode == INTEL_MIPTREE_TRMODE_NONE)
> > +  return 0;
> > +
> > +   /* Compute array index. */
> > +   assert (bpp >= 8 && bpp <= 128 && is_power_of_two(bpp));
> > +   i = ffs(bpp / 8) - 1;
> > +
> > +   align = tr_mode == INTEL_MIPTREE_TRMODE_YF ?
> > +   align_2d_yf[i] :
> > +   align_2d_ys[i];
> > +
> > +   assert(is_power_of_two(align));
> > +
> > +   /* XY_FAST_COPY_BLT doesn't support horizontal alignment of 16. */
> > +   if (align == 16)
> > +  align = 32;
> > +
> > +   return (ffs(align) - 6) << shift;
> > +}
> >

Re: [Mesa-dev] [PATCH v2 5/5] i965/gen9: Allocate YF/YS tiled buffer objects

2015-06-23 Thread Ben Widawsky

On Tue, Jun 23, 2015 at 01:23:05PM -0700, Anuj Phogat wrote:
> In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm
> using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers
> libdrm need not know about the tiling format because these buffers
> don't have hardware support to be tiled or detiled through a fenced
> region. libdrm still need to know buffer alignment value for its use
> in kernel when resolving the relocation.
> 
> Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers
> satisfy both the above conditions.
> 
> V2: Delete min/max buffer size restrictions not valid for i965+.
> Remove redundant align to tile size statements.
> Remove some redundant code now when there are no min/max buffer size.
> 
> Signed-off-by: Anuj Phogat 
> Cc: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 
> +--
>  1 file changed, 58 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 80c52f2..5bcb094 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -558,6 +558,48 @@ intel_lower_compressed_format(struct brw_context *brw, 
> mesa_format format)
> }
>  }
>  
> +/* This function computes Yf/Ys tiled bo size, alignment and pitch. */
> +static uint64_t
> +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment,
> +uint64_t *pitch)
> +{
> +   const uint32_t bpp = mt->cpp * 8;
> +   const uint32_t aspect_ratio = (bpp == 16 || bpp == 64) ? 2 : 1;
> +   uint32_t tile_width, tile_height;
> +   uint64_t stride, size, aligned_y;
> +
> +   assert(mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE);
> +
> +   *alignment = mt->tr_mode == INTEL_MIPTREE_TRMODE_YF ? 4096 : 64 * 1024;
> +
> +   switch (bpp) {
> +   case 8:
> +  tile_height = 64;
> +  break;
> +   case 16:
> +   case 32:
> +  tile_height = 32;
> +  break;
> +   case 64:
> +   case 128:
> +  tile_height = 16;
> +  break;
> +   default:
> +  unreachable("not reached");
> +   }
> +
> +   if (mt->tr_mode == INTEL_MIPTREE_TRMODE_YS)
> +  tile_height *= 4;
> +
> +   aligned_y = ALIGN(mt->total_height, tile_height);
> +   stride = mt->total_width * mt->cpp;
> +   tile_width = tile_height * mt->cpp * aspect_ratio;
> +   stride = ALIGN(stride, tile_width);
> +   size = stride * aligned_y;
> +
> +   *pitch = stride;
> +   return size;
> +}
>  
>  struct intel_mipmap_tree *
>  intel_miptree_create(struct brw_context *brw,
> @@ -616,11 +658,23 @@ intel_miptree_create(struct brw_context *brw,
>alloc_flags |= BO_ALLOC_FOR_RENDER;
>  
> unsigned long pitch;
> -   mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree", total_width,
> - total_height, mt->cpp, &mt->tiling,
> - &pitch, alloc_flags);
> mt->etc_format = etc_format;
> -   mt->pitch = pitch;
> +
> +   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) {
> +  unsigned alignment = 0;
> +  unsigned long size;
> +  size = intel_get_yf_ys_bo_size(mt, &alignment, &pitch);
> +  assert(size);
> +  mt->bo = drm_intel_bo_alloc_for_render(brw->bufmgr, "miptree",
> + size, alignment);
> +  mt->pitch = pitch;
> +   } else {
> +  mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree",
> +total_width, total_height, mt->cpp,
> +        &mt->tiling, &pitch,
> +alloc_flags);
> +  mt->pitch = pitch;
> +   }
>  
> /* If the BO is too large to fit in the aperture, we need to use the
>  * BLT engine to support it.  Prior to Sandybridge, the BLT paths can't
> 

You could move mt->pitch = pitch outside of the if/else (or get rid of the local
variable entirely):
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH mesa] i965/gen8+: bo in state base address must be in 32-bit address range

2015-06-23 Thread Ben Widawsky

Hi. Feel free to Cc me on patches of this nature. I am far behind on mesa-dev,
and no longer read intel-gfx. I'm probably one of the sensible people to look at
this...

On Tue, Jun 23, 2015 at 01:21:27PM +0100, Michel Thierry wrote:
> Gen8+ supports 48-bit virtual addresses, but some objects must always be
> allocated inside the 32-bit address range.
> 
> In specific, any resource used with flat/heapless (0x-0xf000)
> General State Heap (GSH) or Intruction State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State Offset
> are limited to 32-bits.

I don't think GSH, or ISH are well known terms that have every appeared
anywhere. I'd just keep the bit after the final comma (...because ...)

> 
> Set provided bo flag when the 4GB limit is not necessary, to be able to use
> the full address space.

I'm glad you got around to this. We'd been putting it off for a long time.

> 
> Cc: mesa-dev@lists.freedesktop.org
> Signed-off-by: Michel Thierry 
> ---
>  src/mesa/drivers/dri/i965/gen8_misc_state.c   | 6 +++---
>  src/mesa/drivers/dri/i965/intel_batchbuffer.h | 7 +++
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen8_misc_state.c 
> b/src/mesa/drivers/dri/i965/gen8_misc_state.c
> index b20038e..26531d0 100644
> --- a/src/mesa/drivers/dri/i965/gen8_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_misc_state.c
> @@ -41,17 +41,17 @@ void gen8_upload_state_base_address(struct brw_context 
> *brw)
> OUT_BATCH(0);
> OUT_BATCH(mocs_wb << 16);
> /* Surface state base address: */
> -   OUT_RELOC64(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
> +   OUT_RELOC64_32BWA(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
> mocs_wb << 4 | 1);
> /* Dynamic state base address: */
> -   OUT_RELOC64(brw->batch.bo,
> +   OUT_RELOC64_32BWA(brw->batch.bo,
> I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
> mocs_wb << 4 | 1);
> /* Indirect object base address: MEDIA_OBJECT data */
> OUT_BATCH(mocs_wb << 4 | 1);
> OUT_BATCH(0);
> /* Instruction base address: shader kernels (incl. SIP) */
> -   OUT_RELOC64(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
> +   OUT_RELOC64_32BWA(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
> mocs_wb << 4 | 1);
>  
> /* General state buffer size */
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
> b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
> index 7bdd836..5aa741e 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
> @@ -177,6 +177,13 @@ intel_batchbuffer_advance(struct brw_context *brw)
>  
>  /* Handle 48-bit address relocations for Gen8+ */
>  #define OUT_RELOC64(buf, read_domains, write_domain, delta) do { \
> +   drm_intel_bo_set_supports_48baddress(buf); \
> +   intel_batchbuffer_emit_reloc64(brw, buf, read_domains, write_domain, 
> delta);  \
> +} while (0)
> +
> +/* Handle 48-bit address relocations for Gen8+, ask for 32-bit address */
> +#define OUT_RELOC64_32BWA(buf, read_domains, write_domain, delta) do { \
> +   drm_intel_bo_clear_supports_48baddress(buf); \
> intel_batchbuffer_emit_reloc64(brw, buf, read_domains, write_domain, 
> delta);  \
>  } while (0)
>  

First and least bikesheddy, you need to bump the required libdrm in the
configure.ac to support this new libdrm function (maybe you did, but I don't see
it on mesa-dev).

More bikesheddy, and forgive me here because I haven't looked at any of the
kernel interfaces or libdrm patches (you can Cc those to mesa-dev if they're
relevant fwiw).

Presumably at the end of the day it's drm_intel_bo_emit_reloc which needs to
know about these limitations. Unfortunately we don't have a flags field there.
The implementation here seems like a somewhat cumbersome workaround for that (it
looks like the context execbuf which is pretty crappy - yes, I know who the
author was). Have you already discussed adding a new emit_reloc? I suppose if
people are opposed to a new emit reloc, the only I'd like to see different is
have the functions which need the workaround just call OUT_RELOC, instead of
OUT_RELOC64 (put a comment in the call sites), and make OUT_RELOC call the
drm_intel_bo_clear_supports_48baddress() (which is obviously a nop on pre-gen8
platforms). The OUT_RELOC64 case should be left alone - we shouldn't need to
tell libdrm that I want a 64bit relocation, and it can actually be 64b...

I suspect not many other mesa devs will have an opinion here, but I'm flexible
if they disagree.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/skl: Fix aligning mt->total_width to the block size

2015-06-24 Thread Ben Widawsky

On Wed, Jun 24, 2015 at 02:29:09PM +0100, Neil Roberts wrote:
> Ben Widawsky  writes:
> 
> > I think this is beginning to infringe upon the definition of align_w.
> > The total width is a function of it's miptree properties and not the
> > compressed block properties, right?
> >
> > In other words, if there is a case where align_w != bw, I think
> > total_width should be aligned to align_w, NOT bw.
> 
> I don't think it's so clear cut. In practice the mt->total_width doesn't
> really need to be aligned to anything because as far as I can tell it is
> only used to calculate the row stride. The row stride is separately
> aligned to whatever constraints necessary by libdrm so it doesn't really
> matter what we pick here.
> 
> The reason I think that the intention was to align it to the block width
> rather than the horizontal alignment is that in the non-compressed case
> the total width isn't aligned to anything at all.
> 
> It's probably not worth making too much of a fuss over this patch seeing
> as it doesn't make any practical difference. I'm happy to forget about
> it and pretend I never noticed the inconsistency.
> 
> Regards,
> - Neil
> 

I'm not opposed, I guess I just wanted a clear understanding of how it should
work. If you think this is the right thing to do, I trust you.

Reviewed-by: Ben Widawsky 

> >
> > (I'm not opposed to the patch, just making sure I understand.)
> >
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 5 ++---
> >>  1 file changed, 2 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> >> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> >> index 1e7d8a1..dbb6cef 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> >> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> >> @@ -366,9 +366,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
> >>  
> >> mt->total_width = mt->physical_width0;
> >>  
> >> -   if (mt->compressed) {
> >> -   mt->total_width = ALIGN(mt->physical_width0, mt->align_w);
> >> -   }
> >> +   if (mt->compressed)
> >> +   mt->total_width = ALIGN(mt->total_width, bw);
> >>  
> >> /* May need to adjust width to accommodate the placement of
> >>  * the 2nd mipmap.  This occurs when the alignment
> >> -- 
> >> 1.9.3
> >> 
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] i965/bxt: Add basic Broxton infrastructure

2015-06-24 Thread Ben Widawsky

On Wed, Jun 24, 2015 at 08:12:36PM +, Lecluse, Philippe wrote:
> I Have successfully tested and validate patch 1,3,4 on BXT
> Regards,
> Philippe
> Intel Corporation NV/SA
> Kings Square, Veldkant 31
> 2550 Kontich
> RPM (Bruxelles) 0415.497.718. 
> Citibank, Brussels, account 570/1031255/09
> 
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). Any review or distribution by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender and delete all copies.
> 

Thanks. Patch 3 was already pushed. I've squashed 1, and 4, and pushed that with
Mark's review.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/skl: Use more compact hiz dimensions

2015-06-24 Thread Ben Widawsky

gen8 had some special restrictions which don't seem to carry over to gen9.
Quoting the spec for SKL:
"The Z_Height and Z_Width values must equal those present in
3DSTATE_DEPTH_BUFFER incremented by one."

This fixes nothing in piglit (and regresses nothing).

Cc: Jordan Justen 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 32 ++-
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 6aa969a..432a47c 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1550,21 +1550,23 @@ intel_gen8_hiz_buf_create(struct brw_context *brw,
/* Gen7 PRM Volume 2, Part 1, 11.5.3 "Hierarchical Depth Buffer" documents
 * adjustments required for Z_Height and Z_Width based on multisampling.
 */
-   switch (mt->num_samples) {
-   case 0:
-   case 1:
-  break;
-   case 2:
-   case 4:
-  z_width *= 2;
-  z_height *= 2;
-  break;
-   case 8:
-  z_width *= 4;
-  z_height *= 2;
-  break;
-   default:
-  unreachable("unsupported sample count");
+   if (brw->gen < 9) {
+  switch (mt->num_samples) {
+  case 0:
+  case 1:
+ break;
+  case 2:
+  case 4:
+ z_width *= 2;
+ z_height *= 2;
+ break;
+  case 8:
+ z_width *= 4;
+ z_height *= 2;
+ break;
+  default:
+ unreachable("unsupported sample count");
+  }
}
 
const unsigned vertical_align = 8; /* 'j' in the docs */
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/cs: Initialize GPGPU Thread Count

2015-06-25 Thread Ben Widawsky

On Thu, Jun 11, 2015 at 09:04:45PM -0700, Jordan Justen wrote:
> This field should always be set for gen8. In the bdw PRM, Volume 2d:
> Command Reference: Structures under INTERFACE_DESCRIPTOR_DATA, DWORD
> 6, Bits 9:0, Number of Threads in GPGPU Thread Group:
> 
> "This field should not be set to 0 even if the barrier is disabled,
> since an accurate value is needed for proper pre-emption."

I am pretty skeptical that we actually need this. It's pretty clear this is a
requirement for preemption, and do we ever plan to support gpgpu preemption with
compute shaders?

Since I did some research, here's why I think it's a requirement. BDW added
supported of doing mid thread-group preemption. Doing this requires that when
the workload is restored, it has a concept of how many more threads need to
complete.

So I don't see this patch as being a requirement, but it shouldn't hurt and
probably makes looking a debug slightly easier. Also, if we do ever support
preemption, it should work. One comment inline, and then it's
Reviewed-by: Ben Widawsky 

> 
> In the HSW PRM, the it doesn't mention that it must always be set, but
> it should not hurt.
> 
> Reported-by: Kristian Høgsberg 
> Signed-off-by: Jordan Justen 
> Cc: Kristian Høgsberg 
> ---
>  src/mesa/drivers/dri/i965/brw_cs.cpp| 19 +++
>  src/mesa/drivers/dri/i965/brw_defines.h |  5 +
>  2 files changed, 24 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
> b/src/mesa/drivers/dri/i965/brw_cs.cpp
> index 1f2a9d2..44c76ba 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
> @@ -284,6 +284,17 @@ brw_cs_precompile(struct gl_context *ctx,
>  }
>  
>  
> +static unsigned
> +get_cs_thread_count(const struct brw_cs_prog_data *cs_prog_data)
> +{
> +   const unsigned simd_size = cs_prog_data->simd_size;
> +   unsigned group_size = cs_prog_data->local_size[0] *
> +  cs_prog_data->local_size[1] * cs_prog_data->local_size[2];
> +
> +   return (group_size + simd_size - 1) / simd_size;
> +}
> +
> +
>  static void
>  brw_upload_cs_state(struct brw_context *brw)
>  {
> @@ -309,6 +320,8 @@ brw_upload_cs_state(struct brw_context *brw)
>  
> prog_data->binding_table.size_bytes,
>  32, 
> &stage_state->bind_bo_offset);
>  
> +   unsigned threads = get_cs_thread_count(cs_prog_data);
> +
> uint32_t dwords = brw->gen < 8 ? 8 : 9;
> BEGIN_BATCH(dwords);
> OUT_BATCH(MEDIA_VFE_STATE << 16 | (dwords - 2));
> @@ -358,6 +371,12 @@ brw_upload_cs_state(struct brw_context *brw)
> desc[dw++] = 0;
> desc[dw++] = 0;
> desc[dw++] = stage_state->bind_bo_offset;
> +   desc[dw++] = 0;
> +   const uint32_t media_threads =
> +  brw->gen >= 8 ?
> +  SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
> +  SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT);
> +   desc[dw++] = media_threads;

What's the deal with, "The maximum value for global barriers is limited by the
number of threads in the system, or by 511," Can we add an assert?

>  
> BEGIN_BATCH(4);
> OUT_BATCH(MEDIA_INTERFACE_DESCRIPTOR_LOAD << 16 | (4 - 2));
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index f6da305..2a8f500 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2495,6 +2495,11 @@ enum brw_wm_barycentric_interp_mode {
>  # define MEDIA_VFE_STATE_CURBE_ALLOC_MASK   INTEL_MASK(15, 0)
>  
>  #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
> +/* GEN7 DW5, GEN8+ DW6 */
> +# define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0
> +# define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
> +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
> +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(9, 0)
>  #define MEDIA_STATE_FLUSH   0x7004
>  #define GPGPU_WALKER0x7105
>  /* GEN8+ DW2 */
> -- 
> 2.1.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] mesa : NULL check InfoLog

2015-06-25 Thread Ben Widawsky

On Thu, Jun 25, 2015 at 02:52:47PM +0200, Marta Lofstedt wrote:
> From: Marta Lofstedt 
> 
> When a program is compiled, but linking failed the
> sh->InfoLog could be NULL. This is expoloited
> by OpenGL ES 3.1 conformance tests.
> 
> V2: ralloc_strdup shProg->InfoLog
> 
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/shaderapi.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> index a4296ad..bc6625a 100644
> --- a/src/mesa/main/shaderapi.c
> +++ b/src/mesa/main/shaderapi.c
> @@ -1921,7 +1921,10 @@ _mesa_create_shader_program(struct gl_context* ctx, 
> GLboolean separate,
>  #endif
>}
>  
> -  ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
> + if (sh->InfoLog)
> +ralloc_strcat(&shProg->InfoLog, sh->InfoLog);
> + else
> +ralloc_strdup(ctx, shProg->InfoLog);

I don't understand what the strdup part is meant to do. Without the else, this
is:
Reviewed-by: Ben Widawsky 

Feel free to explain why you need to dup the log in the else case, and I'll look
again.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Don't try to print the GLSL IR if it has been freed

2015-06-26 Thread Ben Widawsky

On Fri, Jun 26, 2015 at 05:54:15PM +0100, Neil Roberts wrote:
> Since commit 104c8fc2c2aa5621261f8 the GLSL IR will be freed if NIR is
> being used. This was causing it to segfault if INTEL_DEBUG=wm is set.
> This patch just makes it avoid dumping the GLSL IR in that case.
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 2327af7..85e271d 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -574,10 +574,13 @@ brw_dump_ir(const char *stage, struct gl_shader_program 
> *shader_prog,
>  struct gl_shader *shader, struct gl_program *prog)
>  {
> if (shader_prog) {
> -  fprintf(stderr,
> -  "GLSL IR for native %s shader %d:\n", stage, 
> shader_prog->Name);
> -  _mesa_print_ir(stderr, shader->ir, NULL);
> -  fprintf(stderr, "\n\n");
> +  if (shader->ir) {
> + fprintf(stderr,
> + "GLSL IR for native %s shader %d:\n",
> + stage, shader_prog->Name);
> + _mesa_print_ir(stderr, shader->ir, NULL);
> + fprintf(stderr, "\n\n");
> +  }
>     } else {
>fprintf(stderr, "ARB_%s_program %d ir for native %s shader\n",
>stage, prog->Id, stage);


Reviewed-by: Ben Widawsky 

(also, I think it's good practice to Cc the author the commit you referenced,
though maybe you did just without the "Cc:"
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] i965/gen9: Plugin the code for selecting YF/YS tiling on skl+

2015-06-29 Thread Ben Widawsky

On Fri, Jun 26, 2015 at 01:23:41PM -0700, Anuj Phogat wrote:
> On Mon, Jun 22, 2015 at 5:23 PM, Anuj Phogat  wrote:
> > On Mon, Jun 22, 2015 at 2:53 PM, Ben Widawsky  wrote:
> >> On Wed, Jun 10, 2015 at 03:30:47PM -0700, Anuj Phogat wrote:
> >>> Buffers with Yf/Ys tiling end up using meta upload / download
> >>> paths or the blitter for cases where they used tiled_memcpy paths
> >>> in case of Y tiling. This has exposed some bugs in meta path. To
> >>> avoid any piglit regressions on SKL this patch keeps the Yf/Ys
> >>> tiling disabled at the moment.
> >>>
> >>> V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben)
> >>> Few cosmetic changes.
> >>> V4: Get rid of brw_miptree_choose_tr_mode().
> >>> Take care of all tile resource modes {Yf, Ys, none} for all
> >>> generations at one place.
> >>>
> >>> Signed-off-by: Anuj Phogat 
> >>> Cc: Ben Widawsky 
> >>> ---
> >>>  src/mesa/drivers/dri/i965/brw_tex_layout.c | 97 
> >>> --
> >>>  1 file changed, 79 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> >>> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> >>> index b9ac4cf..c0ef5cc 100644
> >>> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> >>> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> >>> @@ -807,27 +807,88 @@ brw_miptree_layout(struct brw_context *brw,
> >>> enum intel_miptree_tiling_mode requested,
> >>> struct intel_mipmap_tree *mt)
> >>>  {
> >>> -   mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE;
> >>> +   const unsigned bpp = mt->cpp * 8;
> >>> +   const bool is_tr_mode_yf_ys_allowed =
> >>> +  brw->gen >= 9 &&
> >>> +  !for_bo &&
> >>> +  !mt->compressed &&
> >>> +  /* Enable YF/YS tiling only for color surfaces because depth and
> >>> +   * stencil surfaces are not supported in blitter using fast copy
> >>> +   * blit and meta PBO upload, download paths. No other paths
> >>> +   * currently support Yf/Ys tiled surfaces.
> >>> +   * FIXME:  Remove this restriction once we have a tiled_memcpy()
> >>> +   * path to do depth/stencil data upload/download to Yf/Ys tiled
> >>> +   * surfaces.
> >>> +   */
> >>
> >> I think it's more readable to move this comment above the variable 
> >> declaration.
> >> Up to you though. Also I think "FINISHME" is the more appropriate 
> >> classification
> >> for this type of thing.
> >>
> > Sure.
> >>> +  _mesa_is_format_color_format(mt->format) &&
> >>> +  (requested == INTEL_MIPTREE_TILING_Y ||
> >>> +   requested == INTEL_MIPTREE_TILING_ANY) &&
> >>
> >> This is where my tiling flags would have helped a bit since you should be 
> >> able
> >> to do flags & Y_TILED :P
> >>
> > Yes, I will do a follow up patch to make use of that.
> >>> +  (bpp && is_power_of_two(bpp)) &&
> >>> +  /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling
> >>> +   * disabled at the moment.
> >>> +   */
> >>> +  false;
> >>
> >> Also, "FINISHME"
> >>
> >>>
> >>> -   intel_miptree_set_alignment(brw, mt);
> >>> -   intel_miptree_set_total_width_height(brw, mt);
> >>> +   /* Lower index (Yf) is the higher priority mode */
> >>> +   const uint32_t tr_mode[3] = {INTEL_MIPTREE_TRMODE_YF,
> >>> +INTEL_MIPTREE_TRMODE_YS,
> >>> +INTEL_MIPTREE_TRMODE_NONE};
> >>> +   int i = is_tr_mode_yf_ys_allowed ? 0 : ARRAY_SIZE(tr_mode) - 1;
> >>>
> >>> -   if (!mt->total_width || !mt->total_height) {
> >>> -  intel_miptree_release(&mt);
> >>> -  return;
> >>> -   }
> >>> +   while (i < ARRAY_SIZE(tr_mode)) {
> >>> +  if (brw->gen < 9)
> >>> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE);
> >>> +  else
> >>> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_YF ||
> >>> +tr_mode[i]

Re: [Mesa-dev] [PATCH] i965/gen9: Use custom MOCS entries set up by the kernel.

2015-06-30 Thread Ben Widawsky

On Tue, Jun 30, 2015 at 11:25:42PM +0300, Francisco Jerez wrote:
> Instead of relying on hardware defaults the i915 kernel driver is
> going program custom MOCS tables system-wide on Gen9 hardware.  The
> "WT" entry previously used for renderbuffers had a number of problems:
> It disabled caching on eLLC, it used a reserved L3 cacheability
> setting, and it used to override the PTE controls making renderbuffers
> always WT on LLC regardless of the kernel's setting.  Instead use an
> entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
> L3CC=WB.
> 
> Even though the corresponding kernel change is in a way an ABI break
> it doesn't seem necessary to check that the kernel is recent enough
> because the change should only affect Gen9 which is still unreleased
> hardware.

I think the commit message is a bit confusing. You correctly mention the WT->PTE
fix, but then the reasoning for the WB change isn't clear [to me].

In any case, I think it makes a lot more sense to fix the PTE setting as one
patch for the old table, then a patch to update both WB and WT to the new table
settings. Also, we do have customers (Canonical) that want to make this work on
mesa 10.5, and with an older kernel. Therefore I think the two separate patches,
and doing it without the dependency on Ville's patch (which I like FWIW) make
the lives of everyone easiest. Then Ville can rebase his patch on top of this
for mesa 10.7 time.

I did think of it, but never broached the subject if we want to send both my
MOCS patch, and the PTE version of this patch to stable.

Anyway, the concept here is definitely
Acked-by: Ben Widawsky 

> ---
> Note that this change is based on Ville's "[PATCH 1/2] i965: House
> MOCS settings in brw_context/brw_device_info":
> 
> http://lists.freedesktop.org/archives/mesa-dev/2015-June/086665.html

Could you include a reference to the kernel patch too if you end up resending?

> 
>  src/mesa/drivers/dri/i965/brw_defines.h | 16 +++-
>  src/mesa/drivers/dri/i965/brw_device_info.c |  5 +++--
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 497da9c..2889118 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2499,12 +2499,18 @@ enum brw_wm_barycentric_interp_mode {
>   */
>  #define CHV_MOCS_L3  0x78
>  
> -/* Skylake: MOCS is now an index into an array of 64 different configurable
> - * cache settings.  We still use only either write-back or write-through; and
> - * rely on the documented default values.
> +/* Skylake: MOCS is now an index into an array of 64 different caching
> + * configurations programmed by the kernel.
>   */
> -#define SKL_MOCS_WB (0b001001 << 1)
> -#define SKL_MOCS_WT (0b000101 << 1)
> +/* TC=LLC/eLLC, LeCC=WB, LRUM=3, L3CC=WB */
> +#define SKL_MOCS_WB  (1 << 1)
> +/* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */
> +#define SKL_MOCS_PTE (9 << 1)
> +
> +/* Broxton: As for Skylake this should match the tables set up by the kernel.
> + */
> +/* L3CC=WB */
> +#define BXT_MOCS_L3  (9 << 1)
>  
>  #define MEDIA_VFE_STATE 0x7000
>  /* GEN7 DW2, GEN8+ DW3 */
> diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
> b/src/mesa/drivers/dri/i965/brw_device_info.c
> index 167ecb5..d5133e0 100644
> --- a/src/mesa/drivers/dri/i965/brw_device_info.c
> +++ b/src/mesa/drivers/dri/i965/brw_device_info.c
> @@ -305,7 +305,6 @@ static const struct brw_device_info brw_device_info_chv = 
> {
>  };
>  
>  /* Thread counts and URB limits are placeholders, and may not be accurate. */
> -/* FINISHME: Use PTE MOCS on Skylake. */
>  #define GEN9_FEATURES   \
> .gen = 9,\
> .has_hiz_and_separate_stencil = true,\
> @@ -315,7 +314,7 @@ static const struct brw_device_info brw_device_info_chv = 
> {
> .max_vs_threads = 280,   \
> .max_gs_threads = 256,   \
> .max_wm_threads = 408,   \
> -   .mocs_pte = SKL_MOCS_WT, \
> +   .mocs_pte = SKL_MOCS_PTE,\
> .mocs_wb = SKL_MOCS_WB,  \
> .urb = { \
>.size = 128,  \
> @@ -352,6 +351,8 @@ static const struct brw_device_info brw_device_info_bxt = 
> {
> .max_vs_threads = 112,
> .max_gs_threads = 112,
> .max_wm_threads = 32,
> +   .mocs_pte = BXT_MOCS_L3,
> +

Re: [Mesa-dev] [PATCH] i965/fs: Don't use the pixel interpolater for centroid interpolation

2015-06-30 Thread Ben Widawsky

y << 4)));
> +setup_pixel_interpolater_instruction(instr, inst);
>   } else {
>  src = vgrf(glsl_type::ivec2_type);
>  fs_reg offset_src = retype(get_nir_src(instr->src[0]),
> @@ -1531,9 +1550,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
> bld.SEL(offset(src, i), itemp, fs_reg(7)));
>  }
>  
> -mlen = 2;
>  inst = bld.emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, 
> dst_xy, src,
>  fs_reg(0u));
> +setup_pixel_interpolater_instruction(instr, inst, 2);
>   }
>   break;
>}
> @@ -1542,11 +1561,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
>   unreachable("Invalid intrinsic");
>}
>  
> -  inst->mlen = mlen;
> -  inst->regs_written = 2; /* 2 floats per slot returned */
> -  inst->pi_noperspective = instr->variables[0]->var->data.interpolation 
> ==
> -   INTERP_QUALIFIER_NOPERSPECTIVE;
> -
>for (unsigned j = 0; j < instr->num_components; j++) {
>   fs_reg src = interp_reg(instr->variables[0]->var->data.location, j);
>   src.type = dest.type;
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
> b/src/mesa/drivers/dri/i965/brw_wm.c
> index 592a729..f7fe1e0 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> @@ -40,9 +40,62 @@
>  #include "program/prog_parameter.h"
>  #include "program/program.h"
>  #include "intel_mipmap_tree.h"
> +#include "brw_nir.h"
>  
>  #include "util/ralloc.h"
>  
> +static bool
> +compute_modes_in_block(nir_block *block,
> +   void *state)
> +{
> +   unsigned *interp_modes = state;
> +   nir_intrinsic_instr *intrin;
> +   enum brw_wm_barycentric_interp_mode interp_mode;
> +
> +   nir_foreach_instr(block, instr) {
> +  if (instr->type != nir_instr_type_intrinsic)
> + continue;
> +
> +  intrin = nir_instr_as_intrinsic(instr);
> +
> +  if (intrin->intrinsic != nir_intrinsic_interp_var_at_centroid)
> + continue;
> +
> +  if (intrin->variables[0]->var->data.interpolation ==
> +  INTERP_QUALIFIER_NOPERSPECTIVE)
> + interp_mode = BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
> +  else
> + interp_mode = BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
> +
> +  *interp_modes |= 1 << interp_mode;
> +   }
> +
> +   return true;
> +}
> +
> +/**
> + * Looks for calls to interpolateAtCentroid within the program and returns a
> + * mask of the additional interpolation modes that they require.
> + */
> +static unsigned
> +compute_interpolate_at_centroid_modes(const struct gl_fragment_program 
> *fprog)
> +{
> +   unsigned interp_modes = 0;
> +   struct nir_shader *shader = fprog->Base.nir;
> +
> +   if (shader == NULL)
> +  return 0;
> +
> +   nir_foreach_overload(shader, overload) {
> +  if (overload->impl == NULL)
> + continue;
> +
> +  nir_foreach_block(overload->impl, compute_modes_in_block, 
> &interp_modes);
> +   }
> +
> +   return interp_modes;
> +}
> +
>  /**
>   * Return a bitfield where bit n is set if barycentric interpolation mode n
>   * (see enum brw_wm_barycentric_interp_mode) is needed by the fragment 
> shader.
> @@ -114,6 +167,8 @@ brw_compute_barycentric_interp_modes(struct brw_context 
> *brw,
>}
> }
>  
> +   barycentric_interp_modes |= compute_interpolate_at_centroid_modes(fprog);
> +
> return barycentric_interp_modes;
>  }
>  
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen9: Use custom MOCS entries set up by the kernel.

2015-06-30 Thread Ben Widawsky

On Wed, Jul 01, 2015 at 12:33:54AM +0300, Francisco Jerez wrote:
> Ben Widawsky  writes:
> 
> > On Tue, Jun 30, 2015 at 11:25:42PM +0300, Francisco Jerez wrote:
> >> Instead of relying on hardware defaults the i915 kernel driver is
> >> going program custom MOCS tables system-wide on Gen9 hardware.  The
> >> "WT" entry previously used for renderbuffers had a number of problems:
> >> It disabled caching on eLLC, it used a reserved L3 cacheability
> >> setting, and it used to override the PTE controls making renderbuffers
> >> always WT on LLC regardless of the kernel's setting.  Instead use an
> >> entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
> >> L3CC=WB.
> >> 
> >> Even though the corresponding kernel change is in a way an ABI break
> >> it doesn't seem necessary to check that the kernel is recent enough
> >> because the change should only affect Gen9 which is still unreleased
> >> hardware.
> >
> > I think the commit message is a bit confusing. You correctly mention the 
> > WT->PTE
> > fix, but then the reasoning for the WB change isn't clear [to me].
> >
> Right, I probably didn't mention it because the meaning of the WB define
> didn't change at all, the index into the new MOCS table is different but
> it should have the same semantics.
> 

I figured, just add it to the commit message :-)

> > In any case, I think it makes a lot more sense to fix the PTE setting as one
> > patch for the old table, then a patch to update both WB and WT to the new 
> > table
> > settings.
> 
> I tried to split up the patch that way originally, but unfortunately
> there's no entry in the default MOCS table equivalent to the new PTE
> setting, and there is also no equivalent to the old WT setting in the
> custom MOCS table (and it probably doesn't make sense to add one just
> for the sake of having a nice git history), so it doesn't seem easily
> possible to do it backwards either (first update to the new table, then
> switch to the PTE MOCS setting).

Hmm. I must not be following something because it sure looks like the HW
defaults have indices for the PTE setting. The index you're using from the new
table, 9 is just the hardware index 2, isn't it?

10  00  10  11  0   0   00  000

Can you explain what I'm missing?

> 
> > Also, we do have customers (Canonical) that want to make this work on
> > mesa 10.5, and with an older kernel. Therefore I think the two separate 
> > patches,
> > and doing it without the dependency on Ville's patch (which I like FWIW) 
> > make
> > the lives of everyone easiest. Then Ville can rebase his patch on top of 
> > this
> > for mesa 10.7 time.
> >
> The problem is that an equivalent patch not based on Ville's refactor
> would involve a considerable amount of churn because the BXT and SKL WB
> entries (which are used in many different places) don't match (sigh).
> It may not be suitable for stable either way, unless we drop BXT support
> or are OK with adding a bunch of ternary operators, basically anywhere
> SKL_MOCS_WB is used.

Yeah, we don't need BXT support in stable since BXT won't have PCI IDs until
10.7. So I'd be in favor of doing the easy SKL specific thing first if it's
possible

Please tell me there is a good reason that they didn't make BXT and SKL the
same...

> 
> > I did think of it, but never broached the subject if we want to send both my
> > MOCS patch, and the PTE version of this patch to stable.
> >
> > Anyway, the concept here is definitely
> > Acked-by: Ben Widawsky 
> >
> >> ---
> >> Note that this change is based on Ville's "[PATCH 1/2] i965: House
> >> MOCS settings in brw_context/brw_device_info":
> >> 
> >> http://lists.freedesktop.org/archives/mesa-dev/2015-June/086665.html
> >
> > Could you include a reference to the kernel patch too if you end up 
> > resending?
> 
> Ah, sure, here it is FTR, I didn't notice I hadn't included the link
> until it was too late:
>  
> http://lists.freedesktop.org/archives/intel-gfx/2015-June/070244.html
> 

Thanks.

> >
> >> 
> >>  src/mesa/drivers/dri/i965/brw_defines.h | 16 +++-
> >>  src/mesa/drivers/dri/i965/brw_device_info.c |  5 +++--
> >>  2 files changed, 14 insertions(+), 7 deletions(-)
> >> 
> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> >> b/src/mesa/drivers/dri/i965/brw_defines.h
> >> index 497da9c..2889118 100644
&

[Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a

2015-07-01 Thread Ben Widawsky

Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This fixes 0 piglit failures with a GBM gpu piglit run.

Signed-off-by: Ben Widawsky 
---

I had this one sitting around for almost 2 months. I'm not sure why I didn't
send it out sooner. It seems like it's needed.

---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 26 ++
 2 files changed, 27 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 66b9abc..f55fd49 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -276,6 +276,7 @@
 #define GEN8_SURFACE_TILING_W   (1 << 12)
 #define GEN8_SURFACE_TILING_X   (2 << 12)
 #define GEN8_SURFACE_TILING_Y   (3 << 12)
+#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
 #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
 #define BRW_SURFACE_MIPLAYOUT_SHIFT10
 #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index ca5ed17..a245379 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -264,6 +264,19 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
   surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
}
 
+   if (brw->is_cherryview || brw->gen >= 9) {
+  /* "This bit must be set for the following surface types: BC2_UNORM
+   * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM"
+   */
+  switch (format) {
+  case BRW_SURFACEFORMAT_BC2_UNORM:
+  case BRW_SURFACEFORMAT_BC3_UNORM:
+  case BRW_SURFACEFORMAT_BC5_SNORM:
+  case BRW_SURFACEFORMAT_BC7_UNORM:
+ surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
+  }
+   }
+
if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
   surf[0] |= GEN8_SURFACE_IS_ARRAY;
 
@@ -491,6 +504,19 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
  horizontal_alignment(brw, mt, surf_type) |
  surface_tiling_mode(tiling);
 
+   if (brw->is_cherryview || brw->gen >= 9) {
+  /* "This bit must be set for the following surface types: BC2_UNORM
+   * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM"
+   */
+  switch (format) {
+  case BRW_SURFACEFORMAT_BC2_UNORM:
+  case BRW_SURFACEFORMAT_BC3_UNORM:
+  case BRW_SURFACEFORMAT_BC5_SNORM:
+  case BRW_SURFACEFORMAT_BC7_UNORM:
+ surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
+  }
+   }
+
surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
 
surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) |
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/skl: Emit new 3DSTATE_VF_COMPONENT_PACKING

2015-07-01 Thread Ben Widawsky

We don't yet have a use for this state, but initializing it to known values is
always considered wise. In general NULL state can probably go in the misc state
upload, I only put it here because I assume it might be useful at some point.

Signed-off-by: Ben Widawsky 
---

I've had this patch sitting around for almost 3 months now. I believe we like to
initialize new fields as a general rule of thumb.

---
 src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
 src/mesa/drivers/dri/i965/gen8_draw_upload.c | 29 
 2 files changed, 31 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 66b9abc..444d974 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1713,6 +1713,8 @@ enum brw_message_target {
 
 #define _3DSTATE_VF_TOPOLOGY0x784b /* GEN8+ */
 
+#define _3DSTATE_VF_COMPONENT_PACKING   0x7855 /* GEN9+ */
+
 #define _3DSTATE_WM_CHROMAKEY  0x784c /* GEN8+ */
 
 #define _3DSTATE_URB_VS 0x7830 /* GEN7+ */
diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c 
b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
index 1af90ec..f6e7fc8 100644
--- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c
@@ -35,6 +35,33 @@
 #include "intel_batchbuffer.h"
 #include "intel_buffer_objects.h"
 
+/**
+ * Emits a null component packing state.
+ *
+ * Paraphrasing the docs: "This command is used to specify which 32-bit
+ * components are "enabled" to be stored in the URB, and which are "disabled".
+ * Disabling all four components for a given Vertex Element will result in no
+ * data stored for that element. Note that any insertion of SGVs
+ * (3DSTATE_VF_SGVS) is performed before the packing operation."
+ *
+ * FINISHME: "Component packing is probably only useful for SIMD8 VS thread
+ * execution." When enabled, the correct bit must be set in 3DSTATE_VF.
+ */
+static void
+gen9_emit_component_packing(struct brw_context *brw)
+{
+   if (brw->gen < 9)
+  return;
+
+   BEGIN_BATCH(5);
+   OUT_BATCH(_3DSTATE_VF_COMPONENT_PACKING << 16 | (5 - 2));
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+   OUT_BATCH(0);
+   ADVANCE_BATCH();
+}
+
 static void
 gen8_emit_vertices(struct brw_context *brw)
 {
@@ -44,6 +71,8 @@ gen8_emit_vertices(struct brw_context *brw)
brw_prepare_vertices(brw);
brw_prepare_shader_draw_parameters(brw);
 
+   gen9_emit_component_packing(brw);
+
if (brw->vs.prog_data->uses_vertexid || brw->vs.prog_data->uses_instanceid) 
{
   unsigned vue = brw->vb.nr_enabled;
 
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a

2015-07-01 Thread Ben Widawsky

On Wed, Jul 01, 2015 at 04:03:53PM -0700, Ben Widawsky wrote:
> Certain compressed formats require this setting. The docs don't go into much
> detail as to why it's needed exactly.
> 
> This fixes 0 piglit failures with a GBM gpu piglit run.

I just ran this again in piglit since Chris asked me to clarify. There are also
no regressions (http://otc-mesa-ci.jf.intel.com/job/Leeroy/124112/) - ie. no
changes.

(It looks like BSW was removed from jenkins).

[snip]

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRA

2015-07-03 Thread Ben Widawsky

On Fri, Jul 03, 2015 at 01:15:21PM +0100, Neil Roberts wrote:
> On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
> the shader sends a message to the pixel interpolator. This fixes the
> interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
> but that is not implemented anywhere so it's not a regression.
> ---
>  src/mesa/drivers/dri/i965/brw_context.h   | 1 +
>  src/mesa/drivers/dri/i965/brw_defines.h   | 1 +
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp  | 4 
>  src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +++
>  4 files changed, 9 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 3553f6e..7596139 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -415,6 +415,7 @@ struct brw_wm_prog_data {
> bool uses_pos_offset;
> bool uses_omask;
> bool uses_kill;
> +   bool pulls_bary;
> uint32_t prog_offset_16;
>  
> /**
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 66b9abc..19489ab 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2145,6 +2145,7 @@ enum brw_pixel_shader_computed_depth_mode {
>  # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1 << 7)
>  # define GEN8_PSX_SHADER_IS_PER_SAMPLE  (1 << 6)
>  # define GEN8_PSX_SHADER_COMPUTES_STENCIL   (1 << 5)
> +# define GEN9_PSX_SHADER_PULLS_BARY (1 << 3)
>  # define GEN8_PSX_SHADER_HAS_UAV(1 << 2)
>  # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK   (1 << 1)
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index bd71404..3ebc3a2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1481,6 +1481,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
> nir_intrinsic_instr *instr
> case nir_intrinsic_interp_var_at_centroid:
> case nir_intrinsic_interp_var_at_sample:
> case nir_intrinsic_interp_var_at_offset: {
> +  assert(stage == MESA_SHADER_FRAGMENT);
> +
> +  ((struct brw_wm_prog_data *) prog_data)->pulls_bary = true;
> +
>fs_reg dst_xy = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
>  
>/* For most messages, we need one reg of ignored data; the hardware
> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> index a88f109..d544509 100644
> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> @@ -58,6 +58,9 @@ gen8_upload_ps_extra(struct brw_context *brw,
> if (prog_data->uses_omask)
>dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET;
>  
> +   if (brw->gen >= 9 && prog_data->pulls_bary)
> +  dw1 |= GEN9_PSX_SHADER_PULLS_BARY;
> +
> if (_mesa_active_fragment_shader_has_atomic_ops(&brw->ctx))
>dw1 |= GEN8_PSX_SHADER_HAS_UAV;
>  

It's unclear to me what the downside to always setting this bit would be (I
assume that's the behavior of previous gens).

I also assume this means you're abandoning the other patch, or doing it on top
of this, else you don't want to do it for the centroid case.

Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Implement faster streaming memcpy

2015-07-08 Thread Ben Widawsky

WARNING: No perf data, please keep reading though)

This implements the suggestion provided by the paper, "Fast USWC to WB Memory
Copy"
(https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers).
This is described throughout the paper, but the sample code lives in Figure 3-3.
That paper purports a roughly 40% performance gain in Mbyte/second over the
original implementation done by Matt.

Section 3.1.2 is the summary of why an intermediate cache buffer is used. It
claims that if you use the naive implementation, fill buffers are contended for.
To be honest, I can't quite fathom the underlying explanation, but I'll think
about it some more. Most importantly would be to get the perf data... This patch
does need performance data. I don't currently have a platform that this would
benefit (BYT or BSW), so I can't get anything useful. As soon as I get a
platform to test it on, I will - meanwhile, maybe whomever tested the original
patch the first time around come run this through?

Cc: Matt Turner 
Cc: Chad Versace 
Cc: Kristian Høgsberg 
Signed-off-by: Ben Widawsky 
---
 src/mesa/main/streaming-load-memcpy.c | 61 +++
 1 file changed, 47 insertions(+), 14 deletions(-)

diff --git a/src/mesa/main/streaming-load-memcpy.c 
b/src/mesa/main/streaming-load-memcpy.c
index d7147af..3cd310a 100644
--- a/src/mesa/main/streaming-load-memcpy.c
+++ b/src/mesa/main/streaming-load-memcpy.c
@@ -30,6 +30,8 @@
 #include "main/streaming-load-memcpy.h"
 #include 
 
+static uint8_t rsvd_space[4096];
+
 /* Copies memory from src to dst, using SSE 4.1's MOVNTDQA to get streaming
  * read performance from uncached memory.
  */
@@ -59,23 +61,54 @@ _mesa_streaming_load_memcpy(void *restrict dst, void 
*restrict src, size_t len)
   len -= MIN2(bytes_before_alignment_boundary, len);
}
 
-   while (len >= 64) {
-  __m128i *dst_cacheline = (__m128i *)d;
-  __m128i *src_cacheline = (__m128i *)s;
+   while (len > 64) {
+  __m128i *cached_buffer = (__m128i *)rsvd_space;
+  size_t streaming_len = len > 4096 ? 4096 : len;
+
+  __asm__ volatile("mfence" ::: "memory");
+
+  while (streaming_len >= 64) {
+ __m128i *src_cacheline = (__m128i *)s;
+
+ __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
+ __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
+ __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
+ __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
+
+ _mm_store_si128(cached_buffer + 0, temp1);
+ _mm_store_si128(cached_buffer + 1, temp2);
+ _mm_store_si128(cached_buffer + 2, temp3);
+ _mm_store_si128(cached_buffer + 3, temp4);
+
+ s += 64;
+ streaming_len -= 64;
+ cached_buffer += 4;
+  }
+
+  cached_buffer = (__m128i *)rsvd_space;
+  streaming_len = len > 4096 ? 4096 : len;
+
+  __asm__ volatile("mfence" ::: "memory");
+
+  while (streaming_len >= 64) {
+ __m128i *dst_cacheline = (__m128i *)d;
+
+ __m128i temp1 = _mm_stream_load_si128(cached_buffer + 0);
+ __m128i temp2 = _mm_stream_load_si128(cached_buffer + 1);
+ __m128i temp3 = _mm_stream_load_si128(cached_buffer + 2);
+ __m128i temp4 = _mm_stream_load_si128(cached_buffer + 3);
 
-  __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
-  __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
-  __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
-  __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
+ _mm_store_si128(dst_cacheline + 0, temp1);
+ _mm_store_si128(dst_cacheline + 1, temp2);
+ _mm_store_si128(dst_cacheline + 2, temp3);
+ _mm_store_si128(dst_cacheline + 3, temp4);
 
-  _mm_store_si128(dst_cacheline + 0, temp1);
-  _mm_store_si128(dst_cacheline + 1, temp2);
-  _mm_store_si128(dst_cacheline + 2, temp3);
-  _mm_store_si128(dst_cacheline + 3, temp4);
+ d += 64;
+ streaming_len -= 64;
+ cached_buffer += 4;
 
-  d += 64;
-  s += 64;
-  len -= 64;
+ len -= 64;
+  }
}
 
/* memcpy() the tail. */
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCHv2] i965/gen9: Use custom MOCS entries set up by the kernel.

2015-07-08 Thread Ben Widawsky

On Tue, Jul 07, 2015 at 10:21:28PM +0300, Francisco Jerez wrote:
> Instead of relying on hardware defaults the i915 kernel driver is
> going program custom MOCS tables system-wide on Gen9 hardware.  The
> "WT" entry previously used for renderbuffers had a number of problems:
> It disabled caching on eLLC, it used a reserved L3 cacheability
> setting, and it used to override the PTE controls making renderbuffers
> always WT on LLC regardless of the kernel's setting.  Instead use an
> entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
> L3CC=WB.
> 
> The "WB" entry previously used for anything other than renderbuffers
> has moved to a different index in the new MOCS tables but it should
> have the same caching semantics as the old entry.
> 
> Even though the corresponding kernel change ("drm/i915: Added
> Programming of the MOCS") is in a way an ABI break it doesn't seem
> necessary to check that the kernel is recent enough because the change
> should only affect Gen9 which is still unreleased hardware.
> 
> v2: Update MOCS values for the new Android-incompatible tables
> introduced in v7 of the kernel patch.
> 
> Cc: 10.6 

It'd be cool to get perf data, but certainly not a requirement here since the
requirement to change is pretty obvious, IMO (mostly, I'm just curious). I do
like having the References: in the commit for the kernel patch, but that's just
me, and I can live with whatever.

> ---
>  src/mesa/drivers/dri/i965/brw_defines.h| 11 ++-
>  src/mesa/drivers/dri/i965/gen8_surface_state.c |  3 +--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 66b9abc..8ab8d62 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2491,12 +2491,13 @@ enum brw_wm_barycentric_interp_mode {
>  #define BDW_MOCS_WT  0x58
>  #define BDW_MOCS_PTE 0x18
>  
> -/* Skylake: MOCS is now an index into an array of 64 different configurable
> - * cache settings.  We still use only either write-back or write-through; and
> - * rely on the documented default values.
> +/* Skylake: MOCS is now an index into an array of 62 different caching
> + * configurations programmed by the kernel.

I'd keep the '64' instead of '62' the latter is a software construct, but
whatever you like.

>   */
> -#define SKL_MOCS_WB (0b001001 << 1)
> -#define SKL_MOCS_WT (0b000101 << 1)
> +/* TC=LLC/eLLC, LeCC=WB, LRUM=3, L3CC=WB */
> +#define SKL_MOCS_WB  (2 << 1)
> +/* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */
> +#define SKL_MOCS_PTE (1 << 1)
>  
>  #define MEDIA_VFE_STATE 0x7000
>  /* GEN7 DW2, GEN8+ DW3 */
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index bd3eb00..dfaf762 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -401,8 +401,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
>irb->mt_layer : (irb->mt_layer / MAX2(mt->num_samples, 1));
> GLenum gl_target =
>rb->TexImage ? rb->TexImage->TexObject->Target : GL_TEXTURE_2D;
> -   /* FINISHME: Use PTE MOCS on Skylake. */
> -   uint32_t mocs = brw->gen >= 9 ? SKL_MOCS_WT : BDW_MOCS_PTE;
> +   uint32_t mocs = brw->gen >= 9 ? SKL_MOCS_PTE : BDW_MOCS_PTE;

I don't know the policy on const really, but this is a good opportunity to
const.
>  
> intel_miptree_used_for_rendering(mt);
>  

Reviewed-by: Ben Widawsky 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/chv|skl: Apply sampler bypass w/a

2015-07-08 Thread Ben Widawsky

On Thu, Jul 02, 2015 at 12:58:33PM -0700, Matt Turner wrote:
> On Thu, Jul 2, 2015 at 12:57 PM, Matt Turner  wrote:
> > On Wed, Jul 1, 2015 at 4:03 PM, Ben Widawsky
> >  wrote:
> >> Certain compressed formats require this setting. The docs don't go into 
> >> much
> >> detail as to why it's needed exactly.
> >>
> >> This fixes 0 piglit failures with a GBM gpu piglit run.
> >
> > That's a really weird way of saying that.
> >
> >>
> >> Signed-off-by: Ben Widawsky 
> >> ---
> >>
> >> I had this one sitting around for almost 2 months. I'm not sure why I 
> >> didn't
> >> send it out sooner. It seems like it's needed.
> >>
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_defines.h|  1 +
> >>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 26 
> >> ++
> >>  2 files changed, 27 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> >> b/src/mesa/drivers/dri/i965/brw_defines.h
> >> index 66b9abc..f55fd49 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> @@ -276,6 +276,7 @@
> >>  #define GEN8_SURFACE_TILING_W   (1 << 12)
> >>  #define GEN8_SURFACE_TILING_X   (2 << 12)
> >>  #define GEN8_SURFACE_TILING_Y   (3 << 12)
> >> +#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
> >>  #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
> >>  #define BRW_SURFACE_MIPLAYOUT_SHIFT10
> >>  #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
> >> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> >> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> index ca5ed17..a245379 100644
> >> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> >> @@ -264,6 +264,19 @@ gen8_emit_texture_surface_state(struct brw_context 
> >> *brw,
> >>surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
> >> }
> >>
> >> +   if (brw->is_cherryview || brw->gen >= 9) {
> >> +  /* "This bit must be set for the following surface types: BC2_UNORM
> >> +   * BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM"
> >
> > Don't do naked quotes -- Use the normal style.
> >
> >> +   */
> >> +  switch (format) {
> >> +  case BRW_SURFACEFORMAT_BC2_UNORM:
> >> +  case BRW_SURFACEFORMAT_BC3_UNORM:
> >> +  case BRW_SURFACEFORMAT_BC5_SNORM:
> >
> > Missing BRW_SURFACEFORMAT_BC5_UNORM.
> >
> >> +  case BRW_SURFACEFORMAT_BC7_UNORM:
> >> + surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
> >
> > It wouldn't surprise me if static analysis tools complain about the
> > missing break.
> >
> > Add a break or make it an if statement (which would be two lines
> > shorter). All together, how about
> >
> >/* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
> > * bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
> > *
> > *This bit must be set for the following surface types: BC2_UNORM
> > *BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
> > */
> >if (format == BRW_SURFACEFORMAT_BC2_UNORM ||
> >format == BRW_SURFACEFORMAT_BC3_UNORM ||
> >format == BRW_SURFACEFORMAT_BC5_SNORM ||
> >format == BRW_SURFACEFORMAT_BC5_UNORM ||
> 
> Bah, would be nice to so BC5_UNORM and then BC5_SNORM to better match
> the comment.
> 

Thanks for catching the missing case.

> >format == BRW_SURFACEFORMAT_BC7_UNORM)
> >   surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
> >}
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] [v2] i965/chv|skl: Apply sampler bypass w/a

2015-07-08 Thread Ben Widawsky

Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/

v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
  surface state open coded because it seems to go better with the existing code.

Cc: Matt Turner 
Cc: Nanley Chery 
Cc: Jordan Justen  (aux-hiz needs this too)
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 29 ++
 2 files changed, 30 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 19489ab..3668967 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -276,6 +276,7 @@
 #define GEN8_SURFACE_TILING_W   (1 << 12)
 #define GEN8_SURFACE_TILING_X   (2 << 12)
 #define GEN8_SURFACE_TILING_Y   (3 << 12)
+#define GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE  (1 << 9)
 #define BRW_SURFACE_RC_READ_WRITE  (1 << 8)
 #define BRW_SURFACE_MIPLAYOUT_SHIFT10
 #define BRW_SURFACE_MIPMAPLAYOUT_BELOW   0
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index bd3eb00..9bbe8ae 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -132,6 +132,27 @@ horizontal_alignment(const struct brw_context *brw,
}
 }
 
+static bool
+sampler_l2_bypass_disable(struct brw_context *brw, unsigned format)
+{
+   /* From the CHV PRM, Volume 2d, page 321 (RENDER_SURFACE_STATE dword 0
+* bit 9 "Sampler L2 Bypass Mode Disable" Programming Notes):
+*
+*This bit must be set for the following surface types: BC2_UNORM
+*BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
+*/
+   if ((brw->gen >= 9 || brw->is_cherryview) &&
+   (format == BRW_SURFACEFORMAT_BC2_UNORM ||
+format == BRW_SURFACEFORMAT_BC3_UNORM ||
+format == BRW_SURFACEFORMAT_BC5_UNORM ||
+format == BRW_SURFACEFORMAT_BC5_SNORM ||
+format == BRW_SURFACEFORMAT_BC7_UNORM)) {
+  return true;
+   }
+
+   return false;
+}
+
 static uint32_t *
 allocate_surface_state(struct brw_context *brw, uint32_t *out_offset, int 
index)
 {
@@ -238,6 +259,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
   surf[0] |= BRW_SURFACE_CUBEFACE_ENABLES;
}
 
+   if (sampler_l2_bypass_disable(brw, format)) {
+  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
+   }
+
if (_mesa_is_array_texture(target) || target == GL_TEXTURE_CUBE_MAP)
   surf[0] |= GEN8_SURFACE_IS_ARRAY;
 
@@ -465,6 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
  horizontal_alignment(brw, mt, surf_type) |
  surface_tiling_mode(tiling);
 
+   if (sampler_l2_bypass_disable(brw, format)) {
+  surf[0] |= GEN8_SURFACE_SAMPLER_L2_BYPASS_DISABLE;
+   }
+
surf[1] = SET_FIELD(mocs, GEN8_SURFACE_MOCS) | mt->qpitch >> 2;
 
surf[2] = SET_FIELD(width - 1, GEN7_SURFACE_WIDTH) |
-- 
2.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] [v2] i965: Split out gen8 push constant state upload

2015-07-09 Thread Ben Widawsky

While implementing the workaround in the previous patch I noticed things were
starting to get a bit messy. Since gen8 works differently enough from gen7, I
thought splitting it out with be good.

While here, get rid of gen8 MOCS which does nothing and was in the wrong place
anyway.

This patch is totally optional. I'd be willing to just always use buffer #2 on
gen8+. Pre-HSW this wasn't allowed, but it looks like it's okay for gen8 too.

v2: Move inactive batch generation to the top of the function in order to make
the rest of the code easier to read.

Jenkins results (still a bunch of spurious failures, I miss Mark):
http://otc-mesa-ci.jf.intel.com/job/bwidawsk/169/

Signed-off-by: Ben Widawsky 
Reviewed-by: Anuj Phogat  (v1)
---

I had a minor bug in v1 which prevented me from pushing this sooner. I'd like to
merge this patch unless anyone has complaints?

---
 src/mesa/drivers/dri/i965/brw_state.h |  6 +-
 src/mesa/drivers/dri/i965/gen6_gs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen6_vs_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 93 ---
 5 files changed, 68 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 987672f..f45459d 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -368,9 +368,9 @@ brw_upload_pull_constants(struct brw_context *brw,
 
 /* gen7_vs_state.c */
 void
-gen7_upload_constant_state(struct brw_context *brw,
-   const struct brw_stage_state *stage_state,
-   bool active, unsigned opcode);
+brw_upload_constant_state(struct brw_context *brw,
+  const struct brw_stage_state *stage_state,
+  bool active, unsigned opcode);
 
 #ifdef __cplusplus
 }
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index eb4c586..19568b0 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -48,7 +48,7 @@ gen6_upload_gs_push_constants(struct brw_context *brw)
}
 
if (brw->gen >= 7)
-  gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
+  brw_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
 }
 
 const struct brw_tracked_state gen6_gs_push_constants = {
diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index 35d10ef..c33607d 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -140,8 +140,7 @@ gen6_upload_vs_push_constants(struct brw_context *brw)
   if (brw->gen == 7 && !brw->is_haswell && !brw->is_baytrail)
  gen7_emit_vs_workaround_flush(brw);
 
-  gen7_upload_constant_state(brw, stage_state, true /* active */,
- _3DSTATE_CONSTANT_VS);
+  brw_upload_constant_state(brw, stage_state, true, _3DSTATE_CONSTANT_VS);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index d1748ba..ced4ad6 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -50,8 +50,7 @@ gen6_upload_wm_push_constants(struct brw_context *brw)
   stage_state, AUB_TRACE_WM_CONSTANTS);
 
if (brw->gen >= 7) {
-  gen7_upload_constant_state(brw, &brw->wm.base, true,
- _3DSTATE_CONSTANT_PS);
+  brw_upload_constant_state(brw, &brw->wm.base, true, 
_3DSTATE_CONSTANT_PS);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 4b17d06..6a51934 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -29,20 +29,23 @@
 #include "program/prog_statevars.h"
 #include "intel_batchbuffer.h"
 
-
-void
-gen7_upload_constant_state(struct brw_context *brw,
+static void
+gen8_upload_constant_state(struct brw_context *brw,
const struct brw_stage_state *stage_state,
bool active, unsigned opcode)
 {
-   uint32_t mocs = brw->gen < 8 ? GEN7_MOCS_L3 : 0;
 
-   /* Disable if the shader stage is inactive or there are no push constants. 
*/
-   active = active && stage_state->push_const_size != 0;
+   /* FINISHME: determine if we should use mocs on gen9 */
 
-   int dwords = brw->gen >= 8 ? 11 : 7;
-   BEGIN_BATCH(dwords);
-   OUT_BATCH(opcode << 16 | (dwords - 2));
+   BEGIN_BATCH(11);
+   OUT_BATCH(opcode << 16 | (11 - 2));
+
+   if (!active) {
+  for (int i = 0; i < 11; i++)
+ OUT_BATCH(0);
+
+  return;
+   }
 
/* Workaround for SKL+ (we use option #2 u

Re: [Mesa-dev] [PATCH] [v2] i965: Split out gen8 push constant state upload

2015-07-09 Thread Ben Widawsky

On Thu, Jul 09, 2015 at 09:44:52AM -0700, Ben Widawsky wrote:
> While implementing the workaround in the previous patch I noticed things were
> starting to get a bit messy. Since gen8 works differently enough from gen7, I
> thought splitting it out with be good.
> 
> While here, get rid of gen8 MOCS which does nothing and was in the wrong place
> anyway.
> 
> This patch is totally optional. I'd be willing to just always use buffer #2 on
> gen8+. Pre-HSW this wasn't allowed, but it looks like it's okay for gen8 too.
> 
> v2: Move inactive batch generation to the top of the function in order to make
> the rest of the code easier to read.
> 
> Jenkins results (still a bunch of spurious failures, I miss Mark):
> http://otc-mesa-ci.jf.intel.com/job/bwidawsk/169/
> 
> Signed-off-by: Ben Widawsky 
> Reviewed-by: Anuj Phogat  (v1)
> ---
> 
> I had a minor bug in v1 which prevented me from pushing this sooner. I'd like 
> to
> merge this patch unless anyone has complaints?
> 
> ---
>  src/mesa/drivers/dri/i965/brw_state.h |  6 +-
>  src/mesa/drivers/dri/i965/gen6_gs_state.c |  2 +-
>  src/mesa/drivers/dri/i965/gen6_vs_state.c |  3 +-
>  src/mesa/drivers/dri/i965/gen6_wm_state.c |  3 +-
>  src/mesa/drivers/dri/i965/gen7_vs_state.c | 93 
> ---
>  5 files changed, 68 insertions(+), 39 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
> b/src/mesa/drivers/dri/i965/brw_state.h
> index 987672f..f45459d 100644
> --- a/src/mesa/drivers/dri/i965/brw_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_state.h
> @@ -368,9 +368,9 @@ brw_upload_pull_constants(struct brw_context *brw,
>  
>  /* gen7_vs_state.c */
>  void
> -gen7_upload_constant_state(struct brw_context *brw,
> -   const struct brw_stage_state *stage_state,
> -   bool active, unsigned opcode);
> +brw_upload_constant_state(struct brw_context *brw,
> +  const struct brw_stage_state *stage_state,
> +  bool active, unsigned opcode);
>  
>  #ifdef __cplusplus
>  }
> diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
> b/src/mesa/drivers/dri/i965/gen6_gs_state.c
> index eb4c586..19568b0 100644
> --- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
> @@ -48,7 +48,7 @@ gen6_upload_gs_push_constants(struct brw_context *brw)
> }
>  
> if (brw->gen >= 7)
> -  gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
> +  brw_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
>  }
>  
>  const struct brw_tracked_state gen6_gs_push_constants = {
> diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
> b/src/mesa/drivers/dri/i965/gen6_vs_state.c
> index 35d10ef..c33607d 100644
> --- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
> @@ -140,8 +140,7 @@ gen6_upload_vs_push_constants(struct brw_context *brw)
>if (brw->gen == 7 && !brw->is_haswell && !brw->is_baytrail)
>   gen7_emit_vs_workaround_flush(brw);
>  
> -  gen7_upload_constant_state(brw, stage_state, true /* active */,
> - _3DSTATE_CONSTANT_VS);
> +  brw_upload_constant_state(brw, stage_state, true, 
> _3DSTATE_CONSTANT_VS);
> }
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
> b/src/mesa/drivers/dri/i965/gen6_wm_state.c
> index d1748ba..ced4ad6 100644
> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
> @@ -50,8 +50,7 @@ gen6_upload_wm_push_constants(struct brw_context *brw)
>stage_state, AUB_TRACE_WM_CONSTANTS);
>  
> if (brw->gen >= 7) {
> -  gen7_upload_constant_state(brw, &brw->wm.base, true,
> - _3DSTATE_CONSTANT_PS);
> +  brw_upload_constant_state(brw, &brw->wm.base, true, 
> _3DSTATE_CONSTANT_PS);
> }
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
> b/src/mesa/drivers/dri/i965/gen7_vs_state.c
> index 4b17d06..6a51934 100644
> --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
> @@ -29,20 +29,23 @@
>  #include "program/prog_statevars.h"
>  #include "intel_batchbuffer.h"
>  
> -
> -void
> -gen7_upload_constant_state(struct brw_context *brw,
> +static void
> +gen8_upload_constant_state(struct brw_context *brw,
> const struct brw_stage_state *stage_state,
> bool active,

[Mesa-dev] [PATCH] [v3] i965: Split out gen8 push constant state upload

2015-07-09 Thread Ben Widawsky

While implementing the workaround in the previous patch I noticed things were
starting to get a bit messy. Since gen8 works differently enough from gen7, I
thought splitting it out with be good.

While here, get rid of gen8 MOCS which does nothing and was in the wrong place
anyway.

This patch is totally optional. I'd be willing to just always use buffer #2 on
gen8+. Pre-HSW this wasn't allowed, but it looks like it's okay for gen8 too.

v2: Move inactive batch generation to the top of the function in order to make
the rest of the code easier to read.

Jenkins results (still a bunch of spurious failures, I miss Mark):
http://otc-mesa-ci.jf.intel.com/job/bwidawsk/169/

v3: v2 had a bug in that it both didn't emit the right number of dwords, and it
didn't do ADVANCE_BATCH(). I'm moderately worried that there were no failures as
a result.
http://otc-mesa-ci.jf.intel.com/job/bwidawsk/170/

Signed-off-by: Ben Widawsky 
Reviewed-by: Anuj Phogat  (v1)
---
 src/mesa/drivers/dri/i965/brw_state.h |  6 +-
 src/mesa/drivers/dri/i965/gen6_gs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen6_vs_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c |  3 +-
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 94 +--
 5 files changed, 69 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 987672f..f45459d 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -368,9 +368,9 @@ brw_upload_pull_constants(struct brw_context *brw,
 
 /* gen7_vs_state.c */
 void
-gen7_upload_constant_state(struct brw_context *brw,
-   const struct brw_stage_state *stage_state,
-   bool active, unsigned opcode);
+brw_upload_constant_state(struct brw_context *brw,
+  const struct brw_stage_state *stage_state,
+  bool active, unsigned opcode);
 
 #ifdef __cplusplus
 }
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index eb4c586..19568b0 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -48,7 +48,7 @@ gen6_upload_gs_push_constants(struct brw_context *brw)
}
 
if (brw->gen >= 7)
-  gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
+  brw_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
 }
 
 const struct brw_tracked_state gen6_gs_push_constants = {
diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index 35d10ef..c33607d 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -140,8 +140,7 @@ gen6_upload_vs_push_constants(struct brw_context *brw)
   if (brw->gen == 7 && !brw->is_haswell && !brw->is_baytrail)
  gen7_emit_vs_workaround_flush(brw);
 
-  gen7_upload_constant_state(brw, stage_state, true /* active */,
- _3DSTATE_CONSTANT_VS);
+  brw_upload_constant_state(brw, stage_state, true, _3DSTATE_CONSTANT_VS);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index d1748ba..ced4ad6 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -50,8 +50,7 @@ gen6_upload_wm_push_constants(struct brw_context *brw)
   stage_state, AUB_TRACE_WM_CONSTANTS);
 
if (brw->gen >= 7) {
-  gen7_upload_constant_state(brw, &brw->wm.base, true,
- _3DSTATE_CONSTANT_PS);
+  brw_upload_constant_state(brw, &brw->wm.base, true, 
_3DSTATE_CONSTANT_PS);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 4b17d06..f8f0ad2 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -29,20 +29,24 @@
 #include "program/prog_statevars.h"
 #include "intel_batchbuffer.h"
 
-
-void
-gen7_upload_constant_state(struct brw_context *brw,
+static void
+gen8_upload_constant_state(struct brw_context *brw,
const struct brw_stage_state *stage_state,
bool active, unsigned opcode)
 {
-   uint32_t mocs = brw->gen < 8 ? GEN7_MOCS_L3 : 0;
 
-   /* Disable if the shader stage is inactive or there are no push constants. 
*/
-   active = active && stage_state->push_const_size != 0;
+   /* FINISHME: determine if we should use mocs on gen9 */
 
-   int dwords = brw->gen >= 8 ? 11 : 7;
-   BEGIN_BATCH(dwords);
-   OUT_BATCH(opcode << 16 | (dwords - 2));
+   BEGIN_BATCH(11);
+   OUT_BATCH(opcode << 16 | (11 - 2));
+
+   if (!active) {
+  for (int i = 0; i < 10; i++)
+

Re: [Mesa-dev] [PATCH] i965/cs: Initialize GPGPU Thread Count

2015-07-09 Thread Ben Widawsky

On Thu, Jul 02, 2015 at 11:32:03PM -0700, Jordan Justen wrote:
> On 2015-06-25 11:34:59, Ben Widawsky wrote:
> > On Thu, Jun 11, 2015 at 09:04:45PM -0700, Jordan Justen wrote:
> > > +   desc[dw++] = 0;
> > > +   const uint32_t media_threads =
> > > +  brw->gen >= 8 ?
> > > +  SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
> > > +  SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT);
> > > +   desc[dw++] = media_threads;
> > 
> > What's the deal with, "The maximum value for global barriers is limited by 
> > the
> > number of threads in the system, or by 511," Can we add an assert?
> 
> I guess we are using a local barrier, so "the maximum value is the
> number of threads in a subslice for local barriers".
> 
> How about I add assert(threads <= brw->max_cs_threads)? Although, in
> brw_compute:brw_emit_gpgpu_walker we have a similar assertion.
> 
> -Jordan


Seems fine either way. The original mail had my r-b

> 
> > >  
> > > BEGIN_BATCH(4);
> > > OUT_BATCH(MEDIA_INTERFACE_DESCRIPTOR_LOAD << 16 | (4 - 2));
> > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> > > b/src/mesa/drivers/dri/i965/brw_defines.h
> > > index f6da305..2a8f500 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > > @@ -2495,6 +2495,11 @@ enum brw_wm_barycentric_interp_mode {
> > >  # define MEDIA_VFE_STATE_CURBE_ALLOC_MASK   INTEL_MASK(15, 0)
> > >  
> > >  #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002
> > > +/* GEN7 DW5, GEN8+ DW6 */
> > > +# define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0
> > > +# define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
> > > +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
> > > +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(9, 0)
> > >  #define MEDIA_STATE_FLUSH   0x7004
> > >  #define GPGPU_WALKER0x7105
> > >  /* GEN8+ DW2 */
> > > -- 
> > > 2.1.4
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965/cs: Setup push constant data for uniforms

2015-07-09 Thread Ben Widawsky

On Tue, Jun 16, 2015 at 02:21:39PM -0700, Jordan Justen wrote:
> brw_upload_cs_push_constants was based on gen6_upload_push_constants.

This review is based off of 2838833bfd5eb0a87fdacfa1cd6391b50f9c0b8b in your
repository. This patch doesn't apply cleanly in its current form.

> 
> Signed-off-by: Jordan Justen 
> ---
>  These 2 patches allow this piglit to pass:
>  
> tests/spec/arb_compute_shader/execution/basic-uniform-access-atomic.shader_test
>  (Also requires overriding the GL version and some extensions...)
> 
>  src/mesa/drivers/dri/i965/brw_context.h  |   2 +-
>  src/mesa/drivers/dri/i965/brw_cs.cpp | 119 
> ++-
>  src/mesa/drivers/dri/i965/brw_defines.h  |   6 ++
>  src/mesa/drivers/dri/i965/brw_state.h|   1 +
>  src/mesa/drivers/dri/i965/brw_state_upload.c |   2 +
>  5 files changed, 125 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 01c4283..9ea0dfd 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1457,7 +1457,7 @@ struct brw_context
>  
> int num_atoms[BRW_NUM_PIPELINES];
> const struct brw_tracked_state render_atoms[57];
> -   const struct brw_tracked_state compute_atoms[3];
> +   const struct brw_tracked_state compute_atoms[4];
>  
> /* If (INTEL_DEBUG & DEBUG_BATCH) */
> struct {
> diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp 
> b/src/mesa/drivers/dri/i965/brw_cs.cpp
> index 44c76ba..e26d576 100644
> --- a/src/mesa/drivers/dri/i965/brw_cs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp
> @@ -320,6 +320,9 @@ brw_upload_cs_state(struct brw_context *brw)
>  
> prog_data->binding_table.size_bytes,
>  32, 
> &stage_state->bind_bo_offset);
>  
> +   unsigned push_constant_size =
> +  prog_data->nr_params * sizeof(gl_constant_value);
> +   unsigned reg_aligned_constant_size = ALIGN(push_constant_size, 32);

It seems 64b alignment is a requirement on BDW for the MEDIA_CURBE_LOAD.

> unsigned threads = get_cs_thread_count(cs_prog_data);
>  

Also, you either need to apply the workaround required (MI_ATOMIC), or stick in
an assert and deal with the workaround later.
/* FINISHME: WaSomeNameFillMeIn */
if (brw->gen >= 8)
assert (reg_aligned_constant_size * threads <= 4032);

> uint32_t dwords = brw->gen < 8 ? 8 : 9;
> @@ -352,12 +355,24 @@ brw_upload_cs_state(struct brw_context *brw)
>  
> OUT_BATCH(0);
> const uint32_t vfe_urb_allocation = brw->gen >= 8 ? 2 : 0;
> -   OUT_BATCH(SET_FIELD(vfe_urb_allocation, MEDIA_VFE_STATE_URB_ALLOC));
> +   const uint32_t vfe_curbe_allocation =
> +  (reg_aligned_constant_size / 32) * threads + 32;

I can't make sense out of the additional 32 at the end. What is that for? If
it's the descriptor entries, this looks wrong for HSW (and beyond).

It seems like from the Indirect Payload Storage section of the docs (which is
specific to GEN8+, but has the legacy way), there is a way to specify cross
thread data. Maybe you can look into this some more so I don't have to.

> +   OUT_BATCH(SET_FIELD(vfe_urb_allocation, MEDIA_VFE_STATE_URB_ALLOC) |
> + SET_FIELD(vfe_curbe_allocation, MEDIA_VFE_STATE_CURBE_ALLOC));
> OUT_BATCH(0);
> OUT_BATCH(0);
> OUT_BATCH(0);
> ADVANCE_BATCH();
>  
> +   if (reg_aligned_constant_size > 0) {
> +  BEGIN_BATCH(4);
> +  OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
> +  OUT_BATCH(0);
> +  OUT_BATCH(reg_aligned_constant_size * threads);
> +  OUT_BATCH(stage_state->push_const_offset);
> +  ADVANCE_BATCH();
> +   }
> +
> /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
> memcpy(bind, stage_state->surf_offset,
>prog_data->binding_table.size_bytes);
> @@ -371,7 +386,8 @@ brw_upload_cs_state(struct brw_context *brw)
> desc[dw++] = 0;
> desc[dw++] = 0;
> desc[dw++] = stage_state->bind_bo_offset;
> -   desc[dw++] = 0;
> +   desc[dw++] = SET_FIELD((reg_aligned_constant_size / 32) + 0,
> +  MEDIA_CURBE_READ_LENGTH);
> const uint32_t media_threads =
>brw->gen >= 8 ?
>SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
> @@ -392,8 +408,103 @@ const struct brw_tracked_state brw_cs_state = {
> /* explicit initialisers aren't valid C++, comment
>  * them for documentation purposes */
> /* .dirty = */{
> -  /* .mesa = */ 0,
> -  /* .brw = */  BRW_NEW_CS_PROG_DATA,
> +  /* .mesa = */ _NEW_PROGRAM_CONSTANTS,
> +  /* .brw = */  BRW_NEW_CS_PROG_DATA |
> +BRW_NEW_PUSH_CONSTANT_ALLOCATION,
> },
> /* .emit = */ brw_upload_cs_state
>  };
> +
> +
> +/**
> + * Creates a region containing the push constants for the CS on gen7+.
> + *
> + * Push constants are constant values (such as GLSL uniforms) that are
> + * pre-loaded into a shader stage's register spac

Re: [Mesa-dev] [PATCH] [v3] i965: Split out gen8 push constant state upload

2015-07-10 Thread Ben Widawsky

On Fri, Jul 10, 2015 at 12:03:54PM -0700, Matt Turner wrote:
> On Thu, Jul 9, 2015 at 11:00 AM, Ben Widawsky
>  wrote:
> > While implementing the workaround in the previous patch I noticed things 
> > were
> > starting to get a bit messy. Since gen8 works differently enough from gen7, 
> > I
> > thought splitting it out with be good.
> >
> > While here, get rid of gen8 MOCS which does nothing and was in the wrong 
> > place
> > anyway.
> >
> > This patch is totally optional. I'd be willing to just always use buffer #2 
> > on
> > gen8+. Pre-HSW this wasn't allowed, but it looks like it's okay for gen8 
> > too.
> >
> > v2: Move inactive batch generation to the top of the function in order to 
> > make
> > the rest of the code easier to read.
> >
> > Jenkins results (still a bunch of spurious failures, I miss Mark):
> > http://otc-mesa-ci.jf.intel.com/job/bwidawsk/169/
> >
> > v3: v2 had a bug in that it both didn't emit the right number of dwords, 
> > and it
> > didn't do ADVANCE_BATCH(). I'm moderately worried that there were no 
> > failures as
> > a result.
> > http://otc-mesa-ci.jf.intel.com/job/bwidawsk/170/
> 
> I don't think putting Intel-internal links in the commit message is a good 
> idea.
> 
> Ken's made similar comments to me.
> 
> Also, so much off the wall commentary...

Maybe my definition of "off the wall" is different than yours. The only thing
off the wall to me, was the bit about missing Mark. It was *some* off the wall
commentary.

That aside though, I think the internal links is a good point and thing to
discuss... I've had a couple of cases already where I, or Neil benefited from
the Jenkins links being there to try to figure out some later regression. I can
sympathize with not having internal links in the history since it isn't
accessible to anyone. Earlier, I would have fought somewhat strongly for the
links, except that when Mark moved servers he didn't preserve the old links, so
that made me feel like it's a lot more transient than I initially felt.

However, I think it's really valuable for us to have them in the patches,
especially for review by some of the internal folks - like isn't it great to see
for yourself that I ran it? I suppose I can discard the URLs before pushing. The
cases I mentioned above would have benefited just as well having the links on
the list and not in the commit history (albeit a bit harder to find). Any
opposition to that?

*I do certainly thing posting JIRA tasks is irrelevant and wrong unless the
entire contents of the JIRA entry are also pasted. Why I feel Jenkins results
are different is we at least know approximately what is contained at that link.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] i965/miptree: Separate special miptree mappings

2015-07-14 Thread Ben Widawsky

Several mappings require special handling (stencil, etc textures, and depth).
Since I am attempting to clean up the logic which chooses the way in which we
map things, relegating this inflexible part to another part of the function
reduces complexity.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 1330c2f..b5cd6a0 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2693,12 +2693,17 @@ intel_miptree_map(struct brw_context *brw,
 
if (mt->format == MESA_FORMAT_S_UINT8) {
   intel_miptree_map_s8(brw, mt, map, level, slice);
+  goto done;
} else if (mt->etc_format != MESA_FORMAT_NONE &&
   !(mode & BRW_MAP_DIRECT_BIT)) {
   intel_miptree_map_etc(brw, mt, map, level, slice);
+  goto done;
} else if (mt->stencil_mt && !(mode & BRW_MAP_DIRECT_BIT)) {
   intel_miptree_map_depthstencil(brw, mt, map, level, slice);
-   } else if (use_intel_mipree_map_blit(brw, mt, mode, level, slice)) {
+  goto done;
+   }
+
+   if (use_intel_mipree_map_blit(brw, mt, mode, level, slice)) {
   intel_miptree_map_blit(brw, mt, map, level, slice);
 #if defined(USE_SSE41)
} else if (!(mode & GL_MAP_WRITE_BIT) &&
@@ -2710,6 +2715,7 @@ intel_miptree_map(struct brw_context *brw,
   intel_miptree_map_gtt(brw, mt, map, level, slice);
}
 
+done:
*out_ptr = map->ptr;
*out_stride = map->stride;
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] i965/miptree: Cleanup some of the miptree map logic

2015-07-14 Thread Ben Widawsky

At the crux of this change is moving whether or not we can even use the hardware
blitter into the can_blit_slice check. Fundamentally this makes sense as
blitting a slice is a subset in functionality of being able to use the blitter
at all.

NOTE: I think it's bad practice to have the assert in a function that is
determining whether or not we should use the blitter, but I tried the
alternatives, and they look worse IMO.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_blit.c| 13 +
 src/mesa/drivers/dri/i965/intel_blit.h|  3 +++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 27 +--
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_blit.c 
b/src/mesa/drivers/dri/i965/intel_blit.c
index bc39053..c4701e3 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.c
+++ b/src/mesa/drivers/dri/i965/intel_blit.c
@@ -241,6 +241,19 @@ intel_miptree_blit_compatible_formats(mesa_format src, 
mesa_format dst)
return false;
 }
 
+bool
+intel_miptree_can_hw_blit(struct brw_context *brw, struct intel_mipmap_tree 
*mt)
+{
+   if (mt->compressed)
+  return false;
+
+   /* Prior to Sandybridge, the blitter can't handle Y tiling */
+   if (brw->gen < 6 && mt->tiling == I915_TILING_Y)
+  return false;
+
+   return true;
+}
+
 /**
  * Implements a rectangular block transfer (blit) of pixels between two
  * miptrees.
diff --git a/src/mesa/drivers/dri/i965/intel_blit.h 
b/src/mesa/drivers/dri/i965/intel_blit.h
index c3d19a5..e60dd9b 100644
--- a/src/mesa/drivers/dri/i965/intel_blit.h
+++ b/src/mesa/drivers/dri/i965/intel_blit.h
@@ -50,6 +50,9 @@ intelEmitCopyBlit(struct brw_context *brw,
 
 bool intel_miptree_blit_compatible_formats(mesa_format src, mesa_format dst);
 
+bool intel_miptree_can_hw_blit(struct brw_context *brw,
+   struct intel_mipmap_tree *mt);
+
 bool intel_miptree_blit(struct brw_context *brw,
 struct intel_mipmap_tree *src_mt,
 int src_level, int src_slice,
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 72fba49..1330c2f 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2600,9 +2600,14 @@ intel_miptree_release_map(struct intel_mipmap_tree *mt,
 }
 
 static bool
-can_blit_slice(struct intel_mipmap_tree *mt,
+can_blit_slice(struct brw_context *brw,
+   struct intel_mipmap_tree *mt,
unsigned int level, unsigned int slice)
 {
+
+   if (!intel_miptree_can_hw_blit(brw, mt))
+  return false;
+
uint32_t image_x;
uint32_t image_y;
intel_miptree_get_image_offset(mt, level, slice, &image_x, &image_y);
@@ -2624,20 +2629,22 @@ use_intel_mipree_map_blit(struct brw_context *brw,
   unsigned int slice)
 {
if (brw->has_llc &&
-  /* It's probably not worth swapping to the blit ring because of
-   * all the overhead involved.
-   */
!(mode & GL_MAP_WRITE_BIT) &&
-   !mt->compressed &&
-   (mt->tiling == I915_TILING_X ||
-/* Prior to Sandybridge, the blitter can't handle Y tiling */
-(brw->gen >= 6 && mt->tiling == I915_TILING_Y)) &&
-   can_blit_slice(mt, level, slice))
+   can_blit_slice(brw, mt, level, slice))
   return true;
 
if (mt->tiling != I915_TILING_NONE &&
mt->bo->size >= brw->max_gtt_map_object_size) {
-  assert(can_blit_slice(mt, level, slice));
+  /* XXX: This assertion is actually the final condition for platforms
+   * without SSE4.1.  Returning false is not the right thing to do with
+   * the current code. On those platforms, the goal of this function is to 
give
+   * preference to the GTT, and at this point we've determined we cannot 
use
+   * the GTT, and we cannot blit, so we are out of options.
+   *
+   * NOTE: It should be possible to actually handle the case, but AFAIK, we
+   * never get this assertion.
+   */
+  assert(can_blit_slice(brw, mt, level, slice));
   return true;
}
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] i965: Push miptree tiling request into flags

2015-07-14 Thread Ben Widawsky

With the last few patches a way was provided to influence lower layer miptree
layout and allocation decisions via flags (replacing bools). For simplicity, I
chose not to touch the tiling requests because the change was slightly less
mechanical than replacing the bools.

The goal is to organize the code so we can continue to add new parameters and
tiling types while minimizing risk to the existing code, and not having to
constantly add new function parameters.

v2: Rebased on Anuj's recent Yf/Ys changes
Fix non-msrt MCS allocation (was only happening in gen8 case before)

Cc: Anuj Phogat 
Cc: Chad Versace 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_tex_layout.c | 21 ++--
 src/mesa/drivers/dri/i965/intel_fbo.c  |  6 ++--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 45 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h  | 15 -
 src/mesa/drivers/dri/i965/intel_tex.c  |  2 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c|  3 +-
 src/mesa/drivers/dri/i965/intel_tex_validate.c |  5 +--
 7 files changed, 50 insertions(+), 47 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
b/src/mesa/drivers/dri/i965/brw_tex_layout.c
index 389834f..a12b4af 100644
--- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
+++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
@@ -614,8 +614,8 @@ brw_miptree_layout_texture_3d(struct brw_context *brw,
  */
 static uint32_t
 brw_miptree_choose_tiling(struct brw_context *brw,
-  enum intel_miptree_tiling_mode requested,
-  const struct intel_mipmap_tree *mt)
+  const struct intel_mipmap_tree *mt,
+  uint32_t layout_flags)
 {
if (mt->format == MESA_FORMAT_S_UINT8) {
   /* The stencil buffer is W tiled. However, we request from the kernel a
@@ -624,15 +624,18 @@ brw_miptree_choose_tiling(struct brw_context *brw,
   return I915_TILING_NONE;
}
 
+   /* Do not support changing the tiling for miptrees with pre-allocated BOs. 
*/
+   assert((layout_flags & MIPTREE_LAYOUT_FOR_BO) == 0);
+
/* Some usages may want only one type of tiling, like depth miptrees (Y
 * tiled), or temporary BOs for uploading data once (linear).
 */
-   switch (requested) {
-   case INTEL_MIPTREE_TILING_ANY:
+   switch (layout_flags & MIPTREE_LAYOUT_ALLOC_ANY_TILED) {
+   case MIPTREE_LAYOUT_ALLOC_ANY_TILED:
   break;
-   case INTEL_MIPTREE_TILING_Y:
+   case MIPTREE_LAYOUT_ALLOC_YTILED:
   return I915_TILING_Y;
-   case INTEL_MIPTREE_TILING_NONE:
+   case MIPTREE_LAYOUT_ALLOC_LINEAR:
   return I915_TILING_NONE;
}
 
@@ -835,7 +838,6 @@ intel_miptree_can_use_tr_mode(const struct 
intel_mipmap_tree *mt)
 void
 brw_miptree_layout(struct brw_context *brw,
struct intel_mipmap_tree *mt,
-   enum intel_miptree_tiling_mode requested,
uint32_t layout_flags)
 {
const unsigned bpp = mt->cpp * 8;
@@ -852,8 +854,7 @@ brw_miptree_layout(struct brw_context *brw,
   !(layout_flags & MIPTREE_LAYOUT_FOR_BO) &&
   !mt->compressed &&
   _mesa_is_format_color_format(mt->format) &&
-  (requested == INTEL_MIPTREE_TILING_Y ||
-   requested == INTEL_MIPTREE_TILING_ANY) &&
+  (layout_flags & MIPTREE_LAYOUT_ALLOC_YTILED) &&
   (bpp && is_power_of_two(bpp)) &&
   /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling
* disabled at the moment.
@@ -897,7 +898,7 @@ brw_miptree_layout(struct brw_context *brw,
   if (layout_flags & MIPTREE_LAYOUT_FOR_BO)
  break;
 
-  mt->tiling = brw_miptree_choose_tiling(brw, requested, mt);
+  mt->tiling = brw_miptree_choose_tiling(brw, mt, layout_flags);
   if (is_tr_mode_yf_ys_allowed) {
  if (intel_miptree_can_use_tr_mode(mt))
 break;
diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c 
b/src/mesa/drivers/dri/i965/intel_fbo.c
index 05e3f8b..26f895b 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -1022,6 +1022,9 @@ intel_renderbuffer_move_to_temp(struct brw_context *brw,
struct intel_mipmap_tree *new_mt;
int width, height, depth;
 
+   uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD |
+   MIPTREE_LAYOUT_ALLOC_ANY_TILED;
+
intel_miptree_get_dimensions_for_image(rb->TexImage, &width, &height, 
&depth);
 
new_mt = intel_miptree_create(brw, rb->TexImage->TexObject->Target,
@@ -1030,8 +1033,7 @@ intel_renderbuffer_move_to_temp(struct brw_context *brw,
  intel_image->base.Base.Level,
  width, height, depth,
  irb->mt->num_samples,
- INTEL_MIPTREE_TILING_ANY,
-

[Mesa-dev] [PATCH 4/5] i965/miptree: Shortcircuit writable & compressed miptrees

2015-07-14 Thread Ben Widawsky

If I am reading the code correctly, writable mappings and mappings for
compressed miptrees will always end up using calling map_gtt.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index b5cd6a0..2788270 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2701,6 +2701,9 @@ intel_miptree_map(struct brw_context *brw,
} else if (mt->stencil_mt && !(mode & BRW_MAP_DIRECT_BIT)) {
   intel_miptree_map_depthstencil(brw, mt, map, level, slice);
   goto done;
+   } else if (mode & GL_MAP_WRITE_BIT || mt->compressed) {
+  intel_miptree_map_gtt(brw, mt, map, level, slice);
+  goto done;
}
 
if (use_intel_mipree_map_blit(brw, mt, mode, level, slice)) {
@@ -2712,6 +2715,8 @@ intel_miptree_map(struct brw_context *brw,
   intel_miptree_map_movntdqa(brw, mt, map, level, slice);
 #endif
} else {
+  assert(mode & GL_MAP_WRITE_BIT == 0);
+  assert(!mt->compressed);
   intel_miptree_map_gtt(brw, mt, map, level, slice);
}
 
-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/5] [RFCish] Rework the miptree mapping logic

2015-07-14 Thread Ben Widawsky

A few patches which cleanup the miptree mapping logic. I haven't heavily tested
them yet, which is why I added the RFC label (admittedly I totally ignored the
unmap path, I hope it just works, but I'll find out). In particular, the last
patch will have a much full list of tests and platforms before I'd push. I
wanted to check if people were generally okay with this before I spent more time
since the one time I posted a smaller cleanup in this area, I met with some
opposition.

The first patch is a resend with a non-trivial rebase after Anuj landed the
Yf/Ys initial support. It's somewhat unrelated, but I really like the patch, and
it goes with the general cleanup mantra.

The motivation of the patch series is actually trying to test out my new-blits
patches (the series that enables Y-tiling and blit capability for large texture
arrays). I was trying to force different mapping types based on various
attributes of the miptree, but found it almost impossible to predict what the
existing logic would actually choose. The goal was was both improving the
clarify of the code, and making it easier to experiment with other paths.

I've written a similar patch series about 3 times now and I figured I should
finally send it out, since most people I asked think it's a good idea (and none
of them has said it's a bad idea).


Ccing all the people that I've discussed this with...
Cc: Jason Ekstrand 
Cc: Chad Versace 
Cc: Kenneth Graunke 
Cc: Anuj Phogat 

Ben Widawsky (5):
  i965: Push miptree tiling request into flags
  i965/miptree: Cleanup some of the miptree map logic
  i965/miptree: Separate special miptree mappings
  i965/miptree: Shortcircuit writable & compressed miptrees
  i965/miptree: Rewrite the miptree map logic

 src/mesa/drivers/dri/i965/brw_tex_layout.c |  21 ++--
 src/mesa/drivers/dri/i965/intel_blit.c |  13 +++
 src/mesa/drivers/dri/i965/intel_blit.h |   3 +
 src/mesa/drivers/dri/i965/intel_fbo.c  |   6 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 133 ++---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h  |  15 ++-
 src/mesa/drivers/dri/i965/intel_tex.c  |   2 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c|   3 +-
 src/mesa/drivers/dri/i965/intel_tex_validate.c |   5 +-
 9 files changed, 118 insertions(+), 83 deletions(-)

-- 
2.4.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] i965/miptree: Rewrite the miptree map logic

2015-07-14 Thread Ben Widawsky

This patch rewrites the logic for determining which method we using for mapping
a miptree. It is my intention that that this patch, the required patches before
this do not change functionality, or if they do, it's in very obscure an
unobservable cases.

I have two reasons why I decided to write this patch. The existing logic was way
too tricky. In particular, the way in which it evaluated which operation to use
was out of order - specifically when it checked to use the blitter in
use_intel_mipree_map_blit(), part of the check is to determine if it will later
be unable to use the GTT. The other reason is to make playing with the various
operations much easier. For example, there are some theories being thrown around
that we might actually want to use the blitter where we use the GTT today, and
vice versa. After this patch, benchmarking those changes is much more
straightforward.

It's pretty difficult for me to prove there is no real change going on. I ran a
subset of my benchmarks on this though. The following benchmarks show no perf
difference on BDW with ministat with n=5 and CI=.95:
OglBatch7
OglDeferred
OglFillPixel
OglGeomPoint
OglGeomTriList
OglHdrBloom
OglPSBump2
OglPSPhong
OglPSPom
OglShMapPcf
OglTerrainFlyInst
OglTexMem512
OglVSDiffuse8
OglVSInstancing
OglZBuffer
plot3d
trex

It's important to point out that much of the changes effect non-LLC platform,
and I do not yet have data for that. I'll be collecting it over the next few
days, but I figure this patch can get some comments meanwhile.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 76 +--
 1 file changed, 37 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 2788270..545fbf3 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2283,6 +2283,8 @@ intel_miptree_unmap_movntdqa(struct brw_context *brw,
map->buffer = NULL;
map->ptr = NULL;
 }
+#else
+#define intel_miptree_map_movntdqa(x,y,z,w,a) abort()
 #endif
 
 static void
@@ -2621,36 +2623,6 @@ can_blit_slice(struct brw_context *brw,
return true;
 }
 
-static bool
-use_intel_mipree_map_blit(struct brw_context *brw,
-  struct intel_mipmap_tree *mt,
-  GLbitfield mode,
-  unsigned int level,
-  unsigned int slice)
-{
-   if (brw->has_llc &&
-   !(mode & GL_MAP_WRITE_BIT) &&
-   can_blit_slice(brw, mt, level, slice))
-  return true;
-
-   if (mt->tiling != I915_TILING_NONE &&
-   mt->bo->size >= brw->max_gtt_map_object_size) {
-  /* XXX: This assertion is actually the final condition for platforms
-   * without SSE4.1.  Returning false is not the right thing to do with
-   * the current code. On those platforms, the goal of this function is to 
give
-   * preference to the GTT, and at this point we've determined we cannot 
use
-   * the GTT, and we cannot blit, so we are out of options.
-   *
-   * NOTE: It should be possible to actually handle the case, but AFAIK, we
-   * never get this assertion.
-   */
-  assert(can_blit_slice(brw, mt, level, slice));
-  return true;
-   }
-
-   return false;
-}
-
 /**
  * Parameter \a out_stride has type ptrdiff_t not because the buffer stride may
  * exceed 32 bits but to diminish the likelihood subtle bugs in pointer
@@ -2706,18 +2678,44 @@ intel_miptree_map(struct brw_context *brw,
   goto done;
}
 
-   if (use_intel_mipree_map_blit(brw, mt, mode, level, slice)) {
-  intel_miptree_map_blit(brw, mt, map, level, slice);
+   /* First determine what the available option are, then pick from the best
+* option based on the platform.
+*/
+   bool can_hw_blit = can_blit_slice(brw, mt, level, slice);
+   bool can_use_gtt = mt->bo->size < brw->max_gtt_map_object_size;
 #if defined(USE_SSE41)
-   } else if (!(mode & GL_MAP_WRITE_BIT) &&
-  !mt->compressed && cpu_has_sse4_1 &&
-  (mt->pitch % 16 == 0)) {
-  intel_miptree_map_movntdqa(brw, mt, map, level, slice);
+   bool can_stream_map = cpu_has_sse4_1 && mt->pitch % 16 == 0;
+#else
+   bool can_stream_map = false;
 #endif
-   } else {
-  assert(mode & GL_MAP_WRITE_BIT == 0);
-  assert(!mt->compressed);
+
+   if (can_stream_map) {
+  /* BENCHMARK_ME: GTT maps for non-llc */
+  intel_miptree_map_movntdqa(brw, mt, map, level, slice);
+  goto done;
+   }
+
+   /*
+* Hopefully we've been able to use the streaming copy, but if we really
+* can't, make a decision based on the two things we know to matter: tiling,
+* and LLC.
+*
+* The general thinking is that with shared cache, doing software detiling
+* is

Re: [Mesa-dev] [PATCH 02/10] i965: Reduce the scope of input in buffer tex setup

2015-07-14 Thread Ben Widawsky

On Wed, Jul 01, 2015 at 02:46:32PM +0300, Topi Pohjolainen wrote:
> Signed-off-by: Topi Pohjolainen 

I swear I am not trying to nitpick but I don't actually understand what your
goal of the patch is. Could you maybe elaborate a bit on what "reduce the scope
of input in.."

> ---
>  src/mesa/drivers/dri/i965/brw_context.h   | 4 ++--
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 8 +++-
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 2 +-
>  src/mesa/drivers/dri/i965/gen8_surface_state.c| 2 +-
>  4 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index ae29798..da018bf 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1710,8 +1710,8 @@ void brw_create_constant_surface(struct brw_context 
> *brw,
>   uint32_t size,
>   uint32_t *out_offset,
>   bool dword_pitch);
> -void brw_update_buffer_texture_surface(struct gl_context *ctx,
> -   unsigned unit,
> +void brw_update_buffer_texture_surface(struct brw_context *brw,
> +   struct gl_texture_object *tObj,
> uint32_t *surf_offset);
>  void
>  brw_update_sol_surface(struct brw_context *brw,
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 72aad96..73aa719 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -276,12 +276,10 @@ gen4_emit_buffer_surface_state(struct brw_context *brw,
>  }
>  
>  void
> -brw_update_buffer_texture_surface(struct gl_context *ctx,
> -  unsigned unit,
> +brw_update_buffer_texture_surface(struct brw_context *brw,
> +  struct gl_texture_object *tObj,
>uint32_t *surf_offset)
>  {
> -   struct brw_context *brw = brw_context(ctx);
> -   struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current;
> struct intel_buffer_object *intel_obj =
>intel_buffer_object(tObj->BufferObject);
> uint32_t size = tObj->BufferSize;
> @@ -323,7 +321,7 @@ brw_update_texture_surface(struct gl_context *ctx,
>  
> /* BRW_NEW_TEXTURE_BUFFER */
> if (tObj->Target == GL_TEXTURE_BUFFER) {
> -  brw_update_buffer_texture_surface(ctx, unit, surf_offset);
> +  brw_update_buffer_texture_surface(brw, tObj, surf_offset);
>return;
> }
>  
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 494bc22..6aa8299 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -357,7 +357,7 @@ gen7_update_texture_surface(struct gl_context *ctx,
> struct gl_texture_object *obj = ctx->Texture.Unit[unit]._Current;
>  
> if (obj->Target == GL_TEXTURE_BUFFER) {
> -  brw_update_buffer_texture_surface(ctx, unit, surf_offset);
> +  brw_update_buffer_texture_surface(brw, obj, surf_offset);
>  
> } else {
>struct intel_texture_object *intel_obj = intel_texture_object(obj);
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index c595ec3..11defd1 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -308,7 +308,7 @@ gen8_update_texture_surface(struct gl_context *ctx,
> struct gl_texture_object *obj = ctx->Texture.Unit[unit]._Current;
>  
> if (obj->Target == GL_TEXTURE_BUFFER) {
> -  brw_update_buffer_texture_surface(ctx, unit, surf_offset);
> +  brw_update_buffer_texture_surface(brw, obj, surf_offset);
>  
> } else {
>struct gl_texture_image *firstImage = obj->Image[0][obj->BaseLevel];
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] [RFCish] Rework the miptree mapping logic

2015-07-14 Thread Ben Widawsky

On Tue, Jul 14, 2015 at 09:56:08AM -0700, Ben Widawsky wrote:
> A few patches which cleanup the miptree mapping logic. I haven't heavily 
> tested
> them yet, which is why I added the RFC label (admittedly I totally ignored the
> unmap path, I hope it just works, but I'll find out). In particular, the last
> patch will have a much full list of tests and platforms before I'd push. I
> wanted to check if people were generally okay with this before I spent more 
> time
> since the one time I posted a smaller cleanup in this area, I met with some
> opposition.
> 

FWIW, it did well on the piglit run, which for me does still exclude the
different (non-LLC) platforms

http://otc-mesa-ci.jf.intel.com/job/bwidawsk/185/

[snip]

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Register spilling issues in the NIR->vec4 backend

2015-07-15 Thread Ben Widawsky

On Wed, Jul 15, 2015 at 11:02:03AM -0700, Connor Abbott wrote:
> On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral  wrote:
> > Hi,
> >
> > when we sent the patches for the new nir->vec4 backend we mentioned that
> > we had a few dEQP tests that would fail to link because of register
> > spilling. Now that we have added GS support we see a few instances of
> > this problem popping up in a few GS piglit tests too, for example this
> > one:
> >
> > tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test
> >
> > I have been looking into what is going on with these tests and I came to
> > the conclusion that the problem is a consequence of various factors, but
> > probably the main thing contributing to it is the way our SSA pass
> > works. That said, I am not that experienced with NIR, so it could also
> > be that my analysis is missing something and I am just arriving to wrong
> > conclusions, so I'll explain my thoughts below and hopefully someone
> > else with more NIR experience can jump in and confirm or reject my
> > analysis.
> >
> > The GS code in that test looks like this:
> >
> > for (int p = 0; p < 3; p++) {
> >color = ((index >= ins[p].m1.length() ?
> > ins[p].m2[index-ins[p].m1.length()] :
> > ins[p].m1[index]) == expect) ?
> >vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0);
> >gl_Position = gl_in[p].gl_Position;
> >EmitVertex();
> > }
> >
> > One thing that is immediately contributing to the register pressure is
> > some really awful code generated because of the indirect array indexing
> > on the inputs inside the loop. This is because of the
> > lower_variable_index_to_cond_assign lowering pass called from
> > brw_shader.cpp. This pass will convert that color assignment into a
> > bunch of nested if/else statements which makes the generated GLSL IR
> > code rather large, involving plenty of temporaries too. This is only
> > made worse by the fact that loop unrolling will replicate that 3 times.
> > The result is a huge pile of GLSL IR with a few dozens of nested if/else
> > statements and temporaries that looks like [1] (that is only a fragment
> > of the GLSL IR).
> >
> > One thing that is particularly relevant in that code is that it has
> > multiple conditional assignments to the same variable
> > (dereference_array_value) as a consequence of this lowering pass.
> >
> > That much, however, is common to the NIR and non-NIR paths. The problem
> > in the NIR case is that all these assignments generate new SSA values,
> > which then become new registers in the final NIR form. This leads to NIR
> > code like [2].  In contrast, the old vec4 visitor path, is able to have
> > writes to the same variable write to the same register.
> >
> > As a result, if I print the code right before register allocation in the
> > NIR path [3] and I compare that to what we get with the old vec4 visitor
> > path at that same point [4], it is clearly visible that this difference
> > is allowing the vec4 visitor path to reduce register pressure (see how
> > in [4] we have multiple writes to vgrf5, while in [3] we always write to
> > a new vgrf every time).
> >
> > So, am I missing something or is this kind of result expected with NIR
> > programs? Is there anything in the nir->vec4 pass that we can do to fix
> > this or does this need to be fixed when going out of SSA moe inside NIR?
> >
> > Iago
> >
> > [1] http://pastebin.com/5uA8ex2S
> > [2] http://pastebin.com/pqLfvAVN
> > [3] http://pastebin.com/64nSuUH8
> > [4] http://pastebin.com/WCrdYxzt
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> Hi Iago,
> 
> Indeed, NIR does convert conditional writes to conditional selectss --
> it's a required part of the conversion to SSA, and since our HW has a
> conditional select instruction that's just as fast as doing a
> conditional move, we haven't bothered much to try and change it back
> during out-of-SSA. However, doing this shouldn't make things worse. In
> your example, vgrf9, vgrf15, and vgrf17 all have very short live
> intervals and don't interfere with vgrf11 (unless there's another use
> of them somewhere after the snippet you pasted), which means that the
> register allocator is free to allocate the destinations of all the
> selects to the same register.
> 
> What's happening, though, is that you're running into our terrible
> liveness analysis. After doing the proper liveness analysis, we figure
> out the place each register first becomes live and last becomes dead,
> and then we consider registers that have overlapping ranges to
> interfere. So we consider vgrf11 to interfere with vgrf15 and vgrf17,
> even though it really doesn't. The trouble with making it do the right
> thing is that we may actually need to extend the live ranges of
> registers when the exec masks don't match up, either because one

Re: [Mesa-dev] [PATCH 5/5] i965/miptree: Rewrite the miptree map logic

2015-07-17 Thread Ben Widawsky

On Thu, Jul 16, 2015 at 01:45:56PM -0700, Chad Versace wrote:
> On Tue 14 Jul 2015, Ben Widawsky wrote:
> > This patch rewrites the logic for determining which method we using for 
> > mapping
> > a miptree. It is my intention that that this patch, the required patches 
> > before
> > this do not change functionality, or if they do, it's in very obscure an
> > unobservable cases.
> > 
> > I have two reasons why I decided to write this patch. The existing logic 
> > was way
> > too tricky. In particular, the way in which it evaluated which operation to 
> > use
> > was out of order - specifically when it checked to use the blitter in
> > use_intel_mipree_map_blit(), part of the check is to determine if it will 
> > later
> > be unable to use the GTT. The other reason is to make playing with the 
> > various
> > operations much easier. For example, there are some theories being thrown 
> > around
> > that we might actually want to use the blitter where we use the GTT today, 
> > and
> > vice versa. After this patch, benchmarking those changes is much more
> > straightforward.
> > 
> > It's pretty difficult for me to prove there is no real change going on. I 
> > ran a
> > subset of my benchmarks on this though. The following benchmarks show no 
> > perf
> > difference on BDW with ministat with n=5 and CI=.95:
> > OglBatch7
> > OglDeferred
> > OglFillPixel
> > OglGeomPoint
> > OglGeomTriList
> > OglHdrBloom
> > OglPSBump2
> > OglPSPhong
> > OglPSPom
> > OglShMapPcf
> > OglTerrainFlyInst
> > OglTexMem512
> > OglVSDiffuse8
> > OglVSInstancing
> > OglZBuffer
> > plot3d
> > trex
> > 
> > It's important to point out that much of the changes effect non-LLC 
> > platform,
> > and I do not yet have data for that. I'll be collecting it over the next few
> > days, but I figure this patch can get some comments meanwhile.
> > 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 76 
> > +--
> >  1 file changed, 37 insertions(+), 39 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > index 2788270..545fbf3 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > @@ -2283,6 +2283,8 @@ intel_miptree_unmap_movntdqa(struct brw_context *brw,
> > map->buffer = NULL;
> > map->ptr = NULL;
> >  }
> > +#else
> > +#define intel_miptree_map_movntdqa(x,y,z,w,a) abort()
> >  #endif
> >  
> >  static void
> > @@ -2621,36 +2623,6 @@ can_blit_slice(struct brw_context *brw,
> > return true;
> >  }
> >  
> > -static bool
> > -use_intel_mipree_map_blit(struct brw_context *brw,
> > -  struct intel_mipmap_tree *mt,
> > -  GLbitfield mode,
> > -  unsigned int level,
> > -  unsigned int slice)
> > -{
> > -   if (brw->has_llc &&
> > -   !(mode & GL_MAP_WRITE_BIT) &&
> > -   can_blit_slice(brw, mt, level, slice))
> > -  return true;
> > -
> > -   if (mt->tiling != I915_TILING_NONE &&
> > -   mt->bo->size >= brw->max_gtt_map_object_size) {
> > -  /* XXX: This assertion is actually the final condition for platforms
> > -   * without SSE4.1.  Returning false is not the right thing to do with
> > -   * the current code. On those platforms, the goal of this function 
> > is to give
> > -   * preference to the GTT, and at this point we've determined we 
> > cannot use
> > -   * the GTT, and we cannot blit, so we are out of options.
> > -   *
> > -   * NOTE: It should be possible to actually handle the case, but 
> > AFAIK, we
> > -   * never get this assertion.
> > -   */
> > -  assert(can_blit_slice(brw, mt, level, slice));
> > -  return true;
> > -   }
> > -
> > -   return false;
> > -}
> > -
> >  /**
> >   * Parameter \a out_stride has type ptrdiff_t not because the buffer 
> > stride may
> >   * exceed 32 bits but to diminish the likelihood subtle bugs in pointer
> > @@ -2706,18 +2678,44 @@ intel_miptree_map(struct brw_context *brw,
> >goto done;
> > }
> >  
> > -   if (use_intel_mipree_map_b

Re: [Mesa-dev] [PATCH 5/5] i965/miptree: Rewrite the miptree map logic

2015-07-17 Thread Ben Widawsky

On Thu, Jul 16, 2015 at 03:06:48PM -0700, Matt Turner wrote:
> On Tue, Jul 14, 2015 at 9:56 AM, Ben Widawsky
>  wrote:
> > This patch rewrites the logic for determining which method we using for 
> > mapping
> > a miptree. It is my intention that that this patch, the required patches 
> > before
> > this do not change functionality, or if they do, it's in very obscure an
> > unobservable cases.
> >
> > I have two reasons why I decided to write this patch. The existing logic 
> > was way
> > too tricky. In particular, the way in which it evaluated which operation to 
> > use
> > was out of order - specifically when it checked to use the blitter in
> > use_intel_mipree_map_blit(), part of the check is to determine if it will 
> > later
> > be unable to use the GTT. The other reason is to make playing with the 
> > various
> > operations much easier. For example, there are some theories being thrown 
> > around
> > that we might actually want to use the blitter where we use the GTT today, 
> > and
> > vice versa. After this patch, benchmarking those changes is much more
> > straightforward.
> >
> > It's pretty difficult for me to prove there is no real change going on. I 
> > ran a
> > subset of my benchmarks on this though. The following benchmarks show no 
> > perf
> > difference on BDW with ministat with n=5 and CI=.95:
> > OglBatch7
> > OglDeferred
> > OglFillPixel
> > OglGeomPoint
> > OglGeomTriList
> > OglHdrBloom
> > OglPSBump2
> > OglPSPhong
> > OglPSPom
> > OglShMapPcf
> > OglTerrainFlyInst
> > OglTexMem512
> > OglVSDiffuse8
> > OglVSInstancing
> > OglZBuffer
> > plot3d
> > trex
> >
> > It's important to point out that much of the changes effect non-LLC 
> > platform,
> 
> s/effect/affect/
> 
> > and I do not yet have data for that. I'll be collecting it over the next few
> > days, but I figure this patch can get some comments meanwhile.
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 76 
> > +--
> >  1 file changed, 37 insertions(+), 39 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > index 2788270..545fbf3 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > @@ -2283,6 +2283,8 @@ intel_miptree_unmap_movntdqa(struct brw_context *brw,
> > map->buffer = NULL;
> > map->ptr = NULL;
> >  }
> > +#else
> > +#define intel_miptree_map_movntdqa(x,y,z,w,a) abort()
> 
> Yuck.
> 
> >  #endif
> >
> >  static void
> > @@ -2621,36 +2623,6 @@ can_blit_slice(struct brw_context *brw,
> > return true;
> >  }
> >
> > -static bool
> > -use_intel_mipree_map_blit(struct brw_context *brw,
> > -  struct intel_mipmap_tree *mt,
> > -  GLbitfield mode,
> > -  unsigned int level,
> > -  unsigned int slice)
> > -{
> > -   if (brw->has_llc &&
> > -   !(mode & GL_MAP_WRITE_BIT) &&
> > -   can_blit_slice(brw, mt, level, slice))
> > -  return true;
> > -
> > -   if (mt->tiling != I915_TILING_NONE &&
> > -   mt->bo->size >= brw->max_gtt_map_object_size) {
> > -  /* XXX: This assertion is actually the final condition for platforms
> > -   * without SSE4.1.  Returning false is not the right thing to do with
> > -   * the current code. On those platforms, the goal of this function 
> > is to give
> > -   * preference to the GTT, and at this point we've determined we 
> > cannot use
> > -   * the GTT, and we cannot blit, so we are out of options.
> > -   *
> > -   * NOTE: It should be possible to actually handle the case, but 
> > AFAIK, we
> > -   * never get this assertion.
> > -   */
> > -  assert(can_blit_slice(brw, mt, level, slice));
> > -  return true;
> > -   }
> > -
> > -   return false;
> > -}
> > -
> >  /**
> >   * Parameter \a out_stride has type ptrdiff_t not because the buffer 
> > stride may
> >   * exceed 32 bits but to diminish the likelihood subtle bugs in pointer
> > @@ -2706,18 +2678,44 @@ intel_miptree_map(struct brw_context *brw,
> >goto done;

Re: [Mesa-dev] Register spilling issues in the NIR->vec4 backend

2015-07-20 Thread Ben Widawsky

On Mon, Jul 20, 2015 at 03:35:26PM +0200, Iago Toral wrote:
> Hi,
> On Thu, 2015-07-16 at 08:15 -0700, Jason Ekstrand wrote:
> > 
> > On Jul 15, 2015 11:20 PM, "Iago Toral"  wrote:
> > >
> > > On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote:
> > > > On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral 
> > wrote:
> > > > > Hi,
> > > > >
> > > > > when we sent the patches for the new nir->vec4 backend we
> > mentioned that
> > > > > we had a few dEQP tests that would fail to link because of
> > register
> > > > > spilling. Now that we have added GS support we see a few
> > instances of
> > > > > this problem popping up in a few GS piglit tests too, for
> > example this
> > > > > one:
> > > > >
> > > > >
> > tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test
> > > > >
> > > > > I have been looking into what is going on with these tests and I
> > came to
> > > > > the conclusion that the problem is a consequence of various
> > factors, but
> > > > > probably the main thing contributing to it is the way our SSA
> > pass
> > > > > works. That said, I am not that experienced with NIR, so it
> > could also
> > > > > be that my analysis is missing something and I am just arriving
> > to wrong
> > > > > conclusions, so I'll explain my thoughts below and hopefully
> > someone
> > > > > else with more NIR experience can jump in and confirm or reject
> > my
> > > > > analysis.
> > > > >
> > > > > The GS code in that test looks like this:
> > > > >
> > > > > for (int p = 0; p < 3; p++) {
> > > > >color = ((index >= ins[p].m1.length() ?
> > > > > ins[p].m2[index-ins[p].m1.length()] :
> > > > > ins[p].m1[index]) == expect) ?
> > > > >vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0,
> > 1.0);
> > > > >gl_Position = gl_in[p].gl_Position;
> > > > >EmitVertex();
> > > > > }
> > > > >
> > > > > One thing that is immediately contributing to the register
> > pressure is
> > > > > some really awful code generated because of the indirect array
> > indexing
> > > > > on the inputs inside the loop. This is because of the
> > > > > lower_variable_index_to_cond_assign lowering pass called from
> > > > > brw_shader.cpp. This pass will convert that color assignment
> > into a
> > > > > bunch of nested if/else statements which makes the generated
> > GLSL IR
> > > > > code rather large, involving plenty of temporaries too. This is
> > only
> > > > > made worse by the fact that loop unrolling will replicate that 3
> > times.
> > > > > The result is a huge pile of GLSL IR with a few dozens of nested
> > if/else
> > > > > statements and temporaries that looks like [1] (that is only a
> > fragment
> > > > > of the GLSL IR).
> > > > >
> > > > > One thing that is particularly relevant in that code is that it
> > has
> > > > > multiple conditional assignments to the same variable
> > > > > (dereference_array_value) as a consequence of this lowering
> > pass.
> > > > >
> > > > > That much, however, is common to the NIR and non-NIR paths. The
> > problem
> > > > > in the NIR case is that all these assignments generate new SSA
> > values,
> > > > > which then become new registers in the final NIR form. This
> > leads to NIR
> > > > > code like [2].  In contrast, the old vec4 visitor path, is able
> > to have
> > > > > writes to the same variable write to the same register.
> > > > >
> > > > > As a result, if I print the code right before register
> > allocation in the
> > > > > NIR path [3] and I compare that to what we get with the old vec4
> > visitor
> > > > > path at that same point [4], it is clearly visible that this
> > difference
> > > > > is allowing the vec4 visitor path to reduce register pressure
> > (see how
> > > > > in [4] we have multiple writes to vgrf5, while in [3] we always
> > write to
> > > > > a new vgrf every time).
> > > > >
> > > > > So, am I missing something or is this kind of result expected
> > with NIR
> > > > > programs? Is there anything in the nir->vec4 pass that we can do
> > to fix
> > > > > this or does this need to be fixed when going out of SSA moe
> > inside NIR?
> > > > >
> > > > > Iago
> > > > >
> > > > > [1] http://pastebin.com/5uA8ex2S
> > > > > [2] http://pastebin.com/pqLfvAVN
> > > > > [3] http://pastebin.com/64nSuUH8
> > > > > [4] http://pastebin.com/WCrdYxzt
> > > > >
> > > > > ___
> > > > > mesa-dev mailing list
> > > > > mesa-dev@lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >
> > > > Hi Iago,
> > > >
> > > > Indeed, NIR does convert conditional writes to conditional
> > selectss --
> > > > it's a required part of the conversion to SSA, and since our HW
> > has a
> > > > conditional select instruction that's just as fast as doing a
> > > > conditional move, we haven't bothered much to try and change it
> > back
> > > > during out-of-SSA. However, doing this shouldn't make things
> > worse. In
> > > > your example, vgrf9, v

[Mesa-dev] [PATCH 10/10] i965/gen9: Support fast clears for 32b float

2015-10-13 Thread Ben Widawsky

SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).

This is enabled separate because it is a new feature to SKL and so it might have
some issues.

NOTE: This patch has 2 regressions with 16F_LUMINANCE and 16F_INTENSITY which
needs to be resolved before merging. The rest of the test suites are happy.
./bin/ext_framebuffer_multisample-formats [2468]
...
Testing GL_LUMINANCE16F_ARB
Probe at (0,0)
  Expected: 0.00
  Observed: 0.50
Probe at (0,0)
  Expected: 0.00
  Observed: 0.50

Not-Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 ++--
 src/mesa/drivers/dri/i965/gen8_surface_state.c  | 8 
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 9c51ffb..aa36794 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -360,8 +360,12 @@ is_color_fast_clear_compatible(struct brw_context *brw,
}
 
for (int i = 0; i < 4; i++) {
-  if (color->f[i] != 0.0f && color->f[i] != 1.0f &&
-  _mesa_format_has_color_component(format, i)) {
+  if (!_mesa_format_has_color_component(format, i)) {
+ continue;
+  }
+
+  if (brw->gen < 9 &&
+  color->f[i] != 0.0f && color->f[i] != 1.0f) {
  return false;
   }
}
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index b19b492..ca0cedc 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -188,14 +188,6 @@ gen8_emit_fast_clear_color(struct brw_context *brw,
uint32_t *surf)
 {
if (brw->gen >= 9) {
-#define check_fast_clear_val(x) \
-  assert(mt->gen9_fast_clear_color.f[x] == 0.0 || \
- mt->gen9_fast_clear_color.f[x] == 1.0)
-  check_fast_clear_val(0);
-  check_fast_clear_val(1);
-  check_fast_clear_val(2);
-  check_fast_clear_val(3);
-#undef check_fast_clear_val
   surf[12] = mt->gen9_fast_clear_color.ui[0];
   surf[13] = mt->gen9_fast_clear_color.ui[1];
   surf[14] = mt->gen9_fast_clear_color.ui[2];
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/10] i965/gen8+: Remove redundant zeroing of surface state

2015-10-13 Thread Ben Widawsky

The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).

NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 18b8665..eaaecd3 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -284,8 +284,6 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
 SET_FIELD((aux_mt->pitch / tile_w) - 1,
   GEN8_SURFACE_AUX_PITCH) |
 aux_mode;
-   } else {
-  surf[6] = 0;
}
 
surf[7] = mt->fast_clear_color_value |
@@ -302,11 +300,7 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
   aux_mt->bo, 0,
   I915_GEM_DOMAIN_SAMPLER,
   (rw ? I915_GEM_DOMAIN_SAMPLER : 0));
-   } else {
-  surf[10] = 0;
-  surf[11] = 0;
}
-   surf[12] = 0;
 
/* Emit relocation to surface contents */
drm_intel_bo_emit_reloc(brw->batch.bo,
@@ -514,8 +508,6 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
 SET_FIELD((aux_mt->pitch / tile_w) - 1,
   GEN8_SURFACE_AUX_PITCH) |
 aux_mode;
-   } else {
-  surf[6] = 0;
}
 
surf[7] = mt->fast_clear_color_value |
@@ -533,11 +525,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
   offset + 10 * 4,
   aux_mt->bo, 0,
   I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);
-   } else {
-  surf[10] = 0;
-  surf[11] = 0;
}
-   surf[12] = 0;
 
drm_intel_bo_emit_reloc(brw->batch.bo,
offset + 8 * 4,
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-13 Thread Ben Widawsky

The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch). For me, this patch was at some point critical
for getting stable piglit results (though this did not seem to be the case on a
branch Chad was working on).

It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 +
 1 file changed, 84 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 7bf52f0..9e6711e 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
 }
 
+/**
+ * Individually fast clear each color buffer attachment. On previous gens this
+ * isn't required. The motivation for this comes from one line (which seems to
+ * be specific to SKL+). The list item is in section titled _MCS Buffer for
+ * Render Target(s)_
+ *
+ *   "Since only one RT is bound with a clear pass, only one RT can be cleared
+ *   at a time. To clear multiple RTs, multiple clear passes are required."
+ *
+ * The code follows the same idea as the resolve code which creates a fake FBO
+ * to avoid interfering with too much of the GL state.
+ */
+static void
+fast_clear_attachments(struct brw_context *brw,
+   struct gl_framebuffer *fb,
+   uint32_t fast_clear_buffers,
+   struct rect fast_clear_rect)
+{
+   assert(brw->gen >= 9);
+   struct gl_context *ctx = &brw->ctx;
+
+   GLuint old_fb = ctx->DrawBuffer->Name;
+
+   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
+  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
+  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
+  GLuint fbo, rbo;
+  int index = fb->_ColorDrawBufferIndexes[buf];
+
+  if (!((1 << index) & fast_clear_buffers))
+ continue;
+
+  _mesa_GenFramebuffers(1, &fbo);
+  rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false);
+
+  _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
+  _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER,
+GL_COLOR_ATTACHMENT0,
+GL_RENDERBUFFER, rbo);
+  _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);
+
+  brw_fast_clear_init(brw);
+
+  use_rectlist(brw, true);
+
+  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+
+  /* SKL+ also has a resolve mode for compressed render targets and thus 
more
+   * bits to let us select the type of resolve.  For fast clear resolves, 
it
+   * turns out we can use the same value as pre-SKL though.
+   */
+  set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
+  brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
+  set_fast_clear_op(brw, 0);
+  use_rectlist(brw, false);
+
+  _mesa_DeleteRenderbuffers(1, &rbo);
+  _mesa_DeleteFramebuffers(1, &fbo);
+
+  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
+   * resolve them eventually.
+   */
+  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
+   }
+
+   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb);
+}
+
 bool
 brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb,
 GLbitfield buffers, bool partial_clear)
@@ -600,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
use_rectlist(brw, true);
 
layers = MAX2(1, fb->MaxNumLayers);
-   if (fast_clear_buffers) {
+
+   if (brw->gen >= 9 && fast_clear_buffers) {
+  fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
+   } else if (fast_clear_buffers) {
   _mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
   brw_draw_rectlist(ctx, &fast_clear_rect, layers);
   set_fast_clear_op(brw, 0);
+
+  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
+   * resolve them eventually.
+   */
+  for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
+ struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
+ struct intel_renderbuffer *irb = intel_renderbuffer(rb);
+ int index = fb->_ColorDrawBufferIndexes[buf];
+
+ if ((1 << index) & fast_cl

[Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-13 Thread Ben Widawsky

This patch series adds support for fast color clears on SKL as it exists on
previous generations of hardware minus the new hardware restriction on surface
formats. Additionally, it adds support for utilizing clear values with up to 32b
per color channel (see note at the bottom). It is based on work originally done
by Kristian, so thanks to him for that initial work as well as helping me debug
some of the issues.

Additionally, thanks to Chad for helping track down the last bug in the 
rectangle
scaling code which was (for me) being masked by another bug (#3 below). I
imagine it would have been several more weeks at least before I uncovered it.

We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order to
support the 32b per channel. As it turned out though, Skylake made other changes
to support this which caused weird failures which seemed to interfere with
each other.

1. Not all surface formats support lossless compression.
2. Clearing multiple color buffer attachments must happen in n passes
3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
was the bug which Chad helped uncover, I had it correct in my patch from March
http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
had other problems which prevented merge, including #1 and #2 above).

I have no piglit, dEQP or CTS regressions (except for the last patch). I haven't
yet, but will collect perf data on this ASAP. Historically we've come to expect
this to provide large gains in tests which are memory bandwidth limited and
doing many clears.

Ben Widawsky (10):
  i965/gen8+: Remove redundant zeroing of surface state
  i965/gen8+: Extract color clear surface state
  i965/skl: Enable fast color clears on SKL
  i965/skl: skip fast clears for certain surface formats
  i965/meta/gen9: Individually fast clear color attachments
  Revert "i965/gen9: Disable MCS for 1x color surfaces"
  Revert "i965/gen9: Enable rep clears on gen9"
  i965/meta: Assert fast clears and rep clears never overlap
  i965/meta: Remove fast_clear_color variable
  i965/gen9: Support fast clears for 32b float

 src/mesa/drivers/dri/i965/brw_context.h |   1 +
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 ++--
 src/mesa/drivers/dri/i965/brw_surface_formats.c |  27 
 src/mesa/drivers/dri/i965/gen8_surface_state.c  |  48 ---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  20 +--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |   7 +-
 6 files changed, 205 insertions(+), 70 deletions(-)

-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/10] Revert "i965/gen9: Disable MCS for 1x color surfaces"

2015-10-13 Thread Ben Widawsky

This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index f108b75..c723f79 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -207,14 +207,6 @@ intel_miptree_supports_non_msrt_fast_clear(struct 
brw_context *brw,
if (brw->gen < 7)
   return false;
 
-   if (brw->gen >= 9) {
-  /* FINISHME: Enable singlesample fast MCS clears on SKL after all GPU
-   * FINISHME: hangs are resolved.
-   */
-  perf_debug("singlesample fast MCS clears disabled on gen9");
-  return false;
-   }
-
if (mt->disable_aux_buffers)
   return false;
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/10] Revert "i965/gen9: Enable rep clears on gen9"

2015-10-13 Thread Ben Widawsky

This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b.

It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 9e6711e..97094ae 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -537,11 +537,6 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
  clear_type = REP_CLEAR;
 
-  if (brw->gen >= 9 && clear_type == FAST_CLEAR) {
- perf_debug("fast MCS clears are disabled on gen9");
- clear_type = REP_CLEAR;
-  }
-
   /* We can't do scissored fast clears because of the restrictions on the
* fast clear rectangle size.
*/
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL

2015-10-13 Thread Ben Widawsky

Based on a patch originally from Kristian. Skylake has extended capabilities
with regard to fast clears, but that is saved for another patch.

The same effect could be acheived with the following, however I think the way
I've done it is more in line with how the docs explain it.
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
   /* In release builds, fall through */
case I915_TILING_Y:
   *width_px = 32 / mt->cpp;
-  *height = 4;
+  if (brw->gen >= 9)
+ *height = 2;
+  else
+ *height = 4;

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c  | 34 
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  9 +
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |  7 +++-
 4 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index fbde3f0..7bf52f0 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect 
*rect, int num_instances)
 }
 
 static void
-get_fast_clear_rect(struct gl_framebuffer *fb,
+get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb,
 struct intel_renderbuffer *irb, struct rect *rect)
 {
unsigned int x_align, y_align;
@@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
*/
   intel_get_non_msrt_mcs_alignment(irb->mt, &x_align, &y_align);
   x_align *= 16;
-  y_align *= 32;
+
+  /* SKL+ line alignment requirement for Y-tiled are half those of the 
prior
+   * generations.
+   */
+  if (brw->gen >= 9)
+ y_align *= 16;
+  else
+ y_align *= 32;
 
   /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
* Target(s)", beneath the "Fast Color Clear" bullet (p327):
@@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
* terms of (width,height) of the RT.
*
* MSAA  Width of Clear Rect  Height of Clear Rect
+   *  2X Ceil(1/8*width)  Ceil(1/2*height)
*  4X Ceil(1/8*width)  Ceil(1/2*height)
*  8X Ceil(1/2*width)  Ceil(1/2*height)
+   * 16X widthCeil(1/2*height)
*
* The text "with upper left co-ordinate to coincide with actual
* rectangle being cleared" is a little confusing--it seems to imply
@@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
   case 8:
  x_scaledown = 2;
  break;
+  case 16:
+ x_scaledown = 1;
+ break;
   default:
  unreachable("Unexpected sample count for fast clear");
   }
@@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw,
 
 /**
  * Convert the given color to a bitfield suitable for ORing into DWORD 7 of
- * SURFACE_STATE.
+ * SURFACE_STATE (DWORD 12-15 on SKL+).
  */
-static uint32_t
-compute_fast_clear_color_bits(const union gl_color_union *color)
+static void
+set_fast_clear_color(struct brw_context *brw,
+ struct intel_mipmap_tree *mt,
+ const union gl_color_union *color)
 {
-   uint32_t bits = 0;
-   for (int i = 0; i < 4; i++) {
-  /* Testing for non-0 works for integer and float colors */
-  if (color->f[i] != 0.0f)
- bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+   if (brw->gen >= 9) {
+  mt->gen9_fast_clear_color = *color;
+   } else {
+  mt->fast_clear_color_value = 0;
+  for (int i = 0; i < 4; i++) {
+ /* Testing for non-0 works for integer and float colors */
+ if (color->f[i] != 0.0f)
+ mt->fast_clear_color_value |=
+1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+  }
}
-   return bits;
 }
 
 static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
@@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
 
   switch (clear_type) {
   case FAST_CLEAR:
- irb->mt->fast_clear_color_value =
-compute_fast_clear_color_bits(&ctx->Color.ClearColor);
+ set_fast_clear_color(brw, irb->mt, &ctx->Color.ClearColor);
  irb->need_downsample = true;
 
  /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the
@@ -521,7 +538,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
  irb->need_downsample =

[Mesa-dev] [PATCH 02/10] i965/gen8+: Extract color clear surface state

2015-10-13 Thread Ben Widawsky

On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.

Should have no functional impact.

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/gen8_surface_state.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index eaaecd3..e70c15b 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -384,6 +384,12 @@ gen8_emit_null_surface_state(struct brw_context *brw,
  SET_FIELD(height - 1, GEN7_SURFACE_HEIGHT);
 }
 
+static void
+gen8_emit_fast_clear_color(struct intel_mipmap_tree *mt, uint32_t *surf)
+{
+   surf[7] |= mt->fast_clear_color_value;
+}
+
 /**
  * Sets up a surface state structure to point at the given region.
  * While it is only used for the front/back buffer currently, it should be
@@ -510,11 +516,11 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
 aux_mode;
}
 
-   surf[7] = mt->fast_clear_color_value |
- SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
- SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
- SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
- SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A);
+   gen8_emit_fast_clear_color(mt, surf);
+   surf[7] |= SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
+  SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
+  SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
+  SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A);
 
assert(mt->offset % mt->cpp == 0);
*((uint64_t *) &surf[8]) = mt->bo->offset64 + mt->offset; /* reloc */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable

2015-10-13 Thread Ben Widawsky

It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what it's
meant to do).

Cc: Kristian Høgsberg 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 41afc9a..9c51ffb 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw,
}
 }
 
-static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
-
 static void
 set_fast_clear_op(struct brw_context *brw, uint32_t op)
 {
@@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw,
 
   use_rectlist(brw, true);
 
-  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
 
   /* SKL+ also has a resolve mode for compressed render targets and thus 
more
* bits to let us select the type of resolve.  For fast clear resolves, 
it
@@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
} else if (fast_clear_buffers) {
   _mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
-  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
   set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
   brw_draw_rectlist(ctx, &fast_clear_rect, layers);
   set_fast_clear_op(brw, 0);
@@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw,
 
use_rectlist(brw, true);
 
-   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
+   brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
 
/* SKL+ also has a resolve mode for compressed render targets and thus more
 * bits to let us select the type of resolve.  For fast clear resolves, it
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/10] i965/meta: Assert fast clears and rep clears never overlap

2015-10-13 Thread Ben Widawsky

There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.

Cc: Kristian Høgsberg 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 97094ae..41afc9a 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -616,6 +616,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   }
}
 
+   assert((fast_clear_buffers & rep_clear_buffers) == 0);
+
if (!(fast_clear_buffers | rep_clear_buffers)) {
   if (plain_clear_buffers)
  /* If we only have plain clears, skip the meta save/restore. */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-13 Thread Ben Widawsky

Initially I had this planned as a patch to be squashed in to the enabling patch
because there is no point enabling fast clears without this. However, Chad
merged a patch which disables fast clears on gen9 explicitly, and so I can hide
this behind the revert of that patch. This is a nice I really wanted this patch
as a distinct patch for review. This is a new, weird, and poorly documented
restriction for SKL. (In fact, I am still not 100% certain the restriction is
entirely necessary, but there are around 30 piglit regressions without this).

SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface is
a CCS (Color Control Surface) with compression disabled or an MCS with
compression enabled, depending on number of multisamples. MCS (Multisample
Control Surface) is a special type of CCS."

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_context.h |  1 +
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 +
 src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index e59478a..32b8250 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1546,6 +1546,7 @@ struct brw_context
 
uint32_t render_target_format[MESA_FORMAT_COUNT];
bool format_supported_as_render_target[MESA_FORMAT_COUNT];
+   bool losslessly_compressable[MESA_FORMAT_COUNT];
 
/* Interpolation modes, one byte per vue slot.
 * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 97fff60..d706ecc 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
   }
}
 
+   if (brw->gen >= 9) {
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true;
+  brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true;
+   }
+
/* We will check this table for FBO completeness, but the surface format
 * table above only covered color rendering.
 */
diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
b/src/mesa/drivers/dri/i965/gen8_surface_state.c
index 995b4dd..b19b492 100644
--- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
@@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
* "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN
*  16 must be used."
*/
-  if (brw->gen >= 9 || mt->num_samples == 1)
+  if (brw->gen >= 9 || mt->num_samples == 1) {
  assert(mt->halign == 16);
+ assert(mt->num_samples || brw->losslessly_compressable[mt->format] == 
true);
+  }
}
 
const uint32_t surf_type = translate_tex_target(target);
@@ -488,8 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw,
* "When A

Re: [Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable

2015-10-14 Thread Ben Widawsky

On Wed, Oct 14, 2015 at 11:52:03AM +0200, Neil Roberts wrote:
> This patch doesn't look right. See this sentence in “Render Target Fast
> Clear”:
> 
> “The pixel shader kernel requires no attributes, and delivers a value of
>  0x in all channels of the render target write message”
> 
> Presumably the fast_clear_color is trying to implement this restriction.
> 
> Regards,
> - Neil
> 

You're right. Originally the patch only touched the color in the resolve pass,
which I believe doesn't matter (nor does the actual shader we bind). However,
I'm sort of baffled now why I'd see no piglit regressions since the clear color
will *never* be all F.

Either way, I'll drop this patch - but my confusion level has increased.

> Ben Widawsky  writes:
> 
> > It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what 
> > it's
> > meant to do).
> >
> > Cc: Kristian Høgsberg 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++-
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > index 41afc9a..9c51ffb 100644
> > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > @@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw,
> > }
> >  }
> >  
> > -static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
> > -
> >  static void
> >  set_fast_clear_op(struct brw_context *brw, uint32_t op)
> >  {
> > @@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw,
> >  
> >use_rectlist(brw, true);
> >  
> > -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> > +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
> >  
> >/* SKL+ also has a resolve mode for compressed render targets and 
> > thus more
> > * bits to let us select the type of resolve.  For fast clear 
> > resolves, it
> > @@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> > gl_framebuffer *fb,
> >fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect);
> > } else if (fast_clear_buffers) {
> >_mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers);
> > -  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> > +  brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
> >set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
> >brw_draw_rectlist(ctx, &fast_clear_rect, layers);
> >set_fast_clear_op(brw, 0);
> > @@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw,
> >  
> > use_rectlist(brw, true);
> >  
> > -   brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> > +   brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f);
> >  
> > /* SKL+ also has a resolve mode for compressed render targets and thus 
> > more
> >  * bits to let us select the type of resolve.  For fast clear resolves, 
> > it
> > -- 
> > 2.6.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Ben Widawsky

On Wed, Oct 14, 2015 at 02:43:24PM +0300, Pohjolainen, Topi wrote:
> On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> > Ben Widawsky  writes:
> > 
> > > The impetus for this patch comes from a seemingly benign statement within 
> > > the
> > > spec (quoted within the patch). For me, this patch was at some point 
> > > critical
> > > for getting stable piglit results (though this did not seem to be the 
> > > case on a
> > > branch Chad was working on).
> > >
> > > It is very important for clearing multiple color buffer attachments and 
> > > can be
> > > observed in the following piglit tests:
> > > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > >
> > > Signed-off-by: Ben Widawsky 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > > +
> > >  1 file changed, 84 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > index 7bf52f0..9e6711e 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> > > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> > >  }
> > >  
> > > +/**
> > > + * Individually fast clear each color buffer attachment. On previous 
> > > gens this
> > > + * isn't required. The motivation for this comes from one line (which 
> > > seems to
> > > + * be specific to SKL+). The list item is in section titled _MCS Buffer 
> > > for
> > > + * Render Target(s)_
> > > + *
> > > + *   "Since only one RT is bound with a clear pass, only one RT can be 
> > > cleared
> > > + *   at a time. To clear multiple RTs, multiple clear passes are 
> > > required."
> > 
> > This sentence also appears in the HSW PRM so it seems a bit odd if it's
> > only causing problems on SKL. I guess if we get Piglit regressions
> > without it then it makes sense to have the patch. It might be worth just
> > double checking whether this patch is completely necessary. The wording
> > in the commit message seems a little unsure.
> 
> The spec seems to be missing something as the section discussing "Render
> Target Fast Clear" seems to suggest the opposite:
> 
> "The render target(s) is/are bound as they normally would be, with the MCS
>  surface defined in SURFACE_STATE."

I am aware of all this. Neil, yes it is completely necessary for piglit (I don't
know if anything in the real world does this or not).

You are both asking to me to provide something which may be impossible, an
explanation of why the docs and/or hardware are behaving this way. Let me
respond in kind, please provide an alternate patch which fixes:
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear (all subtests)

FWIW Topi, it's also contradicted in 3DSTATE_PS definition.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-14 Thread Ben Widawsky

On Wed, Oct 14, 2015 at 08:04:48PM +0300, Pohjolainen, Topi wrote:
> On Wed, Oct 14, 2015 at 09:54:43AM -0700, Ben Widawsky wrote:
> > On Wed, Oct 14, 2015 at 02:43:24PM +0300, Pohjolainen, Topi wrote:
> > > On Wed, Oct 14, 2015 at 11:39:03AM +0200, Neil Roberts wrote:
> > > > Ben Widawsky  writes:
> > > > 
> > > > > The impetus for this patch comes from a seemingly benign statement 
> > > > > within the
> > > > > spec (quoted within the patch). For me, this patch was at some point 
> > > > > critical
> > > > > for getting stable piglit results (though this did not seem to be the 
> > > > > case on a
> > > > > branch Chad was working on).
> > > > >
> > > > > It is very important for clearing multiple color buffer attachments 
> > > > > and can be
> > > > > observed in the following piglit tests:
> > > > > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> > > > > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > > > >
> > > > > Signed-off-by: Ben Widawsky 
> > > > > ---
> > > > >  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> > > > > +
> > > > >  1 file changed, 84 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> > > > > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > > index 7bf52f0..9e6711e 100644
> > > > > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> > > > > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool 
> > > > > enable)
> > > > > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
> > > > >  }
> > > > >  
> > > > > +/**
> > > > > + * Individually fast clear each color buffer attachment. On previous 
> > > > > gens this
> > > > > + * isn't required. The motivation for this comes from one line 
> > > > > (which seems to
> > > > > + * be specific to SKL+). The list item is in section titled _MCS 
> > > > > Buffer for
> > > > > + * Render Target(s)_
> > > > > + *
> > > > > + *   "Since only one RT is bound with a clear pass, only one RT can 
> > > > > be cleared
> > > > > + *   at a time. To clear multiple RTs, multiple clear passes are 
> > > > > required."
> > > > 
> > > > This sentence also appears in the HSW PRM so it seems a bit odd if it's
> > > > only causing problems on SKL. I guess if we get Piglit regressions
> > > > without it then it makes sense to have the patch. It might be worth just
> > > > double checking whether this patch is completely necessary. The wording
> > > > in the commit message seems a little unsure.
> > > 
> > > The spec seems to be missing something as the section discussing "Render
> > > Target Fast Clear" seems to suggest the opposite:
> > > 
> > > "The render target(s) is/are bound as they normally would be, with the MCS
> > >  surface defined in SURFACE_STATE."
> > 
> > I am aware of all this. Neil, yes it is completely necessary for piglit (I 
> > don't
> > know if anything in the real world does this or not).
> > 
> > You are both asking to me to provide something which may be impossible, an
> > explanation of why the docs and/or hardware are behaving this way. Let me
> > respond in kind, please provide an alternate patch which fixes:
> > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear (all subtests)
> > 
> > FWIW Topi, it's also contradicted in 3DSTATE_PS definition.
> 
> You misunderstood me I think, I'm not questioning your patch or your
> interpretation, or asking you to provide some information that just isn't
> there in the spec. We talked about this quite a bit. I'm just saying that I
> feel that something is missing in the spec.

I've been unable to get any issues addressed in this part of the spec. I have 4
open bugs filed against the doc on this section alone.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-14 Thread Ben Widawsky

On Wed, Oct 14, 2015 at 01:10:23PM +0200, Neil Roberts wrote:
> It would be nice if you could give some indication of where this list of
> formats came from.
> 
> Unless we expect the list to change with future generations, maybe it
> would be better to make it a static const table? It's a shame to grow
> the context size unnecessarily.
> 
> Regards,
> - Neil
> 

You are correct, I should have referenced it. It is in the section titled
"Render Target Surface Types [SKL+] - Surface Formats for Render Target Messages
[SKL+]"

The supported formats do change, for now we could certainly assume that SKL is
the base set and future GENs add things.

> Ben Widawsky  writes:
> 
> > Initially I had this planned as a patch to be squashed in to the enabling 
> > patch
> > because there is no point enabling fast clears without this. However, Chad
> > merged a patch which disables fast clears on gen9 explicitly, and so I can 
> > hide
> > this behind the revert of that patch. This is a nice I really wanted this 
> > patch
> > as a distinct patch for review. This is a new, weird, and poorly documented
> > restriction for SKL. (In fact, I am still not 100% certain the restriction 
> > is
> > entirely necessary, but there are around 30 piglit regressions without 
> > this).
> >
> > SKL adds compressible render targets and as a result mutates some of the
> > programming for fast clears and resolves. There is a new internal surface 
> > type
> > called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary 
> > surface is
> > a CCS (Color Control Surface) with compression disabled or an MCS with
> > compression enabled, depending on number of multisamples. MCS (Multisample
> > Control Surface) is a special type of CCS."
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h |  1 +
> >  src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 
> > +
> >  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
> >  4 files changed, 37 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index e59478a..32b8250 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1546,6 +1546,7 @@ struct brw_context
> >  
> > uint32_t render_target_format[MESA_FORMAT_COUNT];
> > bool format_supported_as_render_target[MESA_FORMAT_COUNT];
> > +   bool losslessly_compressable[MESA_FORMAT_COUNT];
> >  
> > /* Interpolation modes, one byte per vue slot.
> >  * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
> > diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> > b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> > index 97fff60..d706ecc 100644
> > --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> > +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> > @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
> >}
> > }
> >  
> > +   if (brw->gen >= 9) {
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
> > +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
&

Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-14 Thread Ben Widawsky

On Tue, Oct 13, 2015 at 08:50:17PM -0700, Ben Widawsky wrote:
> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 
> 32b
> per color channel (see note at the bottom). It is based on work originally 
> done
> by Kristian, so thanks to him for that initial work as well as helping me 
> debug
> some of the issues.
> 
> Additionally, thanks to Chad for helping track down the last bug in the 
> rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
> 
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order 
> to
> support the 32b per channel. As it turned out though, Skylake made other 
> changes
> to support this which caused weird failures which seemed to interfere with
> each other.
> 
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
> 
> I have no piglit, dEQP or CTS regressions (except for the last patch). I 
> haven't
> yet, but will collect perf data on this ASAP. Historically we've come to 
> expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.

I left out the note here about 32b having two small regressions.

I did some very basic performance data collection. As expected, the rep_clears
which were already enabled by Chad seem to actually provide most of the gains. I
didn't actually run long enough to do much except prove to myself that there
aren't any performance regressions over the gen9 rep clears. These are the
results which shouldn't be taken too seriously (5 runs only).

Benchmark   % diff (master->full 32b fast clears)
OglBatch0 1.87   
OglBatch1 0.54   
OglBatch2 -0.44  
OglBatch3 0.11   
OglBatch4 -0.94  
OglBatch5 -2.11  
OglBatch6 1.18   
OglBatch7 7.02   
OglDeferred   3.05   
OglDeferredAA 3.6
OglFillPixel  0.07   
OglFillTexMulti   -0.01  
OglFillTexSingle  0.03   
OglGeomPoint  0.07   
OglGeomTriList0.74   
OglGeomTriStrip   -0.13  
OglHdrBloom   -1.93  
OglMultithread-0.96  
OglPSBump20.33   
OglPSBump80.31   
OglPSPhong0.18   
OglPSPom  -0.08  
OglShMapPcf   0.03   
OglShMapVsm   -0.3   
OglTerrainFlyInst 0.46   
OglTerrainPanInst 0.4
OglTexFilterAniso -0.08  
OglTexFilterTri   0.13   
OglTexMem128  0.2
OglTexMem512  -0.03  
OglVSDiffuse1 0.23   
OglVSDiffuse8 -0.23  
OglVSInstancing   -0.15  
OglVSTangent  -0.06  
OglZBuffer0.07   
fill  0.17   
filloff   -0.01  
fur   -0.19  
heaven0.56   
plot3d-0.18  
trex  4.51   
trexoff   3.69   
triangle  0.04   
valley1.86   
warsow0.18   
xonotic   0.4


BTW: the patches are here as well (with 32b support reverted):
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=skl-fast-clear
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/6] Add ARB_shader_stencil_export for SKL+

2015-10-20 Thread Ben Widawsky

This patch series implements ARB_shader_stencil_export. The fragment shader is
able to natively understand the W-tile format of stencil, and write results to
the stencil buffer. The patch series has no piglit regressions, and it is
passing both my new test (mentioned below) as well as the original test
`fbo-depth-array fs-writes-stencil`.

There are still 2 things ongoing here, but I don't think either prevent merging
at least the first 5 patches. I am still working on improving the basic test I
created: http://lists.freedesktop.org/archives/piglit/2015-October/017584.html

I am also still working on implementing meta blits for stencil using this
extension. I think the patches for that are mostly done, but they need more
testing. I'll be posting those separately.

Ben Widawsky (6):
  i965: Correct the comment about fb write payload
  i965/fs: Enumerate logical fb writes arguments
  i965: (trivial) rename computes stencil to gen9
  i965: Implement ARB_shader_stencil_export (SKL+)
  Implement the proper packing for the stencil payload
  i965: Advertise ARB_shader_stencil_export (gen9+)

 docs/relnotes/11.1.0.html  |  1 +
 src/mesa/drivers/dri/i965/brw_compiler.h   |  1 +
 src/mesa/drivers/dri/i965/brw_defines.h| 22 ++-
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 39 ++-
 src/mesa/drivers/dri/i965/brw_fs.h |  4 ++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 52 ++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  2 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 24 ++--
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 +
 src/mesa/drivers/dri/i965/gen8_ps_state.c  |  5 +++
 src/mesa/drivers/dri/i965/intel_extensions.c   |  1 +
 11 files changed, 131 insertions(+), 22 deletions(-)

-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] i965/fs: Enumerate logical fb writes arguments

2015-10-20 Thread Ben Widawsky

Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.

I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.

Cc: Francisco Jerez 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 18 ++
 src/mesa/drivers/dri/i965/brw_fs.cpp| 21 +++--
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 7a5ee1b..e06c9d6 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -912,14 +912,6 @@ enum opcode {
/**
 * Same as FS_OPCODE_FB_WRITE but expects its arguments separately as
 * individual sources instead of as a single payload blob:
-*
-* Source 0: [required] Color 0.
-* Source 1: [optional] Color 1 (for dual source blend messages).
-* Source 2: [optional] Src0 Alpha.
-* Source 3: [optional] Source Depth (gl_FragDepth)
-* Source 4: [optional (gen4-5)] Destination Depth passthrough from thread
-* Source 5: [optional] Sample Mask (gl_SampleMask).
-* Source 6: [required] Number of color components (as a UD immediate).
 */
FS_OPCODE_FB_WRITE_LOGICAL,
 
@@ -1318,6 +1310,16 @@ enum brw_urb_write_flags {
   BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
 };
 
+enum fb_write_logical_args {
+   FB_WRITE_COLOR0 = 0,  /* REQUIRED */
+   FB_WRITE_COLOR1 = 1,  /* for dual source blend messages */
+   FB_WRITE_SRC0_ALPHA = 2,
+   FB_WRITE_SRC_DEPTH = 3,   /* gl_FragDepth */
+   FB_WRITE_DST_DEPTH = 4,   /* GEN4-5: passthrough from thread */
+   FB_WRITE_OMASK = 5,   /* Sample Mask (gl_SampleMask) */
+   FB_WRITE_COMPONENTS = 6,  /* REQUIRED */
+};
+
 #ifdef __cplusplus
 /**
  * Allow brw_urb_write_flags enums to be ORed together.
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..e2e3761 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -695,10 +695,10 @@ fs_inst::components_read(unsigned i) const
   return 2;
 
case FS_OPCODE_FB_WRITE_LOGICAL:
-  assert(src[6].file == IMM);
+  assert(src[FB_WRITE_COMPONENTS].file == IMM);
   /* First/second FB write color. */
   if (i < 2)
- return src[6].fixed_hw_reg.dw1.ud;
+ return src[FB_WRITE_COMPONENTS].fixed_hw_reg.dw1.ud;
   else
  return 1;
 
@@ -3337,15 +3337,16 @@ lower_fb_write_logical_send(const fs_builder &bld, 
fs_inst *inst,
 const brw_wm_prog_key *key,
 const fs_visitor::thread_payload &payload)
 {
-   assert(inst->src[6].file == IMM);
+   assert(inst->src[FB_WRITE_COMPONENTS].file == IMM);
const brw_device_info *devinfo = bld.shader->devinfo;
-   const fs_reg &color0 = inst->src[0];
-   const fs_reg &color1 = inst->src[1];
-   const fs_reg &src0_alpha = inst->src[2];
-   const fs_reg &src_depth = inst->src[3];
-   const fs_reg &dst_depth = inst->src[4];
-   fs_reg sample_mask = inst->src[5];
-   const unsigned components = inst->src[6].fixed_hw_reg.dw1.ud;
+   const fs_reg &color0 = inst->src[FB_WRITE_COLOR0];
+   const fs_reg &color1 = inst->src[FB_WRITE_COLOR1];
+   const fs_reg &src0_alpha = inst->src[FB_WRITE_SRC0_ALPHA];
+   const fs_reg &src_depth = inst->src[FB_WRITE_SRC_DEPTH];
+   const fs_reg &dst_depth = inst->src[FB_WRITE_DST_DEPTH];
+   fs_reg sample_mask = inst->src[FB_WRITE_OMASK];
+   const unsigned components =
+  inst->src[FB_WRITE_COMPONENTS].fixed_hw_reg.dw1.ud;
 
/* We can potentially have a message length of up to 15, so we have to set
 * base_mrf to either 0 or 1 in order to fit in m0..m15.
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/6] i965: (trivial) rename computes stencil to gen9

2015-10-20 Thread Ben Widawsky

All the documentation I can find says that this bit (and functionality) only
exists on SKL+. Since the bit isn't yet used, there is no real impact here.

The original code was added by Ken here (a surprisingly long time ago):
commit f3c6d6f1e151f6a44a76038dccebe4434038dcb1
Author: Kenneth Graunke 
Date:   Thu Nov 29 21:00:27 2012 -0800

i965: Update 3DSTATE_PS, 3DSTATE_WM, and add 3DSTATE_PS_EXTRA.

Cc: Kenneth Graunke 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index e06c9d6..215f454 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2385,7 +2385,7 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define GEN8_PSX_ATTRIBUTE_ENABLE  (1 << 8)
 # define GEN8_PSX_SHADER_DISABLES_ALPHA_TO_COVERAGE (1 << 7)
 # define GEN8_PSX_SHADER_IS_PER_SAMPLE  (1 << 6)
-# define GEN8_PSX_SHADER_COMPUTES_STENCIL   (1 << 5)
+# define GEN9_PSX_SHADER_COMPUTES_STENCIL   (1 << 5)
 # define GEN9_PSX_SHADER_PULLS_BARY (1 << 3)
 # define GEN8_PSX_SHADER_HAS_UAV(1 << 2)
 # define GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK   (1 << 1)
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] i965: Correct the comment about fb write payload

2015-10-20 Thread Ben Widawsky

Cc: Francisco Jerez 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 393f17a..7a5ee1b 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -916,8 +916,8 @@ enum opcode {
 * Source 0: [required] Color 0.
 * Source 1: [optional] Color 1 (for dual source blend messages).
 * Source 2: [optional] Src0 Alpha.
-* Source 3: [optional] Source Depth (passthrough from the thread payload).
-* Source 4: [optional] Destination Depth (gl_FragDepth).
+* Source 3: [optional] Source Depth (gl_FragDepth)
+* Source 4: [optional (gen4-5)] Destination Depth passthrough from thread
 * Source 5: [optional] Sample Mask (gl_SampleMask).
 * Source 6: [required] Number of color components (as a UD immediate).
 */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] i965: Implement ARB_shader_stencil_export (SKL+)

2015-10-20 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_compiler.h   |  1 +
 src/mesa/drivers/dri/i965/brw_defines.h|  5 +++--
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 14 ++
 src/mesa/drivers/dri/i965/brw_fs.h |  2 ++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  8 
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 24 +---
 src/mesa/drivers/dri/i965/gen8_ps_state.c  |  5 +
 8 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 11c485d..4a02ce4 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -334,6 +334,7 @@ struct brw_wm_prog_data {
} binding_table;
 
uint8_t computed_depth_mode;
+   bool computed_stencil;
 
bool early_fragment_tests;
bool no_8;
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 215f454..c67728b 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1316,8 +1316,9 @@ enum fb_write_logical_args {
FB_WRITE_SRC0_ALPHA = 2,
FB_WRITE_SRC_DEPTH = 3,   /* gl_FragDepth */
FB_WRITE_DST_DEPTH = 4,   /* GEN4-5: passthrough from thread */
-   FB_WRITE_OMASK = 5,   /* Sample Mask (gl_SampleMask) */
-   FB_WRITE_COMPONENTS = 6,  /* REQUIRED */
+   FB_WRITE_SRC_STENCIL = 5, /* gl_FragStencilRefARB */
+   FB_WRITE_OMASK = 6,   /* Sample Mask (gl_SampleMask) */
+   FB_WRITE_COMPONENTS = 7,  /* REQUIRED */
 };
 
 #ifdef __cplusplus
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e2e3761..560eb91 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3344,6 +3344,7 @@ lower_fb_write_logical_send(const fs_builder &bld, 
fs_inst *inst,
const fs_reg &src0_alpha = inst->src[FB_WRITE_SRC0_ALPHA];
const fs_reg &src_depth = inst->src[FB_WRITE_SRC_DEPTH];
const fs_reg &dst_depth = inst->src[FB_WRITE_DST_DEPTH];
+   const fs_reg &src_stencil = inst->src[FB_WRITE_SRC_STENCIL];
fs_reg sample_mask = inst->src[FB_WRITE_OMASK];
const unsigned components =
   inst->src[FB_WRITE_COMPONENTS].fixed_hw_reg.dw1.ud;
@@ -3436,6 +3437,13 @@ lower_fb_write_logical_send(const fs_builder &bld, 
fs_inst *inst,
   length++;
}
 
+   if (src_stencil.file != BAD_FILE) {
+  assert(devinfo->gen >= 9);
+  assert(bld.dispatch_width() != 16);
+  sources[length] = src_stencil;
+  length++;
+   }
+
fs_inst *load;
if (devinfo->gen >= 7) {
   /* Send from the GRF */
@@ -4700,6 +4708,10 @@ fs_visitor::setup_payload_gen6()
if (nir->info.outputs_written & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) {
   source_depth_to_render_target = true;
}
+
+   if (nir->info.outputs_written & BITFIELD64_BIT(FRAG_RESULT_STENCIL)) {
+  source_stencil_to_render_target = true;
+   }
 }
 
 void
@@ -5208,6 +5220,8 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
prog_data->uses_omask =
   shader->info.outputs_written & BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK);
prog_data->computed_depth_mode = computed_depth_mode(shader);
+   prog_data->computed_stencil =
+  shader->info.outputs_written & BITFIELD64_BIT(FRAG_RESULT_STENCIL);
 
prog_data->early_fragment_tests = shader->info.fs.early_fragment_tests;
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 171338d..4f59d4b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -328,6 +328,7 @@ public:
int *push_constant_loc;
 
fs_reg frag_depth;
+   fs_reg frag_stencil;
fs_reg sample_mask;
fs_reg outputs[VARYING_SLOT_MAX];
unsigned output_components[VARYING_SLOT_MAX];
@@ -367,6 +368,7 @@ public:
} payload;
 
bool source_depth_to_render_target;
+   bool source_stencil_to_render_target;
bool runtime_check_aads_emit;
 
fs_reg pixel_x;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 13c495c..1a893c9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -317,6 +317,14 @@ fs_generator::generate_fb_write(fs_inst *inst, struct 
brw_reg payload)
brw_imm_ud(inst->target));
 }
 
+ /* Set computes stencil to render target */
+ if (prog_data->computed_stencil) {
+brw_OR(p,
+   vec1(retype(payload, BRW_REGISTER_TYPE_UD)),
+   vec1(retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UD)),
+   brw_imm_ud(0x1 << 14));
+ }
+
 implied_header = brw_null_reg();

[Mesa-dev] [PATCH 6/6] i965: Advertise ARB_shader_stencil_export (gen9+)

2015-10-20 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 docs/relnotes/11.1.0.html| 1 +
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index d3dbe9d..9abc6df 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -47,6 +47,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_blend_func_extended on freedreno (a3xx)
 GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips
 GL_ARB_gpu_shader5 on r600 for Evergreen and later chips
+GL_ARB_shader_stencil_export on i965
 GL_ARB_shader_storage_buffer_object on i965
 GL_ARB_shader_texture_image_samples on i965, nv50, nvc0, r600, 
radeonsi
 GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 3f9afd1..c6826d6 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -358,6 +358,7 @@ intelInitExtensions(struct gl_context *ctx)
if (brw->gen >= 9) {
   ctx->Extensions.KHR_texture_compression_astc_ldr = true;
   ctx->Extensions.KHR_texture_compression_astc_hdr = true;
+  ctx->Extensions.ARB_shader_stencil_export = true;
}
 
if (ctx->API == API_OPENGL_CORE)
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/6] Implement the proper packing for the stencil payload

2015-10-20 Thread Ben Widawsky

This patch is split out for review. It will be squashed before pushing.
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  6 +++-
 src/mesa/drivers/dri/i965/brw_fs.h |  2 ++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 44 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 ++
 5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index c67728b..2c5cd7a 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -915,6 +915,7 @@ enum opcode {
 */
FS_OPCODE_FB_WRITE_LOGICAL,
 
+   FS_OPCODE_PACK_STENCIL_REF,
FS_OPCODE_BLORP_FB_WRITE,
FS_OPCODE_REP_FB_WRITE,
SHADER_OPCODE_RCP,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 560eb91..c962043 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3440,7 +3440,11 @@ lower_fb_write_logical_send(const fs_builder &bld, 
fs_inst *inst,
if (src_stencil.file != BAD_FILE) {
   assert(devinfo->gen >= 9);
   assert(bld.dispatch_width() != 16);
-  sources[length] = src_stencil;
+
+  sources[length] = bld.vgrf(BRW_REGISTER_TYPE_UD);
+  bld.exec_all().annotate("FB write OS")
+ .emit(FS_OPCODE_PACK_STENCIL_REF, sources[length],
+   retype(src_stencil, BRW_REGISTER_TYPE_UB));
   length++;
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 4f59d4b..e2bc469 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -419,6 +419,8 @@ private:
void generate_fb_write(fs_inst *inst, struct brw_reg payload);
void generate_urb_write(fs_inst *inst, struct brw_reg payload);
void generate_cs_terminate(fs_inst *inst, struct brw_reg payload);
+   void generate_stencil_ref_packing(fs_inst *inst, struct brw_reg dst,
+ struct brw_reg src);
void generate_barrier(fs_inst *inst, struct brw_reg src);
void generate_blorp_fb_write(fs_inst *inst);
void generate_linterp(fs_inst *inst, struct brw_reg dst,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 1a893c9..80cc6ad 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -415,6 +415,46 @@ fs_generator::generate_cs_terminate(fs_inst *inst, struct 
brw_reg payload)
 }
 
 void
+fs_generator::generate_stencil_ref_packing(fs_inst *inst,
+   struct brw_reg dst,
+   struct brw_reg src)
+{
+   assert(dispatch_width == 8);
+   assert(devinfo->gen >= 9);
+
+   /* Stencil value updates are provided in 8 slots of 1 byte per slot.
+* Presumably, in order to save memory bandwidth, the stencil reference
+* values written from the FS need to be packed into 2 dwords (this makes
+* sense because the stencil values are limited to 1 byte each and a SIMD8
+* send, so stencil slots 0-3 in dw0, and 4-7 in dw1.)
+*
+* The spec is confusing here because in the payload definition of 
MDP_RTW_S8
+* (Message Data Payload for Render Target Writes with Stencil 8b) the
+* stencil value seems to be dw4.0-dw4.7. However, if you look at the type 
of
+* dw4 it is type MDPR_STENCIL (Message Data Payload Register) which is the
+* packed values specified above and diagrammed below:
+*
+* 31 0
+* 
+* DW  |  |
+* 2-7 |IGNORED   |
+* |  |
+* 
+* DW1 | STC   | STC   | STC   | STC  |
+* | slot7 | slot6 | slot5 | slot4|
+* 
+* DW0 | STC   | STC   | STC   | STC  |
+* | slot3 | slot2 | slot1 | slot0|
+* 
+*/
+
+   src.width = BRW_WIDTH_1;
+   src.hstride = BRW_HORIZONTAL_STRIDE_0;
+   src.vstride = BRW_VERTICAL_STRIDE_4;
+   brw_MOV(p, retype(dst, BRW_REGISTER_TYPE_UB), retype(src, 
BRW_REGISTER_TYPE_UB));
+}
+
+void
 fs_generator::generate_barrier(fs_inst *inst, struct brw_reg src)
 {
brw_barrier(p, src);
@@ -2153,6 +2193,10 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 generate_barrier(inst, src[0]);
 break;
 
+  case FS_OPCODE_PACK_STENCIL_REF:
+ generate_stencil_ref_packing(inst, dst, src[0]);
+ break;
+
   default:
  unreachable("Unsupported opcode");
 
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 2324b56..c38e34e 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++

Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> message types because I found by experimentation that it doesn't work.
> I wrote in the comment that I couldn't find any documentation for this
> problem. However I've now found the documentation and it has
> additional restrictions on further message types so this patch updates
> the comment and adds the others.
> ---
> 
> That paragraph in the spec also mentions further restrictions that we
> should probably worry about like that the shader shouldn't combine
> this optimisation with any other render target data port read/writes.
> 
> It also has a fairly pessimistic note saying the optimisation is only
> really good for large polygons in a GUI-like workload. I wonder
> whether we should be doing some more benchmarking to decide whether
> it's really a good idea to enable this as a general optimisation even
> for games.

I remember seeing this before, but I cannot find it now. All I am seeing
regarding performance implications are the bits about requiring a header, and
writing to the same pixel from multiple threads. The latter one I assume is only
going to happen with MSAA?


> 
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 49323eb..bf9ff84 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
> if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
>return false;
>  
> -   /* This optimisation doesn't seem to work for textureGather for some
> -* reason. I can't find any documentation or known workarounds to indicate
> -* that this is expected, but considering that it is probably pretty
> -* unlikely that a shader would directly write out the results from
> -* textureGather we might as well just disable it.
> +   /* 3D Sampler » Messages » Message Format
> +*
> +* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
> +*  messages except sample+killpix, resinfo, sampleinfo, LOD, and 
> gather4*”
>  */
> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
>return false;
>  
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/6] i965/fs: Enumerate logical fb writes arguments

2015-10-20 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 02:52:29PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:29 PM, Ben Widawsky
>  wrote:
> > Gen9 adds the ability to write out a stencil value, so we need to expand the
> > virtual payload by one. Abstracting this now makes that change easier to 
> > read.
> >
> > I was admittedly confused early on about some of the hardcoding. If people
> > believe the resulting code is inferior, I am not super attached to the 
> > patch.
> >
> > Cc: Francisco Jerez 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  src/mesa/drivers/dri/i965/brw_defines.h | 18 ++
> >  src/mesa/drivers/dri/i965/brw_fs.cpp| 21 +++--
> >  2 files changed, 21 insertions(+), 18 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> > b/src/mesa/drivers/dri/i965/brw_defines.h
> > index 7a5ee1b..e06c9d6 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -912,14 +912,6 @@ enum opcode {
> > /**
> >  * Same as FS_OPCODE_FB_WRITE but expects its arguments separately as
> >  * individual sources instead of as a single payload blob:
> > -*
> > -* Source 0: [required] Color 0.
> > -* Source 1: [optional] Color 1 (for dual source blend messages).
> > -* Source 2: [optional] Src0 Alpha.
> > -* Source 3: [optional] Source Depth (gl_FragDepth)
> > -* Source 4: [optional (gen4-5)] Destination Depth passthrough from 
> > thread
> > -* Source 5: [optional] Sample Mask (gl_SampleMask).
> > -* Source 6: [required] Number of color components (as a UD immediate).
> >  */
> > FS_OPCODE_FB_WRITE_LOGICAL,
> >
> > @@ -1318,6 +1310,16 @@ enum brw_urb_write_flags {
> >BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
> >  };
> >
> > +enum fb_write_logical_args {
> > +   FB_WRITE_COLOR0 = 0,  /* REQUIRED */
> > +   FB_WRITE_COLOR1 = 1,  /* for dual source blend messages */
> > +   FB_WRITE_SRC0_ALPHA = 2,
> > +   FB_WRITE_SRC_DEPTH = 3,   /* gl_FragDepth */
> > +   FB_WRITE_DST_DEPTH = 4,   /* GEN4-5: passthrough from thread */
> > +   FB_WRITE_OMASK = 5,   /* Sample Mask (gl_SampleMask) */
> > +   FB_WRITE_COMPONENTS = 6,  /* REQUIRED */
> 
> Do we gain anything by assigning values explicitly?

Just code readability. As a noob coming into the code, seeing a random "6" or
"4" in places was strange and it took a bit to figure out where to get the
sensible value from.

Is there any specific opposition toward doing this, or some reason it wasn't
done in the first place? I honestly don't care too much...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/6] i965/fs: Enumerate logical fb writes arguments

2015-10-20 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 02:57:24PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:54 PM, Ben Widawsky  wrote:
> > On Tue, Oct 20, 2015 at 02:52:29PM -0700, Matt Turner wrote:
> >> On Tue, Oct 20, 2015 at 2:29 PM, Ben Widawsky
> >>  wrote:
> >> > Gen9 adds the ability to write out a stencil value, so we need to expand 
> >> > the
> >> > virtual payload by one. Abstracting this now makes that change easier to 
> >> > read.
> >> >
> >> > I was admittedly confused early on about some of the hardcoding. If 
> >> > people
> >> > believe the resulting code is inferior, I am not super attached to the 
> >> > patch.
> >> >
> >> > Cc: Francisco Jerez 
> >> > Signed-off-by: Ben Widawsky 
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_defines.h | 18 ++
> >> >  src/mesa/drivers/dri/i965/brw_fs.cpp| 21 +++--
> >> >  2 files changed, 21 insertions(+), 18 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> >> > b/src/mesa/drivers/dri/i965/brw_defines.h
> >> > index 7a5ee1b..e06c9d6 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> > @@ -912,14 +912,6 @@ enum opcode {
> >> > /**
> >> >  * Same as FS_OPCODE_FB_WRITE but expects its arguments separately as
> >> >  * individual sources instead of as a single payload blob:
> >> > -*
> >> > -* Source 0: [required] Color 0.
> >> > -* Source 1: [optional] Color 1 (for dual source blend messages).
> >> > -* Source 2: [optional] Src0 Alpha.
> >> > -* Source 3: [optional] Source Depth (gl_FragDepth)
> >> > -* Source 4: [optional (gen4-5)] Destination Depth passthrough from 
> >> > thread
> >> > -* Source 5: [optional] Sample Mask (gl_SampleMask).
> >> > -* Source 6: [required] Number of color components (as a UD 
> >> > immediate).
> >> >  */
> >> > FS_OPCODE_FB_WRITE_LOGICAL,
> >> >
> >> > @@ -1318,6 +1310,16 @@ enum brw_urb_write_flags {
> >> >BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
> >> >  };
> >> >
> >> > +enum fb_write_logical_args {
> >> > +   FB_WRITE_COLOR0 = 0,  /* REQUIRED */
> >> > +   FB_WRITE_COLOR1 = 1,  /* for dual source blend messages */
> >> > +   FB_WRITE_SRC0_ALPHA = 2,
> >> > +   FB_WRITE_SRC_DEPTH = 3,   /* gl_FragDepth */
> >> > +   FB_WRITE_DST_DEPTH = 4,   /* GEN4-5: passthrough from thread */
> >> > +   FB_WRITE_OMASK = 5,   /* Sample Mask (gl_SampleMask) */
> >> > +   FB_WRITE_COMPONENTS = 6,  /* REQUIRED */
> >>
> >> Do we gain anything by assigning values explicitly?
> >
> > Just code readability. As a noob coming into the code, seeing a random "6" 
> > or
> > "4" in places was strange and it took a bit to figure out where to get the
> > sensible value from.
> >
> > Is there any specific opposition toward doing this, or some reason it wasn't
> > done in the first place? I honestly don't care too much...
> 
> If everything just uses the new enum values (and their values don't
> matter per se), we shouldn't assign them specifically. Patch 4/6 would
> be simpler if you didn't have to renumber some of the enums, for
> instance.

Yes, I suppose patch 4/6 does end up without the first hunk in the patch if I
did away with this, but I still think the readability gained outweighs that.
However, I admit my knowledge in this part of the codebase is likely the
minority (in between 0 and expert).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/6] Implement the proper packing for the stencil payload

2015-10-20 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 03:17:38PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:29 PM, Ben Widawsky
>  wrote:
> > This patch is split out for review. It will be squashed before pushing.
> > ---
> >  src/mesa/drivers/dri/i965/brw_defines.h|  1 +
> >  src/mesa/drivers/dri/i965/brw_fs.cpp   |  6 +++-
> >  src/mesa/drivers/dri/i965/brw_fs.h |  2 ++
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 44 
> > ++
> >  src/mesa/drivers/dri/i965/brw_shader.cpp   |  2 ++
> >  5 files changed, 54 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> > b/src/mesa/drivers/dri/i965/brw_defines.h
> > index c67728b..2c5cd7a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -915,6 +915,7 @@ enum opcode {
> >  */
> > FS_OPCODE_FB_WRITE_LOGICAL,
> >
> > +   FS_OPCODE_PACK_STENCIL_REF,
> 
> Listing this in the middle of four fbwrite opcodes seems wrong.
> 

Can you please find me a better place? I searched all over and really wasn't
sure where it belongs. It is FB write related (unless there is a more generic
use of such packing), but it is indeed not an FB write.

> > FS_OPCODE_BLORP_FB_WRITE,
> > FS_OPCODE_REP_FB_WRITE,
> > SHADER_OPCODE_RCP,
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 560eb91..c962043 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -3440,7 +3440,11 @@ lower_fb_write_logical_send(const fs_builder &bld, 
> > fs_inst *inst,
> > if (src_stencil.file != BAD_FILE) {
> >assert(devinfo->gen >= 9);
> >assert(bld.dispatch_width() != 16);
> > -  sources[length] = src_stencil;
> > +
> > +  sources[length] = bld.vgrf(BRW_REGISTER_TYPE_UD);
> > +  bld.exec_all().annotate("FB write OS")
> 
> OS?
> 

"Output Stencil" It's the common form in all the payload definitions in the
docs. I kind of liked keeping the doc term so it's searchable, even though we
hadn't done that for other things like oMask (should be OM). I can rename it to
whatever you like instead - just tell me what you want.

> > + .emit(FS_OPCODE_PACK_STENCIL_REF, sources[length],
> > +   retype(src_stencil, BRW_REGISTER_TYPE_UB));
> >length++;
> > }
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> > b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 4f59d4b..e2bc469 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -419,6 +419,8 @@ private:
> > void generate_fb_write(fs_inst *inst, struct brw_reg payload);
> > void generate_urb_write(fs_inst *inst, struct brw_reg payload);
> > void generate_cs_terminate(fs_inst *inst, struct brw_reg payload);
> > +   void generate_stencil_ref_packing(fs_inst *inst, struct brw_reg dst,
> > + struct brw_reg src);
> > void generate_barrier(fs_inst *inst, struct brw_reg src);
> > void generate_blorp_fb_write(fs_inst *inst);
> > void generate_linterp(fs_inst *inst, struct brw_reg dst,
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index 1a893c9..80cc6ad 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -415,6 +415,46 @@ fs_generator::generate_cs_terminate(fs_inst *inst, 
> > struct brw_reg payload)
> >  }
> >
> >  void
> > +fs_generator::generate_stencil_ref_packing(fs_inst *inst,
> > +   struct brw_reg dst,
> > +   struct brw_reg src)
> > +{
> > +   assert(dispatch_width == 8);
> > +   assert(devinfo->gen >= 9);
> > +
> > +   /* Stencil value updates are provided in 8 slots of 1 byte per slot.
> > +* Presumably, in order to save memory bandwidth, the stencil reference
> > +* values written from the FS need to be packed into 2 dwords (this 
> > makes
> > +* sense because the stencil values are limited to 1 byte each and a 
> > SIMD8
> > +* send, so stencil slots 0-3 in dw0, and 4-7 in dw1.)
> > +*
> > +* The spec is confusin

Re: [Mesa-dev] [PATCH 2/6] i965/fs: Enumerate logical fb writes arguments

2015-10-20 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 03:19:48PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 3:11 PM, Ben Widawsky  wrote:
> > On Tue, Oct 20, 2015 at 02:57:24PM -0700, Matt Turner wrote:
> >> On Tue, Oct 20, 2015 at 2:54 PM, Ben Widawsky  wrote:
> >> > On Tue, Oct 20, 2015 at 02:52:29PM -0700, Matt Turner wrote:
> >> >> On Tue, Oct 20, 2015 at 2:29 PM, Ben Widawsky
> >> >>  wrote:
> >> >> > Gen9 adds the ability to write out a stencil value, so we need to 
> >> >> > expand the
> >> >> > virtual payload by one. Abstracting this now makes that change easier 
> >> >> > to read.
> >> >> >
> >> >> > I was admittedly confused early on about some of the hardcoding. If 
> >> >> > people
> >> >> > believe the resulting code is inferior, I am not super attached to 
> >> >> > the patch.
> >> >> >
> >> >> > Cc: Francisco Jerez 
> >> >> > Signed-off-by: Ben Widawsky 
> >> >> > ---
> >> >> >  src/mesa/drivers/dri/i965/brw_defines.h | 18 ++
> >> >> >  src/mesa/drivers/dri/i965/brw_fs.cpp| 21 +++--
> >> >> >  2 files changed, 21 insertions(+), 18 deletions(-)
> >> >> >
> >> >> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> >> >> > b/src/mesa/drivers/dri/i965/brw_defines.h
> >> >> > index 7a5ee1b..e06c9d6 100644
> >> >> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> >> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> >> > @@ -912,14 +912,6 @@ enum opcode {
> >> >> > /**
> >> >> >  * Same as FS_OPCODE_FB_WRITE but expects its arguments 
> >> >> > separately as
> >> >> >  * individual sources instead of as a single payload blob:
> >> >> > -*
> >> >> > -* Source 0: [required] Color 0.
> >> >> > -* Source 1: [optional] Color 1 (for dual source blend messages).
> >> >> > -* Source 2: [optional] Src0 Alpha.
> >> >> > -* Source 3: [optional] Source Depth (gl_FragDepth)
> >> >> > -* Source 4: [optional (gen4-5)] Destination Depth passthrough 
> >> >> > from thread
> >> >> > -* Source 5: [optional] Sample Mask (gl_SampleMask).
> >> >> > -* Source 6: [required] Number of color components (as a UD 
> >> >> > immediate).
> >> >> >  */
> >> >> > FS_OPCODE_FB_WRITE_LOGICAL,
> >> >> >
> >> >> > @@ -1318,6 +1310,16 @@ enum brw_urb_write_flags {
> >> >> >BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
> >> >> >  };
> >> >> >
> >> >> > +enum fb_write_logical_args {
> >> >> > +   FB_WRITE_COLOR0 = 0,  /* REQUIRED */
> >> >> > +   FB_WRITE_COLOR1 = 1,  /* for dual source blend messages */
> >> >> > +   FB_WRITE_SRC0_ALPHA = 2,
> >> >> > +   FB_WRITE_SRC_DEPTH = 3,   /* gl_FragDepth */
> >> >> > +   FB_WRITE_DST_DEPTH = 4,   /* GEN4-5: passthrough from thread */
> >> >> > +   FB_WRITE_OMASK = 5,   /* Sample Mask (gl_SampleMask) */
> >> >> > +   FB_WRITE_COMPONENTS = 6,  /* REQUIRED */
> >> >>
> >> >> Do we gain anything by assigning values explicitly?
> >> >
> >> > Just code readability. As a noob coming into the code, seeing a random 
> >> > "6" or
> >> > "4" in places was strange and it took a bit to figure out where to get 
> >> > the
> >> > sensible value from.
> >> >
> >> > Is there any specific opposition toward doing this, or some reason it 
> >> > wasn't
> >> > done in the first place? I honestly don't care too much...
> >>
> >> If everything just uses the new enum values (and their values don't
> >> matter per se), we shouldn't assign them specifically. Patch 4/6 would
> >> be simpler if you didn't have to renumber some of the enums, for
> >> instance.
> >
> > Yes, I suppose patch 4/6 does end up without the first hunk in the patch if 
> > I
> > did away with this, but I still think the readability gained outweighs that.
> > However, I admit my knowledge in this part of the codebase is likely the
> > minority (in between 0 and expert).
> 
> What I should have said in the previous email is that assigning
> arbitrary numbers to enums in brw_defines.h is confusing because one
> might be led to believe that these are hardware values (like almost
> everything else is in brw_defines.h).

Oh. I see what you mean. Yeah, I can drop the explicit numbering. You're cool
with the enumeration though?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-21 Thread Ben Widawsky

On Tue, Oct 20, 2015 at 02:48:41PM -0700, Matt Turner wrote:
> On Tue, Oct 20, 2015 at 2:41 PM, Ben Widawsky  wrote:
> > On Tue, Oct 20, 2015 at 11:56:15AM +0200, Neil Roberts wrote:
> >> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> >> message types because I found by experimentation that it doesn't work.
> >> I wrote in the comment that I couldn't find any documentation for this
> >> problem. However I've now found the documentation and it has
> >> additional restrictions on further message types so this patch updates
> >> the comment and adds the others.
> >> ---
> >>
> >> That paragraph in the spec also mentions further restrictions that we
> >> should probably worry about like that the shader shouldn't combine
> >> this optimisation with any other render target data port read/writes.
> >>
> >> It also has a fairly pessimistic note saying the optimisation is only
> >> really good for large polygons in a GUI-like workload. I wonder
> >> whether we should be doing some more benchmarking to decide whether
> >> it's really a good idea to enable this as a general optimisation even
> >> for games.
> >
> > I remember seeing this before, but I cannot find it now. All I am seeing
> > regarding performance implications are the bits about requiring a header, 
> > and
> > writing to the same pixel from multiple threads. The latter one I assume is 
> > only
> > going to happen with MSAA?
> 
> No, I don't think so. As I understand it, the EUs can be executing
> fragment shaders for multiple primitives at the same time, and those
> primitives might overlap. The c in sendc means that it does some extra
> tracking to ensure that the render target writes land in the correct
> order.
> 
> Presumably by using sendc to texture directly to the render target, it
> adds some extra synchronization (before the texturing is done... or
> something?) that especially hurts when there's a lot of overlapping
> primitives (as in the case of lots of small primitives).

Ah, Neil pointed me to the blurb. Putting this here to remind myself... I think
a cheap way to measure things is to turn the sendc into a send. Things will
probably render wrong, but it should eliminate the bottleneck. If we can see
measurable perf difference with send it certainly would indicate we need to
spend time optimizing the optimization.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] [v2] i965/fs: Enumerate logical fb writes arguments

2015-10-21 Thread Ben Widawsky

Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.

I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.

v2:
Remove explicit numbering from the enumeration (Matt).
Use a real naming scheme, and reference it in the opcode definition (Curro)
  - LOGICAL_SRC_SRC_DEPTH kinda sucks... but it's consistent
Add a missed hardcoded logical position in get_lowered_simd_width (Ben)
Add an assertion to make sure the component numbering is correct (Ben)

Cc: Matt Turner 
Cc: Francisco Jerez 
Signed-off-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 22 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp | 24 +---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  1 +
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index e61ad54..a2f59ea 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -911,15 +911,9 @@ enum opcode {
 
/**
 * Same as FS_OPCODE_FB_WRITE but expects its arguments separately as
-* individual sources instead of as a single payload blob:
-*
-* Source 0: [required] Color 0.
-* Source 1: [optional] Color 1 (for dual source blend messages).
-* Source 2: [optional] Src0 Alpha.
-* Source 3: [optional] Source Depth (gl_FragDepth)
-* Source 4: [optional (gen4-5)] Destination Depth passthrough from thread
-* Source 5: [optional] Sample Mask (gl_SampleMask).
-* Source 6: [required] Number of color components (as a UD immediate).
+* individual sources instead of as a single payload blob. The
+* position/ordering of the arguments are defined by the enum
+* fb_write_logical_srcs.
 */
FS_OPCODE_FB_WRITE_LOGICAL,
 
@@ -1318,6 +1312,16 @@ enum brw_urb_write_flags {
   BRW_URB_WRITE_ALLOCATE | BRW_URB_WRITE_COMPLETE,
 };
 
+enum fb_write_logical_srcs {
+   FB_WRITE_LOGICAL_SRC_COLOR0,  /* REQUIRED */
+   FB_WRITE_LOGICAL_SRC_COLOR1,  /* for dual source blend messages */
+   FB_WRITE_LOGICAL_SRC_SRC0_ALPHA,
+   FB_WRITE_LOGICAL_SRC_SRC_DEPTH,   /* gl_FragDepth */
+   FB_WRITE_LOGICAL_SRC_DST_DEPTH,   /* GEN4-5: passthrough from thread */
+   FB_WRITE_LOGICAL_SRC_OMASK,   /* Sample Mask (gl_SampleMask) */
+   FB_WRITE_LOGICAL_SRC_COMPONENTS,  /* REQUIRED */
+};
+
 #ifdef __cplusplus
 /**
  * Allow brw_urb_write_flags enums to be ORed together.
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index da90467..ef06a70 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -695,10 +695,10 @@ fs_inst::components_read(unsigned i) const
   return 2;
 
case FS_OPCODE_FB_WRITE_LOGICAL:
-  assert(src[6].file == IMM);
+  assert(src[FB_WRITE_LOGICAL_SRC_COMPONENTS].file == IMM);
   /* First/second FB write color. */
   if (i < 2)
- return src[6].fixed_hw_reg.dw1.ud;
+ return src[FB_WRITE_LOGICAL_SRC_COMPONENTS].fixed_hw_reg.dw1.ud;
   else
  return 1;
 
@@ -3339,15 +3339,16 @@ lower_fb_write_logical_send(const fs_builder &bld, 
fs_inst *inst,
 const brw_wm_prog_key *key,
 const fs_visitor::thread_payload &payload)
 {
-   assert(inst->src[6].file == IMM);
+   assert(inst->src[FB_WRITE_LOGICAL_SRC_COMPONENTS].file == IMM);
const brw_device_info *devinfo = bld.shader->devinfo;
-   const fs_reg &color0 = inst->src[0];
-   const fs_reg &color1 = inst->src[1];
-   const fs_reg &src0_alpha = inst->src[2];
-   const fs_reg &src_depth = inst->src[3];
-   const fs_reg &dst_depth = inst->src[4];
-   fs_reg sample_mask = inst->src[5];
-   const unsigned components = inst->src[6].fixed_hw_reg.dw1.ud;
+   const fs_reg &color0 = inst->src[FB_WRITE_LOGICAL_SRC_COLOR0];
+   const fs_reg &color1 = inst->src[FB_WRITE_LOGICAL_SRC_COLOR1];
+   const fs_reg &src0_alpha = inst->src[FB_WRITE_LOGICAL_SRC_SRC0_ALPHA];
+   const fs_reg &src_depth = inst->src[FB_WRITE_LOGICAL_SRC_SRC_DEPTH];
+   const fs_reg &dst_depth = inst->src[FB_WRITE_LOGICAL_SRC_DST_DEPTH];
+   fs_reg sample_mask = inst->src[FB_WRITE_LOGICAL_SRC_OMASK];
+   const unsigned components =
+  inst->src[FB_WRITE_LOGICAL_SRC_COMPONENTS].fixed_hw_reg.dw1.ud;
 
/* We can potentially have a message length of up to 15, so we have to set
 * base_mrf to either 0 or 1 in order to fit in m0..m15.
@@ -4175,7 +4176,8 @@ get_lowered_simd_width(const struct brw_device_info 
*devinfo,
   /* Gen6 doesn't support SIMD16 depth writes but we cannot handle them
* here.
*/
-  assert(

Re: [Mesa-dev] [PATCH 2/2] i965: Set Y-tiling for qualified rotated scanout buffers

2015-10-22 Thread Ben Widawsky

On Thu, Oct 22, 2015 at 06:44:53PM -0700, Vivek Kasireddy wrote:
> On newer hardware platforms that support rotation, if the gbm
> interface requests to create a rotated scanout buffer via the
> flag __DRI_IMAGE_USE_SCANOUT_ROTATED_90_270, set Y-tiling
> while creating the buffer.
> 
> Cc: Kristian Hogsberg 
> Signed-off-by: Vivek Kasireddy 

FYI, I'd been hoping to land a superset of this for quite a while. My
understanding was the DDX doesn't support it.

http://patchwork.freedesktop.org/patch/46984/

> ---
>  src/mesa/drivers/dri/i965/intel_screen.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index 590c45d..1079676 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -525,6 +525,12 @@ intel_create_image(__DRIscreen *screen,
>  
> if (use & __DRI_IMAGE_USE_LINEAR)
>tiling = I915_TILING_NONE;
> +   else if (use & __DRI_IMAGE_USE_SCANOUT_ROTATED_90_270) {
> +  if (intelScreen->devinfo->gen >= 9)
> + tiling = I915_TILING_Y;
> +  else
> + return NULL;
> +   }
>  
> image = intel_allocate_image(format, loaderPrivate);
> if (image == NULL)
> -- 
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] i965/skl: Add GT4 PCI IDs

2015-10-23 Thread Ben Widawsky

Like other gen8+ hardware, the hardware automatically scales up thread counts
and URB sizes, so there is no need to do anything but add the PCI IDs.

FINISHME: This patch still needs testing before merge.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Ben Widawsky 
---
 include/pci_ids/i965_pci_ids.h  | 5 -
 src/mesa/drivers/dri/i965/brw_device_info.c | 4 
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 8a42599..7d23547 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -122,8 +122,11 @@ CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
 CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
 CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
 CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
-CHIPSET(0x192A, skl_gt3, "Intel(R) Skylake SRV GT3")
 CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
+CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
 CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
b/src/mesa/drivers/dri/i965/brw_device_info.c
index a6a3bb6..e7a016c 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.c
+++ b/src/mesa/drivers/dri/i965/brw_device_info.c
@@ -335,6 +335,10 @@ static const struct brw_device_info 
brw_device_info_skl_gt3 = {
GEN9_FEATURES, .gt = 3,
 };
 
+static const struct brw_device_info brw_device_info_skl_gt4 = {
+   GEN9_FEATURES, .gt = 4,
+};
+
 static const struct brw_device_info brw_device_info_bxt = {
GEN9_FEATURES,
.is_broxton = 1,
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965/skl: PCI ID cleanup and brand strings

2015-10-23 Thread Ben Widawsky

A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.

Signed-off-by: Ben Widawsky 
---
 include/pci_ids/i965_pci_ids.h | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 7d23547..a561f70 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -109,24 +109,28 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 
(Broadwell GT3e)")
 CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
 CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
 CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
-CHIPSET(0x1902, skl_gt1, "Intel(R) Skylake DT  GT1")
-CHIPSET(0x1906, skl_gt1, "Intel(R) Skylake ULT GT1")
-CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake SRV GT1")
-CHIPSET(0x190B, skl_gt1, "Intel(R) Skylake Halo GT1")
-CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake ULX GT1")
-CHIPSET(0x1912, skl_gt2, "Intel(R) Skylake DT  GT2")
-CHIPSET(0x1916, skl_gt2, "Intel(R) Skylake ULT GT2")
-CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake SRV GT2")
-CHIPSET(0x191B, skl_gt2, "Intel(R) Skylake Halo GT2")
-CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
-CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
-CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
-CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
-CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
-CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
-CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
-CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
-CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x1902, skl_gt1, "Intel® HD Graphics 510 (Skylake GT1)")
+CHIPSET(0x1906, skl_gt1, "Intel® HD Graphics 510 (Skylake GT1)")
+CHIPSET(0x190A, skl_gt1, "Intel® Skylake GT1")
+CHIPSET(0x190E, skl_gt1, "Intel® Skylake GT1")
+CHIPSET(0x1912, skl_gt2, "Intel® HD Graphics 530 (Skylake GT2)")
+CHIPSET(0x1913, skl_gt2, "Intel® Skylake GT2f")
+CHIPSET(0x1915, skl_gt2, "Intel® Skylake GT2f")
+CHIPSET(0x1916, skl_gt2, "Intel® HD Graphics 520 (Skylake GT2)")
+CHIPSET(0x1917, skl_gt2, "Intel® Skylake GT2f")
+CHIPSET(0x191A, skl_gt2, "Intel® Skylake GT2")
+CHIPSET(0x191B, skl_gt2, "Intel® HD Graphics 530 (Skylake GT2)")
+CHIPSET(0x191D, skl_gt2, "Intel® HD Graphics P530 (Skylake GT2)")
+CHIPSET(0x191E, skl_gt2, "Intel® HD Graphics 515 (Skylake GT2)")
+CHIPSET(0x1921, skl_gt2, "Intel® Skylake GT2f")
+CHIPSET(0x1923, skl_gt3, "Intel® Iris™ Graphics 540 (Skylake GT3e)")
+CHIPSET(0x1926, skl_gt3, "Intel® HD Graphics 535 (Skylake GT3)")
+CHIPSET(0x1927, skl_gt3, "Intel® Iris™ Graphics 550 (Skylake GT3e)")
+CHIPSET(0x192B, skl_gt3, "Iris™ Graphics Iris™ Graphics (Skylake GT3fe)")
+CHIPSET(0x1932, skl_gt4, "Intel® Iris™ Pro Graphics 570/580 (Skylake GT4)")
+CHIPSET(0x193A, skl_gt4, "Intel® Iris™ Pro Graphics P580 (Skylake GT4)")
+CHIPSET(0x193B, skl_gt4, "Intel® Iris™ Pro Graphics 580 (Skylake GT4)")
+CHIPSET(0x193D, skl_gt4, "Intel® Iris™ Pro Graphics P580 (Skylake GT4)")
 CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] i965/skl: Add GT4 PCI IDs

2015-10-23 Thread Ben Widawsky

On Fri, Oct 23, 2015 at 10:37:29AM -0700, Ben Widawsky wrote:
> Like other gen8+ hardware, the hardware automatically scales up thread counts
> and URB sizes, so there is no need to do anything but add the PCI IDs.
> 
> FINISHME: This patch still needs testing before merge.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Ben Widawsky 
> ---
>  include/pci_ids/i965_pci_ids.h  | 5 -
>  src/mesa/drivers/dri/i965/brw_device_info.c | 4 
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
> index 8a42599..7d23547 100644
> --- a/include/pci_ids/i965_pci_ids.h
> +++ b/include/pci_ids/i965_pci_ids.h
> @@ -122,8 +122,11 @@ CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
>  CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
>  CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
>  CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
> -CHIPSET(0x192A, skl_gt3, "Intel(R) Skylake SRV GT3")
>  CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")

This should be removed in the next patch actually. I screwed up the rebase.

> +CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
> +CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
> +CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
> +CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
>  CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
>  CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
>  CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
> diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
> b/src/mesa/drivers/dri/i965/brw_device_info.c
> index a6a3bb6..e7a016c 100644
> --- a/src/mesa/drivers/dri/i965/brw_device_info.c
> +++ b/src/mesa/drivers/dri/i965/brw_device_info.c
> @@ -335,6 +335,10 @@ static const struct brw_device_info 
> brw_device_info_skl_gt3 = {
> GEN9_FEATURES, .gt = 3,
>  };
>  
> +static const struct brw_device_info brw_device_info_skl_gt4 = {
> +   GEN9_FEATURES, .gt = 4,
> +};
> +
>  static const struct brw_device_info brw_device_info_bxt = {
> GEN9_FEATURES,
> .is_broxton = 1,
> -- 
> 2.6.1
> 

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965/skl: PCI ID cleanup and brand strings

2015-10-23 Thread Ben Widawsky

On Fri, Oct 23, 2015 at 01:44:38PM -0400, Ilia Mirkin wrote:
> On Fri, Oct 23, 2015 at 1:37 PM, Ben Widawsky
>  wrote:
> > A few new PCI ids are added here, and one is removed (0x190B) because it no
> > longer seems to exist anywhere.
> >
> > Signed-off-by: Ben Widawsky 
> > ---
> >  include/pci_ids/i965_pci_ids.h | 40 
> > ++--
> >  1 file changed, 22 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
> > index 7d23547..a561f70 100644
> > --- a/include/pci_ids/i965_pci_ids.h
> > +++ b/include/pci_ids/i965_pci_ids.h
> > @@ -109,24 +109,28 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 
> > (Broadwell GT3e)")
> >  CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
> >  CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
> >  CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
> > -CHIPSET(0x1902, skl_gt1, "Intel(R) Skylake DT  GT1")
> > -CHIPSET(0x1906, skl_gt1, "Intel(R) Skylake ULT GT1")
> > -CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake SRV GT1")
> > -CHIPSET(0x190B, skl_gt1, "Intel(R) Skylake Halo GT1")
> > -CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake ULX GT1")
> > -CHIPSET(0x1912, skl_gt2, "Intel(R) Skylake DT  GT2")
> > -CHIPSET(0x1916, skl_gt2, "Intel(R) Skylake ULT GT2")
> > -CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake SRV GT2")
> > -CHIPSET(0x191B, skl_gt2, "Intel(R) Skylake Halo GT2")
> > -CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
> > -CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
> > -CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
> > -CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
> > -CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
> > -CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
> > -CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
> > -CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
> > -CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
> > +CHIPSET(0x1902, skl_gt1, "Intel® HD Graphics 510 (Skylake GT1)")
> 
> Are you sure you want to include non-ascii characters here? For
> example your patch encodes this as UTF-8 and displays like this on my
> term:
> 
> +CHIPSET(0x1902, skl_gt1, "IntelÂ® HD Graphics 510 (Skylake GT1)")
> 
> I think (R) and (tm) are the accepted strings to use in these
> scenarios... [you could also have some huge complex thing to localize
> to the current LC setting, but... seems like overkill.]
> 
>   -ilia

I thought I would give it a shot, but I figured someone would complain (and I'm
not surprised it was you)... I copied all of these directly from libreoffice
opening a .xlsx, so it could very well just be wrong as opposed to a
misinterpretation.

I would prefer to switch away from ascii, but the fact that anyone said anything
means I will just switch it over.

> 
> > +CHIPSET(0x1906, skl_gt1, "Intel® HD Graphics 510 (Skylake GT1)")
> > +CHIPSET(0x190A, skl_gt1, "Intel® Skylake GT1")
> > +CHIPSET(0x190E, skl_gt1, "Intel® Skylake GT1")
> > +CHIPSET(0x1912, skl_gt2, "Intel® HD Graphics 530 (Skylake GT2)")
> > +CHIPSET(0x1913, skl_gt2, "Intel® Skylake GT2f")
> > +CHIPSET(0x1915, skl_gt2, "Intel® Skylake GT2f")
> > +CHIPSET(0x1916, skl_gt2, "Intel® HD Graphics 520 (Skylake GT2)")
> > +CHIPSET(0x1917, skl_gt2, "Intel® Skylake GT2f")
> > +CHIPSET(0x191A, skl_gt2, "Intel® Skylake GT2")
> > +CHIPSET(0x191B, skl_gt2, "Intel® HD Graphics 530 (Skylake GT2)")
> > +CHIPSET(0x191D, skl_gt2, "Intel® HD Graphics P530 (Skylake GT2)")
> > +CHIPSET(0x191E, skl_gt2, "Intel® HD Graphics 515 (Skylake GT2)")
> > +CHIPSET(0x1921, skl_gt2, "Intel® Skylake GT2f")
> > +CHIPSET(0x1923, skl_gt3, "Intel® Iris™ Graphics 540 (Skylake GT3e)")
> > +CHIPSET(0x1926, skl_gt3, "Intel® HD Graphics 535 (Skylake GT3)")
> > +CHIPSET(0x1927, skl_gt3, "Intel® Iris™ Graphics 550 (Skylake GT3e)")
> > +CHIPSET(0x192B, skl_gt3, "Iris™ Graphics Iris™ Graphics (Skylake GT3fe)")

I just noticed this bug which I copied blindly. I'll fix this too

> > +CHIPSET(0x1932, skl_gt4, "Intel® Iris™ Pro Graphics 570/580 (Skylake GT4)")
> > +CHIPSET(0x193A, skl_gt4, "Intel® Iris™ Pro Graphics P580 (Skylake GT4)")
> > +CHIPSET(0x193B, skl_gt4, "Intel® Iris™ Pro Graphics 580 (Skylake GT4)")
> > +CHIPSET(0x193D, skl_gt4, "Intel® Iris™ Pro Graphics P580 (Skylake GT4)")
> >  CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
> >  CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
> >  CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
> > --
> > 2.6.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] intel: Cleanup SKL PCI ID definitions.

2015-10-23 Thread Ben Widawsky

This removes ones which aren't used 0x190b, 192a), and adds some new ones. I
kept the original names where possible.

Cc: Kristian Høgsberg 
Cc: Damien Lespiau 
Signed-off-by: Ben Widawsky 
---
 intel/intel_chipset.h | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/intel/intel_chipset.h b/intel/intel_chipset.h
index 6c8dc73..a0f17c6 100644
--- a/intel/intel_chipset.h
+++ b/intel/intel_chipset.h
@@ -165,21 +165,24 @@
 #define PCI_CHIP_CHERRYVIEW_2  0x22b2
 #define PCI_CHIP_CHERRYVIEW_3  0x22b3
 
-#define PCI_CHIP_SKYLAKE_ULT_GT2   0x1916
+#define PCI_CHIP_SKYLAKE_DT_GT10x1902
 #define PCI_CHIP_SKYLAKE_ULT_GT1   0x1906
-#define PCI_CHIP_SKYLAKE_ULT_GT3   0x1926
-#define PCI_CHIP_SKYLAKE_ULT_GT2F  0x1921
-#define PCI_CHIP_SKYLAKE_ULX_GT1   0x190E
-#define PCI_CHIP_SKYLAKE_ULX_GT2   0x191E
+#define PCI_CHIP_SKYLAKE_SRV_GT1   0x190A /* Reserved */
+#define PCI_CHIP_SKYLAKE_ULX_GT1   0x190E /* Reserved */
 #define PCI_CHIP_SKYLAKE_DT_GT20x1912
-#define PCI_CHIP_SKYLAKE_DT_GT10x1902
+#define PCI_CHIP_SKYLAKE_FUSED0_GT20x1913 /* Reserved */
+#define PCI_CHIP_SKYLAKE_FUSED1_GT20x1915 /* Reserved */
+#define PCI_CHIP_SKYLAKE_ULT_GT2   0x1916
+#define PCI_CHIP_SKYLAKE_FUSED2_GT20x1917 /* Reserved */
+#define PCI_CHIP_SKYLAKE_SRV_GT2   0x191A /* Reserved */
 #define PCI_CHIP_SKYLAKE_HALO_GT2  0x191B
-#define PCI_CHIP_SKYLAKE_HALO_GT3  0x192B
-#define PCI_CHIP_SKYLAKE_HALO_GT1  0x190B
-#define PCI_CHIP_SKYLAKE_SRV_GT2   0x191A
-#define PCI_CHIP_SKYLAKE_SRV_GT3   0x192A
-#define PCI_CHIP_SKYLAKE_SRV_GT1   0x190A
 #define PCI_CHIP_SKYLAKE_WKS_GT2   0x191D
+#define PCI_CHIP_SKYLAKE_ULX_GT2   0x191E
+#define PCI_CHIP_SKYLAKE_MOBILE_GT20x1921 /* Reserved */
+#define PCI_CHIP_SKYLAKE_GT3E_540  0x1923
+#define PCI_CHIP_SKYLAKE_GT3   0x1926
+#define PCI_CHIP_SKYLAKE_GT3E_550  0x1927
+#define PCI_CHIP_SKYLAKE_HALO_GT3  0x192B /* Reserved */
 #define PCI_CHIP_SKYLAKE_DT_GT40x1932
 #define PCI_CHIP_SKYLAKE_SRV_GT4   0x193A
 #define PCI_CHIP_SKYLAKE_H_GT4 0x193B
@@ -351,20 +354,23 @@
 #define IS_SKL_GT1(devid)  ((devid) == PCI_CHIP_SKYLAKE_ULT_GT1|| \
 (devid) == PCI_CHIP_SKYLAKE_ULX_GT1|| \
 (devid) == PCI_CHIP_SKYLAKE_DT_GT1 || \
-(devid) == PCI_CHIP_SKYLAKE_HALO_GT1   || \
 (devid) == PCI_CHIP_SKYLAKE_SRV_GT1)
 
-#define IS_SKL_GT2(devid)  ((devid) == PCI_CHIP_SKYLAKE_ULT_GT2|| \
-(devid) == PCI_CHIP_SKYLAKE_ULT_GT2F   || \
-(devid) == PCI_CHIP_SKYLAKE_ULX_GT2|| \
-(devid) == PCI_CHIP_SKYLAKE_DT_GT2 || \
-(devid) == PCI_CHIP_SKYLAKE_HALO_GT2   || \
+#define IS_SKL_GT2(devid)  ((devid) == PCI_CHIP_SKYLAKE_DT_GT2 || \
+(devid) == PCI_CHIP_SKYLAKE_FUSED0_GT2 || \
+(devid) == PCI_CHIP_SKYLAKE_FUSED1_GT2 || \
+(devid) == PCI_CHIP_SKYLAKE_ULT_GT2|| \
+(devid) == PCI_CHIP_SKYLAKE_FUSED2_GT2 || \
 (devid) == PCI_CHIP_SKYLAKE_SRV_GT2|| \
-(devid) == PCI_CHIP_SKYLAKE_WKS_GT2)
+(devid) == PCI_CHIP_SKYLAKE_HALO_GT2   || \
+(devid) == PCI_CHIP_SKYLAKE_WKS_GT2|| \
+(devid) == PCI_CHIP_SKYLAKE_ULX_GT2|| \
+(devid) == PCI_CHIP_SKYLAKE_MOBILE_GT2)
 
-#define IS_SKL_GT3(devid)  ((devid) == PCI_CHIP_SKYLAKE_ULT_GT3|| \
+#define IS_SKL_GT3(devid)  ((devid) == PCI_CHIP_SKYLAKE_GT3|| \
 (devid) == PCI_CHIP_SKYLAKE_HALO_GT3   || \
-(devid) == PCI_CHIP_SKYLAKE_SRV_GT3)
+(devid) == PCI_CHIP_SKYLAKE_GT3E_540   || \
+(devid) == PCI_CHIP_SKYLAKE_GT3E_550)
 
 #define IS_SKL_GT4(devid)  ((devid) == PCI_CHIP_SKYLAKE_DT_GT4 || \
 (devid) == PCI_CHIP_SKYLAKE_SRV_GT4|| \
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] intel: Add SKL GT4 PCI IDs

2015-10-23 Thread Ben Widawsky

Cc: Kristian Høgsberg 
Cc: Damien Lespiau 
Signed-off-by: Ben Widawsky 
---
 intel/intel_chipset.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/intel/intel_chipset.h b/intel/intel_chipset.h
index 253ea71..6c8dc73 100644
--- a/intel/intel_chipset.h
+++ b/intel/intel_chipset.h
@@ -180,6 +180,10 @@
 #define PCI_CHIP_SKYLAKE_SRV_GT3   0x192A
 #define PCI_CHIP_SKYLAKE_SRV_GT1   0x190A
 #define PCI_CHIP_SKYLAKE_WKS_GT2   0x191D
+#define PCI_CHIP_SKYLAKE_DT_GT40x1932
+#define PCI_CHIP_SKYLAKE_SRV_GT4   0x193A
+#define PCI_CHIP_SKYLAKE_H_GT4 0x193B
+#define PCI_CHIP_SKYLAKE_WKS_GT4   0x193D
 
 #define PCI_CHIP_BROXTON_0 0x0A84
 #define PCI_CHIP_BROXTON_1 0x1A84
@@ -362,9 +366,15 @@
 (devid) == PCI_CHIP_SKYLAKE_HALO_GT3   || \
 (devid) == PCI_CHIP_SKYLAKE_SRV_GT3)
 
+#define IS_SKL_GT4(devid)  ((devid) == PCI_CHIP_SKYLAKE_DT_GT4 || \
+(devid) == PCI_CHIP_SKYLAKE_SRV_GT4|| \
+(devid) == PCI_CHIP_SKYLAKE_H_GT4  || \
+(devid) == PCI_CHIP_SKYLAKE_WKS_GT4)
+
 #define IS_SKYLAKE(devid)  (IS_SKL_GT1(devid) || \
 IS_SKL_GT2(devid) || \
-IS_SKL_GT3(devid))
+IS_SKL_GT3(devid) || \
+IS_SKL_GT4(devid))
 
 #define IS_BROXTON(devid)  ((devid) == PCI_CHIP_BROXTON_0  || \
 (devid) == PCI_CHIP_BROXTON_1  || \
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] [v2] i965/skl: Add GT4 PCI IDs

2015-10-23 Thread Ben Widawsky

Like other gen8+ hardware, the hardware automatically scales up thread counts
and URB sizes, so there is no need to do anything but add the PCI IDs.

FINISHME: This patch still needs testing before merge.

v2: Remove the PCI ID removal. That should be done as part of the next patch.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Ben Widawsky 
---
 include/pci_ids/i965_pci_ids.h  | 4 
 src/mesa/drivers/dri/i965/brw_device_info.c | 4 
 2 files changed, 8 insertions(+)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 8a42599..626064a 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -124,6 +124,10 @@ CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
 CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
 CHIPSET(0x192A, skl_gt3, "Intel(R) Skylake SRV GT3")
 CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
+CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
 CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c 
b/src/mesa/drivers/dri/i965/brw_device_info.c
index a6a3bb6..e7a016c 100644
--- a/src/mesa/drivers/dri/i965/brw_device_info.c
+++ b/src/mesa/drivers/dri/i965/brw_device_info.c
@@ -335,6 +335,10 @@ static const struct brw_device_info 
brw_device_info_skl_gt3 = {
GEN9_FEATURES, .gt = 3,
 };
 
+static const struct brw_device_info brw_device_info_skl_gt4 = {
+   GEN9_FEATURES, .gt = 4,
+};
+
 static const struct brw_device_info brw_device_info_bxt = {
GEN9_FEATURES,
.is_broxton = 1,
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] [v2] i965/skl: PCI ID cleanup and brand strings

2015-10-23 Thread Ben Widawsky

A few new PCI ids are added here, and two are removed (0x190B, 0x192A) because
it no longer seems to exist anywhere.

v2: Update commit message to reflect the removal of 0x192a as well.
Only use ascii characters (Ilia)

Signed-off-by: Ben Widawsky 
---
 include/pci_ids/i965_pci_ids.h | 41 ++---
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 626064a..bb51e5c 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -109,25 +109,28 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 
(Broadwell GT3e)")
 CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
 CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
 CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
-CHIPSET(0x1902, skl_gt1, "Intel(R) Skylake DT  GT1")
-CHIPSET(0x1906, skl_gt1, "Intel(R) Skylake ULT GT1")
-CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake SRV GT1")
-CHIPSET(0x190B, skl_gt1, "Intel(R) Skylake Halo GT1")
-CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake ULX GT1")
-CHIPSET(0x1912, skl_gt2, "Intel(R) Skylake DT  GT2")
-CHIPSET(0x1916, skl_gt2, "Intel(R) Skylake ULT GT2")
-CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake SRV GT2")
-CHIPSET(0x191B, skl_gt2, "Intel(R) Skylake Halo GT2")
-CHIPSET(0x191D, skl_gt2, "Intel(R) Skylake WKS GT2")
-CHIPSET(0x191E, skl_gt2, "Intel(R) Skylake ULX GT2")
-CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake ULT GT2F")
-CHIPSET(0x1926, skl_gt3, "Intel(R) Skylake ULT GT3")
-CHIPSET(0x192A, skl_gt3, "Intel(R) Skylake SRV GT3")
-CHIPSET(0x192B, skl_gt3, "Intel(R) Skylake Halo GT3")
-CHIPSET(0x1932, skl_gt4, "Intel(R) Skylake GT4")
-CHIPSET(0x193A, skl_gt4, "Intel(R) Skylake GT4")
-CHIPSET(0x193B, skl_gt4, "Intel(R) Skylake GT4")
-CHIPSET(0x193D, skl_gt4, "Intel(R) Skylake GT4")
+CHIPSET(0x1902, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
+CHIPSET(0x1906, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
+CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake GT1")
+CHIPSET(0x190E, skl_gt1, "Intel(R) Skylake GT1")
+CHIPSET(0x1912, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
+CHIPSET(0x1913, skl_gt2, "Intel(R) Skylake GT2f")
+CHIPSET(0x1915, skl_gt2, "Intel(R) Skylake GT2f")
+CHIPSET(0x1916, skl_gt2, "Intel(R) HD Graphics 520 (Skylake GT2)")
+CHIPSET(0x1917, skl_gt2, "Intel(R) Skylake GT2f")
+CHIPSET(0x191A, skl_gt2, "Intel(R) Skylake GT2")
+CHIPSET(0x191B, skl_gt2, "Intel(R) HD Graphics 530 (Skylake GT2)")
+CHIPSET(0x191D, skl_gt2, "Intel(R) HD Graphics P530 (Skylake GT2)")
+CHIPSET(0x191E, skl_gt2, "Intel(R) HD Graphics 515 (Skylake GT2)")
+CHIPSET(0x1921, skl_gt2, "Intel(R) Skylake GT2f")
+CHIPSET(0x1923, skl_gt3, "Intel(R) Iris Graphics 540 (Skylake GT3e)")
+CHIPSET(0x1926, skl_gt3, "Intel(R) HD Graphics 535 (Skylake GT3)")
+CHIPSET(0x1927, skl_gt3, "Intel(R) Iris Graphics 550 (Skylake GT3e)")
+CHIPSET(0x192B, skl_gt3, "Intel(R) Iris Graphics (Skylake GT3fe)")
+CHIPSET(0x1932, skl_gt4, "Intel(R) Iris Pro Graphics 570/580 (Skylake GT4)")
+CHIPSET(0x193A, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4)")
+CHIPSET(0x193B, skl_gt4, "Intel(R) Iris Pro Graphics 580 (Skylake GT4)")
+CHIPSET(0x193D, skl_gt4, "Intel(R) Iris Pro Graphics P580 (Skylake GT4)")
 CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B1, chv, "Intel(R) HD Graphics (Cherryview)")
 CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1066 matches

Mail list logo