So basically,

"mov dst.xz  src.yw" is impossible with mesa code?

Ben

________________________________________
From: mesa-dev-bounces+benjamin.segovia=intel....@lists.freedesktop.org 
[mesa-dev-bounces+benjamin.segovia=intel....@lists.freedesktop.org] On Behalf 
Of Segovia, Benjamin [benjamin.sego...@intel.com]
Sent: Monday, July 26, 2010 6:13 PM
To: Eric Anholt; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in Mesa front 
end

- Hmm, first bug is clearly something I misunderstood. You say:

in "mov dst.xz src" src will use .xz and not .xy? But the swizzle is still. 
xyzw or do I miss a point?

- I am not sure but the other test were skipped I think with my version of 
piglit + mesa. I have to check that again.

- For bit_scan, bit_cound, what would you prefer?

Cheers,
Ben

________________________________________
From: Eric Anholt [e...@anholt.net]
Sent: Monday, July 26, 2010 6:03 PM
To: Segovia, Benjamin; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in Mesa front 
end

On Mon, 19 Jul 2010 16:01:02 -0700, Benjamin Segovia 
<benjamin.sego...@intel.com> wrote:
> - Improved optimization of GPU programs. Now, swizzling is taken into account
>   and swizzles are properly transformed while removing mov instructions.
>   Removals of mov instructions are now much more effective
>
> - Analysis of control flows is still very primitive and far more too
>   conservative. Shaders using a lot of branches will be less optimized than
>   straightforward ones
>
> - Main things to do next is:
>    * instruction merging like for example merging:
>        mul a.x b.x c.x
>        mul a.y b.y c.y
>        mul a.z b.z c.z
>      into
>        mul a.xyz b.xyz c.xyz
>    * register renaming to avoid some still unecessary movs
>
> - Tested with piglit. I run all the shaders and compare output from the new
>   version with the old one. Also, run openarena, nexuiz and warsow. All games
>   perfectly run and GPU code is clearly improved. Note that I only use my 
> Intel
>   Gen GPU for the backend. So everything was tested using classic Mesa with
>   the Intel i965 driver.

Added two new testcases to piglit, glsl-fs-add-masked and
glsl-fs-mov-masked, to catch bugs in this that I found while reviewing
the code after my GLSL demo started failing.  Patch included for
glsl-fs-add-masked (MOV dst.xz, src uses the X and Z channels of src,
not X and Y), feel free to squash it into a fixed patch.

glsl-vs-arrays, glsl-vs-arrays-2, and glsl-vs-mov-after-deref also
started failing for me as well after applying your change.  Lots of
important code seems to have been optimized out.  I'm guessing RelAddr
handling is broken.  Everywhere else that bit_count is used looks bogus
to me, but I haven't constructed tests for them yet.

I'm not a fan of bit_clear[], bit_scan[], or expand_one[] either.  They
obfuscate the code to me.

>From 9c7fb7b5b6f299b19d27245f2b68853521499afa Mon Sep 17 00:00:00 2001
From: Eric Anholt <e...@anholt.net>
Date: Mon, 26 Jul 2010 16:20:11 -0700
Subject: [PATCH] mesa: Fix bug in handling of read channel masks 
get_src_arg_mask.

Fixes piglit glsl-fs-add-masked, and my GLSL demo.
---
 src/mesa/program/prog_optimize.c |   14 ++++----------
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/src/mesa/program/prog_optimize.c b/src/mesa/program/prog_optimize.c
index 9119b10..c853330 100644
--- a/src/mesa/program/prog_optimize.c
+++ b/src/mesa/program/prog_optimize.c
@@ -51,12 +51,11 @@ static const int expand_one[5] = {0,1,3,7,15};
 static GLuint
 get_src_arg_mask(const struct prog_instruction *inst, int arg, int dst_mask)
 {
-   int read_mask, channel_mask, read_count;
+   int read_mask, channel_mask;
    int comp;

    /* Form the dst register, find the written channels */
    if (inst->CondUpdate) {
-      read_count = 4;
       channel_mask = WRITEMASK_XYZW;
    }
    else {
@@ -67,8 +66,7 @@ get_src_arg_mask(const struct prog_instruction *inst, int 
arg, int dst_mask)
       case OPCODE_MAD:
       case OPCODE_MUL:
       case OPCODE_SUB:
-         read_count = bit_count[inst->DstReg.WriteMask & dst_mask];
-         channel_mask = expand_one[read_count];
+         channel_mask = inst->DstReg.WriteMask & dst_mask;
          break;
       case OPCODE_RCP:
       case OPCODE_SIN:
@@ -76,20 +74,16 @@ get_src_arg_mask(const struct prog_instruction *inst, int 
arg, int dst_mask)
       case OPCODE_RSQ:
       case OPCODE_POW:
       case OPCODE_EX2:
-         read_count = 1;
          channel_mask = WRITEMASK_X;
          break;
       case OPCODE_DP2:
-         read_count = 2;
          channel_mask = WRITEMASK_XY;
          break;
       case OPCODE_DP3:
       case OPCODE_XPD:
-         read_count = 3;
          channel_mask = WRITEMASK_XYZ;
          break;
       default:
-         read_count = 4;
          channel_mask = WRITEMASK_XYZW;
          break;
       }
@@ -99,9 +93,9 @@ get_src_arg_mask(const struct prog_instruction *inst, int 
arg, int dst_mask)
     * components are actually read
     */
    read_mask = 0;
-   for (comp = 0; comp < read_count; ++comp) {
+   for (comp = 0; comp < 4; ++comp) {
       const int coord = GET_SWZ(inst->SrcReg[arg].Swizzle, comp);
-      if (coord <= SWIZZLE_W)
+      if (channel_mask & (1 << comp) && coord <= SWIZZLE_W)
          read_mask |= 1 << coord;
    }

--
1.7.1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to