So basically,
"mov dst.xz src.yw" is impossible with mesa code?
Ben
________________________________________
From: mesa-dev-bounces+benjamin.segovia=intel....@lists.freedesktop.org
[mesa-dev-bounces+benjamin.segovia=intel....@lists.freedesktop.org]
On Behalf Of Segovia, Benjamin [benjamin.sego...@intel.com]
Sent: Monday, July 26, 2010 6:13 PM
To: Eric Anholt; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in
Mesa front end
- Hmm, first bug is clearly something I misunderstood. You say:
in "mov dst.xz src" src will use .xz and not .xy? But the swizzle is
still. xyzw or do I miss a point?
- I am not sure but the other test were skipped I think with my
version of piglit + mesa. I have to check that again.
- For bit_scan, bit_cound, what would you prefer?
Cheers,
Ben
________________________________________
From: Eric Anholt [e...@anholt.net]
Sent: Monday, July 26, 2010 6:03 PM
To: Segovia, Benjamin; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Better GPU program optimization in
Mesa front end
On Mon, 19 Jul 2010 16:01:02 -0700, Benjamin Segovia <benjamin.sego...@intel.com
> wrote:
- Improved optimization of GPU programs. Now, swizzling is taken
into account
and swizzles are properly transformed while removing mov
instructions.
Removals of mov instructions are now much more effective
- Analysis of control flows is still very primitive and far more too
conservative. Shaders using a lot of branches will be less
optimized than
straightforward ones
- Main things to do next is:
* instruction merging like for example merging:
mul a.x b.x c.x
mul a.y b.y c.y
mul a.z b.z c.z
into
mul a.xyz b.xyz c.xyz
* register renaming to avoid some still unecessary movs
- Tested with piglit. I run all the shaders and compare output from
the new
version with the old one. Also, run openarena, nexuiz and warsow.
All games
perfectly run and GPU code is clearly improved. Note that I only
use my Intel
Gen GPU for the backend. So everything was tested using classic
Mesa with
the Intel i965 driver.
Added two new testcases to piglit, glsl-fs-add-masked and
glsl-fs-mov-masked, to catch bugs in this that I found while reviewing
the code after my GLSL demo started failing. Patch included for
glsl-fs-add-masked (MOV dst.xz, src uses the X and Z channels of src,
not X and Y), feel free to squash it into a fixed patch.
glsl-vs-arrays, glsl-vs-arrays-2, and glsl-vs-mov-after-deref also
started failing for me as well after applying your change. Lots of
important code seems to have been optimized out. I'm guessing RelAddr
handling is broken. Everywhere else that bit_count is used looks
bogus
to me, but I haven't constructed tests for them yet.
I'm not a fan of bit_clear[], bit_scan[], or expand_one[] either.
They
obfuscate the code to me.
From 9c7fb7b5b6f299b19d27245f2b68853521499afa Mon Sep 17 00:00:00 2001
From: Eric Anholt <e...@anholt.net>
Date: Mon, 26 Jul 2010 16:20:11 -0700
Subject: [PATCH] mesa: Fix bug in handling of read channel masks
get_src_arg_mask.
Fixes piglit glsl-fs-add-masked, and my GLSL demo.
---
src/mesa/program/prog_optimize.c | 14 ++++----------
1 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/src/mesa/program/prog_optimize.c b/src/mesa/program/
prog_optimize.c
index 9119b10..c853330 100644
--- a/src/mesa/program/prog_optimize.c
+++ b/src/mesa/program/prog_optimize.c
@@ -51,12 +51,11 @@ static const int expand_one[5] = {0,1,3,7,15};
static GLuint
get_src_arg_mask(const struct prog_instruction *inst, int arg, int
dst_mask)
{
- int read_mask, channel_mask, read_count;
+ int read_mask, channel_mask;
int comp;
/* Form the dst register, find the written channels */
if (inst->CondUpdate) {
- read_count = 4;
channel_mask = WRITEMASK_XYZW;
}
else {
@@ -67,8 +66,7 @@ get_src_arg_mask(const struct prog_instruction
*inst, int arg, int dst_mask)
case OPCODE_MAD:
case OPCODE_MUL:
case OPCODE_SUB:
- read_count = bit_count[inst->DstReg.WriteMask & dst_mask];
- channel_mask = expand_one[read_count];
+ channel_mask = inst->DstReg.WriteMask & dst_mask;
break;
case OPCODE_RCP:
case OPCODE_SIN:
@@ -76,20 +74,16 @@ get_src_arg_mask(const struct prog_instruction
*inst, int arg, int dst_mask)
case OPCODE_RSQ:
case OPCODE_POW:
case OPCODE_EX2:
- read_count = 1;
channel_mask = WRITEMASK_X;
break;
case OPCODE_DP2:
- read_count = 2;
channel_mask = WRITEMASK_XY;
break;
case OPCODE_DP3:
case OPCODE_XPD:
- read_count = 3;
channel_mask = WRITEMASK_XYZ;
break;
default:
- read_count = 4;
channel_mask = WRITEMASK_XYZW;
break;
}
@@ -99,9 +93,9 @@ get_src_arg_mask(const struct prog_instruction
*inst, int arg, int dst_mask)
* components are actually read
*/
read_mask = 0;
- for (comp = 0; comp < read_count; ++comp) {
+ for (comp = 0; comp < 4; ++comp) {
const int coord = GET_SWZ(inst->SrcReg[arg].Swizzle, comp);
- if (coord <= SWIZZLE_W)
+ if (channel_mask & (1 << comp) && coord <= SWIZZLE_W)
read_mask |= 1 << coord;
}
--
1.7.1
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev