Iago Toral <ito...@igalia.com> writes:

> On Tue, 2016-05-03 at 16:21 -0700, Jordan Justen wrote:
>> On 2016-05-03 05:21:54, Samuel Iglesias Gonsálvez wrote:
>> > From: Iago Toral Quiroga <ito...@igalia.com>
>> > 
>> > The current code ignores the suboffet in the instruction's source
>> > and just uses the one from the constant. This is not correct
>> > when the instruction's source is accessing the constant with a
>> > different type and using the suboffset to select a specific
>> > chunk of the constant. We generate this kind of code in fp64
>> > when we want to select only the high 32-bit of a particular
>> > double constant.
>> > 
>> > Instead, we should add any existing suboffset in the
>> > instruction's source (modulo the size of the entry's type)
>> > to the suboffset in the constant so we can preserve the orinal
>> > semantics.
>> > 
>> > Prevents that we turn this:
>> > 
>> > mov(8) vgrf5:DF, u2<0>:DF
>> > mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
>> > 
>> > Into:
>> > 
>> > mov(8) vgrf7:UD, u2<0>:UD
>> > 
>> > And instead, with this patch, we produce:
>> > 
>> > mov(8) vgrf7:UD, u2+0.4<0>:UD
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 23 
>> > ++++++++++++++++++++--
>> >  1 file changed, 21 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index aa4c9c9..5fae10f 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -445,8 +445,27 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int 
>> > arg, acp_entry *entry)
>> >     case BAD_FILE:
>> >     case ARF:
>> >     case FIXED_GRF:
>> > -      inst->src[arg].reg_offset = entry->src.reg_offset;
>> > -      inst->src[arg].subreg_offset = entry->src.subreg_offset;
>> > +      {
>> > +         inst->src[arg].reg_offset = entry->src.reg_offset;
>> > +         inst->src[arg].subreg_offset = entry->src.subreg_offset;
>> > +
>> > +         /* If we copy propagate from a larger type we have to be aware 
>> > that
>> > +          * the instruction might be using subreg_offset to select a 
>> > particular
>> > +          * chunk of the data in the entry. For example:
>> > +          *
>> > +          * mov(8) vgrf5:DF, u2<0>:DF
>> > +          * mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
>> > +          *
>> > +          * vgrf5+0.4<2>:UD is actually reading the high 32-bit of u2.0, 
>> > so if
>> > +          * we want to copy propagate here we have to do it from u2+0.4.
>> > +          */
>> > +         int type_sz_src = type_sz(inst->src[arg].type);
>> > +         int type_sz_entry = type_sz(entry->src.type);
>> > +         if (type_sz_entry > type_sz_src) {
>> > +            inst->src[arg].subreg_offset +=
>> > +               inst->src[arg].subreg_offset % type_sz_entry;
>> 
>> Seeing 'inst->src[arg].subreg_offset' on both sides of this += seems
>> strange. Is this correct?
>
> Yes, this looks wrong. I'll fix it, thanks!
>

I just had a look at the version of this patch you have in your tree
(not sure I took the right one), which does:

| @@ -445,8 +445,24 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
|     case BAD_FILE:
|     case ARF:
|     case FIXED_GRF:
| -      inst->src[arg].reg_offset = entry->src.reg_offset;
| -      inst->src[arg].subreg_offset = entry->src.subreg_offset;
| +      {
| +         inst->src[arg].reg_offset = entry->src.reg_offset;
| +
| +         /* If we copy propagate from a larger type we have to be aware that
| +          * the instruction might be using subreg_offset to select a 
particular
| +          * chunk of the data in the entry. For example:
| +          *
| +          * mov(8) vgrf5:DF, u2<0>:DF
| +          * mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
| +          *
| +          * vgrf5+0.4<2>:UD is actually reading the high 32-bit of u2.0, so 
if
| +          * we want to copy propagate here we have to do it from u2+0.4.
| +          */
| +         int type_sz_entry = type_sz(entry->src.type);
| +         inst->src[arg].subreg_offset =
| +            (inst->src[arg].subreg_offset +
| +             entry->src.subreg_offset) % type_sz_entry;
| +      }
|        break;
|     case ATTR:
|     case VGRF:

I'm afraid the calculation is still wrong, overflow of subreg_offset is
not handled properly by taking the original value modulo 4 and then
updating reg_offset accordingly (oh man there should *really* just be a
single reg_offset field expressed in bytes), and you're applying the
entry->src.subreg_offset offset *before* taking the remainder as if it
applied to the source region of the instruction instead of the source
region of the copy, which doesn't look correct to me.

Really, fixing this properly amounts to duplicating half of the logic
From the VGRF case below (but replacing REG_SIZE with 4 for uniforms
because of the inconsistent units of reg_offset).  I wonder why the
switch-case statement is even here -- How about the fix below? (applies
on top of the patch I proposed as replacement for PATCH 2)

>> -Jordan
>> 
>> > +         }
>> > +      }
>> >        break;
>> >     case ATTR:
>> >     case VGRF:
>> > -- 
>> > 2.5.0
>> > 
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

From b84d3bffaf07b0a7c1922575355748ca428d36f4 Mon Sep 17 00:00:00 2001
From: Francisco Jerez <curroje...@riseup.net>
Date: Tue, 10 May 2016 16:01:56 -0700
Subject: [PATCH 1/2] i965/fs: Simplify and fix register offset calculation of
 try_copy_propagate().

try_copy_propagate() was special-casing UNIFORM registers (the
BAD_FILE, ARF and FIXED_GRF cases are dead, see the assertion at the
top of the function) and then failing to take into account the
possibility of the instruction reading from a non-zero offset of the
destination of the copy.  The VGRF/ATTR handling takes it into account
correctly, and there is no reason we couldn't use the exact same logic
for the UNIFORM file aside from the fact that uniforms represent
reg_offset in different units.  We can work around that easily by
defining an additional constant with the right unit reg_offset is
expressed in.
---
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 40 +++++-----------------
 1 file changed, 9 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index b4573c4..5e17fe4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -439,27 +439,6 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
    inst->src[arg].stride *= entry->src.stride;
    inst->saturate = inst->saturate || entry->saturate;
 
-   switch (entry->src.file) {
-   case UNIFORM:
-   case BAD_FILE:
-   case ARF:
-   case FIXED_GRF:
-      inst->src[arg].reg_offset = entry->src.reg_offset;
-      inst->src[arg].subreg_offset = entry->src.subreg_offset;
-      break;
-   case ATTR:
-   case VGRF:
-      {
-         /* In this case, we'll just leave the width alone.  The source
-          * register could have different widths depending on how it is
-          * being used.  For instance, if only half of the register was
-          * used then we want to preserve that and continue to only use
-          * half.
-          *
-          * Also, we have to deal with mapping parts of vgrfs to other
-          * parts of vgrfs so we have to do some reg_offset magic.
-          */
-
          /* Compute the offset of inst->src[arg] relative to entry->dst */
          const unsigned rel_offset = (inst->src[arg].reg_offset
                                       - entry->dst.reg_offset) * REG_SIZE +
@@ -472,21 +451,20 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
          const unsigned component = rel_offset / type_sz(entry->dst.type);
          const unsigned suboffset = rel_offset % type_sz(entry->dst.type);
 
+         /* Account for the inconsistent units reg_offset is expressed in.
+          * FINISHME -- Make the units of reg_offset consistent (e.g. bytes?)
+          *             for all register files.
+          */
+         const unsigned reg_size = (entry->src.file == UNIFORM ? 4 : REG_SIZE);
+
          /* Calculate the byte offset at the origin of the copy of the given
           * component and suboffset.
           */
          const unsigned offset = suboffset +
             component * entry->src.stride * type_sz(entry->src.type) +
-            entry->src.reg_offset * REG_SIZE + entry->src.subreg_offset;
-         inst->src[arg].reg_offset = offset / REG_SIZE;
-         inst->src[arg].subreg_offset = offset % REG_SIZE;
-      }
-      break;
-
-   case MRF:
-   case IMM:
-      unreachable("not reached");
-   }
+            entry->src.reg_offset * reg_size + entry->src.subreg_offset;
+         inst->src[arg].reg_offset = offset / reg_size;
+         inst->src[arg].subreg_offset = offset % reg_size;
 
    if (has_source_modifiers) {
       if (entry->dst.type != inst->src[arg].type) {
-- 
2.7.3

From 60ec356e0e275287583e9b3217e7b50be9e392b6 Mon Sep 17 00:00:00 2001
From: Francisco Jerez <curroje...@riseup.net>
Date: Tue, 10 May 2016 16:03:36 -0700
Subject: [PATCH 2/2] i965/fs: Reindent register offset calculation of
 try_copy_propagate().

---
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 46 +++++++++++-----------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 5e17fe4..6d6d8d0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -439,32 +439,32 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
    inst->src[arg].stride *= entry->src.stride;
    inst->saturate = inst->saturate || entry->saturate;
 
-         /* Compute the offset of inst->src[arg] relative to entry->dst */
-         const unsigned rel_offset = (inst->src[arg].reg_offset
-                                      - entry->dst.reg_offset) * REG_SIZE +
-                                     inst->src[arg].subreg_offset;
+   /* Compute the offset of inst->src[arg] relative to entry->dst */
+   const unsigned rel_offset = (inst->src[arg].reg_offset
+                                - entry->dst.reg_offset) * REG_SIZE +
+                               inst->src[arg].subreg_offset;
 
-         /* Compute the first component of the copy that the instruction is
-          * reading, and the base byte offset within that component.
-          */
-         assert(entry->dst.subreg_offset == 0 && entry->dst.stride == 1);
-         const unsigned component = rel_offset / type_sz(entry->dst.type);
-         const unsigned suboffset = rel_offset % type_sz(entry->dst.type);
+   /* Compute the first component of the copy that the instruction is
+    * reading, and the base byte offset within that component.
+    */
+   assert(entry->dst.subreg_offset == 0 && entry->dst.stride == 1);
+   const unsigned component = rel_offset / type_sz(entry->dst.type);
+   const unsigned suboffset = rel_offset % type_sz(entry->dst.type);
 
-         /* Account for the inconsistent units reg_offset is expressed in.
-          * FINISHME -- Make the units of reg_offset consistent (e.g. bytes?)
-          *             for all register files.
-          */
-         const unsigned reg_size = (entry->src.file == UNIFORM ? 4 : REG_SIZE);
+   /* Account for the inconsistent units reg_offset is expressed in.
+    * FINISHME -- Make the units of reg_offset consistent (e.g. bytes?) for
+    *             all register files.
+    */
+   const unsigned reg_size = (entry->src.file == UNIFORM ? 4 : REG_SIZE);
 
-         /* Calculate the byte offset at the origin of the copy of the given
-          * component and suboffset.
-          */
-         const unsigned offset = suboffset +
-            component * entry->src.stride * type_sz(entry->src.type) +
-            entry->src.reg_offset * reg_size + entry->src.subreg_offset;
-         inst->src[arg].reg_offset = offset / reg_size;
-         inst->src[arg].subreg_offset = offset % reg_size;
+   /* Calculate the byte offset at the origin of the copy of the given
+    * component and suboffset.
+    */
+   const unsigned offset = suboffset +
+      component * entry->src.stride * type_sz(entry->src.type) +
+      entry->src.reg_offset * reg_size + entry->src.subreg_offset;
+   inst->src[arg].reg_offset = offset / reg_size;
+   inst->src[arg].subreg_offset = offset % reg_size;
 
    if (has_source_modifiers) {
       if (entry->dst.type != inst->src[arg].type) {
-- 
2.7.3

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to