fs: Implement image load, store and atomic.

Francisco Jerez Fri, 31 Jul 2015 10:00:18 -0700

Jason Ekstrand <ja...@jlekstrand.net> writes:

> On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez <curroje...@riseup.net> 
> wrote:
>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>
>>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez <curroje...@riseup.net> 
>>> wrote:
>>>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>>>
>>>>> This all looks correct as far as I can tell.  However, I'm very
>>>>> concerned about the number of checks such as
>>>>> has_matching_typed_format() that are built-in to the compiler (via
>>>>> surface_builder) where we then go on to do something that is highly
>>>>> dependant on state setup doing the exact same check (not via
>>>>> surface_builder) and doing the right thing.  Another example would be
>>>>> the interplay with has_split_bit_layout.  How do we know these won't
>>>>> get out of sync?
>>>>>
>>>> They cannot get out of sync because the policy which surface format to
>>>> use for a given GLSL format is shared between the compiler and the
>>>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
>>>> surface state set-up for shader images.".
>>>
>>> I came back to this today and looked at it again.  I think my main
>>> objection is that all of these metadata functions work in terms of a
>>> devinfo and a format and then immediately call
>>> brw_lower_mesa_image_format.  Why don't they work instead in terms
>>> comparing the lowered format to the nominal format?  Then it would be
>>> a lot more obvious that you're doing a transformation from one format
>>> to another.  As is, it's a bunch of magic "what do I have to do next"
>>> queries
>>
>> I don't think any of these functions implies what to do next, they all
>> encode clearly defined device-specific properties of the formats:
>>
>>  - Does the device support any format with the same bit layout?
>>    (has_supported_bit_layout())
>>
>>  - Does the device support any format with the same bit layout *and*
>>    encoding? (is_conversion_trivial())
>>
>>  - Does the device support a typed format of the same size?
>>    (has_matching_typed_format())
>>
>>  - Does the device split the pixel data into a number of discontiguous
>>    segments? (has_split_bit_layout())
>>
>>  - Does the device return garbage in the unused bits when reading from
>>    the given format? (has_undefined_high_bits())
>>
>>  - Do all fields of the format have the same size? (This is not even
>>    device-specific)
>>
>>  - Is the format represented as a signed integer in memory? (Ditto)
>>
>> You may have noticed that except for the last two the natural subject
>> and object of all questions are "device" and "format" respectively --
>> I'm not sure I see how rephrasing them to involve the lowered format
>> (that in fact a number of them don't care about) would make any of these
>> questions easier to understand.
>
> I *understand* the questions being asked.  My claim is this is the
> *wrong* set of questions.  By the time you get to this stage, someone
> higher up has made the decision that they're not going to give you
> format A but will rather give you format B which they know will also
> work.  (This decision is made by code that understands the hardware
> limitations and you're guaranteed that B is a format you can read and
> transform into A.)  The fact that that decision is passed down through
> brw_lower_mesa_image_format is secondary.  At this stage, you have two
> formats, A and B and need to figure out how to get from one to the
> other.
>
> For example, instead of "Does the device support a typed format of the
> same size?" the better question would be "Are these two formats the
> same size?".


Those two questions are not equivalent, lowered formats are always the
same size as the original, whether a format is supported for typed
surface access or not is inherently devinfo-specific, the devinfo
argument couldn't be replaced with a lower_format argument.  The same
goes for "Does the device return garbage in the unused bits when reading
From the given format?".

> Instead of asking "Does the device support any format with the same
> bit layout *and* encoding?" you should ask "Do these two formats have
> the same bit layout *and* encoding?"
>
*Shrug*, I think they're only wrong questions if you start from the
assumption that the policy that computes the lowered format from a given
image format is broken -- E.g. that it gives you a suboptimal format of
different encoding even though the hardware supports an identically
encoded one.  You can quickly rule out that assumption by looking up the
definition of is_conversion_trivial() and seeing that it does little
more than call brw_lower_mesa_image_format().

I'm afraid that I was slightly misleading earlier:
is_conversion_trivial() is really about the encoding and only has
indirect bit layout implications.  The current implementation is based
on the assumption that the lowered format we're converting from or to is
just the canonical one -- I.e. it cannot be readily extended to check
whether the conversion between any two arbitrary formats is trivial, so
even though replacing the devinfo argument with a lower_format would be
technically possible it would lead to an IMO a more disturbing problem.
is_conversion_trivial(mesa_format, mesa_format) would be a partial
function with ill-defined results for certain combinations of its
arguments, and it wouldn't have any way to assert() that the pair is
sensible without an additional devinfo pointer.  To make the matter
worse the C++ type system cannot distinguish between a lowered and an
unlowered format while it can distinguish between a devinfo and a
format.  The current image_format_info queries are total functions of
their arguments and they are all consistent about taking the original
format *only*.  I'd like to preserve those properties.

> Does the difference I'm making make sense?  In the current code, it
> looks like you're doing device queries and making decisions about how
> to work around device limitations.

emit_image_load and store don't make any decisions themselves, and I'm
not sure I fully understand what made you think they were -- The
code-paths taken through them are driven by the image_format_info
queries so I understand that you may have thought that *those* were the
ones making decisions, or not, either way it's not an obstacle to
convince oneself of the correctness of emit_image_load/store if you
temporarily assume that the results returned by image_format_info are
sensible and consistent with the canonical format lowering policy --
Assumption you can quickly verify afterwards because their semantics are
clearly defined and their result is usually a simple function of
brw_lower_mesa_image_format().

There's one case however in which I think it would be fine to replace
the devinfo argument with a mesa_format one (although it would somewhat
break the rule of only passing unlowered formats to the queries).  It
may also be the one that contributed the most to your impression that
emit_image_load and store were making decisions: We could change
"has_supported_bit_layout(devinfo, format)" to
"has_same_bit_layout(format, lower_format)".  has_same_bit_layout()
would be total and commutative, so my objections above don't apply.  For
the other ones I could mention more explicitly in the doxygen comments
that the functions don't make any decisions independently and the result
is based on the usual lowering policy for the given device.

> Hence my original comment about things getting out of sync.  In
> reality, all of those decisions have already been made by
> brw_lower_mesa_image_format, this code is just doing what it needs to
> do to go from format A to format B.
>
>>> which, while in the same namespace, are in a different file.
>>
>> I happen to have cleaned that up to be in the same file a short while
>> ago -- They're no longer used by anything else outside of
>> brw_fs_surface_builder.cpp:
>> http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200
>>
>>> It would also be more efficient because you would only call
>>> brw_lower_mesa_image_format once.
>>
>> A more elegant way to "fix" that would be to mark
>> brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it
>> would make any measurable difference in practice though.
>>
>>>
>>> Does that make more sense?
>>> --Jason
>>>
>>>> Thanks.
>>>>
>>>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <curroje...@riseup.net> 
>>>>> wrote:
>>>>>> v2: Drop VEC4 suport.
>>>>>> v3: Rebase.
>>>>>> ---
>>>>>>  .../drivers/dri/i965/brw_fs_surface_builder.cpp    | 216 
>>>>>> +++++++++++++++++++++
>>>>>>  src/mesa/drivers/dri/i965/brw_fs_surface_builder.h |  17 ++
>>>>>>  2 files changed, 233 insertions(+)
>>>>>>
>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp 
>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>> index ea1c4aa..46b449f 100644
>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>> @@ -587,3 +587,219 @@ namespace {
>>>>>>        }
>>>>>>     }
>>>>>>  }
>>>>>> +
>>>>>> +namespace brw {
>>>>>> +   namespace image_access {
>>>>>> +      /**
>>>>>> +       * Load a vector from a surface of the given format and 
>>>>>> dimensionality
>>>>>> +       * at the given coordinates.
>>>>>> +       */
>>>>>> +      fs_reg
>>>>>> +      emit_image_load(const fs_builder &bld,
>>>>>> +                      const fs_reg &image, const fs_reg &addr,
>>>>>> +                      mesa_format format, unsigned dims)
>>>>>> +      {
>>>>>> +         using namespace image_format_info;
>>>>>> +         using namespace image_format_conversion;
>>>>>> +         using namespace image_validity;
>>>>>> +         using namespace image_coordinates;
>>>>>> +         using namespace surface_access;
>>>>>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>>>>>> +         const mesa_format lower_format =
>>>>>> +            brw_lower_mesa_image_format(devinfo, format);
>>>>>> +         fs_reg tmp;
>>>>>> +
>>>>>> +         if (has_matching_typed_format(devinfo, format)) {
>>>>>> +            /* Hopefully we get here most of the time... */
>>>>>> +            tmp = emit_typed_read(bld, image, addr, dims,
>>>>>> +                                  
>>>>>> _mesa_format_num_components(lower_format));
>>>>>> +         } else {
>>>>>> +            /* Untyped surface reads return 32 bits of the surface per
>>>>>> +             * component, without any sort of unpacking or type 
>>>>>> conversion,
>>>>>> +             */
>>>>>> +            const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>>> +
>>>>>> +            /* they don't properly handle out of bounds access, so we 
>>>>>> have to
>>>>>> +             * check manually if the coordinates are valid and 
>>>>>> predicate the
>>>>>> +             * surface read on the result,
>>>>>> +             */
>>>>>> +            const brw_predicate pred =
>>>>>> +               emit_bounds_check(bld, image, addr, dims);
>>>>>> +
>>>>>> +            /* and they don't know about surface coordinates, we need to
>>>>>> +             * convert them to a raw memory offset.
>>>>>> +             */
>>>>>> +            const fs_reg laddr = emit_address_calculation(bld, image, 
>>>>>> addr, dims);
>>>>>> +
>>>>>> +            tmp = emit_untyped_read(bld, image, laddr, 1, size, pred);
>>>>>> +
>>>>>> +            /* An out of bounds surface access should give zero as 
>>>>>> result. */
>>>>>> +            for (unsigned c = 0; c < 4; ++c)
>>>>>> +               set_predicate(pred, bld.SEL(offset(tmp, bld, c),
>>>>>> +                                           offset(tmp, bld, c), 
>>>>>> fs_reg(0)));
>>>>>> +         }
>>>>>> +
>>>>>> +         /* Set the register type to D instead of UD if the data type is
>>>>>> +          * represented as a signed integer in memory so that sign 
>>>>>> extension
>>>>>> +          * is handled correctly by unpack.
>>>>>> +          */
>>>>>> +         if (needs_sign_extension(format))
>>>>>> +            tmp = retype(tmp, BRW_REGISTER_TYPE_D);
>>>>>> +
>>>>>> +         if (!has_supported_bit_layout(devinfo, format)) {
>>>>>> +            /* Unpack individual vector components from the bitfield if 
>>>>>> the
>>>>>> +             * hardware is unable to do it for us.
>>>>>> +             */
>>>>>> +            if (has_split_bit_layout(devinfo, format))
>>>>>> +               tmp = emit_pack(bld, tmp, get_bit_shifts(lower_format),
>>>>>> +                               get_bit_widths(lower_format));
>>>>>> +            else
>>>>>> +               tmp = emit_unpack(bld, tmp, get_bit_shifts(format),
>>>>>> +                                 get_bit_widths(format));
>>>>>> +
>>>>>> +         } else if ((needs_sign_extension(format) &&
>>>>>> +                     !is_conversion_trivial(devinfo, format)) ||
>>>>>> +                    has_undefined_high_bits(devinfo, format)) {
>>>>>> +            /* Perform a trivial unpack even though the bit layout 
>>>>>> matches in
>>>>>> +             * order to get the most significant bits of each component
>>>>>> +             * initialized properly.
>>>>>> +             */
>>>>>> +            tmp = emit_unpack(bld, tmp, color_u(0, 32, 64, 96),
>>>>>> +                              get_bit_widths(format));
>>>>>> +         }
>>>>>> +
>>>>>> +         if (!_mesa_is_format_integer(format)) {
>>>>>> +            if (is_conversion_trivial(devinfo, format)) {
>>>>>> +               /* Just need to cast the vector to the target type. */
>>>>>> +               tmp = retype(tmp, BRW_REGISTER_TYPE_F);
>>>>>> +            } else {
>>>>>> +               /* Do the right sort of type conversion to float. */
>>>>>> +               if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>>> +                  tmp = emit_convert_from_float(
>>>>>> +                     bld, tmp, get_bit_widths(format));
>>>>>> +               else
>>>>>> +                  tmp = emit_convert_from_scaled(
>>>>>> +                     bld, tmp, get_bit_widths(format),
>>>>>> +                     _mesa_is_format_signed(format));
>>>>>> +            }
>>>>>> +         }
>>>>>> +
>>>>>> +         /* Initialize missing components of the result. */
>>>>>> +         return emit_pad(bld, tmp, get_bit_widths(format));
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Store a vector in a surface of the given format and 
>>>>>> dimensionality at
>>>>>> +       * the given coordinates.
>>>>>> +       */
>>>>>> +      void
>>>>>> +      emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>>> +                       const fs_reg &addr, const fs_reg &src,
>>>>>> +                       mesa_format format, unsigned dims)
>>>>>> +      {
>>>>>> +         using namespace image_format_info;
>>>>>> +         using namespace image_format_conversion;
>>>>>> +         using namespace image_validity;
>>>>>> +         using namespace image_coordinates;
>>>>>> +         using namespace surface_access;
>>>>>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>>>>>> +
>>>>>> +         if (format == MESA_FORMAT_NONE) {
>>>>>> +            /* We don't know what the format is, but that's fine 
>>>>>> because it
>>>>>> +             * implies write-only access, and typed surface writes are 
>>>>>> always
>>>>>> +             * able to take care of type conversion and packing for us.
>>>>>> +             */
>>>>>> +            emit_typed_write(bld, image, addr, src, dims, 4);
>>>>>> +
>>>>>> +         } else {
>>>>>> +            const mesa_format lower_format =
>>>>>> +               brw_lower_mesa_image_format(devinfo, format);
>>>>>> +            fs_reg tmp = src;
>>>>>> +
>>>>>> +            if (!is_conversion_trivial(devinfo, format)) {
>>>>>> +               /* Do the right sort of type conversion. */
>>>>>> +               if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>>> +                  tmp = emit_convert_to_float(bld, tmp, 
>>>>>> get_bit_widths(format));
>>>>>> +
>>>>>> +               else if (_mesa_is_format_integer(format))
>>>>>> +                  tmp = emit_convert_to_integer(bld, tmp, 
>>>>>> get_bit_widths(format),
>>>>>> +                                                
>>>>>> _mesa_is_format_signed(format));
>>>>>> +
>>>>>> +               else
>>>>>> +                  tmp = emit_convert_to_scaled(bld, tmp, 
>>>>>> get_bit_widths(format),
>>>>>> +                                               
>>>>>> _mesa_is_format_signed(format));
>>>>>> +            }
>>>>>> +
>>>>>> +            /* We're down to bit manipulation at this point. */
>>>>>> +            tmp = retype(tmp, BRW_REGISTER_TYPE_UD);
>>>>>> +
>>>>>> +            if (!has_supported_bit_layout(devinfo, format)) {
>>>>>> +               /* Pack the vector components into a bitfield if the 
>>>>>> hardware
>>>>>> +                * is unable to do it for us.
>>>>>> +                */
>>>>>> +               if (has_split_bit_layout(devinfo, format))
>>>>>> +                  tmp = emit_unpack(bld, tmp, 
>>>>>> get_bit_shifts(lower_format),
>>>>>> +                                    get_bit_widths(lower_format));
>>>>>> +
>>>>>> +               else
>>>>>> +                  tmp = emit_pack(bld, tmp, get_bit_shifts(format),
>>>>>> +                                  get_bit_widths(format));
>>>>>> +            }
>>>>>> +
>>>>>> +            if (has_matching_typed_format(devinfo, format)) {
>>>>>> +               /* Hopefully we get here most of the time... */
>>>>>> +               emit_typed_write(bld, image, addr, tmp, dims,
>>>>>> +                                
>>>>>> _mesa_format_num_components(lower_format));
>>>>>> +
>>>>>> +            } else {
>>>>>> +               /* Untyped surface writes store 32 bits of the surface 
>>>>>> per
>>>>>> +                * component, without any sort of packing or type 
>>>>>> conversion,
>>>>>> +                */
>>>>>> +               const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>>> +
>>>>>> +               /* they don't properly handle out of bounds access, so 
>>>>>> we have
>>>>>> +                * to check manually if the coordinates are valid and 
>>>>>> predicate
>>>>>> +                * the surface write on the result,
>>>>>> +                */
>>>>>> +               const brw_predicate pred =
>>>>>> +                  emit_bounds_check(bld, image, addr, dims);
>>>>>> +
>>>>>> +               /* and, phew, they don't know about surface coordinates, 
>>>>>> we
>>>>>> +                * need to convert them to a raw memory offset.
>>>>>> +                */
>>>>>> +               const fs_reg laddr = emit_address_calculation(bld, 
>>>>>> image, addr, dims);
>>>>>> +
>>>>>> +               emit_untyped_write(bld, image, laddr, tmp, 1, size, 
>>>>>> pred);
>>>>>> +            }
>>>>>> +         }
>>>>>> +      }
>>>>>> +
>>>>>> +      /**
>>>>>> +       * Perform an atomic read-modify-write operation in a surface of 
>>>>>> the
>>>>>> +       * given dimensionality at the given coordinates.  Main building 
>>>>>> block
>>>>>> +       * of the imageAtomic GLSL built-ins.
>>>>>> +       */
>>>>>> +      fs_reg
>>>>>> +      emit_image_atomic(const fs_builder &bld,
>>>>>> +                        const fs_reg &image, const fs_reg &addr,
>>>>>> +                        const fs_reg &src0, const fs_reg &src1,
>>>>>> +                        unsigned dims, unsigned rsize, unsigned op)
>>>>>> +      {
>>>>>> +         using namespace image_validity;
>>>>>> +         using namespace image_coordinates;
>>>>>> +         using namespace surface_access;
>>>>>> +         /* Avoid performing an atomic operation on an unbound surface. 
>>>>>> */
>>>>>> +         const brw_predicate pred = emit_surface_check(bld, image);
>>>>>> +
>>>>>> +         /* Thankfully we can do without untyped atomics here. */
>>>>>> +         const fs_reg tmp = emit_typed_atomic(bld, image, addr, src0, 
>>>>>> src1,
>>>>>> +                                              dims, rsize, op, pred);
>>>>>> +
>>>>>> +         /* An unbound surface access should give zero as result. */
>>>>>> +         if (rsize)
>>>>>> +            set_predicate(pred, bld.SEL(tmp, tmp, fs_reg(0)));
>>>>>> +
>>>>>> +         return tmp;
>>>>>> +      }
>>>>>> +   }
>>>>>> +}
>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h 
>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>> index 5424be6..cad8c27 100644
>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>> @@ -213,5 +213,22 @@ namespace brw {
>>>>>>                   _mesa_get_format_datatype(format) == GL_INT);
>>>>>>        }
>>>>>>     }
>>>>>> +
>>>>>> +   namespace image_access {
>>>>>> +      fs_reg
>>>>>> +      emit_image_load(const fs_builder &bld,
>>>>>> +                      const fs_reg &image, const fs_reg &addr,
>>>>>> +                      mesa_format format, unsigned dims);
>>>>>> +
>>>>>> +      void
>>>>>> +      emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>>> +                       const fs_reg &addr, const fs_reg &src,
>>>>>> +                       mesa_format format, unsigned dims);
>>>>>> +      fs_reg
>>>>>> +      emit_image_atomic(const fs_builder &bld,
>>>>>> +                        const fs_reg &image, const fs_reg &addr,
>>>>>> +                        const fs_reg &src0, const fs_reg &src1,
>>>>>> +                        unsigned dims, unsigned rsize, unsigned op);
>>>>>> +   }
>>>>>>  }
>>>>>>  #endif
>>>>>> --
>>>>>> 2.4.3
>>>>>>
>>>>>> _______________________________________________
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev@lists.freedesktop.org
>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.

Reply via email to