fs: Implement image load, store and atomic.

Jason Ekstrand Fri, 07 Aug 2015 15:12:07 -0700

On Fri, Jul 31, 2015 at 9:58 AM, Francisco Jerez <curroje...@riseup.net> wrote:
> Jason Ekstrand <ja...@jlekstrand.net> writes:
>
>> On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez <curroje...@riseup.net> 
>> wrote:
>>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>>
>>>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez <curroje...@riseup.net> 
>>>> wrote:
>>>>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>>>>
>>>>>> This all looks correct as far as I can tell.  However, I'm very
>>>>>> concerned about the number of checks such as
>>>>>> has_matching_typed_format() that are built-in to the compiler (via
>>>>>> surface_builder) where we then go on to do something that is highly
>>>>>> dependant on state setup doing the exact same check (not via
>>>>>> surface_builder) and doing the right thing.  Another example would be
>>>>>> the interplay with has_split_bit_layout.  How do we know these won't
>>>>>> get out of sync?
>>>>>>
>>>>> They cannot get out of sync because the policy which surface format to
>>>>> use for a given GLSL format is shared between the compiler and the
>>>>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
>>>>> surface state set-up for shader images.".
>>>>
>>>> I came back to this today and looked at it again.  I think my main
>>>> objection is that all of these metadata functions work in terms of a
>>>> devinfo and a format and then immediately call
>>>> brw_lower_mesa_image_format.  Why don't they work instead in terms
>>>> comparing the lowered format to the nominal format?  Then it would be
>>>> a lot more obvious that you're doing a transformation from one format
>>>> to another.  As is, it's a bunch of magic "what do I have to do next"
>>>> queries
>>>
>>> I don't think any of these functions implies what to do next, they all
>>> encode clearly defined device-specific properties of the formats:
>>>
>>>  - Does the device support any format with the same bit layout?
>>>    (has_supported_bit_layout())
>>>
>>>  - Does the device support any format with the same bit layout *and*
>>>    encoding? (is_conversion_trivial())
>>>
>>>  - Does the device support a typed format of the same size?
>>>    (has_matching_typed_format())
>>>
>>>  - Does the device split the pixel data into a number of discontiguous
>>>    segments? (has_split_bit_layout())
>>>
>>>  - Does the device return garbage in the unused bits when reading from
>>>    the given format? (has_undefined_high_bits())
>>>
>>>  - Do all fields of the format have the same size? (This is not even
>>>    device-specific)
>>>
>>>  - Is the format represented as a signed integer in memory? (Ditto)
>>>
>>> You may have noticed that except for the last two the natural subject
>>> and object of all questions are "device" and "format" respectively --
>>> I'm not sure I see how rephrasing them to involve the lowered format
>>> (that in fact a number of them don't care about) would make any of these
>>> questions easier to understand.
>>
>> I *understand* the questions being asked.  My claim is this is the
>> *wrong* set of questions.  By the time you get to this stage, someone
>> higher up has made the decision that they're not going to give you
>> format A but will rather give you format B which they know will also
>> work.  (This decision is made by code that understands the hardware
>> limitations and you're guaranteed that B is a format you can read and
>> transform into A.)  The fact that that decision is passed down through
>> brw_lower_mesa_image_format is secondary.  At this stage, you have two
>> formats, A and B and need to figure out how to get from one to the
>> other.
>>
>> For example, instead of "Does the device support a typed format of the
>> same size?" the better question would be "Are these two formats the
>> same size?".
>
> Those two questions are not equivalent, lowered formats are always the
> same size as the original, whether a format is supported for typed
> surface access or not is inherently devinfo-specific, the devinfo
> argument couldn't be replaced with a lower_format argument.  The same
> goes for "Does the device return garbage in the unused bits when reading
> From the given format?".
>
>> Instead of asking "Does the device support any format with the same
>> bit layout *and* encoding?" you should ask "Do these two formats have
>> the same bit layout *and* encoding?"
>>
> *Shrug*, I think they're only wrong questions if you start from the
> assumption that the policy that computes the lowered format from a given
> image format is broken -- E.g. that it gives you a suboptimal format of
> different encoding even though the hardware supports an identically
> encoded one.  You can quickly rule out that assumption by looking up the
> definition of is_conversion_trivial() and seeing that it does little
> more than call brw_lower_mesa_image_format().
>
> I'm afraid that I was slightly misleading earlier:
> is_conversion_trivial() is really about the encoding and only has
> indirect bit layout implications.  The current implementation is based
> on the assumption that the lowered format we're converting from or to is
> just the canonical one -- I.e. it cannot be readily extended to check
> whether the conversion between any two arbitrary formats is trivial, so
> even though replacing the devinfo argument with a lower_format would be
> technically possible it would lead to an IMO a more disturbing problem.
> is_conversion_trivial(mesa_format, mesa_format) would be a partial
> function with ill-defined results for certain combinations of its
> arguments, and it wouldn't have any way to assert() that the pair is
> sensible without an additional devinfo pointer.  To make the matter
> worse the C++ type system cannot distinguish between a lowered and an
> unlowered format while it can distinguish between a devinfo and a
> format.  The current image_format_info queries are total functions of
> their arguments and they are all consistent about taking the original
> format *only*.  I'd like to preserve those properties.
>
>> Does the difference I'm making make sense?  In the current code, it
>> looks like you're doing device queries and making decisions about how
>> to work around device limitations.
>
> emit_image_load and store don't make any decisions themselves, and I'm
> not sure I fully understand what made you think they were -- The
> code-paths taken through them are driven by the image_format_info
> queries so I understand that you may have thought that *those* were the
> ones making decisions, or not, either way it's not an obstacle to
> convince oneself of the correctness of emit_image_load/store if you
> temporarily assume that the results returned by image_format_info are
> sensible and consistent with the canonical format lowering policy --
> Assumption you can quickly verify afterwards because their semantics are
> clearly defined and their result is usually a simple function of
> brw_lower_mesa_image_format().
>
> There's one case however in which I think it would be fine to replace
> the devinfo argument with a mesa_format one (although it would somewhat
> break the rule of only passing unlowered formats to the queries).  It
> may also be the one that contributed the most to your impression that
> emit_image_load and store were making decisions: We could change
> "has_supported_bit_layout(devinfo, format)" to
> "has_same_bit_layout(format, lower_format)".  has_same_bit_layout()
> would be total and commutative, so my objections above don't apply.  For
> the other ones I could mention more explicitly in the doxygen comments
> that the functions don't make any decisions independently and the result
> is based on the usual lowering policy for the given device.


I can see that this is another spot where we're having trouble seeing
eye-to-eye.  I am convinced that it is correct (and have was several
e-mails ago), but I still don't like the way it's structured.  I may
send a "clean-up" patch or two after things have landed and we can
argue about the structure then.  However, for the sake of getting
things merged,

Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com>

>> Hence my original comment about things getting out of sync.  In
>> reality, all of those decisions have already been made by
>> brw_lower_mesa_image_format, this code is just doing what it needs to
>> do to go from format A to format B.
>>
>>>> which, while in the same namespace, are in a different file.
>>>
>>> I happen to have cleaned that up to be in the same file a short while
>>> ago -- They're no longer used by anything else outside of
>>> brw_fs_surface_builder.cpp:
>>> http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200
>>>
>>>> It would also be more efficient because you would only call
>>>> brw_lower_mesa_image_format once.
>>>
>>> A more elegant way to "fix" that would be to mark
>>> brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it
>>> would make any measurable difference in practice though.
>>>
>>>>
>>>> Does that make more sense?
>>>> --Jason
>>>>
>>>>> Thanks.
>>>>>
>>>>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <curroje...@riseup.net> 
>>>>>> wrote:
>>>>>>> v2: Drop VEC4 suport.
>>>>>>> v3: Rebase.
>>>>>>> ---
>>>>>>>  .../drivers/dri/i965/brw_fs_surface_builder.cpp    | 216 
>>>>>>> +++++++++++++++++++++
>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_surface_builder.h |  17 ++
>>>>>>>  2 files changed, 233 insertions(+)
>>>>>>>
>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp 
>>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>>> index ea1c4aa..46b449f 100644
>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>>>> @@ -587,3 +587,219 @@ namespace {
>>>>>>>        }
>>>>>>>     }
>>>>>>>  }
>>>>>>> +
>>>>>>> +namespace brw {
>>>>>>> +   namespace image_access {
>>>>>>> +      /**
>>>>>>> +       * Load a vector from a surface of the given format and 
>>>>>>> dimensionality
>>>>>>> +       * at the given coordinates.
>>>>>>> +       */
>>>>>>> +      fs_reg
>>>>>>> +      emit_image_load(const fs_builder &bld,
>>>>>>> +                      const fs_reg &image, const fs_reg &addr,
>>>>>>> +                      mesa_format format, unsigned dims)
>>>>>>> +      {
>>>>>>> +         using namespace image_format_info;
>>>>>>> +         using namespace image_format_conversion;
>>>>>>> +         using namespace image_validity;
>>>>>>> +         using namespace image_coordinates;
>>>>>>> +         using namespace surface_access;
>>>>>>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>>>>>>> +         const mesa_format lower_format =
>>>>>>> +            brw_lower_mesa_image_format(devinfo, format);
>>>>>>> +         fs_reg tmp;
>>>>>>> +
>>>>>>> +         if (has_matching_typed_format(devinfo, format)) {
>>>>>>> +            /* Hopefully we get here most of the time... */
>>>>>>> +            tmp = emit_typed_read(bld, image, addr, dims,
>>>>>>> +                                  
>>>>>>> _mesa_format_num_components(lower_format));
>>>>>>> +         } else {
>>>>>>> +            /* Untyped surface reads return 32 bits of the surface per
>>>>>>> +             * component, without any sort of unpacking or type 
>>>>>>> conversion,
>>>>>>> +             */
>>>>>>> +            const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>>>> +
>>>>>>> +            /* they don't properly handle out of bounds access, so we 
>>>>>>> have to
>>>>>>> +             * check manually if the coordinates are valid and 
>>>>>>> predicate the
>>>>>>> +             * surface read on the result,
>>>>>>> +             */
>>>>>>> +            const brw_predicate pred =
>>>>>>> +               emit_bounds_check(bld, image, addr, dims);
>>>>>>> +
>>>>>>> +            /* and they don't know about surface coordinates, we need 
>>>>>>> to
>>>>>>> +             * convert them to a raw memory offset.
>>>>>>> +             */
>>>>>>> +            const fs_reg laddr = emit_address_calculation(bld, image, 
>>>>>>> addr, dims);
>>>>>>> +
>>>>>>> +            tmp = emit_untyped_read(bld, image, laddr, 1, size, pred);
>>>>>>> +
>>>>>>> +            /* An out of bounds surface access should give zero as 
>>>>>>> result. */
>>>>>>> +            for (unsigned c = 0; c < 4; ++c)
>>>>>>> +               set_predicate(pred, bld.SEL(offset(tmp, bld, c),
>>>>>>> +                                           offset(tmp, bld, c), 
>>>>>>> fs_reg(0)));
>>>>>>> +         }
>>>>>>> +
>>>>>>> +         /* Set the register type to D instead of UD if the data type 
>>>>>>> is
>>>>>>> +          * represented as a signed integer in memory so that sign 
>>>>>>> extension
>>>>>>> +          * is handled correctly by unpack.
>>>>>>> +          */
>>>>>>> +         if (needs_sign_extension(format))
>>>>>>> +            tmp = retype(tmp, BRW_REGISTER_TYPE_D);
>>>>>>> +
>>>>>>> +         if (!has_supported_bit_layout(devinfo, format)) {
>>>>>>> +            /* Unpack individual vector components from the bitfield 
>>>>>>> if the
>>>>>>> +             * hardware is unable to do it for us.
>>>>>>> +             */
>>>>>>> +            if (has_split_bit_layout(devinfo, format))
>>>>>>> +               tmp = emit_pack(bld, tmp, get_bit_shifts(lower_format),
>>>>>>> +                               get_bit_widths(lower_format));
>>>>>>> +            else
>>>>>>> +               tmp = emit_unpack(bld, tmp, get_bit_shifts(format),
>>>>>>> +                                 get_bit_widths(format));
>>>>>>> +
>>>>>>> +         } else if ((needs_sign_extension(format) &&
>>>>>>> +                     !is_conversion_trivial(devinfo, format)) ||
>>>>>>> +                    has_undefined_high_bits(devinfo, format)) {
>>>>>>> +            /* Perform a trivial unpack even though the bit layout 
>>>>>>> matches in
>>>>>>> +             * order to get the most significant bits of each component
>>>>>>> +             * initialized properly.
>>>>>>> +             */
>>>>>>> +            tmp = emit_unpack(bld, tmp, color_u(0, 32, 64, 96),
>>>>>>> +                              get_bit_widths(format));
>>>>>>> +         }
>>>>>>> +
>>>>>>> +         if (!_mesa_is_format_integer(format)) {
>>>>>>> +            if (is_conversion_trivial(devinfo, format)) {
>>>>>>> +               /* Just need to cast the vector to the target type. */
>>>>>>> +               tmp = retype(tmp, BRW_REGISTER_TYPE_F);
>>>>>>> +            } else {
>>>>>>> +               /* Do the right sort of type conversion to float. */
>>>>>>> +               if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>>>> +                  tmp = emit_convert_from_float(
>>>>>>> +                     bld, tmp, get_bit_widths(format));
>>>>>>> +               else
>>>>>>> +                  tmp = emit_convert_from_scaled(
>>>>>>> +                     bld, tmp, get_bit_widths(format),
>>>>>>> +                     _mesa_is_format_signed(format));
>>>>>>> +            }
>>>>>>> +         }
>>>>>>> +
>>>>>>> +         /* Initialize missing components of the result. */
>>>>>>> +         return emit_pad(bld, tmp, get_bit_widths(format));
>>>>>>> +      }
>>>>>>> +
>>>>>>> +      /**
>>>>>>> +       * Store a vector in a surface of the given format and 
>>>>>>> dimensionality at
>>>>>>> +       * the given coordinates.
>>>>>>> +       */
>>>>>>> +      void
>>>>>>> +      emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>>>> +                       const fs_reg &addr, const fs_reg &src,
>>>>>>> +                       mesa_format format, unsigned dims)
>>>>>>> +      {
>>>>>>> +         using namespace image_format_info;
>>>>>>> +         using namespace image_format_conversion;
>>>>>>> +         using namespace image_validity;
>>>>>>> +         using namespace image_coordinates;
>>>>>>> +         using namespace surface_access;
>>>>>>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>>>>>>> +
>>>>>>> +         if (format == MESA_FORMAT_NONE) {
>>>>>>> +            /* We don't know what the format is, but that's fine 
>>>>>>> because it
>>>>>>> +             * implies write-only access, and typed surface writes are 
>>>>>>> always
>>>>>>> +             * able to take care of type conversion and packing for us.
>>>>>>> +             */
>>>>>>> +            emit_typed_write(bld, image, addr, src, dims, 4);
>>>>>>> +
>>>>>>> +         } else {
>>>>>>> +            const mesa_format lower_format =
>>>>>>> +               brw_lower_mesa_image_format(devinfo, format);
>>>>>>> +            fs_reg tmp = src;
>>>>>>> +
>>>>>>> +            if (!is_conversion_trivial(devinfo, format)) {
>>>>>>> +               /* Do the right sort of type conversion. */
>>>>>>> +               if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>>>> +                  tmp = emit_convert_to_float(bld, tmp, 
>>>>>>> get_bit_widths(format));
>>>>>>> +
>>>>>>> +               else if (_mesa_is_format_integer(format))
>>>>>>> +                  tmp = emit_convert_to_integer(bld, tmp, 
>>>>>>> get_bit_widths(format),
>>>>>>> +                                                
>>>>>>> _mesa_is_format_signed(format));
>>>>>>> +
>>>>>>> +               else
>>>>>>> +                  tmp = emit_convert_to_scaled(bld, tmp, 
>>>>>>> get_bit_widths(format),
>>>>>>> +                                               
>>>>>>> _mesa_is_format_signed(format));
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            /* We're down to bit manipulation at this point. */
>>>>>>> +            tmp = retype(tmp, BRW_REGISTER_TYPE_UD);
>>>>>>> +
>>>>>>> +            if (!has_supported_bit_layout(devinfo, format)) {
>>>>>>> +               /* Pack the vector components into a bitfield if the 
>>>>>>> hardware
>>>>>>> +                * is unable to do it for us.
>>>>>>> +                */
>>>>>>> +               if (has_split_bit_layout(devinfo, format))
>>>>>>> +                  tmp = emit_unpack(bld, tmp, 
>>>>>>> get_bit_shifts(lower_format),
>>>>>>> +                                    get_bit_widths(lower_format));
>>>>>>> +
>>>>>>> +               else
>>>>>>> +                  tmp = emit_pack(bld, tmp, get_bit_shifts(format),
>>>>>>> +                                  get_bit_widths(format));
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            if (has_matching_typed_format(devinfo, format)) {
>>>>>>> +               /* Hopefully we get here most of the time... */
>>>>>>> +               emit_typed_write(bld, image, addr, tmp, dims,
>>>>>>> +                                
>>>>>>> _mesa_format_num_components(lower_format));
>>>>>>> +
>>>>>>> +            } else {
>>>>>>> +               /* Untyped surface writes store 32 bits of the surface 
>>>>>>> per
>>>>>>> +                * component, without any sort of packing or type 
>>>>>>> conversion,
>>>>>>> +                */
>>>>>>> +               const unsigned size = _mesa_get_format_bytes(format) / 
>>>>>>> 4;
>>>>>>> +
>>>>>>> +               /* they don't properly handle out of bounds access, so 
>>>>>>> we have
>>>>>>> +                * to check manually if the coordinates are valid and 
>>>>>>> predicate
>>>>>>> +                * the surface write on the result,
>>>>>>> +                */
>>>>>>> +               const brw_predicate pred =
>>>>>>> +                  emit_bounds_check(bld, image, addr, dims);
>>>>>>> +
>>>>>>> +               /* and, phew, they don't know about surface 
>>>>>>> coordinates, we
>>>>>>> +                * need to convert them to a raw memory offset.
>>>>>>> +                */
>>>>>>> +               const fs_reg laddr = emit_address_calculation(bld, 
>>>>>>> image, addr, dims);
>>>>>>> +
>>>>>>> +               emit_untyped_write(bld, image, laddr, tmp, 1, size, 
>>>>>>> pred);
>>>>>>> +            }
>>>>>>> +         }
>>>>>>> +      }
>>>>>>> +
>>>>>>> +      /**
>>>>>>> +       * Perform an atomic read-modify-write operation in a surface of 
>>>>>>> the
>>>>>>> +       * given dimensionality at the given coordinates.  Main building 
>>>>>>> block
>>>>>>> +       * of the imageAtomic GLSL built-ins.
>>>>>>> +       */
>>>>>>> +      fs_reg
>>>>>>> +      emit_image_atomic(const fs_builder &bld,
>>>>>>> +                        const fs_reg &image, const fs_reg &addr,
>>>>>>> +                        const fs_reg &src0, const fs_reg &src1,
>>>>>>> +                        unsigned dims, unsigned rsize, unsigned op)
>>>>>>> +      {
>>>>>>> +         using namespace image_validity;
>>>>>>> +         using namespace image_coordinates;
>>>>>>> +         using namespace surface_access;
>>>>>>> +         /* Avoid performing an atomic operation on an unbound 
>>>>>>> surface. */
>>>>>>> +         const brw_predicate pred = emit_surface_check(bld, image);
>>>>>>> +
>>>>>>> +         /* Thankfully we can do without untyped atomics here. */
>>>>>>> +         const fs_reg tmp = emit_typed_atomic(bld, image, addr, src0, 
>>>>>>> src1,
>>>>>>> +                                              dims, rsize, op, pred);
>>>>>>> +
>>>>>>> +         /* An unbound surface access should give zero as result. */
>>>>>>> +         if (rsize)
>>>>>>> +            set_predicate(pred, bld.SEL(tmp, tmp, fs_reg(0)));
>>>>>>> +
>>>>>>> +         return tmp;
>>>>>>> +      }
>>>>>>> +   }
>>>>>>> +}
>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h 
>>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>>> index 5424be6..cad8c27 100644
>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>>>> @@ -213,5 +213,22 @@ namespace brw {
>>>>>>>                   _mesa_get_format_datatype(format) == GL_INT);
>>>>>>>        }
>>>>>>>     }
>>>>>>> +
>>>>>>> +   namespace image_access {
>>>>>>> +      fs_reg
>>>>>>> +      emit_image_load(const fs_builder &bld,
>>>>>>> +                      const fs_reg &image, const fs_reg &addr,
>>>>>>> +                      mesa_format format, unsigned dims);
>>>>>>> +
>>>>>>> +      void
>>>>>>> +      emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>>>> +                       const fs_reg &addr, const fs_reg &src,
>>>>>>> +                       mesa_format format, unsigned dims);
>>>>>>> +      fs_reg
>>>>>>> +      emit_image_atomic(const fs_builder &bld,
>>>>>>> +                        const fs_reg &image, const fs_reg &addr,
>>>>>>> +                        const fs_reg &src0, const fs_reg &src1,
>>>>>>> +                        unsigned dims, unsigned rsize, unsigned op);
>>>>>>> +   }
>>>>>>>  }
>>>>>>>  #endif
>>>>>>> --
>>>>>>> 2.4.3
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mesa-dev mailing list
>>>>>>> mesa-dev@lists.freedesktop.org
>>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.

Reply via email to