fs: Implement image load, store and atomic.

Jason Ekstrand Fri, 31 Jul 2015 07:26:01 -0700

On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez <curroje...@riseup.net> wrote:
> Jason Ekstrand <ja...@jlekstrand.net> writes:
>
>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez <curroje...@riseup.net> 
>> wrote:
>>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>>
>>>> This all looks correct as far as I can tell.  However, I'm very
>>>> concerned about the number of checks such as
>>>> has_matching_typed_format() that are built-in to the compiler (via
>>>> surface_builder) where we then go on to do something that is highly
>>>> dependant on state setup doing the exact same check (not via
>>>> surface_builder) and doing the right thing.  Another example would be
>>>> the interplay with has_split_bit_layout.  How do we know these won't
>>>> get out of sync?
>>>>
>>> They cannot get out of sync because the policy which surface format to
>>> use for a given GLSL format is shared between the compiler and the
>>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement
>>> surface state set-up for shader images.".
>>
>> I came back to this today and looked at it again.  I think my main
>> objection is that all of these metadata functions work in terms of a
>> devinfo and a format and then immediately call
>> brw_lower_mesa_image_format.  Why don't they work instead in terms
>> comparing the lowered format to the nominal format?  Then it would be
>> a lot more obvious that you're doing a transformation from one format
>> to another.  As is, it's a bunch of magic "what do I have to do next"
>> queries
>
> I don't think any of these functions implies what to do next, they all
> encode clearly defined device-specific properties of the formats:
>
>  - Does the device support any format with the same bit layout?
>    (has_supported_bit_layout())
>
>  - Does the device support any format with the same bit layout *and*
>    encoding? (is_conversion_trivial())
>
>  - Does the device support a typed format of the same size?
>    (has_matching_typed_format())
>
>  - Does the device split the pixel data into a number of discontiguous
>    segments? (has_split_bit_layout())
>
>  - Does the device return garbage in the unused bits when reading from
>    the given format? (has_undefined_high_bits())
>
>  - Do all fields of the format have the same size? (This is not even
>    device-specific)
>
>  - Is the format represented as a signed integer in memory? (Ditto)
>
> You may have noticed that except for the last two the natural subject
> and object of all questions are "device" and "format" respectively --
> I'm not sure I see how rephrasing them to involve the lowered format
> (that in fact a number of them don't care about) would make any of these
> questions easier to understand.


I *understand* the questions being asked.  My claim is this is the
*wrong* set of questions.  By the time you get to this stage, someone
higher up has made the decision that they're not going to give you
format A but will rather give you format B which they know will also
work.  (This decision is made by code that understands the hardware
limitations and you're guaranteed that B is a format you can read and
transform into A.)  The fact that that decision is passed down through
brw_lower_mesa_image_format is secondary.  At this stage, you have two
formats, A and B and need to figure out how to get from one to the
other.

For example, instead of "Does the device support a typed format of the
same size?" the better question would be "Are these two formats the
same size?".  Instead of asking "Does the device support any format
with the same bit layout *and* encoding?" you should ask "Do these two
formats have the same bit layout *and* encoding?"

Does the difference I'm making make sense?  In the current code, it
looks like you're doing device queries and making decisions about how
to work around device limitations.  Hence my original comment about
things getting out of sync.  In reality, all of those decisions have
already been made by brw_lower_mesa_image_format, this code is just
doing what it needs to do to go from format A to format B.

>> which, while in the same namespace, are in a different file.
>
> I happen to have cleaned that up to be in the same file a short while
> ago -- They're no longer used by anything else outside of
> brw_fs_surface_builder.cpp:
> http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200
>
>> It would also be more efficient because you would only call
>> brw_lower_mesa_image_format once.
>
> A more elegant way to "fix" that would be to mark
> brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it
> would make any measurable difference in practice though.
>
>>
>> Does that make more sense?
>> --Jason
>>
>>> Thanks.
>>>
>>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <curroje...@riseup.net> 
>>>> wrote:
>>>>> v2: Drop VEC4 suport.
>>>>> v3: Rebase.
>>>>> ---
>>>>>  .../drivers/dri/i965/brw_fs_surface_builder.cpp    | 216 
>>>>> +++++++++++++++++++++
>>>>>  src/mesa/drivers/dri/i965/brw_fs_surface_builder.h |  17 ++
>>>>>  2 files changed, 233 insertions(+)
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp 
>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>> index ea1c4aa..46b449f 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp
>>>>> @@ -587,3 +587,219 @@ namespace {
>>>>>        }
>>>>>     }
>>>>>  }
>>>>> +
>>>>> +namespace brw {
>>>>> +   namespace image_access {
>>>>> +      /**
>>>>> +       * Load a vector from a surface of the given format and 
>>>>> dimensionality
>>>>> +       * at the given coordinates.
>>>>> +       */
>>>>> +      fs_reg
>>>>> +      emit_image_load(const fs_builder &bld,
>>>>> +                      const fs_reg &image, const fs_reg &addr,
>>>>> +                      mesa_format format, unsigned dims)
>>>>> +      {
>>>>> +         using namespace image_format_info;
>>>>> +         using namespace image_format_conversion;
>>>>> +         using namespace image_validity;
>>>>> +         using namespace image_coordinates;
>>>>> +         using namespace surface_access;
>>>>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>>>>> +         const mesa_format lower_format =
>>>>> +            brw_lower_mesa_image_format(devinfo, format);
>>>>> +         fs_reg tmp;
>>>>> +
>>>>> +         if (has_matching_typed_format(devinfo, format)) {
>>>>> +            /* Hopefully we get here most of the time... */
>>>>> +            tmp = emit_typed_read(bld, image, addr, dims,
>>>>> +                                  
>>>>> _mesa_format_num_components(lower_format));
>>>>> +         } else {
>>>>> +            /* Untyped surface reads return 32 bits of the surface per
>>>>> +             * component, without any sort of unpacking or type 
>>>>> conversion,
>>>>> +             */
>>>>> +            const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>> +
>>>>> +            /* they don't properly handle out of bounds access, so we 
>>>>> have to
>>>>> +             * check manually if the coordinates are valid and predicate 
>>>>> the
>>>>> +             * surface read on the result,
>>>>> +             */
>>>>> +            const brw_predicate pred =
>>>>> +               emit_bounds_check(bld, image, addr, dims);
>>>>> +
>>>>> +            /* and they don't know about surface coordinates, we need to
>>>>> +             * convert them to a raw memory offset.
>>>>> +             */
>>>>> +            const fs_reg laddr = emit_address_calculation(bld, image, 
>>>>> addr, dims);
>>>>> +
>>>>> +            tmp = emit_untyped_read(bld, image, laddr, 1, size, pred);
>>>>> +
>>>>> +            /* An out of bounds surface access should give zero as 
>>>>> result. */
>>>>> +            for (unsigned c = 0; c < 4; ++c)
>>>>> +               set_predicate(pred, bld.SEL(offset(tmp, bld, c),
>>>>> +                                           offset(tmp, bld, c), 
>>>>> fs_reg(0)));
>>>>> +         }
>>>>> +
>>>>> +         /* Set the register type to D instead of UD if the data type is
>>>>> +          * represented as a signed integer in memory so that sign 
>>>>> extension
>>>>> +          * is handled correctly by unpack.
>>>>> +          */
>>>>> +         if (needs_sign_extension(format))
>>>>> +            tmp = retype(tmp, BRW_REGISTER_TYPE_D);
>>>>> +
>>>>> +         if (!has_supported_bit_layout(devinfo, format)) {
>>>>> +            /* Unpack individual vector components from the bitfield if 
>>>>> the
>>>>> +             * hardware is unable to do it for us.
>>>>> +             */
>>>>> +            if (has_split_bit_layout(devinfo, format))
>>>>> +               tmp = emit_pack(bld, tmp, get_bit_shifts(lower_format),
>>>>> +                               get_bit_widths(lower_format));
>>>>> +            else
>>>>> +               tmp = emit_unpack(bld, tmp, get_bit_shifts(format),
>>>>> +                                 get_bit_widths(format));
>>>>> +
>>>>> +         } else if ((needs_sign_extension(format) &&
>>>>> +                     !is_conversion_trivial(devinfo, format)) ||
>>>>> +                    has_undefined_high_bits(devinfo, format)) {
>>>>> +            /* Perform a trivial unpack even though the bit layout 
>>>>> matches in
>>>>> +             * order to get the most significant bits of each component
>>>>> +             * initialized properly.
>>>>> +             */
>>>>> +            tmp = emit_unpack(bld, tmp, color_u(0, 32, 64, 96),
>>>>> +                              get_bit_widths(format));
>>>>> +         }
>>>>> +
>>>>> +         if (!_mesa_is_format_integer(format)) {
>>>>> +            if (is_conversion_trivial(devinfo, format)) {
>>>>> +               /* Just need to cast the vector to the target type. */
>>>>> +               tmp = retype(tmp, BRW_REGISTER_TYPE_F);
>>>>> +            } else {
>>>>> +               /* Do the right sort of type conversion to float. */
>>>>> +               if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>> +                  tmp = emit_convert_from_float(
>>>>> +                     bld, tmp, get_bit_widths(format));
>>>>> +               else
>>>>> +                  tmp = emit_convert_from_scaled(
>>>>> +                     bld, tmp, get_bit_widths(format),
>>>>> +                     _mesa_is_format_signed(format));
>>>>> +            }
>>>>> +         }
>>>>> +
>>>>> +         /* Initialize missing components of the result. */
>>>>> +         return emit_pad(bld, tmp, get_bit_widths(format));
>>>>> +      }
>>>>> +
>>>>> +      /**
>>>>> +       * Store a vector in a surface of the given format and 
>>>>> dimensionality at
>>>>> +       * the given coordinates.
>>>>> +       */
>>>>> +      void
>>>>> +      emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>> +                       const fs_reg &addr, const fs_reg &src,
>>>>> +                       mesa_format format, unsigned dims)
>>>>> +      {
>>>>> +         using namespace image_format_info;
>>>>> +         using namespace image_format_conversion;
>>>>> +         using namespace image_validity;
>>>>> +         using namespace image_coordinates;
>>>>> +         using namespace surface_access;
>>>>> +         const brw_device_info *devinfo = bld.shader->devinfo;
>>>>> +
>>>>> +         if (format == MESA_FORMAT_NONE) {
>>>>> +            /* We don't know what the format is, but that's fine because 
>>>>> it
>>>>> +             * implies write-only access, and typed surface writes are 
>>>>> always
>>>>> +             * able to take care of type conversion and packing for us.
>>>>> +             */
>>>>> +            emit_typed_write(bld, image, addr, src, dims, 4);
>>>>> +
>>>>> +         } else {
>>>>> +            const mesa_format lower_format =
>>>>> +               brw_lower_mesa_image_format(devinfo, format);
>>>>> +            fs_reg tmp = src;
>>>>> +
>>>>> +            if (!is_conversion_trivial(devinfo, format)) {
>>>>> +               /* Do the right sort of type conversion. */
>>>>> +               if (_mesa_get_format_datatype(format) == GL_FLOAT)
>>>>> +                  tmp = emit_convert_to_float(bld, tmp, 
>>>>> get_bit_widths(format));
>>>>> +
>>>>> +               else if (_mesa_is_format_integer(format))
>>>>> +                  tmp = emit_convert_to_integer(bld, tmp, 
>>>>> get_bit_widths(format),
>>>>> +                                                
>>>>> _mesa_is_format_signed(format));
>>>>> +
>>>>> +               else
>>>>> +                  tmp = emit_convert_to_scaled(bld, tmp, 
>>>>> get_bit_widths(format),
>>>>> +                                               
>>>>> _mesa_is_format_signed(format));
>>>>> +            }
>>>>> +
>>>>> +            /* We're down to bit manipulation at this point. */
>>>>> +            tmp = retype(tmp, BRW_REGISTER_TYPE_UD);
>>>>> +
>>>>> +            if (!has_supported_bit_layout(devinfo, format)) {
>>>>> +               /* Pack the vector components into a bitfield if the 
>>>>> hardware
>>>>> +                * is unable to do it for us.
>>>>> +                */
>>>>> +               if (has_split_bit_layout(devinfo, format))
>>>>> +                  tmp = emit_unpack(bld, tmp, 
>>>>> get_bit_shifts(lower_format),
>>>>> +                                    get_bit_widths(lower_format));
>>>>> +
>>>>> +               else
>>>>> +                  tmp = emit_pack(bld, tmp, get_bit_shifts(format),
>>>>> +                                  get_bit_widths(format));
>>>>> +            }
>>>>> +
>>>>> +            if (has_matching_typed_format(devinfo, format)) {
>>>>> +               /* Hopefully we get here most of the time... */
>>>>> +               emit_typed_write(bld, image, addr, tmp, dims,
>>>>> +                                
>>>>> _mesa_format_num_components(lower_format));
>>>>> +
>>>>> +            } else {
>>>>> +               /* Untyped surface writes store 32 bits of the surface per
>>>>> +                * component, without any sort of packing or type 
>>>>> conversion,
>>>>> +                */
>>>>> +               const unsigned size = _mesa_get_format_bytes(format) / 4;
>>>>> +
>>>>> +               /* they don't properly handle out of bounds access, so we 
>>>>> have
>>>>> +                * to check manually if the coordinates are valid and 
>>>>> predicate
>>>>> +                * the surface write on the result,
>>>>> +                */
>>>>> +               const brw_predicate pred =
>>>>> +                  emit_bounds_check(bld, image, addr, dims);
>>>>> +
>>>>> +               /* and, phew, they don't know about surface coordinates, 
>>>>> we
>>>>> +                * need to convert them to a raw memory offset.
>>>>> +                */
>>>>> +               const fs_reg laddr = emit_address_calculation(bld, image, 
>>>>> addr, dims);
>>>>> +
>>>>> +               emit_untyped_write(bld, image, laddr, tmp, 1, size, pred);
>>>>> +            }
>>>>> +         }
>>>>> +      }
>>>>> +
>>>>> +      /**
>>>>> +       * Perform an atomic read-modify-write operation in a surface of 
>>>>> the
>>>>> +       * given dimensionality at the given coordinates.  Main building 
>>>>> block
>>>>> +       * of the imageAtomic GLSL built-ins.
>>>>> +       */
>>>>> +      fs_reg
>>>>> +      emit_image_atomic(const fs_builder &bld,
>>>>> +                        const fs_reg &image, const fs_reg &addr,
>>>>> +                        const fs_reg &src0, const fs_reg &src1,
>>>>> +                        unsigned dims, unsigned rsize, unsigned op)
>>>>> +      {
>>>>> +         using namespace image_validity;
>>>>> +         using namespace image_coordinates;
>>>>> +         using namespace surface_access;
>>>>> +         /* Avoid performing an atomic operation on an unbound surface. 
>>>>> */
>>>>> +         const brw_predicate pred = emit_surface_check(bld, image);
>>>>> +
>>>>> +         /* Thankfully we can do without untyped atomics here. */
>>>>> +         const fs_reg tmp = emit_typed_atomic(bld, image, addr, src0, 
>>>>> src1,
>>>>> +                                              dims, rsize, op, pred);
>>>>> +
>>>>> +         /* An unbound surface access should give zero as result. */
>>>>> +         if (rsize)
>>>>> +            set_predicate(pred, bld.SEL(tmp, tmp, fs_reg(0)));
>>>>> +
>>>>> +         return tmp;
>>>>> +      }
>>>>> +   }
>>>>> +}
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h 
>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>> index 5424be6..cad8c27 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h
>>>>> @@ -213,5 +213,22 @@ namespace brw {
>>>>>                   _mesa_get_format_datatype(format) == GL_INT);
>>>>>        }
>>>>>     }
>>>>> +
>>>>> +   namespace image_access {
>>>>> +      fs_reg
>>>>> +      emit_image_load(const fs_builder &bld,
>>>>> +                      const fs_reg &image, const fs_reg &addr,
>>>>> +                      mesa_format format, unsigned dims);
>>>>> +
>>>>> +      void
>>>>> +      emit_image_store(const fs_builder &bld, const fs_reg &image,
>>>>> +                       const fs_reg &addr, const fs_reg &src,
>>>>> +                       mesa_format format, unsigned dims);
>>>>> +      fs_reg
>>>>> +      emit_image_atomic(const fs_builder &bld,
>>>>> +                        const fs_reg &image, const fs_reg &addr,
>>>>> +                        const fs_reg &src0, const fs_reg &src1,
>>>>> +                        unsigned dims, unsigned rsize, unsigned op);
>>>>> +   }
>>>>>  }
>>>>>  #endif
>>>>> --
>>>>> 2.4.3
>>>>>
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev@lists.freedesktop.org
>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 10/20] i965/fs: Implement image load, store and atomic.

Reply via email to