On Fri, Jul 31, 2015 at 9:58 AM, Francisco Jerez <curroje...@riseup.net> wrote: > Jason Ekstrand <ja...@jlekstrand.net> writes: > >> On Fri, Jul 31, 2015 at 6:15 AM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> Jason Ekstrand <ja...@jlekstrand.net> writes: >>> >>>> On Thu, Jul 23, 2015 at 4:38 AM, Francisco Jerez <curroje...@riseup.net> >>>> wrote: >>>>> Jason Ekstrand <ja...@jlekstrand.net> writes: >>>>> >>>>>> This all looks correct as far as I can tell. However, I'm very >>>>>> concerned about the number of checks such as >>>>>> has_matching_typed_format() that are built-in to the compiler (via >>>>>> surface_builder) where we then go on to do something that is highly >>>>>> dependant on state setup doing the exact same check (not via >>>>>> surface_builder) and doing the right thing. Another example would be >>>>>> the interplay with has_split_bit_layout. How do we know these won't >>>>>> get out of sync? >>>>>> >>>>> They cannot get out of sync because the policy which surface format to >>>>> use for a given GLSL format is shared between the compiler and the >>>>> state-setup code, see brw_lower_mesa_image_format() in "i965: Implement >>>>> surface state set-up for shader images.". >>>> >>>> I came back to this today and looked at it again. I think my main >>>> objection is that all of these metadata functions work in terms of a >>>> devinfo and a format and then immediately call >>>> brw_lower_mesa_image_format. Why don't they work instead in terms >>>> comparing the lowered format to the nominal format? Then it would be >>>> a lot more obvious that you're doing a transformation from one format >>>> to another. As is, it's a bunch of magic "what do I have to do next" >>>> queries >>> >>> I don't think any of these functions implies what to do next, they all >>> encode clearly defined device-specific properties of the formats: >>> >>> - Does the device support any format with the same bit layout? >>> (has_supported_bit_layout()) >>> >>> - Does the device support any format with the same bit layout *and* >>> encoding? (is_conversion_trivial()) >>> >>> - Does the device support a typed format of the same size? >>> (has_matching_typed_format()) >>> >>> - Does the device split the pixel data into a number of discontiguous >>> segments? (has_split_bit_layout()) >>> >>> - Does the device return garbage in the unused bits when reading from >>> the given format? (has_undefined_high_bits()) >>> >>> - Do all fields of the format have the same size? (This is not even >>> device-specific) >>> >>> - Is the format represented as a signed integer in memory? (Ditto) >>> >>> You may have noticed that except for the last two the natural subject >>> and object of all questions are "device" and "format" respectively -- >>> I'm not sure I see how rephrasing them to involve the lowered format >>> (that in fact a number of them don't care about) would make any of these >>> questions easier to understand. >> >> I *understand* the questions being asked. My claim is this is the >> *wrong* set of questions. By the time you get to this stage, someone >> higher up has made the decision that they're not going to give you >> format A but will rather give you format B which they know will also >> work. (This decision is made by code that understands the hardware >> limitations and you're guaranteed that B is a format you can read and >> transform into A.) The fact that that decision is passed down through >> brw_lower_mesa_image_format is secondary. At this stage, you have two >> formats, A and B and need to figure out how to get from one to the >> other. >> >> For example, instead of "Does the device support a typed format of the >> same size?" the better question would be "Are these two formats the >> same size?". > > Those two questions are not equivalent, lowered formats are always the > same size as the original, whether a format is supported for typed > surface access or not is inherently devinfo-specific, the devinfo > argument couldn't be replaced with a lower_format argument. The same > goes for "Does the device return garbage in the unused bits when reading > From the given format?". > >> Instead of asking "Does the device support any format with the same >> bit layout *and* encoding?" you should ask "Do these two formats have >> the same bit layout *and* encoding?" >> > *Shrug*, I think they're only wrong questions if you start from the > assumption that the policy that computes the lowered format from a given > image format is broken -- E.g. that it gives you a suboptimal format of > different encoding even though the hardware supports an identically > encoded one. You can quickly rule out that assumption by looking up the > definition of is_conversion_trivial() and seeing that it does little > more than call brw_lower_mesa_image_format(). > > I'm afraid that I was slightly misleading earlier: > is_conversion_trivial() is really about the encoding and only has > indirect bit layout implications. The current implementation is based > on the assumption that the lowered format we're converting from or to is > just the canonical one -- I.e. it cannot be readily extended to check > whether the conversion between any two arbitrary formats is trivial, so > even though replacing the devinfo argument with a lower_format would be > technically possible it would lead to an IMO a more disturbing problem. > is_conversion_trivial(mesa_format, mesa_format) would be a partial > function with ill-defined results for certain combinations of its > arguments, and it wouldn't have any way to assert() that the pair is > sensible without an additional devinfo pointer. To make the matter > worse the C++ type system cannot distinguish between a lowered and an > unlowered format while it can distinguish between a devinfo and a > format. The current image_format_info queries are total functions of > their arguments and they are all consistent about taking the original > format *only*. I'd like to preserve those properties. > >> Does the difference I'm making make sense? In the current code, it >> looks like you're doing device queries and making decisions about how >> to work around device limitations. > > emit_image_load and store don't make any decisions themselves, and I'm > not sure I fully understand what made you think they were -- The > code-paths taken through them are driven by the image_format_info > queries so I understand that you may have thought that *those* were the > ones making decisions, or not, either way it's not an obstacle to > convince oneself of the correctness of emit_image_load/store if you > temporarily assume that the results returned by image_format_info are > sensible and consistent with the canonical format lowering policy -- > Assumption you can quickly verify afterwards because their semantics are > clearly defined and their result is usually a simple function of > brw_lower_mesa_image_format(). > > There's one case however in which I think it would be fine to replace > the devinfo argument with a mesa_format one (although it would somewhat > break the rule of only passing unlowered formats to the queries). It > may also be the one that contributed the most to your impression that > emit_image_load and store were making decisions: We could change > "has_supported_bit_layout(devinfo, format)" to > "has_same_bit_layout(format, lower_format)". has_same_bit_layout() > would be total and commutative, so my objections above don't apply. For > the other ones I could mention more explicitly in the doxygen comments > that the functions don't make any decisions independently and the result > is based on the usual lowering policy for the given device.
I can see that this is another spot where we're having trouble seeing eye-to-eye. I am convinced that it is correct (and have was several e-mails ago), but I still don't like the way it's structured. I may send a "clean-up" patch or two after things have landed and we can argue about the structure then. However, for the sake of getting things merged, Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com> >> Hence my original comment about things getting out of sync. In >> reality, all of those decisions have already been made by >> brw_lower_mesa_image_format, this code is just doing what it needs to >> do to go from format A to format B. >> >>>> which, while in the same namespace, are in a different file. >>> >>> I happen to have cleaned that up to be in the same file a short while >>> ago -- They're no longer used by anything else outside of >>> brw_fs_surface_builder.cpp: >>> http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=6b83876a343cabb89083212d20d5b3721b10c200 >>> >>>> It would also be more efficient because you would only call >>>> brw_lower_mesa_image_format once. >>> >>> A more elegant way to "fix" that would be to mark >>> brw_lower_mesa_image_format() as pure, or enable LTO -- It's not like it >>> would make any measurable difference in practice though. >>> >>>> >>>> Does that make more sense? >>>> --Jason >>>> >>>>> Thanks. >>>>> >>>>>> On Tue, Jul 21, 2015 at 9:38 AM, Francisco Jerez <curroje...@riseup.net> >>>>>> wrote: >>>>>>> v2: Drop VEC4 suport. >>>>>>> v3: Rebase. >>>>>>> --- >>>>>>> .../drivers/dri/i965/brw_fs_surface_builder.cpp | 216 >>>>>>> +++++++++++++++++++++ >>>>>>> src/mesa/drivers/dri/i965/brw_fs_surface_builder.h | 17 ++ >>>>>>> 2 files changed, 233 insertions(+) >>>>>>> >>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >>>>>>> index ea1c4aa..46b449f 100644 >>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.cpp >>>>>>> @@ -587,3 +587,219 @@ namespace { >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> + >>>>>>> +namespace brw { >>>>>>> + namespace image_access { >>>>>>> + /** >>>>>>> + * Load a vector from a surface of the given format and >>>>>>> dimensionality >>>>>>> + * at the given coordinates. >>>>>>> + */ >>>>>>> + fs_reg >>>>>>> + emit_image_load(const fs_builder &bld, >>>>>>> + const fs_reg &image, const fs_reg &addr, >>>>>>> + mesa_format format, unsigned dims) >>>>>>> + { >>>>>>> + using namespace image_format_info; >>>>>>> + using namespace image_format_conversion; >>>>>>> + using namespace image_validity; >>>>>>> + using namespace image_coordinates; >>>>>>> + using namespace surface_access; >>>>>>> + const brw_device_info *devinfo = bld.shader->devinfo; >>>>>>> + const mesa_format lower_format = >>>>>>> + brw_lower_mesa_image_format(devinfo, format); >>>>>>> + fs_reg tmp; >>>>>>> + >>>>>>> + if (has_matching_typed_format(devinfo, format)) { >>>>>>> + /* Hopefully we get here most of the time... */ >>>>>>> + tmp = emit_typed_read(bld, image, addr, dims, >>>>>>> + >>>>>>> _mesa_format_num_components(lower_format)); >>>>>>> + } else { >>>>>>> + /* Untyped surface reads return 32 bits of the surface per >>>>>>> + * component, without any sort of unpacking or type >>>>>>> conversion, >>>>>>> + */ >>>>>>> + const unsigned size = _mesa_get_format_bytes(format) / 4; >>>>>>> + >>>>>>> + /* they don't properly handle out of bounds access, so we >>>>>>> have to >>>>>>> + * check manually if the coordinates are valid and >>>>>>> predicate the >>>>>>> + * surface read on the result, >>>>>>> + */ >>>>>>> + const brw_predicate pred = >>>>>>> + emit_bounds_check(bld, image, addr, dims); >>>>>>> + >>>>>>> + /* and they don't know about surface coordinates, we need >>>>>>> to >>>>>>> + * convert them to a raw memory offset. >>>>>>> + */ >>>>>>> + const fs_reg laddr = emit_address_calculation(bld, image, >>>>>>> addr, dims); >>>>>>> + >>>>>>> + tmp = emit_untyped_read(bld, image, laddr, 1, size, pred); >>>>>>> + >>>>>>> + /* An out of bounds surface access should give zero as >>>>>>> result. */ >>>>>>> + for (unsigned c = 0; c < 4; ++c) >>>>>>> + set_predicate(pred, bld.SEL(offset(tmp, bld, c), >>>>>>> + offset(tmp, bld, c), >>>>>>> fs_reg(0))); >>>>>>> + } >>>>>>> + >>>>>>> + /* Set the register type to D instead of UD if the data type >>>>>>> is >>>>>>> + * represented as a signed integer in memory so that sign >>>>>>> extension >>>>>>> + * is handled correctly by unpack. >>>>>>> + */ >>>>>>> + if (needs_sign_extension(format)) >>>>>>> + tmp = retype(tmp, BRW_REGISTER_TYPE_D); >>>>>>> + >>>>>>> + if (!has_supported_bit_layout(devinfo, format)) { >>>>>>> + /* Unpack individual vector components from the bitfield >>>>>>> if the >>>>>>> + * hardware is unable to do it for us. >>>>>>> + */ >>>>>>> + if (has_split_bit_layout(devinfo, format)) >>>>>>> + tmp = emit_pack(bld, tmp, get_bit_shifts(lower_format), >>>>>>> + get_bit_widths(lower_format)); >>>>>>> + else >>>>>>> + tmp = emit_unpack(bld, tmp, get_bit_shifts(format), >>>>>>> + get_bit_widths(format)); >>>>>>> + >>>>>>> + } else if ((needs_sign_extension(format) && >>>>>>> + !is_conversion_trivial(devinfo, format)) || >>>>>>> + has_undefined_high_bits(devinfo, format)) { >>>>>>> + /* Perform a trivial unpack even though the bit layout >>>>>>> matches in >>>>>>> + * order to get the most significant bits of each component >>>>>>> + * initialized properly. >>>>>>> + */ >>>>>>> + tmp = emit_unpack(bld, tmp, color_u(0, 32, 64, 96), >>>>>>> + get_bit_widths(format)); >>>>>>> + } >>>>>>> + >>>>>>> + if (!_mesa_is_format_integer(format)) { >>>>>>> + if (is_conversion_trivial(devinfo, format)) { >>>>>>> + /* Just need to cast the vector to the target type. */ >>>>>>> + tmp = retype(tmp, BRW_REGISTER_TYPE_F); >>>>>>> + } else { >>>>>>> + /* Do the right sort of type conversion to float. */ >>>>>>> + if (_mesa_get_format_datatype(format) == GL_FLOAT) >>>>>>> + tmp = emit_convert_from_float( >>>>>>> + bld, tmp, get_bit_widths(format)); >>>>>>> + else >>>>>>> + tmp = emit_convert_from_scaled( >>>>>>> + bld, tmp, get_bit_widths(format), >>>>>>> + _mesa_is_format_signed(format)); >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + /* Initialize missing components of the result. */ >>>>>>> + return emit_pad(bld, tmp, get_bit_widths(format)); >>>>>>> + } >>>>>>> + >>>>>>> + /** >>>>>>> + * Store a vector in a surface of the given format and >>>>>>> dimensionality at >>>>>>> + * the given coordinates. >>>>>>> + */ >>>>>>> + void >>>>>>> + emit_image_store(const fs_builder &bld, const fs_reg &image, >>>>>>> + const fs_reg &addr, const fs_reg &src, >>>>>>> + mesa_format format, unsigned dims) >>>>>>> + { >>>>>>> + using namespace image_format_info; >>>>>>> + using namespace image_format_conversion; >>>>>>> + using namespace image_validity; >>>>>>> + using namespace image_coordinates; >>>>>>> + using namespace surface_access; >>>>>>> + const brw_device_info *devinfo = bld.shader->devinfo; >>>>>>> + >>>>>>> + if (format == MESA_FORMAT_NONE) { >>>>>>> + /* We don't know what the format is, but that's fine >>>>>>> because it >>>>>>> + * implies write-only access, and typed surface writes are >>>>>>> always >>>>>>> + * able to take care of type conversion and packing for us. >>>>>>> + */ >>>>>>> + emit_typed_write(bld, image, addr, src, dims, 4); >>>>>>> + >>>>>>> + } else { >>>>>>> + const mesa_format lower_format = >>>>>>> + brw_lower_mesa_image_format(devinfo, format); >>>>>>> + fs_reg tmp = src; >>>>>>> + >>>>>>> + if (!is_conversion_trivial(devinfo, format)) { >>>>>>> + /* Do the right sort of type conversion. */ >>>>>>> + if (_mesa_get_format_datatype(format) == GL_FLOAT) >>>>>>> + tmp = emit_convert_to_float(bld, tmp, >>>>>>> get_bit_widths(format)); >>>>>>> + >>>>>>> + else if (_mesa_is_format_integer(format)) >>>>>>> + tmp = emit_convert_to_integer(bld, tmp, >>>>>>> get_bit_widths(format), >>>>>>> + >>>>>>> _mesa_is_format_signed(format)); >>>>>>> + >>>>>>> + else >>>>>>> + tmp = emit_convert_to_scaled(bld, tmp, >>>>>>> get_bit_widths(format), >>>>>>> + >>>>>>> _mesa_is_format_signed(format)); >>>>>>> + } >>>>>>> + >>>>>>> + /* We're down to bit manipulation at this point. */ >>>>>>> + tmp = retype(tmp, BRW_REGISTER_TYPE_UD); >>>>>>> + >>>>>>> + if (!has_supported_bit_layout(devinfo, format)) { >>>>>>> + /* Pack the vector components into a bitfield if the >>>>>>> hardware >>>>>>> + * is unable to do it for us. >>>>>>> + */ >>>>>>> + if (has_split_bit_layout(devinfo, format)) >>>>>>> + tmp = emit_unpack(bld, tmp, >>>>>>> get_bit_shifts(lower_format), >>>>>>> + get_bit_widths(lower_format)); >>>>>>> + >>>>>>> + else >>>>>>> + tmp = emit_pack(bld, tmp, get_bit_shifts(format), >>>>>>> + get_bit_widths(format)); >>>>>>> + } >>>>>>> + >>>>>>> + if (has_matching_typed_format(devinfo, format)) { >>>>>>> + /* Hopefully we get here most of the time... */ >>>>>>> + emit_typed_write(bld, image, addr, tmp, dims, >>>>>>> + >>>>>>> _mesa_format_num_components(lower_format)); >>>>>>> + >>>>>>> + } else { >>>>>>> + /* Untyped surface writes store 32 bits of the surface >>>>>>> per >>>>>>> + * component, without any sort of packing or type >>>>>>> conversion, >>>>>>> + */ >>>>>>> + const unsigned size = _mesa_get_format_bytes(format) / >>>>>>> 4; >>>>>>> + >>>>>>> + /* they don't properly handle out of bounds access, so >>>>>>> we have >>>>>>> + * to check manually if the coordinates are valid and >>>>>>> predicate >>>>>>> + * the surface write on the result, >>>>>>> + */ >>>>>>> + const brw_predicate pred = >>>>>>> + emit_bounds_check(bld, image, addr, dims); >>>>>>> + >>>>>>> + /* and, phew, they don't know about surface >>>>>>> coordinates, we >>>>>>> + * need to convert them to a raw memory offset. >>>>>>> + */ >>>>>>> + const fs_reg laddr = emit_address_calculation(bld, >>>>>>> image, addr, dims); >>>>>>> + >>>>>>> + emit_untyped_write(bld, image, laddr, tmp, 1, size, >>>>>>> pred); >>>>>>> + } >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + /** >>>>>>> + * Perform an atomic read-modify-write operation in a surface of >>>>>>> the >>>>>>> + * given dimensionality at the given coordinates. Main building >>>>>>> block >>>>>>> + * of the imageAtomic GLSL built-ins. >>>>>>> + */ >>>>>>> + fs_reg >>>>>>> + emit_image_atomic(const fs_builder &bld, >>>>>>> + const fs_reg &image, const fs_reg &addr, >>>>>>> + const fs_reg &src0, const fs_reg &src1, >>>>>>> + unsigned dims, unsigned rsize, unsigned op) >>>>>>> + { >>>>>>> + using namespace image_validity; >>>>>>> + using namespace image_coordinates; >>>>>>> + using namespace surface_access; >>>>>>> + /* Avoid performing an atomic operation on an unbound >>>>>>> surface. */ >>>>>>> + const brw_predicate pred = emit_surface_check(bld, image); >>>>>>> + >>>>>>> + /* Thankfully we can do without untyped atomics here. */ >>>>>>> + const fs_reg tmp = emit_typed_atomic(bld, image, addr, src0, >>>>>>> src1, >>>>>>> + dims, rsize, op, pred); >>>>>>> + >>>>>>> + /* An unbound surface access should give zero as result. */ >>>>>>> + if (rsize) >>>>>>> + set_predicate(pred, bld.SEL(tmp, tmp, fs_reg(0))); >>>>>>> + >>>>>>> + return tmp; >>>>>>> + } >>>>>>> + } >>>>>>> +} >>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h >>>>>>> b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h >>>>>>> index 5424be6..cad8c27 100644 >>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h >>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_surface_builder.h >>>>>>> @@ -213,5 +213,22 @@ namespace brw { >>>>>>> _mesa_get_format_datatype(format) == GL_INT); >>>>>>> } >>>>>>> } >>>>>>> + >>>>>>> + namespace image_access { >>>>>>> + fs_reg >>>>>>> + emit_image_load(const fs_builder &bld, >>>>>>> + const fs_reg &image, const fs_reg &addr, >>>>>>> + mesa_format format, unsigned dims); >>>>>>> + >>>>>>> + void >>>>>>> + emit_image_store(const fs_builder &bld, const fs_reg &image, >>>>>>> + const fs_reg &addr, const fs_reg &src, >>>>>>> + mesa_format format, unsigned dims); >>>>>>> + fs_reg >>>>>>> + emit_image_atomic(const fs_builder &bld, >>>>>>> + const fs_reg &image, const fs_reg &addr, >>>>>>> + const fs_reg &src0, const fs_reg &src1, >>>>>>> + unsigned dims, unsigned rsize, unsigned op); >>>>>>> + } >>>>>>> } >>>>>>> #endif >>>>>>> -- >>>>>>> 2.4.3 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> mesa-dev mailing list >>>>>>> mesa-dev@lists.freedesktop.org >>>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev