UBO buffers need size padding if not multiple of 32-bit (v2)

Jason Ekstrand Mon, 26 Feb 2018 09:20:39 -0800

I've lost track of what's reviewed and what's not. Could you either justsend a status list or do a resend once all the current comments are handled?


--Jason



On February 26, 2018 09:08:01 Chema Casanova <jmcasan...@igalia.com> wrote:

On 26/02/18 16:54, Jason Ekstrand wrote:

On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo
<jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>> wrote:

    The surfaces that backup the GPU buffers have a boundary check that
    considers that access to partial dwords are considered out-of-bounds.
    For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
    last two bytes would always be read as 0 or its writting ignored.

    The introduction of 16-bit types implies that we need to align the size
    to 4-bytes multiples so that partial dwords could be read/written.
    Adding an inconditional +2 size to buffers not being multiple of 2
    solves this issue for the general cases of UBO or SSBO.

    But, when unsized arrays of 16-bit elements are used it is not possible
    to know if the size was padded or not. To solve this issue the
    implementation calculates the needed size of the buffer surfaces,
    as suggested by Jason:

    surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size



Changed also the commit log according the the comments below.


    So when we calculate backwards the buffer_size in the backend we
    update the resinfo return value with:

    buffer_size = (surface_size & ~3) - (surface_size & 3)

    It is also exposed this buffer requirements when robust buffer access
    is enabled so these buffer sizes recommend being multiple of 4.

    v2: (Jason Ekstrand)
        Move padding logic fron anv to isl_surface_state
        Move calculus of original size from spirv to driver backend
    ---
     src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++-
     src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++-
     src/intel/vulkan/anv_device.c     | 11 +++++++++++
     3 files changed, 57 insertions(+), 2 deletions(-)

    diff --git a/src/intel/compiler/brw_fs_nir.cpp
    b/src/intel/compiler/brw_fs_nir.cpp
    index 8efec34cc9d..d017af040b4 100644
    --- a/src/intel/compiler/brw_fs_nir.cpp
    +++ b/src/intel/compiler/brw_fs_nir.cpp
    @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const
    fs_builder &bld, nir_intrinsic_instr *instr
           inst->mlen = 1;
           inst->size_written = 4 * REG_SIZE;

    -      bld.MOV(retype(dest, ret_payload.type),
    component(ret_payload, 0));
    +      /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and
    Faulting:
    +       *
    +       * "Out-of-bounds checking is always performed at a DWord
    granularity. If
    +       * any part of the DWord is out-of-bounds then the whole DWord is
    +       * considered out-of-bounds."
    +       *
    +       * This implies that types with size smaller than 4-bytes
    (16-bits) need


Yeah, 4-bytes (16-bits) is weird.  I'd just drop the "(16-bits)".


Completely agree.

    +       * to be padded if they don't complete the last dword of the
    buffer. But
    +       * as we need to maintain the original size we need to
    reverse the padding
    +       * calculation to return the correct size to know the  number
    of elements
    +       * of an unsized array. As we stored in the last two bits of
    the size
    +       * of the buffer the needed padding we calculate here:
    +       *
    +       * buffer_size = resinfo_size & ~3 - resinfo_size & 3
    +       */
    +
    +      fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);


I'd call this aligned4 because it's in units of bytes.


Locally changed.

    +      fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
    +      fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
    +
    +      ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
    +      ubld.AND(size_aligned32, component(ret_payload, 0),
    brw_imm_ud(~3));


You don't really need the component() here.


Removed.

    +      ubld.ADD(buffer_size, size_aligned32, negate (size_padding));


Removed space after negate

    +
    +      bld.MOV(retype(dest, ret_payload.type),
    component(buffer_size, 0));
    +
           brw_mark_surface_used(prog_data, index);
           break;
        }
    diff --git a/src/intel/isl/isl_surface_state.c
    b/src/intel/isl/isl_surface_state.c
    index bfb27fa4a44..ddc9eb53c96 100644
    --- a/src/intel/isl/isl_surface_state.c
    +++ b/src/intel/isl/isl_surface_state.c
    @@ -673,7 +673,26 @@ void
     isl_genX(buffer_fill_state_s)(void *state,
                                   const struct
    isl_buffer_fill_state_info *restrict info)
     {
    -   uint32_t num_elements = info->size / info->stride;
    +   uint64_t buffer_size = info->size;
    +
    +   /* Uniform and Storage buffers need to have surface size not
    less that the
    +    * aligned 32-bit size of the buffer. To calculate the array
    lenght on
    +    * unsized arrays in StorageBuffer the last 2 bits store the
    padding size
    +    * added to the surface, so we can calculate latter the original
    buffer
    +    * size to know the number of elements.
    +    *
    +    *  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size


isl_align


Changed also the formula to match the final math:

    *  surface_size = isl_align(buffer_size, 4) +
    *                 (isl_align(buffer_size, 4) - buffer_size)

    +    *
    +    *  array_size = (surface_size & ~3) - (surface_size & 3)


Renamed array_size for buffer_size.

    +    */
    +   if (buffer_size % 4 &&


You don't need this bit as the below calculation will be a no-op in that
case.


Removed

    +       (info->format == ISL_FORMAT_RAW  ||
    +        info->stride < isl_format_get_layout(info->format)->bpb / 8)) {
    +      assert(info->stride == 1);
    +      buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;


I think this would better match the FS calculation if it were:

uint64_t aligned_size = isl_align(buffer_size, 4);
buffer_size = aligned_size + (aligned_size - buffer_size);

The current calculation is probably fine though.



Changed, this one has also less problem with overflow of the uint64 :)

Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>


Thanks for the review.

Chema

 

    +   }
    +
    +   uint32_t num_elements = buffer_size / info->stride;

        if (GEN_GEN >= 7) {
           /* From the IVB PRM, SURFACE_STATE::Height,
    diff --git a/src/intel/vulkan/anv_device.c
    b/src/intel/vulkan/anv_device.c
    index a83b7a39f6a..cedeed56219 100644
    --- a/src/intel/vulkan/anv_device.c
    +++ b/src/intel/vulkan/anv_device.c
    @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(

        pMemoryRequirements->size = buffer->size;
        pMemoryRequirements->alignment = alignment;
    +
    +   /* Storage and Uniform buffers should have their size aligned to
    +    * 32-bits to avoid boundary checks when last DWord is not complete.
    +    * This would ensure that not internal padding would be needed for
    +    * 16-bit types.
    +    */
    +   if (device->robust_buffer_access &&
    +       (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
    +        buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
    +      pMemoryRequirements->size = align_u64(buffer->size, 4);
    +
        pMemoryRequirements->memoryTypeBits = memory_types;
     }

    --
    2.14.3




_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

Reply via email to