On 26/02/18 16:54, Jason Ekstrand wrote: > On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo > <jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>> wrote: > > The surfaces that backup the GPU buffers have a boundary check that > considers that access to partial dwords are considered out-of-bounds. > For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the > last two bytes would always be read as 0 or its writting ignored. > > The introduction of 16-bit types implies that we need to align the size > to 4-bytes multiples so that partial dwords could be read/written. > Adding an inconditional +2 size to buffers not being multiple of 2 > solves this issue for the general cases of UBO or SSBO. > > But, when unsized arrays of 16-bit elements are used it is not possible > to know if the size was padded or not. To solve this issue the > implementation calculates the needed size of the buffer surfaces, > as suggested by Jason: > > surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size
Changed also the commit log according the the comments below. > > So when we calculate backwards the buffer_size in the backend we > update the resinfo return value with: > > buffer_size = (surface_size & ~3) - (surface_size & 3) > > It is also exposed this buffer requirements when robust buffer access > is enabled so these buffer sizes recommend being multiple of 4. > > v2: (Jason Ekstrand) > Move padding logic fron anv to isl_surface_state > Move calculus of original size from spirv to driver backend > --- > src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++- > src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++- > src/intel/vulkan/anv_device.c | 11 +++++++++++ > 3 files changed, 57 insertions(+), 2 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index 8efec34cc9d..d017af040b4 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const > fs_builder &bld, nir_intrinsic_instr *instr > inst->mlen = 1; > inst->size_written = 4 * REG_SIZE; > > - bld.MOV(retype(dest, ret_payload.type), > component(ret_payload, 0)); > + /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and > Faulting: > + * > + * "Out-of-bounds checking is always performed at a DWord > granularity. If > + * any part of the DWord is out-of-bounds then the whole DWord is > + * considered out-of-bounds." > + * > + * This implies that types with size smaller than 4-bytes > (16-bits) need > > > Yeah, 4-bytes (16-bits) is weird. I'd just drop the "(16-bits)". Completely agree. > + * to be padded if they don't complete the last dword of the > buffer. But > + * as we need to maintain the original size we need to > reverse the padding > + * calculation to return the correct size to know the number > of elements > + * of an unsized array. As we stored in the last two bits of > the size > + * of the buffer the needed padding we calculate here: > + * > + * buffer_size = resinfo_size & ~3 - resinfo_size & 3 > + */ > + > + fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD); > > > I'd call this aligned4 because it's in units of bytes. Locally changed. > + fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD); > + fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD); > + > + ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3)); > + ubld.AND(size_aligned32, component(ret_payload, 0), > brw_imm_ud(~3)); > > > You don't really need the component() here. Removed. > + ubld.ADD(buffer_size, size_aligned32, negate (size_padding)); Removed space after negate > + > + bld.MOV(retype(dest, ret_payload.type), > component(buffer_size, 0)); > + > brw_mark_surface_used(prog_data, index); > break; > } > diff --git a/src/intel/isl/isl_surface_state.c > b/src/intel/isl/isl_surface_state.c > index bfb27fa4a44..ddc9eb53c96 100644 > --- a/src/intel/isl/isl_surface_state.c > +++ b/src/intel/isl/isl_surface_state.c > @@ -673,7 +673,26 @@ void > isl_genX(buffer_fill_state_s)(void *state, > const struct > isl_buffer_fill_state_info *restrict info) > { > - uint32_t num_elements = info->size / info->stride; > + uint64_t buffer_size = info->size; > + > + /* Uniform and Storage buffers need to have surface size not > less that the > + * aligned 32-bit size of the buffer. To calculate the array > lenght on > + * unsized arrays in StorageBuffer the last 2 bits store the > padding size > + * added to the surface, so we can calculate latter the original > buffer > + * size to know the number of elements. > + * > + * surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size > > > isl_align Changed also the formula to match the final math: * surface_size = isl_align(buffer_size, 4) + * (isl_align(buffer_size, 4) - buffer_size) > + * > + * array_size = (surface_size & ~3) - (surface_size & 3) Renamed array_size for buffer_size. > + */ > + if (buffer_size % 4 && > > > You don't need this bit as the below calculation will be a no-op in that > case. Removed > + (info->format == ISL_FORMAT_RAW || > + info->stride < isl_format_get_layout(info->format)->bpb / 8)) { > + assert(info->stride == 1); > + buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size; > > > I think this would better match the FS calculation if it were: > > uint64_t aligned_size = isl_align(buffer_size, 4); > buffer_size = aligned_size + (aligned_size - buffer_size); > > The current calculation is probably fine though. Changed, this one has also less problem with overflow of the uint64 :) > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> Thanks for the review. Chema > > > + } > + > + uint32_t num_elements = buffer_size / info->stride; > > if (GEN_GEN >= 7) { > /* From the IVB PRM, SURFACE_STATE::Height, > diff --git a/src/intel/vulkan/anv_device.c > b/src/intel/vulkan/anv_device.c > index a83b7a39f6a..cedeed56219 100644 > --- a/src/intel/vulkan/anv_device.c > +++ b/src/intel/vulkan/anv_device.c > @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements( > > pMemoryRequirements->size = buffer->size; > pMemoryRequirements->alignment = alignment; > + > + /* Storage and Uniform buffers should have their size aligned to > + * 32-bits to avoid boundary checks when last DWord is not complete. > + * This would ensure that not internal padding would be needed for > + * 16-bit types. > + */ > + if (device->robust_buffer_access && > + (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT || > + buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT)) > + pMemoryRequirements->size = align_u64(buffer->size, 4); > + > pMemoryRequirements->memoryTypeBits = memory_types; > } > > -- > 2.14.3 > > > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev