On 26/02/18 18:20, Jason Ekstrand wrote: > I've lost track of what's reviewed and what's not. Could you either > just send a status list or do a resend once all the current comments are > handled?
Reviewed-by and all feedback addressed -------------------------------------- [1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of 32-bits. [3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout [5/7] spirv: Calculate properly 16-bit vector sizes [6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned Feedback received but pending to finish v2 ------------------------------------------ [2/7] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout To review when everything is ready ---------------------------------- [4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO [7/7] anv: Enable VK_KHR_16bit_storage for PushConstant I'll resend this series when [2/7] is ready. Chema > --Jason > > > On February 26, 2018 09:08:01 Chema Casanova <jmcasan...@igalia.com> wrote: > >> On 26/02/18 16:54, Jason Ekstrand wrote: >>> On Mon, Feb 26, 2018 at 6:14 AM, Jose Maria Casanova Crespo >>> <jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>> wrote: >>> >>> The surfaces that backup the GPU buffers have a boundary check that >>> considers that access to partial dwords are considered >>> out-of-bounds. >>> For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the >>> last two bytes would always be read as 0 or its writting ignored. >>> >>> The introduction of 16-bit types implies that we need to align >>> the size >>> to 4-bytes multiples so that partial dwords could be read/written. >>> Adding an inconditional +2 size to buffers not being multiple of 2 >>> solves this issue for the general cases of UBO or SSBO. >>> >>> But, when unsized arrays of 16-bit elements are used it is not >>> possible >>> to know if the size was padded or not. To solve this issue the >>> implementation calculates the needed size of the buffer surfaces, >>> as suggested by Jason: >>> >>> surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size >> >> >> Changed also the commit log according the the comments below. >> >>> >>> So when we calculate backwards the buffer_size in the backend we >>> update the resinfo return value with: >>> >>> buffer_size = (surface_size & ~3) - (surface_size & 3) >>> >>> It is also exposed this buffer requirements when robust buffer >>> access >>> is enabled so these buffer sizes recommend being multiple of 4. >>> >>> v2: (Jason Ekstrand) >>> Move padding logic fron anv to isl_surface_state >>> Move calculus of original size from spirv to driver backend >>> --- >>> src/intel/compiler/brw_fs_nir.cpp | 27 ++++++++++++++++++++++++++- >>> src/intel/isl/isl_surface_state.c | 21 ++++++++++++++++++++- >>> src/intel/vulkan/anv_device.c | 11 +++++++++++ >>> 3 files changed, 57 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/intel/compiler/brw_fs_nir.cpp >>> b/src/intel/compiler/brw_fs_nir.cpp >>> index 8efec34cc9d..d017af040b4 100644 >>> --- a/src/intel/compiler/brw_fs_nir.cpp >>> +++ b/src/intel/compiler/brw_fs_nir.cpp >>> @@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const >>> fs_builder &bld, nir_intrinsic_instr *instr >>> inst->mlen = 1; >>> inst->size_written = 4 * REG_SIZE; >>> >>> - bld.MOV(retype(dest, ret_payload.type), >>> component(ret_payload, 0)); >>> + /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and >>> Faulting: >>> + * >>> + * "Out-of-bounds checking is always performed at a DWord >>> granularity. If >>> + * any part of the DWord is out-of-bounds then the whole >>> DWord is >>> + * considered out-of-bounds." >>> + * >>> + * This implies that types with size smaller than 4-bytes >>> (16-bits) need >>> >>> >>> Yeah, 4-bytes (16-bits) is weird. I'd just drop the "(16-bits)". >> >> Completely agree. >> >>> + * to be padded if they don't complete the last dword of the >>> buffer. But >>> + * as we need to maintain the original size we need to >>> reverse the padding >>> + * calculation to return the correct size to know the number >>> of elements >>> + * of an unsized array. As we stored in the last two bits of >>> the size >>> + * of the buffer the needed padding we calculate here: >>> + * >>> + * buffer_size = resinfo_size & ~3 - resinfo_size & 3 >>> + */ >>> + >>> + fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD); >>> >>> >>> I'd call this aligned4 because it's in units of bytes. >> >> Locally changed. >> >>> + fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD); >>> + fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD); >>> + >>> + ubld.AND(size_padding, component(ret_payload, 0), >>> brw_imm_ud(3)); >>> + ubld.AND(size_aligned32, component(ret_payload, 0), >>> brw_imm_ud(~3)); >>> >>> >>> You don't really need the component() here. >> >> Removed. >> >>> + ubld.ADD(buffer_size, size_aligned32, negate (size_padding)); >> >> Removed space after negate >> >>> + >>> + bld.MOV(retype(dest, ret_payload.type), >>> component(buffer_size, 0)); >>> + >>> brw_mark_surface_used(prog_data, index); >>> break; >>> } >>> diff --git a/src/intel/isl/isl_surface_state.c >>> b/src/intel/isl/isl_surface_state.c >>> index bfb27fa4a44..ddc9eb53c96 100644 >>> --- a/src/intel/isl/isl_surface_state.c >>> +++ b/src/intel/isl/isl_surface_state.c >>> @@ -673,7 +673,26 @@ void >>> isl_genX(buffer_fill_state_s)(void *state, >>> const struct >>> isl_buffer_fill_state_info *restrict info) >>> { >>> - uint32_t num_elements = info->size / info->stride; >>> + uint64_t buffer_size = info->size; >>> + >>> + /* Uniform and Storage buffers need to have surface size not >>> less that the >>> + * aligned 32-bit size of the buffer. To calculate the array >>> lenght on >>> + * unsized arrays in StorageBuffer the last 2 bits store the >>> padding size >>> + * added to the surface, so we can calculate latter the original >>> buffer >>> + * size to know the number of elements. >>> + * >>> + * surface_size = 2 * aling_u64(buffer_size, 4) - buffer_size >>> >>> >>> isl_align >> >> Changed also the formula to match the final math: >> >> * surface_size = isl_align(buffer_size, 4) + >> * (isl_align(buffer_size, 4) - buffer_size) >> >> >>> + * >>> + * array_size = (surface_size & ~3) - (surface_size & 3) >> >> Renamed array_size for buffer_size. >> >> >>> + */ >>> + if (buffer_size % 4 && >>> >>> >>> You don't need this bit as the below calculation will be a no-op in that >>> case. >> >> Removed >> >> >>> + (info->format == ISL_FORMAT_RAW || >>> + info->stride < isl_format_get_layout(info->format)->bpb >>> / 8)) { >>> + assert(info->stride == 1); >>> + buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size; >>> >>> >>> I think this would better match the FS calculation if it were: >>> >>> uint64_t aligned_size = isl_align(buffer_size, 4); >>> buffer_size = aligned_size + (aligned_size - buffer_size); >>> >>> The current calculation is probably fine though. >> >> >> Changed, this one has also less problem with overflow of the uint64 :) >> >> >>> Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> >> >> Thanks for the review. >> >> Chema >> >> >>> >>> >>> + } >>> + >>> + uint32_t num_elements = buffer_size / info->stride; >>> >>> if (GEN_GEN >= 7) { >>> /* From the IVB PRM, SURFACE_STATE::Height, >>> diff --git a/src/intel/vulkan/anv_device.c >>> b/src/intel/vulkan/anv_device.c >>> index a83b7a39f6a..cedeed56219 100644 >>> --- a/src/intel/vulkan/anv_device.c >>> +++ b/src/intel/vulkan/anv_device.c >>> @@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements( >>> >>> pMemoryRequirements->size = buffer->size; >>> pMemoryRequirements->alignment = alignment; >>> + >>> + /* Storage and Uniform buffers should have their size aligned to >>> + * 32-bits to avoid boundary checks when last DWord is not >>> complete. >>> + * This would ensure that not internal padding would be >>> needed for >>> + * 16-bit types. >>> + */ >>> + if (device->robust_buffer_access && >>> + (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT || >>> + buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT)) >>> + pMemoryRequirements->size = align_u64(buffer->size, 4); >>> + >>> pMemoryRequirements->memoryTypeBits = memory_types; >>> } >>> >>> -- >>> 2.14.3 >>> >>> >>> >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev