On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> wrote:
> This adds initial format modifiers for AMD GFX9 and newer GPUs. > > This is particularly useful to determine if we can use DCC, and whether > we need an extra display compatible DCC metadata plane. > > Design decisions: > - Always expose a single plane > This way everything works correctly with images with multiple > planes. > > - Do not add an extra memory region in DCC for putting a bit on whether > we are in compressed state. > A decompress on import is cheap enough if already decompressed, and > I do think in most cases we can avoid it in advance during modifier > negotiation. The remainder is probably not common enough to worry > about. > > - Explicitly define the sizes as part of the modifier description instead > of using whatever the current version of radeonsi does. > This way we can avoid dedicated buffers and we can make sure we keep > compatibility across mesa versions. I'd like to put some tests on > this on ac_surface.c so we can learn early in the process if things > need to be changed. Furthermore, the lack of configurable strides on > GFX10 means things already go wrong if we do not agree, making a > custom stride somewhat less useful. > The custom stride will be back for 2D images (not for 3D/Array), so Navi10-14 will be the only hw not supporting the custom stride for 2D. It might not be worth adding the width and height into the modifier just because of Navi10-14, though I don't feel strongly about it. This patch doesn't add the sizes into the description anyway. The rest looks good. Marek > > - No usage of BO metadata at all for modifier usecases. > To avoid the requirement of dedicated dma bufs per image. For > non-modifier based interop we still use the BO metadata, since we > need to keep compatibility with old mesa and this is used for > depth/msaa/3d/CL etc. API interop. > > - A single FD for all planes. > Easier in Vulkan / bindless and radeonsi is already transitioning. > > - Make a single modifier for DCN1 > It defines things uniquely given bpp, which we can assume, so adding > more modifier values do not add clarity. > > - Not exposing the 4K and 256B tiling modes. > These are largely only better for something like a cursor or very > long > and/or tall images. Are they worth the added complexity to save > memory? > For context, at 32bpp, tiles are 128x128 pixels. > > - For multiplane images, every plane uses the same tiling. > On GFX9/GFX10 we can, so no need to make it complicated. > > - We use family_id + external_rev to distinguish between incompatible > GPUs. > PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device > id, > but different tiling. We might be able to find bigger equivalence > groups for _X, but especially for DCC I would be uncomfortable > making it > shared between GPUs. > > - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices > to reorder. This is not shared. > Specific to current implementation and does not need to be shared. To > pave the way to shader-based solution, lets keep this internal to > each > driver. This should reduce the modifier churn if any of the driver > implementations change. (Especially as you'd want to support the old > implementation for a while to stay compatible with old kernels not > supporting a new modifier yet). > > - No support for rotated swizzling. > Can be added easily later and nothing in the stack would generate it > currently. > > - Add extra enum values in the definitions. > This way we can easily switch on modifier without having to pass > around > the current GPU everywhere, assuming the modifier has been validated. > --- > > Since my previous attempt for modifiers got bogged down on details for > the GFX6-GFX8 modifiers in previous discussions, this only attempts to > define modifiers for GFX9+, which is significantly simpler. > > For a final version I'd like to wait until I have written most of the > userspace + kernelspace so we can actually test it. However, I'd > appreciate any early feedback people are willing to give. > > Initial Mesa amd/common support + tests are available at > https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers > > I tested the HW to actually behave as described in the descriptions > on Raven and plan to test on a subset of the others. > > include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++ > 1 file changed, 118 insertions(+) > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index 3feeaa3f987a..9bd286ab2bee 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -756,6 +756,124 @@ extern "C" { > */ > #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1) > > +/* > + * AMD GFX9+ format modifiers > + */ > + > +/* > + * enum-like values for easy switches. > + * > + * No fixed field-size but implementations are supposed to enforce > all-zeros of > + * unused bits during validation. > + */ > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id 0 > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id 1 > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id 2 > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id 3 > +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id 4 > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5 > +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id 6 > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id 7 > + > +/* > + * tiling modes that are compatible between all GPUs that support the > tiling > + * mode. > + * > + * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a > 256 byte > + * micro-block. > + * > + * The macro-block is 64 KiB and the micro-block in macro-block > addressing is > + * y0-x0-y1-x1-... up till the dimensions of the macro-block. > + * > + * The image is then a plain row-major image of macro-blocks. > + */ > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id) > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id) > + > +/* > + * Same as above, but applies a transformation on the micro-block in > macro-block > + * indexing that depends on the GPU pipes, shader engines and banks. > + * > + * RENDER is a new micro-block tiling for GFX10+. > + */ > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev) \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \ > + ((uint64_t)family_id << 40) | \ > + ((uint64_t)external_rev << 48)) > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev) \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id | \ > + ((uint64_t)family_id << 40) | \ > + ((uint64_t)external_rev << 48)) > +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev) \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id | \ > + ((uint64_t)family_id << 40) | \ > + ((uint64_t)external_rev << 48)) > + > +/* > + * Same as above, but with DCC enabled. > + * > + * We add the PCI ID of the device to make sure the transformation above > is > + * applied the same way, as well as make sure the implementation of DCC > supports > + * the same patterns. > + * > + * The DCC is pipe-aligned (and on GFX9 rb-aligned). > + * > + * This includes 2 memory regions per plane: > + * - main image > + * - DCC metadata > + * > + * These are tightly packed according to platform specific DCC alignment > + * requirements. > + * > + * pipe+rb aligned DCC alignment: > + * - GFX9: MAX(65536, > + * MIN2(32, pipes * shader_engines) * > + * num_backends * interleave_bytes) > + * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096) > + * > + * aligned DCC size: > + * - GFX9: > + * tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer > + * width if odd log2) at ratio 1/256 > + * - GFX10 (without rbplus): > + * tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data > + * (prefer width if odd log2) at ratio 1/256 > + */ > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id, > external_rev) \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id > | \ > + ((uint64_t)family_id << 40) | > \ > + ((uint64_t)external_rev << 48)) > +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id, > external_rev) \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id > | \ > + ((uint64_t)family_id << 40) | > \ > + ((uint64_t)external_rev << 48)) > + > +/* > + * DCC that is displayable with DCN1 hardware. > + * > + * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64 > bits, the > + * micro-tiling is DISPLAY. > + * > + * This includes 3 memory regions per plane: > + * - main image > + * - DCC (non aligned) > + * - DCC (pipe-aligned & rb-aligned) > + * > + * non-aligned DCC alignment: > + * - GFX9: MAX(65536, interleave_bytes) > + * - GFX10 (without rbplus): 4096 > + * > + * non-aligned DCC size: > + * - GFX9 & GFX10 (without rbplus): > + * tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio > 1/256 > + */ > +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev) \ > + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \ > + ((uint64_t)family_id << 40) | \ > + ((uint64_t)external_rev << 48)) > + > #if defined(__cplusplus) > } > #endif > -- > 2.23.0 > >
_______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel