On Thu, Feb 21, 2019 at 3:01 PM Rob Clark <robdcl...@gmail.com> wrote:
> On Thu, Feb 21, 2019 at 1:19 AM Alyssa Rosenzweig <aly...@rosenzweig.io> > wrote: > > > > For reasons that are still unclear (speculation included in the comment > > added in this patch), the tiler? metadata has a fast path that we were > > not enabling; there looks to be a possible time/memory tradeoff, but the > > details remain unclear. > > > > Regardless, this patch improves performance dramatically. Particular > > wins are for geometry-heavy scenes. For instance, glmark2-es2's > > Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped > > from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious > > when vsync is disabled, as in glmark2-es2-wayland. > > > > With this patch, on GLES 2.0 samples not involving FBOs, it appears > > performance is converging with (and sometimes surpassing) the blob. > > > > Signed-off-by: Alyssa Rosenzweig <aly...@rosenzweig.io> > > --- > > src/gallium/drivers/panfrost/pan_context.c | 42 +++++++++++++++++++--- > > 1 file changed, 38 insertions(+), 4 deletions(-) > > > > diff --git a/src/gallium/drivers/panfrost/pan_context.c > b/src/gallium/drivers/panfrost/pan_context.c > > index 822b5a0dfef..2996a9c1e09 100644 > > --- a/src/gallium/drivers/panfrost/pan_context.c > > +++ b/src/gallium/drivers/panfrost/pan_context.c > > @@ -256,7 +256,28 @@ static struct bifrost_framebuffer > > panfrost_emit_mfbd(struct panfrost_context *ctx) > > { > > struct bifrost_framebuffer framebuffer = { > > - .tiler_meta = 0xf00000c600, > > + /* It is not yet clear what tiler_meta means or how it's > > + * calculated, but we can tell the lower 32-bits are a > > + * (monotonically increasing?) function of tile count > and > > + * geometry complexity; I suspect it defines a memory > size of > > + * some kind? for the tiler. It's really unclear at the > > + * moment... but to add to the confusion, the hardware > is happy > > + * enough to accept a zero in this field, so we don't > even have > > + * to worry about it right now. > > *somewhere* the result of VS (or binning shader if VS is split in two) > needs to be saved for use during the per-tile rendering. Presumably > you have to give the hw a buffer of appropriate/matching size > somewhere, or with enough geometry (and too small a buffer) things > will overflow and go badly. > > Yes, there is a buffer for holding the results of the tiler. The way it works is that the userspace driver allocates a very large buffer with a "grow on page fault" bit, so the kernel will allocate more memory as the tiler asks for it. > I guess if you exceed the size given in .tiler_meta, then hw falls > back to running VS per tile for all geometry (not just geom visible in > the tile), hence big diff in perf for large tile counts and lotsa > geometry. > No, this isn't it. The vertex shaders aren't split in half, they all run entirely before the tiler even starts. The only thing I could imagine would be some optional part of the tiler buffer where the aforementioned grow on page fault thing doesn't quite work out. > > It does sound a bit strange that it would depend on tile count. I'd > expect it would be a function strictly of amount of geometry (and > possibly effected by whether or not gl_PointSize is written?).. and > possibly amount of varyings if VS isn't split into two parts. > > BR, > -R > > > + * The byte (just after the 32-bit mark) is much more > > + * interesting. The higher nibble I've only ever seen > as 0xF, > > + * but the lower one I've seen as 0x0 or 0xF, and it's > not > > + * obvious what the difference is. But what -is- > obvious is > > + * that when the lower nibble is zero, performance is > severely > > + * degraded compared to when the lower nibble is set. > > + * Evidently, that nibble enables some sort of fast > path, > > + * perhaps relating to caching or tile flush? > Regardless, at > > + * this point there's no clear reason not to set it, > aside from > > + * substantially increased memory requirements (of the > misc_0 > > + * buffer) */ > > + > > + .tiler_meta = ((uint64_t) 0xff << 32) | 0x0, > > > > .width1 = MALI_POSITIVE(ctx->pipe_framebuffer.width), > > .height1 = MALI_POSITIVE(ctx->pipe_framebuffer.height), > > @@ -271,10 +292,23 @@ panfrost_emit_mfbd(struct panfrost_context *ctx) > > > > .unknown2 = 0x1f, > > > > - /* Presumably corresponds to unknown_address_X of SFBD > */ > > + /* Corresponds to unknown_address_X of SFBD */ > > .scratchpad = ctx->scratchpad.gpu, > > .tiler_scratch_start = ctx->misc_0.gpu, > > - .tiler_scratch_middle = ctx->misc_0.gpu + > /*ctx->misc_0.size*/40960, /* Size depends on the size of the framebuffer > and the number of vertices */ > > + > > + /* The constant added here is, like the lower word of > > + * tiler_meta, (loosely) another product of framebuffer > size > > + * and geometry complexity. It must be sufficiently > large for > > + * the tiler_meta fast path to work; if it's too small, > there > > + * will be DATA_INVALID_FAULTs. Conversely, it must be > less > > + * than the total size of misc_0, or else there's no > room. It's > > + * possible this constant configures a partition > between two > > + * parts of misc_0? We haven't investigated the > functionality, > > + * as these buffers are internally used by the hardware > > + * (presumably by the tiler) but not seemingly touched > by the driver > > + */ > > + > > + .tiler_scratch_middle = ctx->misc_0.gpu + 0xf0000, > > > > .tiler_heap_start = ctx->tiler_heap.gpu, > > .tiler_heap_end = ctx->tiler_heap.gpu + > ctx->tiler_heap.size, > > @@ -2696,7 +2730,7 @@ panfrost_setup_hardware(struct panfrost_context > *ctx) > > screen->driver->allocate_slab(screen, &ctx->varying_mem, 16384, > false, 0, 0, 0); > > screen->driver->allocate_slab(screen, &ctx->shaders, 4096, > true, PAN_ALLOCATE_EXECUTE, 0, 0); > > screen->driver->allocate_slab(screen, &ctx->tiler_heap, 32768, > false, PAN_ALLOCATE_GROWABLE, 1, 128); > > - screen->driver->allocate_slab(screen, &ctx->misc_0, 128, false, > PAN_ALLOCATE_GROWABLE, 1, 128); > > + screen->driver->allocate_slab(screen, &ctx->misc_0, 128*128, > false, PAN_ALLOCATE_GROWABLE, 1, 128); > > > > } > > > > -- > > 2.20.1 > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev