On Mon, 5 Aug 2019 at 19:01, Alyssa Rosenzweig <aly...@rosenzweig.io> wrote: > > > + for (unsigned c = 0; c < 4; ++c) { > > + struct panfrost_blend_rt *rt = &blend->rt[c]; > > + _mesa_hash_table_u64_clear(rt->shaders, > > panfrost_delete_blend_shader); > > + } > > What's the implication of the clear call? Does that recursively free > the blend CSOs due to ralloc infrastructure?
struct panfrost_blend_shader is allocated here with memdup, so I was indeed missing a free(). Thanks! > > + memcpy(final.shader.bo->cpu, shader->bo->cpu, > > shader->size); > > I'm not totally comfortable with reading off CPU-writeable GPU-mapped > memory. We do it for scoreboarding too (which is probably a bug). But as > discussed, this may have performance and/or security implications that > we don't really understand. Maybe we might be better off having a purely > CPU copy of shaders we'll need to patch? Hmm, guess would be better to play safe in this case. > > + entry->flags == flags) { > > This is definitely necessary, +1 > > I'm wondering if we should be optimizing this somehow but we can sail > that boat when we get there, I guess. Yep. > > @@ -2035,6 +2041,7 @@ panfrost_bind_shader_state( > > enum pipe_shader_type type) > > { > > struct panfrost_context *ctx = pan_context(pctx); > > + struct panfrost_job *job = panfrost_get_job_for_fbo(ctx); > > > > ctx->shader[type] = hwcso; > > > > @@ -2098,6 +2105,8 @@ panfrost_bind_shader_state( > > > > shader_state->compiled = true; > > } > > + > > + panfrost_job_add_bo(job, shader_state->bo); > > This doesn't strike me as quite right. Conceptually, a shader state > could be bound on one frame and then just reused for 2000 frames in a > row, no? I guess that's not something you'll hit in CTS / normal apps, > but it still strikes me as something we need to pay attention to. > > I think we want the add_bo() call to be part of emit_for_draw(). Makes sense. > > + /* To maximize BO cache usage, don't allocate tiny BOs */ > > + size = MAX2(size, 4096); > > FWIW I think we round up to 4k anyway elsewhere? I think that happens only when going though panfrost_drm_allocate_slab. > Also, how does this > interact with shaders -- if you have a bunch of tiny shaders of 128 > bytes each, you'll be wasting 3968 bytes per shader, no? At least the > old strategy (with the 16MB slab) allowed subdivision with byte > granularity. Yeah, there's the wastage you describe, but we are currently wasting several megs so it's a vast improvement in this regard. For more fine-grained performance and memory usage improvements, I think we should track these kind of regressions in CI, or we'll struggle to consistently improve. Thanks, Tomeu _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev