> +        for (unsigned c = 0; c < 4; ++c) {
> +                struct panfrost_blend_rt *rt = &blend->rt[c];
> +                _mesa_hash_table_u64_clear(rt->shaders, 
> panfrost_delete_blend_shader);
> +        }

What's the implication  of the clear call? Does that recursively free
the blend CSOs due to ralloc infrastructure?

> +                memcpy(final.shader.bo->cpu, shader->bo->cpu, shader->size);

I'm not totally comfortable with reading off CPU-writeable GPU-mapped
memory. We do it for scoreboarding too (which is probably a bug). But as
discussed, this may have performance and/or security implications that
we don't really understand. Maybe we might be better off having a purely
CPU copy of shaders we'll need to patch?

> +                    entry->flags == flags) {

This is definitely necessary, +1

I'm wondering if we should be optimizing this somehow but we can sail
that boat when we get there, I guess.

> @@ -2035,6 +2041,7 @@ panfrost_bind_shader_state(
>          enum pipe_shader_type type)
>  {
>          struct panfrost_context *ctx = pan_context(pctx);
> +        struct panfrost_job *job = panfrost_get_job_for_fbo(ctx);
>  
>          ctx->shader[type] = hwcso;
>  
> @@ -2098,6 +2105,8 @@ panfrost_bind_shader_state(
>  
>                  shader_state->compiled = true;
>          }
> +
> +        panfrost_job_add_bo(job, shader_state->bo);

This doesn't strike me as quite right. Conceptually, a shader state
could be bound on one frame and then just reused for 2000 frames in a
row, no? I guess that's not something you'll hit in CTS / normal apps,
but it still strikes me as something we need to pay attention to.

I think we want the add_bo() call to be part of emit_for_draw().

> +        /* To maximize BO cache usage, don't allocate tiny BOs */
> +        size = MAX2(size, 4096);

FWIW I think we round up to 4k anyway elsewhere? Also, how does this
interact with shaders -- if you have a bunch of tiny shaders of 128
bytes each, you'll be wasting 3968 bytes per shader, no? At least the
old strategy  (with the 16MB slab) allowed subdivision with byte
granularity.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to