On Wed, Jan 25, 2017 at 11:14:32PM -0800, Kenneth Graunke wrote: > Applications may delete a shader program, create a new one, and bind it > before the next draw. With terrible luck, malloc may randomly return a > chunk of memory for the new gl_program that happened to be the exact > same pointer as our previously bound gl_program. In this case, our > logic to detect new programs in brw_upload_pipeline_state() would break: > > if (brw->vertex_program != ctx->VertexProgram._Current) { > brw->vertex_program = ctx->VertexProgram._Current; > brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM; > } > > Because the pointer is the same, we'd think it was the same program. > But it could be wildly different - a different stage altogether, > different sets of resources, and so on. This causes utter chaos. > > As unlikely as this seems, I believe I hit this when running a subset > of the CTS in a loop, in a group of tests that churns through simple > programs, deleting and rebuilding them. Presumably malloc uses a > bucketing cache of sorts, and so freeing up a gl_program and allocating > a new one fairly quickly causes it to reuse that memory. > > The result was that brw->vertex_program->info.num_ssbos claimed the > program had SSBOs, while brw->vs.base.prog_data.binding_table claimed > that there were none. This was crazy, because the binding table is > calculated from info.num_ssbos - the shader info appeared to change > between shader compile time and draw time. Careful use of watchpoints > revealed that it was being clobbered by rzalloc's memset when building > an entirely different program... > > Fortunately, our 0xd0d0d0d0 canary for unused binding table entries > caused us to crash out of bounds when trying to upload SSBOs, or we > may have never discovered this heisenbug. > > Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked > cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5 > at 64x64 in a loop with 100 iterations. > > Cc: "17.0 13.0 12.0" <mesa-sta...@lists.freedesktop.org> > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org>
Really good work, must have taken quite a bit of time!!! Reviewed-by: Topi Pohjolainen <topi.pohjolai...@intel.com> I've noticed the same malloc pattern with miptree's. Gave me a headache when just dumping miptrees and not paying attention to deallocations in between. > --- > src/mesa/drivers/dri/i965/brw_program.c | 43 > +++++++++++++++++++++++++++++++++ > 1 file changed, 43 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > b/src/mesa/drivers/dri/i965/brw_program.c > index e81f6b15c0a..673dc502ad4 100644 > --- a/src/mesa/drivers/dri/i965/brw_program.c > +++ b/src/mesa/drivers/dri/i965/brw_program.c > @@ -177,6 +177,49 @@ static struct gl_program *brwNewProgram(struct > gl_context *ctx, GLenum target, > static void brwDeleteProgram( struct gl_context *ctx, > struct gl_program *prog ) > { > + struct brw_context *brw = brw_context(ctx); > + > + /* Beware! prog's refcount has reached zero, and it's about to be freed. > + * > + * In brw_upload_pipeline_state(), we compare brw->foo_program to > + * ctx->FooProgram._Current, and flag BRW_NEW_FOO_PROGRAM if the > + * pointer has changed. > + * > + * We cannot leave brw->foo_program as a dangling pointer to the dead > + * program. malloc() may allocate the same memory for a new gl_program, > + * causing us to see matching pointers...but totally different programs. > + * > + * We cannot set brw->foo_program to NULL, either. If we've deleted the > + * active program, Mesa may set ctx->FooProgram._Current to NULL. That > + * would cause us to see matching pointers (NULL == NULL), and fail to > + * detect that a program has changed since our last draw. > + * > + * So, set it to a bogus gl_program pointer that will never match, > + * causing us to properly reevaluate the state on our next draw. > + * > + * Getting this wrong causes heisenbugs which are very hard to catch, > + * as you need a very specific allocation pattern to hit the problem. > + */ > + static const struct gl_program deleted_program; > + > + if (brw->vertex_program == prog) > + brw->vertex_program = &deleted_program; > + > + if (brw->tess_ctrl_program == prog) > + brw->tess_ctrl_program = &deleted_program; > + > + if (brw->tess_eval_program == prog) > + brw->tess_eval_program = &deleted_program; > + > + if (brw->geometry_program == prog) > + brw->geometry_program = &deleted_program; > + > + if (brw->fragment_program == prog) > + brw->fragment_program = &deleted_program; > + > + if (brw->compute_program == prog) > + brw->compute_program = &deleted_program; > + > _mesa_delete_program( ctx, prog ); > } > > -- > 2.11.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev