Early in the GEM work, we introduced this ugly split in the 3D driver state setup to support the check_aperture code. We would go through the GL state and make a list of every BO that would be part of state setup, and save that, then do check_aperture on them, then go through state setup again actually emitting the state. The overhead wasn't huge (after some small optimizations), but the complexity cost on the code was pretty bad.
While I was getting started on vertex texturing, I was having an awful time wrapping my mind around the state setup sequence for VS/WM surface state. I realized that if I couldn't understand it, having rewritten most of it myself, everyone else was screwed. So, I spent a while trying to work out how to avoid the split. I realized two things: 1) We now emit only a single level of relocations. Back in the pre-state-streaming days, we'd have this big tree of relocations. The batch pointed to the binding table, which pointed to the surface state, which pointed to texture data. Now, that chain is batch pointing to binding-table-in-batch pointing to surface-state-in-batch, pointing to texture data. So, to do a checkpoint and restore, we only need to do it on the batchbuffer. That's way easier than previous plans. (Tested with an assertion on gen6) 2) We don't care about check_aperture after rolling back the last primitive emitted. The plan is to replace this inner loop of state setup: for each prim { compute bo list if (check_aperture_space(bo list)) { batch flush compute bo list if (check_aperture_space(bo list)) { whine_about_batch_size() fall back; } } upload state to BOs } with this inner loop: for each prim { retry: upload state to BOs if (check_aperture_space(batch)) { if (!retried) { reset_to_last_prim() batch flush } else { if (batch_flush()) whine_about_batch_size() goto retry; } } } Note how if we do a reset, we immediately flush. We don't need to check aperture space -- the kernel will tell us if we actually ran out of aperture or not. Anhd if we did run out of aperture, it's because either the single prim was too big, or because check_aperture was wrong at the point of setting up the last primitive. This means that all I need to do for reset_to_last_prim() is to roll back where the next batch emits go (for the final MI_BATCHBUFFER_END emit), and the set of relocations emitted since the last primitive. This change gives us the relocations reset required. The simplification in i965 comes out to: 41 files changed, 228 insertions(+), 418 deletions(-) but I don't think that accurately reflects the benefit. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx