On Tuesday, June 09, 2015 03:01:00 PM Abdiel Janulgue wrote: > > Thanks a lot Ken! Your help in debugging this pointed me in the right > direction! I had some further observations below: > > On 06/09/2015 11:08 AM, Kenneth Graunke wrote: > > > > > (To bring the mailing list up to speed: Abdiel mentioned on IRC tonight > > that this is actually still necessary---some Piglit tests worked with > > the offset removed, but real applications didn't work. I then noticed > > that even "shader_runner glsl-fs-texture2d.shader_test" breaks when > > hw_bt_start_offset = 0 - even with the workaround Abdiel mentions). > > > > Abdiel, > > > > I think I figured out why this is necessary. In gen[78]_disable_stages, > > we issue 3DSTATE_BINDING_TABLE_POINTERS_HS/DS packets with a "pointer" > > value of 0. > > > > In the software binding table case, this points to the start of the > > batch buffer, which is harmless because the disabled HS/DS won't read > > any surfaces. > > > > However, the hardware binding table case is different: upon receiving > > a 3DSTATE_BINDING_TABLE_POINTERS_XS packet, the hardware *writes* the > > current on-die binding table to the given offset. This is a maximum > > of 256 16-bit surface state pointers. > > > > My theory is that if we program legitimate binding tables at offset 0, > > they get clobbered when gen7_disable_stages says that the HS/DS binding > > tables should be written to offset 0. > > > > By starting at an offset of 256 * sizeof(uint16_t), we are essentially > > allocating a "dummy" binding table of maximum size. > > > > Three things I tried fixed the problem: > > 1. Remove 3DSTATE_BINDING_TABLE_POINTERS_HS/DS from gen7_disable_stages. > > > > We never tell the HW to write out HS/DS tables, so the PS table at > > offset 0 doesn't get clobbered. > > I vaguely remember somehow sometime ago that removing the "nullify > binding table pointer" resulted in some random hangs. But I can't > remember which exact test or game I used. So don't take my word for it. > > > > > 2. Change those packets to use offset 16000 (something large). > > > > We write out useless HS/DS tables, but to an unused spot in the > > buffer, so they don't trash anything. > > > > 3. Move the gen7_disable_stages atom immediately after the > > gen7_hw_binding_tables atom in the list. > > > > Instead of writing VS/PS tables then clobbering them with HS/DS, > > we reverse the order: write garbage HS/DS tables, then clobber them > > with the (actually useful) PS table. > > > > > This brought up a question: how does the hardware know how large of a > > table to write? Does it always write out all 256 entries? > > > > It certainly seems to, as far as I can tell. But that would mean that > > when increasing brw->hw_bt_pool.next_offset, we always need to add > > 256 * sizeof(uint16_t), even if the table only has a few useful entries. > > I'm a bit confused, because we're not doing that today...so shouldn't > > something have broken? > > I did further digging and found out that the answer is both yes and no. > Yes because the hardware is actually able to write to the buffer even > without specifying the full 256 entry size. I confirmed this by mapping > out the contents of the valid binding table pool after every flush and > the entries are correct as long as the offsets used are aligned to > 64-bytes. It also doesn't overwrite the entries written starting from > the next 3DSTATE_BINDING_TABLE_POINTERS_XS offset. > > And no, because for some reason, garbage hardware binding table pointers > (written by gen7_disable in this case) seem to require a full 256-entry > allocation when stating from offset zero. If another > 3DSTATE_BINDING_TABLE_POINTERS_XS offset touches an offset within that > "garbage" range then the hardware gets angry. This is the reason why > anything less than 256 * sizeof(uint16_t) didn't work earlier. > > To summarize, this is what currently happened when we used offset zero > with the current code: > > btp: valid -> garbage -> valid > offset: 0 -> 0 -> 64 > > Notice that the hangs here are caused because the valid binding table > pointer is overwritten with the garbage binding table pointer. This is > same conclusion you mentioned earlier. Also, the other "valid" binding > table offset is less than 256 * sizeof(uint16_t). > > Because of the same reason, using a start offset for the valid binding > table pointer which is less than 256 * sizeof(uint16_t) hangs too: > > btp: valid -> garbage -> valid > offset: 128 -> 0 -> 128 + 64 > > > However doing this instead works: > > btp: valid -> garbage -> valid > offset: 0 -> 64 -> 64 > > It's kinda like step 3 you mentioned above except we don't reverse the > order of the atoms but specify brw->hw_bt_pool.next_offset to the > 3DSTATE_BINDING_TABLE_POINTERS_HS/DS offsets in gen7_disable instead of > zero. We don't have to allocate space for the garbage hardware binding > table because it gets overwritten with the valid pointer anyway. This > approach worked perfectly as well. > > -Abdiel
It sounds like perhaps the hardware is always writing out 256-byte entries. However, once they're written, we're free to *overwrite* the unused portions when writing the next table. Your suggestion sounds like a reasonable solution. gen7_disable_stages is going away soon anyway - Chris is going to wire up tessellation for real. Don't forget to fix the possible buffer overflow - I think every time we emit a binding table, we should make sure there's 256 * sizeof(uint16_t) bytes left in the buffer. If not, flush (and reset the offset). --Ken
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev