On Thursday, June 04, 2015 02:03:21 AM Kenneth Graunke wrote: > On Thursday, June 04, 2015 11:38:52 AM Abdiel Janulgue wrote: > > > > On 06/02/2015 10:54 AM, Kenneth Graunke wrote: > > > On Monday, June 01, 2015 03:14:26 PM Abdiel Janulgue wrote: > [snip] > > I'm the one who is being funny here. After looking harder and then doing > > some archaeological digs in my previous RS enabling efforts. I came to > > the conclusion you are right. The reason the hardware *seems* to enforce > > this arbitrary offset is that I skipped out the "disable RS on state > > base address update" workaround. Now that I reintroduced it back, > > hardware works completely fine even from offset zero. I'll update the > > code in v2. > > Great! Scratch one mystery :) > > Thanks, Abdiel.
(To bring the mailing list up to speed: Abdiel mentioned on IRC tonight that this is actually still necessary---some Piglit tests worked with the offset removed, but real applications didn't work. I then noticed that even "shader_runner glsl-fs-texture2d.shader_test" breaks when hw_bt_start_offset = 0 - even with the workaround Abdiel mentions). Abdiel, I think I figured out why this is necessary. In gen[78]_disable_stages, we issue 3DSTATE_BINDING_TABLE_POINTERS_HS/DS packets with a "pointer" value of 0. In the software binding table case, this points to the start of the batch buffer, which is harmless because the disabled HS/DS won't read any surfaces. However, the hardware binding table case is different: upon receiving a 3DSTATE_BINDING_TABLE_POINTERS_XS packet, the hardware *writes* the current on-die binding table to the given offset. This is a maximum of 256 16-bit surface state pointers. My theory is that if we program legitimate binding tables at offset 0, they get clobbered when gen7_disable_stages says that the HS/DS binding tables should be written to offset 0. By starting at an offset of 256 * sizeof(uint16_t), we are essentially allocating a "dummy" binding table of maximum size. Three things I tried fixed the problem: 1. Remove 3DSTATE_BINDING_TABLE_POINTERS_HS/DS from gen7_disable_stages. We never tell the HW to write out HS/DS tables, so the PS table at offset 0 doesn't get clobbered. 2. Change those packets to use offset 16000 (something large). We write out useless HS/DS tables, but to an unused spot in the buffer, so they don't trash anything. 3. Move the gen7_disable_stages atom immediately after the gen7_hw_binding_tables atom in the list. Instead of writing VS/PS tables then clobbering them with HS/DS, we reverse the order: write garbage HS/DS tables, then clobber them with the (actually useful) PS table. This brought up a question: how does the hardware know how large of a table to write? Does it always write out all 256 entries? It certainly seems to, as far as I can tell. But that would mean that when increasing brw->hw_bt_pool.next_offset, we always need to add 256 * sizeof(uint16_t), even if the table only has a few useful entries. I'm a bit confused, because we're not doing that today...so shouldn't something have broken? --Ken
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev