Another fun example to try to express properly in TGSI: buffer foo { struct bar { coherent int a; int b; } asdf[10]; }
Now all of a sudden you have to worry about stride for the declarations. -ilia On Mon, Nov 2, 2015 at 2:07 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > I haven't the faintest idea about efficiently, but these things flags > on the ld/st instructions in the nvidia ISA for SM20+ (and I just > plain don't know about SM10). I'm moderately sure that's the case for > GCN as well. > > The difficulty with TGSI is that you might have something like > > layout (std430) buffer foo { > coherent int a; > int b; > } > > Now I don't remember if they get baked into the same vec4, but I think > they do. If they don't, then ARB_enhanced_layouts will fix that right > up. Since TGSI is vec4-oriented, it's really awkward to specify that > sort of thing... how would you do it? > > DECL BUFFER[0][0].x COHERENT > DECL BUFFER[0][0].y > > And then totally unrelated to the separate bits, you can end up with > > layout (std430) buffer foo { > int foo[5]; > } > > and I have no idea how to even express that in TGSI -- it'd want > things to be aligned to 16 bytes, but it'll be packed tightly here. > This worked OK for layout (std140), but won't work with more advanced > layouts. This will be a problem for UBOs too -- perhaps we need to > allow something like > > LOAD dst, CONST[1][0], offset > > to account for that. And lastly, ssbo allows for something like > > layout (std430) buffer foo { > int foo[]; > } > > And you can access foo[anything-you-want] -- difficult to declare that > in TGSI. I could invent stuff for all of these situations, but it > seems to be a lot easier to just feed the data to load and forget > about it. That's how it's all encoded in the GLSL IR as well. > > -ilia > > > On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger <srol...@vmware.com> wrote: >> I don't know much about ssbo, but since it looks like in glsl the >> coherent etc. bits are on the variables, not the ops, it seems unnatural >> to mark the op bits instead. So I'd guess it would be better if the >> variables could be marked instead. If this isn't expressible in tgsi >> maybe this needs to be fixed. Albeit I have to say it sounds odd to me >> from a hw perspective if this variables with different bits can be >> stuffed together and then the hw is expected to handle that efficiently... >> >> Roland >> >> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin: >>> Just wanted to note down some thoughts and get some feedback before >>> going forward. I've already sent out a series which covered a lot of >>> this, but in the end I realized it came up a bit short (available at >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg&e= >>> ). >>> >>> There are two separate buffer-related features -- >>> ARB_shader_atomic_counters(_ops) and >>> ARB_shader_storage_buffer_objects. The former are implementable more >>> efficiently on EG/NI hardware by performing the atomic ops on >>> not-main-memory (GDS? LDS?). However I think that the gallium-side >>> interface can be mostly identical for both cases, perhaps we can mark >>> the buffer as atomic-only in the TGSI. >>> >>> Just like there is a CONST tgsi file, I want to add a BUFFER file, >>> which will map to ->set_shader_buffers() indices. The tricky bit comes >>> in from the fact that individual variables inside of a buffer may have >>> different access/store properties. I see two ways to resolve this: >>> >>> 1. Declare each variable explicitly, much like UBO's still get >>> individual decls per slot. These decls could contain the relevant >>> caching property. >>> >>> 2. Make each LOAD/STORE op declare what caching it wants explicitly. >>> >>> The first option would work well for images, but for ssbo, it feels >>> problematic, as with all the various packing options that exist, you >>> could still specify odd per-variable cache rules, which would be >>> difficult to express in the TGSI DECL. However I'm not sure how to >>> implement the second option. >>> >>> There is a precedent of a saturate flag, but looking at >>> tgsi_instruction, there are only 2 free bits. Since there are only 4 >>> different caching values (none, coherent, volatile, restrict; I'm not >>> counting readonly/writeonly), this fits. However that would leave no >>> more bits in tgsi_instruction. I could add a texture-style bit, saying >>> to expect an additional tgsi_instruction_buffer packet with more info >>> but that seems wasteful. >>> >>> Another option is to just pass an immediate directly to the LOAD/STORE >>> ops which would specify this caching spec as an extra source. This >>> seems much simpler, but a little dirtier. Opinions much appreciated. >>> >>> I think that one this is worked out, I'll be able to resend my series >>> adding SSBO/atomic support to freedreno, and partial SSBO (without >>> atomic*) support for nvc0. >>> >>> Cheers, >>> >>> -ilia >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=OnyoWgHxyrDIN6esIAWVu0pQP5Mk8Iz3wNrzeeuTbvo&e= >>> >> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev