FTR these are the various operators on nvidia hw: http://docs.nvidia.com/cuda/parallel-thread-execution/#cache-operators
Most of these map directly to instruction things (ca/cg/cs/cv sound familiar, dunno about lu, could just be an assembler helper). How backwards-compatible is TGSI supposed to be? Can we change the encoding willy-nilly, or are there separate systems that talk to each other using TGSI that would need coordination? -ilia On Mon, Nov 2, 2015 at 2:49 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Ok, I guess if it's really flagged on the instructions in hw, it seems > reasonable to do it on the instructions in tgsi as well. > Using the last two bits there doesn't sound nice indeed (in particular > if maybe you'd wanted to encode the read/write bits as well at some > point too), but it's not THAT bad I think. We can scrap some bits later > if needed from it (token type is 4 bits but never larger than 3, NumSrcs > could easily do with 3 instead of 4 bits too and at some point the > predicate bit can go too). Albeit an extra token might be a good option > too (if you decided to add those r/w bits...) > > Though I still don't quite understand how gpus can do that efficiently > if you can do different flags with data which might be in the same cache > line. But maybe it's less of a problem than I thought... > > Roland > > > Am 02.11.2015 um 20:07 schrieb Ilia Mirkin: >> I haven't the faintest idea about efficiently, but these things flags >> on the ld/st instructions in the nvidia ISA for SM20+ (and I just >> plain don't know about SM10). I'm moderately sure that's the case for >> GCN as well. >> >> The difficulty with TGSI is that you might have something like >> >> layout (std430) buffer foo { >> coherent int a; >> int b; >> } >> >> Now I don't remember if they get baked into the same vec4, but I think >> they do. If they don't, then ARB_enhanced_layouts will fix that right >> up. Since TGSI is vec4-oriented, it's really awkward to specify that >> sort of thing... how would you do it? >> >> DECL BUFFER[0][0].x COHERENT >> DECL BUFFER[0][0].y >> >> And then totally unrelated to the separate bits, you can end up with >> >> layout (std430) buffer foo { >> int foo[5]; >> } >> >> and I have no idea how to even express that in TGSI -- it'd want >> things to be aligned to 16 bytes, but it'll be packed tightly here. >> This worked OK for layout (std140), but won't work with more advanced >> layouts. This will be a problem for UBOs too -- perhaps we need to >> allow something like >> >> LOAD dst, CONST[1][0], offset >> >> to account for that. And lastly, ssbo allows for something like >> >> layout (std430) buffer foo { >> int foo[]; >> } >> >> And you can access foo[anything-you-want] -- difficult to declare that >> in TGSI. I could invent stuff for all of these situations, but it >> seems to be a lot easier to just feed the data to load and forget >> about it. That's how it's all encoded in the GLSL IR as well. >> >> -ilia >> >> >> On Mon, Nov 2, 2015 at 1:56 PM, Roland Scheidegger <srol...@vmware.com> >> wrote: >>> I don't know much about ssbo, but since it looks like in glsl the >>> coherent etc. bits are on the variables, not the ops, it seems unnatural >>> to mark the op bits instead. So I'd guess it would be better if the >>> variables could be marked instead. If this isn't expressible in tgsi >>> maybe this needs to be fixed. Albeit I have to say it sounds odd to me >>> from a hw perspective if this variables with different bits can be >>> stuffed together and then the hw is expected to handle that efficiently... >>> >>> Roland >>> >>> Am 01.11.2015 um 23:45 schrieb Ilia Mirkin: >>>> Just wanted to note down some thoughts and get some feedback before >>>> going forward. I've already sent out a series which covered a lot of >>>> this, but in the end I realized it came up a bit short (available at >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_imirkin_mesa_commits_fd2&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=yJ3Ee990VBHMVTEQzdXBcPDd1ioo-BizrAGpP4kU-Cg&e= >>>> ). >>>> >>>> There are two separate buffer-related features -- >>>> ARB_shader_atomic_counters(_ops) and >>>> ARB_shader_storage_buffer_objects. The former are implementable more >>>> efficiently on EG/NI hardware by performing the atomic ops on >>>> not-main-memory (GDS? LDS?). However I think that the gallium-side >>>> interface can be mostly identical for both cases, perhaps we can mark >>>> the buffer as atomic-only in the TGSI. >>>> >>>> Just like there is a CONST tgsi file, I want to add a BUFFER file, >>>> which will map to ->set_shader_buffers() indices. The tricky bit comes >>>> in from the fact that individual variables inside of a buffer may have >>>> different access/store properties. I see two ways to resolve this: >>>> >>>> 1. Declare each variable explicitly, much like UBO's still get >>>> individual decls per slot. These decls could contain the relevant >>>> caching property. >>>> >>>> 2. Make each LOAD/STORE op declare what caching it wants explicitly. >>>> >>>> The first option would work well for images, but for ssbo, it feels >>>> problematic, as with all the various packing options that exist, you >>>> could still specify odd per-variable cache rules, which would be >>>> difficult to express in the TGSI DECL. However I'm not sure how to >>>> implement the second option. >>>> >>>> There is a precedent of a saturate flag, but looking at >>>> tgsi_instruction, there are only 2 free bits. Since there are only 4 >>>> different caching values (none, coherent, volatile, restrict; I'm not >>>> counting readonly/writeonly), this fits. However that would leave no >>>> more bits in tgsi_instruction. I could add a texture-style bit, saying >>>> to expect an additional tgsi_instruction_buffer packet with more info >>>> but that seems wasteful. >>>> >>>> Another option is to just pass an immediate directly to the LOAD/STORE >>>> ops which would specify this caching spec as an extra source. This >>>> seems much simpler, but a little dirtier. Opinions much appreciated. >>>> >>>> I think that one this is worked out, I'll be able to resend my series >>>> adding SSBO/atomic support to freedreno, and partial SSBO (without >>>> atomic*) support for nvc0. >>>> >>>> Cheers, >>>> >>>> -ilia >>>> _______________________________________________ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=ZEO6K764MpKKCTrBFReM7jS6WlerLtMTWbj_OABE6K8&s=OnyoWgHxyrDIN6esIAWVu0pQP5Mk8Iz3wNrzeeuTbvo&e= >>>> >>> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev