On Mon, Sep 1, 2014 at 12:47 PM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 01.09.2014 18:19, schrieb Ilia Mirkin: >> On Mon, Sep 1, 2014 at 12:00 PM, Roland Scheidegger <srol...@vmware.com> >> wrote: >>> Am 29.08.2014 22:44, schrieb Ilia Mirkin: >>>> Hello, >>>> >>>> I've been thinking a bit about how to properly implement TCS outputs >>>> in TGSI. As a quick reminder, there are per-vertex (i.e. invocation) >>>> and per-patch outputs in TCS. And while you can only write to the >>>> current invocation's per-vertex outputs, you can read from any of >>>> them. (With barrier() used to synchronize invocations.) >>>> >>>> Per-patch outputs map quite nicely onto the existing infrastructure, >>>> so the rest of the questions will be about per-vertex outputs. >>>> >>>> One can represent per-vertex outputs as 2D output arrays. That means >>>> support for them needs to be added all over (which I've actually done, >>>> so I'm not complaining about the extra work but rather asking if it's >>>> a good idea). And then you might have >>>> >>>> DCL OUT[][0], GENERIC >>>> MOV ADDR[1].x, SV[0] /* invocation id */ >>>> MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */ >>>> BARRIER >>>> MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */ >>>> >>>> The advantage here is that it's all nice and consistent. However the >>>> disadvantage is that we have to add a totally useless read of the >>>> invocation id and use it as a relative index for the store. At least >>>> the nvidia shaders don't even have a way of writing other invocations' >>>> data even if they wanted to (without resorting to global memory >>>> accesses). So it's complicating all sorts of logic for apparently no >>>> real benefit. >>>> >>>> Another approach might be to bypass the invocation id on storing the >>>> output, but using it on reads. For example code like >>>> >>>> DCL OUT[0], GENERIC >>>> MOV OUT[0], TEMP[0] >>>> BARRIER >>>> MOV TEMP[0], OUT[3][0] >>>> >>>> This avoids having to teach tgsi about 2d outputs (esp reladdr ones). >>>> This seems a lot simpler, but it ignores the gl_InvocationID indexing >>>> that happens when writing the output. However I don't think that's so >>>> bad. It also means that reads and writes are interpreted a little >>>> differently for OUT's, but that doesn't seem so bad either. >>>> >>>> Thoughts? >>>> >>> >>> I think in the second case though it should be required to declare the >>> inputs separately. It sounds to me like at least on nv50 the access >>> works different in any case (even if the actual data accessed is the >>> same). Though I have no idea how other hw handles this, but in any case >> >> On nvc0 there are load and store instructions (nv50 is a little >> different, but it also doesn't support tess). When storing, there's no >> way to provide it the invocation offset. When loading, there is. >> >>> hull shader from d3d11 uses 2d addressed inputs but 1d addressed outputs >>> too - >>> https://urldefense.proofpoint.com/v1/url?u=http://msdn.microsoft.com/en-us/library/windows/desktop/hh447211%28v%3Dvs.85%29.aspx&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0A&m=nYcD1FcBz0UnqCOOj%2B2wurf%2F3rjQNi1sQmGxNT2xfPQ%3D%0A&s=f81f9c26e90f61f613539e68b7a0cfe070451d77be957c6dc28b2107b03fe497 >>> (though I don't know how that looks like at the ddi level). Probably GL >> >> Hmmm... well from a quick read of it, they've bypassed this problem by >> creating substages with inputs consuming previous stages' outputs. > Doesn't exactly look like this to me. They still have this both as input > and output in multiple stages. > >> >>> used 2d outputs because it indeed looks more consistent (or perhaps some >>> extension could lift the restriction that only the current invocation be >>> written, though I'm not sure if that would ever make sense). >>> So I think if it doesn't actually make sense to try writing to other >>> outputs, option 2) makes more sense. I think though in this case the >>> outputs should probably be strictly write-only, I'd guess it would get >>> messy otherwise if you try to read some other invocations data vs. >>> reading back the current one. >> >> If they were write-only, how would you read another invocation's >> outputs? Or are you suggesting that some new input type be used which >> maps onto the invocations' outputs? > > Yes that's what d3d11 seems to do (as far as I can tell they just have > input control points and output control points). That's why you'd > declare it both as inputs and outputs, even though it is sort of the > same. Can't really tell though if this makes more sense as the gl model, > but this looks cleaner to me than accessing the same var differently (1d > output, 2d input).
One thing that occurred to me, and it's a problem with any approach that hides any aspect of what's going on, which is that you might have like out int foo[]; ... foo[gl_InvocationID] = ... if (...) foo[gl_InvocationID] += 1; Now, it would be nice if the += 1 step could be done without the (presumably expensive) shader input load, instead reusing whatever TEMP was used above. Not sure whether that's too important though. -ilia _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev