I have been thinking about this more and I actually like the way OpenGL does it. The indexing with InvocationID can be lowered with a copy propagation pass for drivers that cannot do it - or they can just ignore the innermost index and assume it's always equal to InvocationID. I also prefer having readable shader outputs.
One little ugly thing right now is that patch outputs are one-dimensional and vertex outputs are 2-dimensional. So you normally get: OUT[][0], POSITION OUT[1], PATCH OUT[2], PATCH1 OUT[][3], GENERIC OUT[4], TESSINNER OUT[5], TESSOUTER We can either leave it this way and assume that if an output access is 2-dimensional, it's per-vertex, otherwise it's per-patch. Or we can add another file for per-vertex data. The same applies to shader inputs and I think we have had this since geometry shaders: IN[][0], POSITION IN[1], PRIMITIVEID Not to say that indirect addressing into outputs is a mess. For that, it would be better to have a strict mapping from outputs to semantics, e.g. OUT0[i] == PATCHi and OUT1[][j] == GENERICj. Alternatively, we can just explicitly use semantic names in the shader code, e.g.: MOV OUT.GENERIC[ADDR.x], TEMP[0] But that would be a lot of work and I'd rather not delay upstreaming tessellation because of this. Thoughts? Marek On Fri, Aug 29, 2014 at 10:44 PM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > Hello, > > I've been thinking a bit about how to properly implement TCS outputs > in TGSI. As a quick reminder, there are per-vertex (i.e. invocation) > and per-patch outputs in TCS. And while you can only write to the > current invocation's per-vertex outputs, you can read from any of > them. (With barrier() used to synchronize invocations.) > > Per-patch outputs map quite nicely onto the existing infrastructure, > so the rest of the questions will be about per-vertex outputs. > > One can represent per-vertex outputs as 2D output arrays. That means > support for them needs to be added all over (which I've actually done, > so I'm not complaining about the extra work but rather asking if it's > a good idea). And then you might have > > DCL OUT[][0], GENERIC > MOV ADDR[1].x, SV[0] /* invocation id */ > MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */ > BARRIER > MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */ > > The advantage here is that it's all nice and consistent. However the > disadvantage is that we have to add a totally useless read of the > invocation id and use it as a relative index for the store. At least > the nvidia shaders don't even have a way of writing other invocations' > data even if they wanted to (without resorting to global memory > accesses). So it's complicating all sorts of logic for apparently no > real benefit. > > Another approach might be to bypass the invocation id on storing the > output, but using it on reads. For example code like > > DCL OUT[0], GENERIC > MOV OUT[0], TEMP[0] > BARRIER > MOV TEMP[0], OUT[3][0] > > This avoids having to teach tgsi about 2d outputs (esp reladdr ones). > This seems a lot simpler, but it ignores the gl_InvocationID indexing > that happens when writing the output. However I don't think that's so > bad. It also means that reads and writes are interpreted a little > differently for OUT's, but that doesn't seem so bad either. > > Thoughts? > > -ilia > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev