Am 29.08.2014 22:44, schrieb Ilia Mirkin: > Hello, > > I've been thinking a bit about how to properly implement TCS outputs > in TGSI. As a quick reminder, there are per-vertex (i.e. invocation) > and per-patch outputs in TCS. And while you can only write to the > current invocation's per-vertex outputs, you can read from any of > them. (With barrier() used to synchronize invocations.) > > Per-patch outputs map quite nicely onto the existing infrastructure, > so the rest of the questions will be about per-vertex outputs. > > One can represent per-vertex outputs as 2D output arrays. That means > support for them needs to be added all over (which I've actually done, > so I'm not complaining about the extra work but rather asking if it's > a good idea). And then you might have > > DCL OUT[][0], GENERIC > MOV ADDR[1].x, SV[0] /* invocation id */ > MOV OUT[ADDR[1].x][0], TEMP[0] /* store value */ > BARRIER > MOV TEMP[0], OUT[3][0] /* read output from invocation == 3 */ > > The advantage here is that it's all nice and consistent. However the > disadvantage is that we have to add a totally useless read of the > invocation id and use it as a relative index for the store. At least > the nvidia shaders don't even have a way of writing other invocations' > data even if they wanted to (without resorting to global memory > accesses). So it's complicating all sorts of logic for apparently no > real benefit. > > Another approach might be to bypass the invocation id on storing the > output, but using it on reads. For example code like > > DCL OUT[0], GENERIC > MOV OUT[0], TEMP[0] > BARRIER > MOV TEMP[0], OUT[3][0] > > This avoids having to teach tgsi about 2d outputs (esp reladdr ones). > This seems a lot simpler, but it ignores the gl_InvocationID indexing > that happens when writing the output. However I don't think that's so > bad. It also means that reads and writes are interpreted a little > differently for OUT's, but that doesn't seem so bad either. > > Thoughts? >
I think in the second case though it should be required to declare the inputs separately. It sounds to me like at least on nv50 the access works different in any case (even if the actual data accessed is the same). Though I have no idea how other hw handles this, but in any case hull shader from d3d11 uses 2d addressed inputs but 1d addressed outputs too - http://msdn.microsoft.com/en-us/library/windows/desktop/hh447211%28v=vs.85%29.aspx (though I don't know how that looks like at the ddi level). Probably GL used 2d outputs because it indeed looks more consistent (or perhaps some extension could lift the restriction that only the current invocation be written, though I'm not sure if that would ever make sense). So I think if it doesn't actually make sense to try writing to other outputs, option 2) makes more sense. I think though in this case the outputs should probably be strictly write-only, I'd guess it would get messy otherwise if you try to read some other invocations data vs. reading back the current one. But I don't really have much of an idea about tesselation, really. Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev