Iago Toral <ito...@igalia.com> writes: > On Wed, 2015-06-17 at 17:20 -0700, Jordan Justen wrote: >> I wanted to question whether this was required, based on this text >> from the extension spec: >> >> "The ability to write to buffer objects creates the potential for >> multiple independent shader invocations to read and write the same >> underlying memory. The same issue exists with the >> ARB_shader_image_load_store extension provided in OpenGL 4.2, which >> can write to texture objects and buffers. In both cases, the >> specification makes few guarantees related to the relative order of >> memory reads and writes performed by the shader invocations." >> >> But I'm not sure if we can reconcile CSE with 'memoryBarrier' and >> 'barrier'. curro, any thoughts from image load/store? > > I think the problem is within the same thread, that text above talks > about multiple invocations reading from and writing to the same > location, but within the same invocation, the order of reads and writes > must be preserved: > > "Buffer variable memory reads and writes within a single shader > invocation are processed in order. However, the order of reads and > writes performed in one invocation relative to those performed by > another invocation is largely undefined." > > For example, if X is a shader storage buffer variable and we have code > like this with just one invocation: > > ssbo_store(X, 1); > a = ssbo_load(X) + 1 // a = 2 > ssbo_store(X, 2); > b = ssbo_load(X) + 1; // b = 3 > > CSE could mess it up like this: > > ssbo_store(X, 1); > tmp = ssbo_load(X) + 1 // tmp = 2 > a = tmp; > ssbo_store(X, 2); > b = tmp; > > which would be incorrect. I think I wrote this patch after seeing > something like this happening. The CSE pass clearly states that it does > not support write variables after all. > > Also, notice the same would apply if there are multiple invocations but > the shader code used something like gl_VertexID or gl_FragCoord to make > each invocation read from/write to a different address within the SSBO > buffer (I imagine this is the usual way to operate with SSBOs). In these > cases, even if we have multiple invocations, keeping the relative order > of reads and writes within each one is necessary. >
AFAICT the reason why this (and many of the other changes in GLSL optimization passes) is needed is because SSBO loads have been implemented as ir_expression nodes instead of being lowered into intrinsics (as other side-effectful operations do like ARB_shader_image_load_store and ARB_shader_atomic_counters). This surely broke the assumption of a number of optimization passes that ir_expression nodes behave as pure functions. I guess the reason why you've done it this way is because UBO loads were already being represented as expressions, so I see why you may have wanted to use the same approach for SSBOs even though there is a fundamental difference between the two: UBO loads have no side effects and are constant for a given set of arguments and a given shader execution, SSBO loads and stores are not. SSBO stores couldn't be accommodated into the same framework so easily, and you decided to create a separate ir node for them, what seems inconsistent with loads. Intrinsics would probably have been a good fit for both loads and stores, and would have made all these optimization changes unnecessary... P.S.: Sorry for the late reply, I was on vacation when I was CC'ed. > Iago > >> -Jordan >> >> On 2015-06-03 00:01:13, Iago Toral Quiroga wrote: >> > SSBOs are read/write and this CSE pass only handles read-only variables. >> > --- >> > src/glsl/opt_cse.cpp | 33 ++++++++++++++++++++++++++++++++- >> > 1 file changed, 32 insertions(+), 1 deletion(-) >> > >> > diff --git a/src/glsl/opt_cse.cpp b/src/glsl/opt_cse.cpp >> > index 4b8e9a0..a05ab46 100644 >> > --- a/src/glsl/opt_cse.cpp >> > +++ b/src/glsl/opt_cse.cpp >> > @@ -245,6 +245,28 @@ contains_rvalue(ir_rvalue *haystack, ir_rvalue >> > *needle) >> > } >> > >> > static bool >> > +expression_contains_ssbo_load(ir_expression *expr) >> > +{ >> > + if (expr->operation == ir_binop_ssbo_load) >> > + return true; >> > + >> > + for (unsigned i = 0; i < expr->get_num_operands(); i++) { >> > + ir_rvalue *op = expr->operands[i]; >> > + if (op->ir_type == ir_type_expression && >> > + expression_contains_ssbo_load(op->as_expression())) { >> > + return true; >> > + } else if (op->ir_type == ir_type_swizzle) { >> > + ir_swizzle *swizzle = op->as_swizzle(); >> > + ir_expression *val = swizzle->val->as_expression(); >> > + if (val && expression_contains_ssbo_load(val)) >> > + return true; >> > + } >> > + } >> > + >> > + return false; >> > +} >> > + >> > +static bool >> > is_cse_candidate(ir_rvalue *ir) >> > { >> > /* Our temporary variable assignment generation isn't ready to handle >> > @@ -260,7 +282,16 @@ is_cse_candidate(ir_rvalue *ir) >> > * to variable-index array dereferences at some point. >> > */ >> > switch (ir->ir_type) { >> > - case ir_type_expression: >> > + case ir_type_expression: { >> > + /* Skip expressions involving SSBO loads, since these operate on >> > + * read-write variables, meaning that the same ssbo_load >> > expression >> > + * may return a different value if the underlying buffer storage >> > + * is written in between. >> > + */ >> > + if (expression_contains_ssbo_load(ir->as_expression())) >> > + return false; >> > + } >> > + break; >> > case ir_type_texture: >> > break; >> > default: >> > -- >> > 1.9.1 >> > >> > _______________________________________________ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev