Paul Berry <stereotype...@gmail.com> writes: > On 27 October 2013 12:58, Marek Olšák <mar...@gmail.com> wrote: > >> On Sun, Oct 27, 2013 at 4:19 PM, Paul Berry <stereotype...@gmail.com> >> wrote: >> > On 20 October 2013 09:20, Marek Olšák <mar...@gmail.com> wrote: >> >> >> >> From: Marek Olšák <marek.ol...@amd.com> >> >> >> >> This avoids a defect in lower_output_reads. >> >> >> >> The problem is lower_output_reads treats the gl_FragData array as a >> single >> >> variable. It first redirects all output writes to a temporary variable >> >> (array) >> >> and then writes the whole temporary variable to the output, generating >> >> assignments to all elements of gl_FragData. >> > >> > >> > Can you go into more detail about why this is a problem? At first >> glance it >> > seems like it should be fine, because failing to assign to an element of >> > gl_FragData is supposed to result in undefined data going to that >> fragment >> > output. So lowering to a shader that does assign to that element of >> > gl_FragData seems like it should be harmless. What am I missing here? >> >> Thanks for the review. The problem is drivers cannot eliminate useless >> writes to gl_FragData, and each enabled gl_FragData output decreases >> performance. The GLSL compiler cannot eliminate the writes either, >> because gl_FragData is an array. >> >> Maybe the GLSL compiler should resize arrays based on which elements >> are written, so that unused elements are not declared, but this is not >> enough for gl_FragData, where the i-th output can be written, but >> (i-1)-th output doesn't have to be. >> > > Ah, ok. When I saw the word "defect", I misunderstood you to be fixing a > correctness-of-rendering bug. As a performance optimization, I get it now. > > For driver back-ends that don't need lower_output_reads (such as i915 and > i965), this optimization isn't needed. Would you mind adding a flag to > ShaderCompilerOptions so that tgsi-based drivers can opt in to this new > optimization? I want to make sure that the code generation of i965 and > i915 isn't affected.
Actually, if Marek has identified that applications not setting all their gl_FragData[] is a performance issue, I want to see this applied to i965, too. This optimization would remove entire FB_WRITE opcodes for us, which could be pretty significant if applications hit this case.
pgpEyWX41KscK.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev