On Fri, Jul 10, 2015 at 9:53 AM, Francisco Jerez <curroje...@riseup.net> wrote: > Jason Ekstrand <ja...@jlekstrand.net> writes: > >> On Fri, Jul 10, 2015 at 5:25 AM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> Jason Ekstrand <ja...@jlekstrand.net> writes: >>> >>>> On Jul 9, 2015 7:57 AM, "Francisco Jerez" <curroje...@riseup.net> wrote: >>>>> >>>>> We were passing src0 alpha and oMask in reverse order. There seems to >>>>> be no good way to pass them in the correct order to the new-style >>>>> LOAD_PAYLOAD (how surprising) because src0 alpha is per-channel while >>>>> oMask is not. Just split src0 alpha in fixed-width registers and pass >>>>> them to LOAD_PAYLOAD as if they were part of the header as work-around >>>>> for now. >>>> >>>> Bah... I came across this when I did the LOAD_PAYLOAD rework but thought it >>>> was only theoretical. I wasn't very familiar with what omask actually did >>>> and, since piglit didn't hit it, I wasn't sure if it was a real problem or >>>> not. I probably should have done more digging and written a piglit test at >>>> the time. My bad. >>>> >>>> One solution that I proposed at the time was to turn header_size into >>>> header_mask in the obvious way. We can still use 8 bits because we should >>>> never have a header source higher than 8. >>>> >>> >>> So your idea is to have one bit per source indicating whether it's >>> header-like or per-channel? I don't think that extends to instructions >>> other than LOAD_PAYLOAD (e.g. FB_WRITE) where the same source is at the >>> same time header and payload. >> >> You're right, it doesn't. We really shouldn't be conflating them. We >> should have header_mask and header_present be different fields. Maybe >> use a union to save space, but they should have different semantic >> meaning and different names. We should probably also have a >> compr4_mask and get rid of the hackery there. >> >>> One bit per 32B register would extend >>> easily but it would be rather ugly to deal with if you want to keep your >>> code SIMD width-invariant. >>> >>> I think if you go with the per-source flag you'll want it to be in its >>> own subclass of fs_inst. With its own subclass you could even have an >>> array of per-source sizes determining the number of registers read for >>> each source, which would be rather nice for the visitor (no need to >>> split vectors into components while passing them to LOAD_PAYLOAD). >>> >>> Still I think the most elegant solution would be to simply get rid of >>> the header/payload distinction by using force_writemask_all and, if it >>> proves to be necessary, fix the optimizer to get rid of redundant >>> force_writemask_all flags where it doesn't do it already. >> >> I really don't think that's a good long-term or short-term solution. >> >> How badly are you blocking on this? I don't really have a lot of >> extra time to work on this at the moment but can carve some out if >> needed. > > I'm not blocking on this at all, feel free to fix it however you like, > or just go with this hack for the moment if you have higher priority > stuff to work on right now, I honestly don't care.
That's good to hear. I'll try and take a look at this in a couple of weeks. Thanks for bringing it up and writing the piglit test! --Jason >> --jason >> >>>> Thoughts? >>>> --Jason >>>> >>>>> I've written a piglit test that demonstrates the problem by using >>>>> gl_SampleMask from a fragment shader with multiple color outputs [1]. >>>>> >>>>> [1] http://lists.freedesktop.org/archives/piglit/2015-July/016499.html >>>>> --- >>>>> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 >>>> +++++++++++++++++--------- >>>>> 1 file changed, 17 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>>> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>>>> index 94d6a58..304ae74 100644 >>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp >>>>> @@ -1535,6 +1535,19 @@ fs_visitor::emit_single_fb_write(const fs_builder >>>> &bld, >>>>> length++; >>>>> } >>>>> >>>>> + if (src0_alpha.file != BAD_FILE && color0.file != BAD_FILE) { >>>>> + /* Neat, we need to chop the src0 alpha component and pass it as >>>> part of >>>>> + * the header even though it has per-channel semantics, because >>>> the next >>>>> + * optional field is header-like and LOAD_PAYLOAD requires all such >>>>> + * fields to form a contiguous segment at the beginning of the >>>> message. >>>>> + */ >>>>> + for (unsigned i = 0; i < exec_size / 8; i++) { >>>>> + setup_color_payload(&sources[length], src0_alpha, 1, 8, >>>>> + use_2nd_half || i == 1); >>>>> + length++; >>>>> + } >>>>> + } >>>>> + >>>>> prog_data->uses_omask = >>>>> prog->OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); >>>>> if (prog_data->uses_omask) { >>>>> @@ -1561,19 +1574,14 @@ fs_visitor::emit_single_fb_write(const fs_builder >>>> &bld, >>>>> offset(this->outputs[0], bld, 3), >>>>> 1, exec_size, false); >>>>> length += 4; >>>>> - } else if (color1.file == BAD_FILE) { >>>>> - if (src0_alpha.file != BAD_FILE) { >>>>> - setup_color_payload(&sources[length], src0_alpha, 1, exec_size, >>>> false); >>>>> - length++; >>>>> - } >>>>> - >>>>> - setup_color_payload(&sources[length], color0, components, >>>>> - exec_size, use_2nd_half); >>>>> - length += 4; >>>>> } else { >>>>> setup_color_payload(&sources[length], color0, components, >>>>> exec_size, use_2nd_half); >>>>> length += 4; >>>>> + >>>>> + } >>>>> + >>>>> + if (color1.file != BAD_FILE) { >>>>> setup_color_payload(&sources[length], color1, components, >>>>> exec_size, use_2nd_half); >>>>> length += 4; >>>>> -- >>>>> 2.4.3 >>>>> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev