On 07/23/2015 05:39 PM, Jason Ekstrand wrote: > On Thu, Jul 23, 2015 at 1:01 AM, Eduardo Lima Mitev <el...@igalia.com> wrote: >> On 07/23/2015 05:20 AM, Jason Ekstrand wrote: >>> On Wed, Jul 22, 2015 at 4:37 AM, Eduardo Lima Mitev <el...@igalia.com> >>> wrote: >>>> On 07/13/2015 01:57 PM, Jason Ekstrand wrote: >>>>> On Wed, Jul 8, 2015 at 11:54 PM, Eduardo Lima Mitev <el...@igalia.com> >>>>> wrote: >>>>>> On 06/30/2015 06:51 PM, Jason Ekstrand wrote: >>>>>>> On Fri, Jun 26, 2015 at 1:06 AM, Eduardo Lima Mitev <el...@igalia.com> >>>>>>> wrote: >>>>>>>> The index into the output_reg array where to store the destination >>>>>>>> register is >>>>>>>> fetched from the nir_outputs map built during nir_setup_outputs stage. >>>>>>>> >>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89580 >>>>>>>> --- >>>>>>>> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 17 +++++++++++++++-- >>>>>>>> 1 file changed, 15 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>>>>>> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>>>>>> index 8a2d335..55d4490 100644 >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp >>>>>>>> @@ -520,10 +520,23 @@ >>>>>>>> vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) >>>>>>>> } >>>>>>>> >>>>>>>> case nir_intrinsic_store_output_indirect: >>>>>>>> + has_indirect = true; >>>>>>>> /* fallthrough */ >>>>>>>> - case nir_intrinsic_store_output: >>>>>>>> - /* @TODO: Not yet implemented */ >>>>>>>> + case nir_intrinsic_store_output: { >>>>>>>> + int offset = instr->const_index[0]; >>>>>>>> + int output = nir_outputs[offset]; >>>>>>>> + >>>>>>>> + src = get_nir_src(instr->src[0], nir_output_types[offset]); >>>>>>>> + dest = dst_reg(src); >>>>>>>> + >>>>>>>> + dest.writemask = brw_writemask_for_size(instr->num_components); >>>>>>>> + >>>>>>>> + if (has_indirect) >>>>>>>> + dest.reladdr = new(mem_ctx) >>>>>>>> src_reg(get_nir_src(instr->src[1])); >>>>>>>> + >>>>>>>> + output_reg[output] = dest; >>>>>>> >>>>>>> I'm very confused about the amount of indirection going on here. It >>>>>>> seems to me that we should be setting these outputs up in >>>>>>> setup_outputs() rather than storring off a map from ints to other ints >>>>>>> and setting it up here. I didn't make this comment on the patch for >>>>>>> setup_outputs() because I wanted to wait to see it used before I >>>>>>> commented on it. >>>>>>> >>>>>>> I'm guessing you did it this way because the nir_assign_var_locations >>>>>>> is giving you bogus values. If so, then it might be better to just >>>>>>> assign variable locations in setup_outputs() rather than having a >>>>>>> remap table. The whole point of nir_lower_io is to make IO easy for >>>>>>> the back-end. If you need a re-map table, then it's no longer making >>>>>>> it easy and we need to think more about what's going on. >>>>>>> --Jason >>>>>>> >>>>>> >>>>>> That double indirection felt bad since the beginning, but it was needed >>>>>> to store the original variable's location (var->data.location). Let me >>>>>> explain: >>>>>> >>>>>> We are (re)using the plumbering in vec4_visitor to setup URB, so the >>>>>> only thing we need to do is to store the out register in "output_reg" >>>>>> map at the correct location. And that location is given by the original >>>>>> location in the shader (var->data.location). >>>>>> >>>>>> So, in this case, "nir_assign_var_locations" pass, which constructs >>>>>> var->data.driver_location, is not useful to us, except to give us >>>>>> consecutive indexes to construct the other map we have, the type map, >>>>>> which is needed to carry the correct type from the original variable to >>>>>> the output register. >>>>> >>>>> If nir_assign_var_locations isn't doing anything for you, don't call >>>>> it. You'll need to do something with var->data.driver_location. If >>>>> what you really want is var->data.location, then just copy that to >>>>> var->data.driver_location when you do nir_setup_outputs. Or >>>>> (depending on how the URB setup works, I don't actually know), put the >>>>> actual URB location in var->data.driver_location when you walk the >>>>> outputs. >>>>> >>>>> From there, you have two options. One would be to setup output_reg at >>>>> the same time with the correct types right away and emit a MOV when >>>>> you get a store_output. (Copy propagation should clean up the MOV.) >>>>> For what it's worth, I don't think the type matters; a URB write just >>>>> writes data to something so as long as you don't have a type mismatch >>>>> in a MOV, the hardware won't care. >>>>> >>>>> The other option, would be to directly emit the URB write in >>>>> store_output. At the moment, it may be better to take the first >>>>> option since that better matches what the FS does right now. But both >>>>> should work fine. >>>>> >>>> >>>> Thanks for these hints, they were very useful. >>>> >>>> I rewrote the implementation of store_output intrinsic to avoid the >>>> setup phase completely. The type, as you suggested, was not important as >>>> long as they match while MOVing the contents of output_reg. To guarantee >>>> that, I had to patch the emit_urb_slot() to guarantee the types always >>>> match. This code is shared with vec4_visitor, so it makes sense to move >>>> the safeguards there instead of having both backends provide the correct >>>> register type in output_reg entries. >>>> For reference, this is the patch that implements it: >>>> https://github.com/Igalia/mesa/commit/8c703937f285c0b3a1e7bf6681c7ed7fe09815aa >>> >>> Seems reasonable. >>> >>>> I also put var->data.location in const_index[1] of the intrinsic op, and >>>> disabled nir_assign_var_locations() for output variables, since I don't >>>> need var->data.driver_location. I could have used const_index[0], but I >>>> prefer to leave driver_location there, and use const_index[1], to avoid >>>> breaking any driver that rely on current layout of const_index (like >>>> FS-nir). I think it is a safer approach. >>> >>> You're not going to break anything by going through the output >>> variables and setting driver_location equal to location. The whole >>> point of driver_location is to store some backend-specific index for >>> the variable. In other words, to do exactly what you're doing. The >>> assign_var_locations calls are simply convenience functions for >>> setting the driver_location field. In other words, using >>> driver_location and const_index[0] is *exactly* what you should do. >>> >> >> Well, FS-nir relies on const_index[0] being data.driver_location. So at >> the very least I have to put a condition like: >> >> if (scalar) >> const_index[0] = var->data.driver_location >> else >> const_index[0] = var->data.location >> >> Otherwise we directly break our own FS-nir pass. >> >> My first implementation did that, but since this is common NIR code >> (theoretically) shared with other backends, putting var->data.location >> in const_index[0] for all non-scalar backends seemed like a bad idea. >> Specially considering that this is very dependent on the implementation >> of URB file in vec4_visitor, with the output_reg intermediate map and >> all. That's why I decided to play safe on pure-NIR side, having both >> driver_location and location available to backends. >> >> But if you think I can ignore this then I'm all for it too. > > What I meant was more like the following in brw_nir.c: > > if (is_scalar) { > assign_var_locations(&nir->outputs, &nir->num_outputs, true); > } else { > foreach_list_typed(nir_variable, var, node, &nir->outputs) > var->data.driver_location = var->data.location; > } > > and then just let nir_lower_io use the driver location all the time. > > Does that make more sense? >
It does, indeed. I will update the patch and send it again to the list as part of the v2 series that you started reviewing already. Thanks! >>> --Jason >>> >>>> All in all, the store_output implementation got much simpler. >>>> >>>>>> So, before knowing that I could modify nir_lower_io, my best shot at >>>>>> transferring the original variable location was to create this >>>>>> nir_outputs map. Now, what I have done is to put that value in >>>>>> const_index[1] of the intrinsic instruction, which was previously >>>>>> unused. What do you think? >>>>>> >>>>>> That removes the offset to offset map, but we still need the type map. >>>>>> >>>>>> About your comment on initializing the register during setup stage, I'm >>>>>> a bit confused: the register that we need to store is not available >>>>>> during setup stage, because we still don't have local registers >>>>>> allocated. >>>>> >>>>> What do you mean? Because you don't have the destination of the >>>>> output_write intrinsic allocated? Even if the register has a file of >>>>> BAD_FILE, you could still store the type there. Also, as I said >>>>> above, the hardware shouldn't care about the types of data. As long >>>>> as the URB write code doesn't accidentally do a float -> int >>>>> conversion or something, we should be fine. >>>>> --Jason >>>>> >>>>>>>> break; >>>>>>>> + } >>>>>>>> >>>>>>>> case nir_intrinsic_load_vertex_id: >>>>>>>> unreachable("should be lowered by lower_vertex_id()"); >>>>>>>> -- >>>>>> >>>>> >>>> >>> >> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev