On Mon, Dec 28, 2015 at 4:23 PM, Connor Abbott <cwabbo...@gmail.com> wrote: > On Mon, Dec 28, 2015 at 3:25 PM, Rob Clark <robdcl...@gmail.com> wrote: >> On Mon, Dec 28, 2015 at 2:05 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: >>> >>> >>> On Mon, Dec 28, 2015 at 10:33 AM, Rob Clark <robdcl...@gmail.com> wrote: >>>> >>>> On Mon, Dec 28, 2015 at 1:20 PM, Jason Ekstrand <ja...@jlekstrand.net> >>>> wrote: >>>> > >>>> > >>>> > On Mon, Dec 28, 2015 at 9:37 AM, Connor Abbott <cwabbo...@gmail.com> >>>> > wrote: >>>> >> >>>> >> On Mon, Dec 28, 2015 at 10:13 AM, Rob Clark <robdcl...@gmail.com> >>>> >> wrote: >>>> >> > On Tue, Dec 22, 2015 at 10:11 PM, Connor Abbott <cwabbo...@gmail.com> >>>> >> > wrote: >>>> >> >> On Tue, Dec 22, 2015 at 9:55 PM, Rob Clark <robdcl...@gmail.com> >>>> >> >> wrote: >>>> >> >>> On Tue, Dec 22, 2015 at 9:47 PM, Connor Abbott >>>> >> >>> <cwabbo...@gmail.com> >>>> >> >>> wrote: >>>> >> >>>> On Tue, Dec 22, 2015 at 9:02 PM, Rob Clark <robdcl...@gmail.com> >>>> >> >>>> wrote: >>>> >> >>>>> On Mon, Dec 21, 2015 at 1:48 PM, Jason Ekstrand >>>> >> >>>>> <ja...@jlekstrand.net> wrote: >>>> >> >>>>>> >>>> >> >>>>>> I think two different concepts of ownership are getting >>>> >> >>>>>> conflated >>>> >> >>>>>> here: >>>> >> >>>>>> Right/responsibility to delete and right to modify. >>>> >> >>>>>> >>>> >> >>>>>> The way I understand it, gallium, as it stands, gives neither to >>>> >> >>>>>> the driver. >>>> >> >>>>>> A back-end using NIR requires the right to modify but who >>>> >> >>>>>> deletes >>>> >> >>>>>> it doesn't >>>> >> >>>>>> ultimately matter. I think it's dangerous to pass one of these >>>> >> >>>>>> rights to >>>> >> >>>>>> the driver and not the other but we need to think about both. >>>> >> >>>>> >>>> >> >>>>> yeah, uneasy about driver modifying the IR if the state tracker >>>> >> >>>>> is >>>> >> >>>>> still going to potentially spin off variants of the IR.. that >>>> >> >>>>> sounds >>>> >> >>>>> like madness. >>>> >> >>>>> >>>> >> >>>>> The refcnt'ing I proposed does deal w/ right to modify vs delete >>>> >> >>>>> via >>>> >> >>>>> nir_shader(_is)_mutable() which returns something that is >>>> >> >>>>> guaranteed >>>> >> >>>>> to be safe to modify (ie. has only a single reference) >>>> >> >>>>> >>>> >> >>>>>> What I'm trying to say is that we have two options here: >>>> >> >>>>>> >>>> >> >>>>>> 1) gallium passes IR to the back-end that it is free to modify >>>> >> >>>>>> and >>>> >> >>>>>> is >>>> >> >>>>>> required to delete when it's done. >>>> >> >>>>> >>>> >> >>>>> with refcnt'ing, s/delete/unref/ >>>> >> >>>>> >>>> >> >>>>> The idea is, the st transfers ownership of the reference it >>>> >> >>>>> passes >>>> >> >>>>> to >>>> >> >>>>> the driver. If the st wants to hang on to a reference itself, it >>>> >> >>>>> must >>>> >> >>>>> increment the refcnt before passing to the driver (backend). >>>> >> >>>>> >>>> >> >>>>> Without refcnt'ing, I suppose we could (since we don't have to >>>> >> >>>>> follow >>>> >> >>>>> TGSI semantics), just decree that the driver always takes >>>> >> >>>>> ownership >>>> >> >>>>> of >>>> >> >>>>> the copy passed in, and if the st wants to hang on to a copy too, >>>> >> >>>>> then >>>> >> >>>>> it must clone. I suppose this would work well enough for >>>> >> >>>>> freedreno/vc4, which both end up generating variants later. It >>>> >> >>>>> does >>>> >> >>>>> force an extra clone for drivers that immediately translate into >>>> >> >>>>> their >>>> >> >>>>> own backend IR and don't need to keep the NIR around, for >>>> >> >>>>> example. >>>> >> >>>>> Maybe that is not worth caring about (since at this point it is >>>> >> >>>>> hypothetical). >>>> >> >>>> >>>> >> >>>> While always cloning does have this disadvantage, I don't think >>>> >> >>>> it's >>>> >> >>>> really relevant here. Even if the driver throws away the NIR >>>> >> >>>> immediately after consuming it, almost invariably it's going to >>>> >> >>>> want >>>> >> >>>> to modify it. The generic NIR passed in by the state tracker >>>> >> >>>> (other >>>> >> >>>> IR -> NIR + some optimizations) is almost never going to be the >>>> >> >>>> same >>>> >> >>>> as the NIR after going through driver-specific lowering passes, >>>> >> >>>> which >>>> >> >>>> means that drivers are never going to want a read-only version of >>>> >> >>>> the >>>> >> >>>> IR. In light of that, I think making the driver own the IR passed >>>> >> >>>> in >>>> >> >>>> seems like the most sensible thing. >>>> >> >>> >>>> >> >>> well, unless the driver is already doing it's own lowering in it's >>>> >> >>> own >>>> >> >>> native IR.. >>>> >> >> >>>> >> >> Well, if you're not doing any lowering in NIR, then you aren't >>>> >> >> really >>>> >> >> taking any advantage of it. I can't see a plausible scenario where >>>> >> >> all >>>> >> >> the lowering is done in the driver's own IR -- and as soon as you do >>>> >> >> anything in NIR, you need the driver-owns-IR semantics. >>>> >> > >>>> >> > When it comes to shader variants, I have a mix, with some things >>>> >> > lowered in nir and others just handled in backend.. >>>> >> > >>>> >> > The re-work / cleanup that I have had on a branch for a while now >>>> >> > (since it is currently blocked on refcnt'ing) does a first round of >>>> >> > variant-key independent NIR lowering/opt passes. And then at draw >>>> >> > time, if the variant key has anything that is lowered in nir, I do a >>>> >> > second round. >>>> > >>>> > >>>> > Just to be clear, your key-dependent lowering happens after all of your >>>> > other lowering? If this is the case, then I guarantee you that you're >>>> > unique in this since i965 and vc4 need to at least run out-of-SSA >>>> > afterwards. To be honest, I completely forgot that a driver could use >>>> > fully >>>> > ssa NIR. >>>> >>>> It is a mix.. I do texcoord saturate, clip-plane, and 2-sided color >>>> lowering in NIR. But flat-shading, binning-pass, and half vs full >>>> precision color output in ir3. >>>> >>>> I do as much lowering in NIR as I can, in an effort to do as much as >>>> possible at compile time, vs draw time. I do the first round of >>>> lowering/opt w/ null shader key, which is enough for the common cases. >>>> >>>> Pretty much independent, I suppose, of whether I came out of SSA or >>>> not first. Although binning-pass variant and the instruction >>>> scheduling I do are easier in SSA. >>>> >>>> Somewhat unrelated, but I may end up converting array access to >>>> registers, but leave everything else in SSA, so I can benefit from >>>> converting multi-dimensional offsets into a single offset.. this is >>>> still one open issue w/ gallium glsl_to_nir.. right now I have a >>>> hacked up version of nir_lower_io that converts global/local >>>> load/store_var's into load/store_var2 which take an offset as a src >>>> (like load_input/store_output) instead of deref chain.. not sure yet >>>> whether this will be the permanent solution, but at least it fixes a >>>> huge heap of variable-indexing piglits and lets me continue w/ >>>> implementing lowering passes for everything else that was previously >>>> done in glsl->tgsi or tgsi->tgsi passes. >>> >>> >>> If you do this, you'll be back to always needing a mutable copy. Most >>> lowering and optimization passes die the moment they see a register. You'll >>> either have to go fix a bunch of stuff up to no-op properly or run >>> vars_to_regs after doing your NIR lowering but before going into your >>> backend IR. This means that your "gold copy" still has variables and you >>> always need to lower them to registers before you go into the backend. >> >> ugg.. but good point, thanks for pointing that out before I wasted >> another afternoon on yet another dead-end for handling deref's.. >> >> Ok, I guess I need to think of a better name than load/store_var2 for >> the new intrinsics ;-) > > I don't think that "you should throw away registers and use your own > thing" is what Jason wanted you to get out of that.
perhaps.. I was considering switching to registers for arrays. Although it would end up forcing an extra clone in the common case where there would otherwise not be one... a bit of a tough pill to swallow.. > Most of the > existing optimization passes barf on registers for a reason: registers > imply that you've gone from "consumer-agnostic NIR," i.e. what's > produced by gtn and operated on by generic optimizations, to your own > driver-specific thing, and any optimizations you're going to run are > only to clean up the result of the lowering passes, so you won't need > to run most of them. In the few cases where we do need an optimization > after lowering to registers, we've gone and fixed it up to no-op > things properly, but in general it's a lot easier and less confusing > to say "new optimization passes don't have to deal with registers" > than to make everyone go and add support for registers to their > passes. I'm not saying that adding a "here's my driver-specific > offset" thing to load/store_var would necessarily be a bad idea, but > don't just dismiss registers out-of-hand. Yeah, I'm not a big fan of making lowering/etc passes deal w/ registers unnecessarily. Seems like coming up w/ some way to lower load/store_var deref chains would be easier. BR, -R >> >> PS. I would really like to keep the nir_variable ptr in the intrinsic >> instruction, rather than matching things up w/ >> var->data.driver_location, which is rather clunky.. >> >> BR, >> -R >> >>>> >>>> >>>> BR, >>>> -R >>>> >>>> >> >>>> >> >>> >>>> >> >>> Maybe it is too much of a hypothetical.. I still think refcnt'ing >>>> >> >>> gives some nice flexibility to deal with various scenarios, and >>>> >> >>> having >>>> >> >>> to call nir_shader_unref() isn't so much of a burden. >>>> >> >> >>>> >> >> Still, I can't see how this flexibility is at all useful, and it >>>> >> >> seems >>>> >> >> like overkill since the driver will always want a mutable version of >>>> >> >> the IR anyways. >>>> >> > >>>> >> > Well, due to the structure I mentioned above, at draw time when I >>>> >> > need >>>> >> > to generate a variant with nothing lowered in NIR, I simply incr the >>>> >> > refcnt on the IR which has already gone through first round of NIR >>>> >> > passes, and pass that in to my back end compiler. At the end, once >>>> >> > the shader binary is generated, I can unconditionally unref the >>>> >> > nir_shader without having to care. >>>> >> > >>>> >> > Without refcnt'ing I'd either have to generate a pointless clone or >>>> >> > keep track that the nir_shader should not actually be free'd. Not >>>> >> > impossible, just a bit more ugly. >>>> >> >>>> >> Assuming you do all your variant management in your driver's IR, then >>>> >> you don't need to do anything. If you do some variant management in >>>> >> NIR, then in the function where you do the NIR-based lowering you can >>>> >> check if you need to do any lowering based on the shader key, clone >>>> >> first, and give the NIR->ir3 function the cloned IR and then free it. >>>> >> It might be a "bit more ugly," but it's really not that much different >>>> >> from the refcounting, and when the extra shader gets created/freed is >>>> >> made explicit. >>>> >> >>>> >> > >>>> >> > (The gallium glsl_to_nir stuff is also currently using refcnt'ing, >>>> >> > although at least for freedreno/ir3 it isn't strictly needed.. I >>>> >> > could >>>> >> > just unconditionally clone in the state tracker. That said, I'm >>>> >> > still >>>> >> > of the opinion that refcnt'ing could be useful to some other driver >>>> >> > someday) >>>> >> >>>> >> "It could be useful to some driver someday" isn't a good argument for >>>> >> adding stuff today. We've already had enough examples of things in NIR >>>> >> that we added because we thought it was useful, but turned out not to >>>> >> be. >>>> >> >>>> >> > >>>> >> > BR, >>>> >> > -R >>>> >> > >>>> >> >>> >>>> >> >>> BR, >>>> >> >>> -R >>>> >> >>> >>>> >> >>>>> >>>> >> >>>>> (I guess nouveau is the one driver, that if it ever consumed NIR, >>>> >> >>>>> would translate immediately into it's own backend IR?) >>>> >> >>>>> >>>> >> >>>>>> 2) gallium passes read-only IR to the back-end and it always >>>> >> >>>>>> makes >>>> >> >>>>>> a copy. >>>> >> >>>>> >>>> >> >>>>> This is how it is w/ TGSI, but I think with NIR we are free to >>>> >> >>>>> make >>>> >> >>>>> a >>>> >> >>>>> clean break. And we *definitely* want to avoid >>>> >> >>>>> the-driver-always-copies semantics.. >>>> >> >>>>> >>>> >> >>>>>> It sounds like, from what Marek is saying, that gallium is >>>> >> >>>>>> currently doing >>>> >> >>>>>> (2) and changing it to (1) would be painful. I think reference >>>> >> >>>>>> counting is >>>> >> >>>>>> more like an awkward option 1.5 than option 3. Reference >>>> >> >>>>>> counting >>>> >> >>>>>> would >>>> >> >>>>>> mean that gallium passes a reference to the driver which it is >>>> >> >>>>>> expected to >>>> >> >>>>>> unref but may keep a second reference if it wants to keep the >>>> >> >>>>>> driver from >>>> >> >>>>>> modifying it. Then the driver may or may not make a copy based >>>> >> >>>>>> on >>>> >> >>>>>> the >>>> >> >>>>>> number of references. Why don't we just make it explicit and >>>> >> >>>>>> add a >>>> >> >>>>>> read-only bit and call it a day. >>>> >> >>>>>> >>>> >> >>>>>> One of the reasons I don't like passing a reference is that it >>>> >> >>>>>> effectively >>>> >> >>>>>> puts allocation and freeing in different components of the >>>> >> >>>>>> driver. >>>> >> >>>>> >>>> >> >>>>> With refcnt'ing you should talk in terms of ref/unref rather than >>>> >> >>>>> allocate/free.. imho. Although maybe that is what you meant. >>>> >> >>>>> (In >>>> >> >>>>> which case, yes, that was my idea, that passing in to driver >>>> >> >>>>> transfers >>>> >> >>>>> ownership of the passed reference.) >>>> >> >>>>> >>>> >> >>>>>> This >>>> >> >>>>>> means that if and driver doesn't care at all about the shader >>>> >> >>>>>> that >>>> >> >>>>>> gets >>>> >> >>>>>> passed in, it still has to under it to avoid a memory leak. You >>>> >> >>>>>> can't have >>>> >> >>>>>> the driver take the reference because then, either it comes in >>>> >> >>>>>> with >>>> >> >>>>>> a >>>> >> >>>>>> recount of 0 and should have been deleted, or the "can I modify >>>> >> >>>>>> this" check >>>> >> >>>>>> becomes "recount <= 2" which makes no sense. >>>> >> >>>>> >>>> >> >>>>> hmm, no, if ownership of the reference is transferred to the >>>> >> >>>>> driver, >>>> >> >>>>> then it becomes "refcount == 1" (and refcount == 0 should be an >>>> >> >>>>> assert, because something has gone horribly wrong) >>>> >> >>>>> >>>> >> >>>>> BR, >>>> >> >>>>> -R >>>> > >>>> > >>> >>> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev