On Mon, Dec 21, 2015 at 11:45 AM, Marek Olšák <mar...@gmail.com> wrote: > On Mon, Dec 21, 2015 at 4:38 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Mon, Dec 21, 2015 at 6:39 AM, Marek Olšák <mar...@gmail.com> wrote: >>> On Mon, Dec 21, 2015 at 6:48 AM, Jason Ekstrand <ja...@jlekstrand.net> >>> wrote: >>>> >>>> On Dec 20, 2015 7:43 PM, "Rob Clark" <robdcl...@gmail.com> wrote: >>>>> >>>>> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott <cwabbo...@gmail.com> >>>>> wrote: >>>>> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdcl...@gmail.com> wrote: >>>>> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand <ja...@jlekstrand.net> >>>>> >> wrote: >>>>> >>> >>>>> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdcl...@gmail.com> wrote: >>>>> >>>> >>>>> >>>> From: Rob Clark <robcl...@freedesktop.org> >>>>> >>>> >>>>> >>>> Jason, >>>>> >>>> >>>>> >>>> How much do you hate this idea? Seems like an easy alternative to >>>>> >>>> using ralloc ctx's to clean up nir variants/clones, which would let >>>>> >>>> us drop the parent memctx for nir_shader_create()/clone(), making >>>>> >>>> it easier to introduce reference counting. >>>>> >>> >>>>> >>> I think "hate" is a but strong. I don't like it but it works. If we >>>>> >>> really >>>>> >>> want nir_shader refcounted, we'll have to do something. >>>>> >> >>>>> >> I suppose the alternate idea of moving the nir_shader_clone() out of >>>>> >> brw_compile_xyz(), and always passing in the clone would be a cleaner >>>>> >> way. It looks like each of the brw_compile_xyz() has exactly one >>>>> >> call-site, so doing the nir_shader_clone() inside doesn't really buy >>>>> >> anything. >>>> >>>> Your forgetting that there may be *cough* other users of this API... We can >>>> change those too but I would like the needs of the compiler users to drive >>>> the API, not the cloning. I still have some details to work out there. In >>>> any case, it doesn't really matter; we can figure something out. >>>> >>>>> >>> About refcounting... The more I think about it the more I'm not >>>>> >>> convinced >>>>> >>> it's useful. As it stands, we have no use for it an I'm not convinced >>>>> >>> you >>>>> >>> do either. We'll see if I can convince you. :-) >>>>> >>> >>>>> >>> In the history of i965 using NIR, we've had about three different ways >>>>> >>> of >>>>> >>> doing things: >>>>> >>> >>>>> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every >>>>> >>> shader/variant >>>>> >>> compile. This is what we did when we first stated using NIR because >>>>> >>> it was >>>>> >>> easy and didn't involve reworking any plumbing. >>>>> >>> >>>>> >>> 2) Lowered NIR is the gold copy; variants are done entirely in the >>>>> >>> back-end >>>>> >>> IR. This is what we did up until about a month ago. Because variants >>>>> >>> are >>>>> >>> done in the back-end, we can run gksl_to_nir and do all of our >>>>> >>> optimizing >>>>> >>> and lowering at link time. Going from NIR to the final shader binary >>>>> >>> is >>>>> >>> then a read-only operation as far as NIR is concerned. >>>>> >>> >>>>> >>> 3) Optimized but not lowered NIR is the gold copy; variants are >>>>> >>> sometimes >>>>> >>> done in NIR. This is the scheme we use now. We call glsl_to_nir and >>>>> >>> do >>>>> >>> some of the optimization and lowering at link time but leave it in SSA >>>>> >>> form. >>>>> >>> When we go to compile the final shader, we make a copy, apply >>>>> >>> variants, do >>>>> >>> the final lowering and then go into the back-end IR. >>>>> >>> >>>>> >>> In each of these cases, we know exactly where we need to make a copy >>>>> >>> without >>>>> >>> the help of reference counting. In the first, we get a fresh copy >>>>> >>> each time >>>>> >>> so we are free to destroy the copy. In the second, we never have to >>>>> >>> modify >>>>> >>> the NIR so no copy. In the third scheme, we always have to make a >>>>> >>> copy >>>>> >>> because, even if variants are a no-op, we still have to go out of SSA >>>>> >>> form >>>>> >>> and do final lowering. You could say that we could avoid making that >>>>> >>> copy. >>>>> >>> However, the work to determine when we don't need variants and can do >>>>> >>> all >>>>> >>> our lowering up-front is far more than the work saved by reference >>>>> >>> counting. >>>>> >>> >>>>> >>> How about gallium? Here's how I imagine it would work (please correct >>>>> >>> me of >>>>> >>> I'm wrong): >>>>> >>> >>>>> >>> 1) In the TGSI case, tgsi_to_nir gets called for each compile so you >>>>> >>> get a >>>>> >>> fresh mutable shader each time. In this case, you are free to do >>>>> >>> whatever >>>>> >>> you want with the shader without making a copy. >>>>> >>> >>>>> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic >>>>> >>> optimizations at >>>>> >>> link time and hold onto the NIR shader. (Hold a reference of you'd >>>>> >>> like.) >>>>> >>> When you go to compile it in the back-end, it needs to do it's own >>>>> >>> lowering >>>>> >>> so it takes a reference and ends up making a copy. >>>>> >>> >>>>> >>> If this description is anywhere close to correct, then I don't think >>>>> >>> you >>>>> >>> really need it either. Determining whether or not you need to copy is >>>>> >>> simply "if (comes_from_tgsi)”. Maybe there's something subtle about >>>>> >>> the >>>>> >>> gallium layer that I don't know that makes refcounting the best >>>>> >>> solution. >>>>> >>> Please enlighten me of there is. >>>>> >> >>>>> >> This issue is that we *potentially* have both the state tracker and >>>>> >> the driver both doing some of there own variant management. (Which >>>>> >> tbh, isn't awesome, it would have been nice if someone realized >>>>> >> earlier on that nearly every driver is going to have to do some sort >>>>> >> of variant mgmt and figured out a way just to push it all down to the >>>>> >> driver.. but I can't see a good way to get there from here.) >>>>> >> >>>>> >> With TGSI as the IR, driver just unconditionally does >>>>> >> tgsi_dup_tokens().. because of the whole thing where st does variants >>>>> >> in some cases, things are defined that driver doesn't own the copy of >>>>> >> the TGSI IR passed in after the fxn call to driver returns. >>>>> >> >>>>> >> With NIR I was hoping to fix this, esp. since nir_shader_clone() is >>>>> >> more heavyweight than tgsi_dup_tokens() (memcpy()). >>>>> >> >>>>> >> Refcnt'ing is a nice solution so that we can pass the driver a >>>>> >> reference that it owns. In cases where state tracker isn't doing >>>>> >> variant mgmt, we pass it the one-and-only ref (avoiding clone). >>>>> >> >>>>> >> I'd suggested that in cases where st does variant mgmt, that st should >>>>> >> do the clone/dup. But that was shot down: >>>>> >> >>>>> >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html >>>> >>>> It sounds like Marek's argument there is more about lifetime management >>>> than >>>> anything. He wants gallium modules to be able to create IR, call into the >>>> driver, and then throw it away. In particular, he doesn't want them to >>>> have >>>> to think about cloning. In a lot of ways it sounds a lot like what i965 >>>> wants too. I really like having brw_compile_foo take a const nir_shader. >>>> The difference is that i965 basically always wants to clone whereas a >>>> gallium driver may not have to if gallium doesn't care what happens to the >>>> shader when it's done. How common is this case? How important is it to >>>> optimize for? I don't know. >>>> >>>> One other thing that bothers me a bit: From Marek's comment, it sounds >>>> like >>>> the components want to just pass in IR and be agnostic about whether the >>>> driver wants its own copy or wants to change it or whatever. This seems >>>> like an argument for always cloning to me. From the perspective of a >>>> gallium module, "I want to hang in to this, I'll keep a reference" seems >>>> exactly the same as "I want to hang onto this, I'll give the driver a >>>> copy". >>>> How are they actually different given that the driver basically has to >>>> modify what you give it in order to do lowering? >>>> >>>>> > Ugh... I didn't read this at the time, but I don't like Marek's >>>>> > response. My understanding of the situation, based on this thread, is >>>>> > that there are some cases where the st knows that there's only going >>>>> > to be one variant and can throw away the (NIR or TGSI) shader after it >>>>> > hands it to the driver, while at other times it has to hold onto all >>>>> > the variants and only give the driver a read-only copy (or duplicate >>>> >>>> As per above, my interpretation of Marek's comment is that he doesn't want >>>> the st to have to think about cloning ever. He wants it to assume that >>>> compilation never modifies the IR so the driver should always clone. You >>>> have to keep in mind that Marek is most likely thinking about caching the >>>> TGSI rather than doing in-place lowering in it. >>>> >>>> If I'm understanding Marek correctly, then it sounds like shader >>>> compilation >>>> should never touch the IR that's passed in. If this is the case, it sounds >>>> like always cloning is the way to go. At least its not *that* expensive. >>> >>> Note that st/mesa needs to keep the FS IR because of glDrawPixels and >>> glBitmap, and the VS IR because of edge flags, glRasterPos evaluation, >>> selection and feedback modes. The last three are done with Draw/LLVM >>> and only support TGSI. >>> >>> Therefore, st/mesa always hangs onto the IR and drivers can't modify >>> it. It also needs VS in TGSI to be able to do everything correctly. >>> >>> What other Gallium modules want or not want is not that important, but >>> changing the current semantics will require fixing a lot of places. >>> (state trackers - mesa, nine, xa; modules - blitter, draw, hud, >>> postprocess, tests, vl) >>> >>> You really better think about whether changing all those and the risk >>> of breaking them is worth it. >>> >>> Marek >> >> Well, we're talking about passing NIR here, not TGSI, so none of those >> places will need to be updated. NIR is a much more heavyweight IR, and >> copying is much more expensive, so it makes more sense there for the >> st to duplicate it and let the driver own the IR it passes in, to >> reduce copying when there's no variant management necessary. > > My main concern was about TGSI, not so much about NIR.
Yes, exactly -- we don't plan on changing the semantics of passing in TGSI, so this only matters when the user passes in a NIR shader. > > If gallium drivers want to take ownership of the IR in > create_xx_state, so be it, but they must be allowed to release it > immediately if they don't want to keep it. > > Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev