Re: [Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Marek Olšák Mon, 21 Dec 2015 13:38:19 -0800

On Mon, Dec 21, 2015 at 7:48 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
>
> On Dec 21, 2015 9:09 AM, "Connor Abbott" <cwabbo...@gmail.com> wrote:
>>
>> On Mon, Dec 21, 2015 at 11:45 AM, Marek Olšák <mar...@gmail.com> wrote:
>> > On Mon, Dec 21, 2015 at 4:38 PM, Connor Abbott <cwabbo...@gmail.com>
>> > wrote:
>> >> On Mon, Dec 21, 2015 at 6:39 AM, Marek Olšák <mar...@gmail.com> wrote:
>> >>> On Mon, Dec 21, 2015 at 6:48 AM, Jason Ekstrand <ja...@jlekstrand.net>
>> >>> wrote:
>> >>>>
>> >>>> On Dec 20, 2015 7:43 PM, "Rob Clark" <robdcl...@gmail.com> wrote:
>> >>>>>
>> >>>>> On Sun, Dec 20, 2015 at 10:29 PM, Connor Abbott
>> >>>>> <cwabbo...@gmail.com>
>> >>>>> wrote:
>> >>>>> > On Sun, Dec 20, 2015 at 10:04 PM, Rob Clark <robdcl...@gmail.com>
>> >>>>> > wrote:
>> >>>>> >> On Sun, Dec 20, 2015 at 9:12 PM, Jason Ekstrand
>> >>>>> >> <ja...@jlekstrand.net>
>> >>>>> >> wrote:
>> >>>>> >>>
>> >>>>> >>> On Dec 19, 2015 5:55 PM, "Rob Clark" <robdcl...@gmail.com>
>> >>>>> >>> wrote:
>> >>>>> >>>>
>> >>>>> >>>> From: Rob Clark <robcl...@freedesktop.org>
>> >>>>> >>>>
>> >>>>> >>>> Jason,
>> >>>>> >>>>
>> >>>>> >>>> How much do you hate this idea?  Seems like an easy alternative
>> >>>>> >>>> to
>> >>>>> >>>> using ralloc ctx's to clean up nir variants/clones, which would
>> >>>>> >>>> let
>> >>>>> >>>> us drop the parent memctx for nir_shader_create()/clone(),
>> >>>>> >>>> making
>> >>>>> >>>> it easier to introduce reference counting.
>> >>>>> >>>
>> >>>>> >>> I think "hate" is a but strong. I don't like it but it works.
>> >>>>> >>> If we
>> >>>>> >>> really
>> >>>>> >>> want nir_shader refcounted, we'll have to do something.
>> >>>>> >>
>> >>>>> >> I suppose the alternate idea of moving the nir_shader_clone() out
>> >>>>> >> of
>> >>>>> >> brw_compile_xyz(), and always passing in the clone would be a
>> >>>>> >> cleaner
>> >>>>> >> way.  It looks like each of the brw_compile_xyz() has exactly one
>> >>>>> >> call-site, so doing the nir_shader_clone() inside doesn't really
>> >>>>> >> buy
>> >>>>> >> anything.
>> >>>>
>> >>>> Your forgetting that there may be *cough* other users of this API...
>> >>>> We can
>> >>>> change those too but I would like the needs of the compiler users to
>> >>>> drive
>> >>>> the API, not the cloning.  I still have some details to work out
>> >>>> there. In
>> >>>> any case, it doesn't really matter; we can figure something out.
>> >>>>
>> >>>>> >>> About refcounting... The more I think about it the more I'm not
>> >>>>> >>> convinced
>> >>>>> >>> it's useful.  As it stands, we have no use for it an I'm not
>> >>>>> >>> convinced
>> >>>>> >>> you
>> >>>>> >>> do either.  We'll see if I can convince you. :-)
>> >>>>> >>>
>> >>>>> >>> In the history of i965 using NIR, we've had about three
>> >>>>> >>> different ways
>> >>>>> >>> of
>> >>>>> >>> doing things:
>> >>>>> >>>
>> >>>>> >>> 1) GLSL is the gold copy and we run glsl_to_nir for every
>> >>>>> >>> shader/variant
>> >>>>> >>> compile.  This is what we did when we first stated using NIR
>> >>>>> >>> because
>> >>>>> >>> it was
>> >>>>> >>> easy and didn't involve reworking any plumbing.
>> >>>>> >>>
>> >>>>> >>> 2) Lowered NIR is the gold copy; variants are done entirely in
>> >>>>> >>> the
>> >>>>> >>> back-end
>> >>>>> >>> IR.  This is what we did up until about a month ago.  Because
>> >>>>> >>> variants
>> >>>>> >>> are
>> >>>>> >>> done in the back-end, we can run gksl_to_nir and do all of our
>> >>>>> >>> optimizing
>> >>>>> >>> and lowering at link time.  Going from NIR to the final shader
>> >>>>> >>> binary
>> >>>>> >>> is
>> >>>>> >>> then a read-only operation as far as NIR is concerned.
>> >>>>> >>>
>> >>>>> >>> 3) Optimized but not lowered NIR is the gold copy; variants are
>> >>>>> >>> sometimes
>> >>>>> >>> done in NIR.  This is the scheme we use now.  We call
>> >>>>> >>> glsl_to_nir and
>> >>>>> >>> do
>> >>>>> >>> some of the optimization and lowering at link time but leave it
>> >>>>> >>> in SSA
>> >>>>> >>> form.
>> >>>>> >>> When we go to compile the final shader, we make a copy, apply
>> >>>>> >>> variants, do
>> >>>>> >>> the final lowering and then go into the back-end IR.
>> >>>>> >>>
>> >>>>> >>> In each of these cases, we know exactly where we need to make a
>> >>>>> >>> copy
>> >>>>> >>> without
>> >>>>> >>> the help of reference counting.  In the first, we get a fresh
>> >>>>> >>> copy
>> >>>>> >>> each time
>> >>>>> >>> so we are free to destroy the copy.  In the second, we never
>> >>>>> >>> have to
>> >>>>> >>> modify
>> >>>>> >>> the NIR so no copy.  In the third scheme, we always have to make
>> >>>>> >>> a
>> >>>>> >>> copy
>> >>>>> >>> because, even if variants are a no-op, we still have to go out
>> >>>>> >>> of SSA
>> >>>>> >>> form
>> >>>>> >>> and do final lowering.  You could say that we could avoid making
>> >>>>> >>> that
>> >>>>> >>> copy.
>> >>>>> >>> However, the work to determine when we don't need variants and
>> >>>>> >>> can do
>> >>>>> >>> all
>> >>>>> >>> our lowering up-front is far more than the work saved by
>> >>>>> >>> reference
>> >>>>> >>> counting.
>> >>>>> >>>
>> >>>>> >>> How about gallium?  Here's how I imagine it would work (please
>> >>>>> >>> correct
>> >>>>> >>> me of
>> >>>>> >>> I'm wrong):
>> >>>>> >>>
>> >>>>> >>> 1) In the TGSI case, tgsi_to_nir gets called for each compile so
>> >>>>> >>> you
>> >>>>> >>> get a
>> >>>>> >>> fresh mutable shader each time.  In this case, you are free to
>> >>>>> >>> do
>> >>>>> >>> whatever
>> >>>>> >>> you want with the shader without making a copy.
>> >>>>> >>>
>> >>>>> >>> 2) In the GLSL case, you run glsl_to_nir and do some basic
>> >>>>> >>> optimizations at
>> >>>>> >>> link time and hold onto the NIR shader. (Hold a reference of
>> >>>>> >>> you'd
>> >>>>> >>> like.)
>> >>>>> >>> When you go to compile it in the back-end, it needs to do it's
>> >>>>> >>> own
>> >>>>> >>> lowering
>> >>>>> >>> so it takes a reference and ends up making a copy.
>> >>>>> >>>
>> >>>>> >>> If this description is anywhere close to correct, then I don't
>> >>>>> >>> think
>> >>>>> >>> you
>> >>>>> >>> really need it either.  Determining whether or not you need to
>> >>>>> >>> copy is
>> >>>>> >>> simply "if (comes_from_tgsi)”.  Maybe there's something subtle
>> >>>>> >>> about
>> >>>>> >>> the
>> >>>>> >>> gallium layer that I don't know that makes refcounting the best
>> >>>>> >>> solution.
>> >>>>> >>> Please enlighten me of there is.
>> >>>>> >>
>> >>>>> >> This issue is that we *potentially* have both the state tracker
>> >>>>> >> and
>> >>>>> >> the driver both doing some of there own variant management.
>> >>>>> >> (Which
>> >>>>> >> tbh, isn't awesome, it would have been nice if someone realized
>> >>>>> >> earlier on that nearly every driver is going to have to do some
>> >>>>> >> sort
>> >>>>> >> of variant mgmt and figured out a way just to push it all down to
>> >>>>> >> the
>> >>>>> >> driver.. but I can't see a good way to get there from here.)
>> >>>>> >>
>> >>>>> >> With TGSI as the IR, driver just unconditionally does
>> >>>>> >> tgsi_dup_tokens().. because of the whole thing where st does
>> >>>>> >> variants
>> >>>>> >> in some cases, things are defined that driver doesn't own the
>> >>>>> >> copy of
>> >>>>> >> the TGSI IR passed in after the fxn call to driver returns.
>> >>>>> >>
>> >>>>> >> With NIR I was hoping to fix this, esp. since nir_shader_clone()
>> >>>>> >> is
>> >>>>> >> more heavyweight than tgsi_dup_tokens() (memcpy()).
>> >>>>> >>
>> >>>>> >> Refcnt'ing is a nice solution so that we can pass the driver a
>> >>>>> >> reference that it owns.  In cases where state tracker isn't doing
>> >>>>> >> variant mgmt, we pass it the one-and-only ref (avoiding clone).
>> >>>>> >>
>> >>>>> >> I'd suggested that in cases where st does variant mgmt, that st
>> >>>>> >> should
>> >>>>> >> do the clone/dup.  But that was shot down:
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097748.html
>> >>>>
>> >>>> It sounds like Marek's argument there is more about lifetime
>> >>>> management than
>> >>>> anything.  He wants gallium modules to be able to create IR, call
>> >>>> into the
>> >>>> driver, and then throw it away.  In particular, he doesn't want them
>> >>>> to have
>> >>>> to think about cloning. In a lot of ways it sounds a lot like what
>> >>>> i965
>> >>>> wants too.  I really like having brw_compile_foo take a const
>> >>>> nir_shader.
>> >>>> The difference is that i965 basically always wants to clone whereas a
>> >>>> gallium driver may not have to if gallium doesn't care what happens
>> >>>> to the
>> >>>> shader when it's done.  How common is this case? How important is it
>> >>>> to
>> >>>> optimize for?  I don't know.
>> >>>>
>> >>>> One other thing that bothers me a bit:  From Marek's comment, it
>> >>>> sounds like
>> >>>> the components want to just pass in IR and be agnostic about whether
>> >>>> the
>> >>>> driver wants its own copy or wants to change it or whatever.  This
>> >>>> seems
>> >>>> like an argument for always cloning to me.  From the perspective of a
>> >>>> gallium module, "I want to hang in to this, I'll keep a reference"
>> >>>> seems
>> >>>> exactly the same as "I want to hang onto this, I'll give the driver a
>> >>>> copy".
>> >>>> How are they actually different given that the driver basically has
>> >>>> to
>> >>>> modify what you give it in order to do lowering?
>> >>>>
>> >>>>> > Ugh... I didn't read this at the time, but I don't like Marek's
>> >>>>> > response. My understanding of the situation, based on this thread,
>> >>>>> > is
>> >>>>> > that there are some cases where the st knows that there's only
>> >>>>> > going
>> >>>>> > to be one variant and can throw away the (NIR or TGSI) shader
>> >>>>> > after it
>> >>>>> > hands it to the driver, while at other times it has to hold onto
>> >>>>> > all
>> >>>>> > the variants and only give the driver a read-only copy (or
>> >>>>> > duplicate
>> >>>>
>> >>>> As per above, my interpretation of Marek's comment is that he doesn't
>> >>>> want
>> >>>> the st to have to think about cloning ever.  He wants it to assume
>> >>>> that
>> >>>> compilation never modifies the IR so the driver should always clone.
>> >>>> You
>> >>>> have to keep in mind that Marek is most likely thinking about caching
>> >>>> the
>> >>>> TGSI rather than doing in-place lowering in it.
>> >>>>
>> >>>> If I'm understanding Marek correctly, then it sounds like shader
>> >>>> compilation
>> >>>> should never touch the IR that's passed in.  If this is the case, it
>> >>>> sounds
>> >>>> like always cloning is the way to go.  At least its not *that*
>> >>>> expensive.
>> >>>
>> >>> Note that st/mesa needs to keep the FS IR because of glDrawPixels and
>> >>> glBitmap, and the VS IR because of edge flags, glRasterPos evaluation,
>> >>> selection and feedback modes. The last three are done with Draw/LLVM
>> >>> and only support TGSI.
>> >>>
>> >>> Therefore, st/mesa always hangs onto the IR and drivers can't modify
>> >>> it. It also needs VS in TGSI to be able to do everything correctly.
>> >>>
>> >>> What other Gallium modules want or not want is not that important, but
>> >>> changing the current semantics will require fixing a lot of places.
>> >>> (state trackers - mesa, nine, xa; modules - blitter, draw, hud,
>> >>> postprocess, tests, vl)
>> >>>
>> >>> You really better think about whether changing all those and the risk
>> >>> of breaking them is worth it.
>> >>>
>> >>> Marek
>> >>
>> >> Well, we're talking about passing NIR here, not TGSI, so none of those
>> >> places will need to be updated. NIR is a much more heavyweight IR, and
>> >> copying is much more expensive, so it makes more sense there for the
>> >> st to duplicate it and let the driver own the IR it passes in, to
>> >> reduce copying when there's no variant management necessary.
>> >
>> > My main concern was about TGSI, not so much about NIR.
>>
>> Yes, exactly -- we don't plan on changing the semantics of passing in
>> TGSI, so this only matters when the user passes in a NIR shader.
>
> I think two different concepts of ownership are getting conflated here:
> Right/responsibility to delete and right to modify.
>
> The way I understand it, gallium, as it stands, gives neither to the driver.
> A back-end using NIR requires the right to modify but who deletes it doesn't
> ultimately matter.  I think it's dangerous to pass one of these rights to
> the driver and not the other but we need to think about both.
>
> What I'm trying to say is that we have two options here:
>
> 1) gallium passes IR to the back-end that it is free to modify and is
> required to delete when it's done.
>
> 2) gallium passes read-only IR to the back-end and it always makes a copy.


Not always. The copy is optional. Drivers are encouraged not to make a
copy if they don't need it. Or they can keep a copy in a different IR,
which is the same scenario for the original IR.

Marek
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] i965: alternative to memctx for cleaning up nir variants

Reply via email to