On Tue, Dec 29, 2015 at 12:36 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > > > On Tue, Dec 29, 2015 at 7:32 AM, Rob Clark <robdcl...@gmail.com> wrote: >> >> On Mon, Dec 28, 2015 at 4:23 PM, Connor Abbott <cwabbo...@gmail.com> >> wrote: >> > On Mon, Dec 28, 2015 at 3:25 PM, Rob Clark <robdcl...@gmail.com> wrote: >> >>> >> >>>> It is a mix.. I do texcoord saturate, clip-plane, and 2-sided color >> >>>> lowering in NIR. But flat-shading, binning-pass, and half vs full >> >>>> precision color output in ir3. >> >>>> >> >>>> I do as much lowering in NIR as I can, in an effort to do as much as >> >>>> possible at compile time, vs draw time. I do the first round of >> >>>> lowering/opt w/ null shader key, which is enough for the common >> >>>> cases. >> >>>> >> >>>> Pretty much independent, I suppose, of whether I came out of SSA or >> >>>> not first. Although binning-pass variant and the instruction >> >>>> scheduling I do are easier in SSA. >> >>>> >> >>>> Somewhat unrelated, but I may end up converting array access to >> >>>> registers, but leave everything else in SSA, so I can benefit from >> >>>> converting multi-dimensional offsets into a single offset.. this is >> >>>> still one open issue w/ gallium glsl_to_nir.. right now I have a >> >>>> hacked up version of nir_lower_io that converts global/local >> >>>> load/store_var's into load/store_var2 which take an offset as a src >> >>>> (like load_input/store_output) instead of deref chain.. not sure yet >> >>>> whether this will be the permanent solution, but at least it fixes a >> >>>> huge heap of variable-indexing piglits and lets me continue w/ >> >>>> implementing lowering passes for everything else that was previously >> >>>> done in glsl->tgsi or tgsi->tgsi passes. >> >>> >> >>> >> >>> If you do this, you'll be back to always needing a mutable copy. Most >> >>> lowering and optimization passes die the moment they see a register. >> >>> You'll >> >>> either have to go fix a bunch of stuff up to no-op properly or run >> >>> vars_to_regs after doing your NIR lowering but before going into your >> >>> backend IR. This means that your "gold copy" still has variables and >> >>> you >> >>> always need to lower them to registers before you go into the backend. >> >> >> >> ugg.. but good point, thanks for pointing that out before I wasted >> >> another afternoon on yet another dead-end for handling deref's.. >> >> >> >> Ok, I guess I need to think of a better name than load/store_var2 for >> >> the new intrinsics ;-) >> > >> > I don't think that "you should throw away registers and use your own >> > thing" is what Jason wanted you to get out of that. > > > Correct. Registers are designed explicitly to do exactly what you want: > Provide an easy-to-work-with linear view of complex variables. I still > don't understand why you're trying so hard not to use them. The code is > already written for you, you just have to turn it on. The only thing you > might have to do is make it take a type_size function like nir_lower_io does > so that you can configure offset units to your backend's liking. > > What i was trying to get across is that the situation in which you can avoid > cloning by clever use of reference counting is specific to your driver and > exact way that you have your lowering passes set up. If you perturb it even > a little, you have to do a copy all the time and reference counting isn't > helping anymore. You're free to design your entire compiler stack around > avoiding that one copy if you wish, but I wouldn't recommend it. > >> perhaps.. I was considering switching to registers for arrays. >> Although it would end up forcing an extra clone in the common case >> where there would otherwise not be one... a bit of a tough pill to >> swallow.. > > > I don't see why that is such a tough pill. Copying is cheap. When we were > writing the cloning code, we both ran shader-db runs where we were cloning > after *every* optimization or lowering pass and it still only hurt runtime > by something like 10 or 20%. A single clone won't even get noticed.
well, that is encouraging.. although I probably tend a little bit more to the cpu limited side of things.. > Also, you've already said that you pre-compile for the "common case" of a > zero shader key so that extra clone gets eaten at compile time where you're > already doing piles of optimization and lowering. The case you really care > about is when that key is non-zero and you have to stop everything and > recompile in the middle of a draw. In that case, you have a non-zero key so > you have to do a clone anyway. I guess if I did a clone before to_regs pass.. seems a bit sub-optimal, and w/ a lower_deref pass (which took type_size fxn ptr[1]) I could get basically the one part of registers that I want.. [1] Note that part of my gallium glsl_to_nir branch have started to de-duplicate all the common type_size implementations. Mesa st uses one that is basically the same as what is in i965 (vec4) with the addition of double support.. > At the end of the day, I think we're getting nowhere here. We have two > different memory management models that are in conflict. The ralloc model > saves us typing and provides some nice safety and refcounting saves you some > typing and privdes you some nice safety. It's becoming fairly obvious that > neither side is going to convince the other that their model is better any > time soon. I'm open to suggestions on how to proceed. One option would be > to have Anholt come in and break the tie. I'd be ok with that. In any > case, we need to solve this one way or another and either commit the patch > or not. Well, as it stands, on the refcnt'ing side of things, so far I am outnumbered. I was planning to re-work my ir3 changes without refcnt'ing, and then the rest of the gallium glsl_to_nir support, hopefully sometime in the next few days.. BR, -R > --Jason > >> > Most of the >> > existing optimization passes barf on registers for a reason: registers >> > imply that you've gone from "consumer-agnostic NIR," i.e. what's >> > produced by gtn and operated on by generic optimizations, to your own >> > driver-specific thing, and any optimizations you're going to run are >> > only to clean up the result of the lowering passes, so you won't need >> > to run most of them. In the few cases where we do need an optimization >> > after lowering to registers, we've gone and fixed it up to no-op >> > things properly, but in general it's a lot easier and less confusing >> > to say "new optimization passes don't have to deal with registers" >> > than to make everyone go and add support for registers to their >> > passes. I'm not saying that adding a "here's my driver-specific >> > offset" thing to load/store_var would necessarily be a bad idea, but >> > don't just dismiss registers out-of-hand. >> >> Yeah, I'm not a big fan of making lowering/etc passes deal w/ >> registers unnecessarily. Seems like coming up w/ some way to lower >> load/store_var deref chains would be easier. >> >> BR, >> -R _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev