On Fri, Feb 20, 2015 at 2:43 PM, Francisco Jerez <curroje...@riseup.net> wrote:
> Jason Ekstrand <ja...@jlekstrand.net> writes: > > > One more comment here... This particularly regards your plan of > separating > > it into "things that match the destination" and "other things" and not > copy > > prop uniforms or immediates into the "other things". There is another > case > > we need to handle. On older gens (SNB maybe?) the SIMD16 FB write > messages > > are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual > (r0, > > r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect > SIMD16 > > load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On > > those platforms, there is a special type of MOV-to-MRF that can move from > > (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right > > now. However, it doesn't follow the standard rules. > > --Jason > > > I don't see why it couldn't be handled in roughly the same way you do it > now? Recognize when src[i + 4] is the same 8-wide register as src[i] > shifted by 8 and emit a COMPR4 copy in that case? > Sure. I haven't thought about it hard enough to prove it can't be done. Just wanted to make sure you were aware of that particular monkey-wrench. --Jason > > On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand <ja...@jlekstrand.net> > > wrote: > > > >> > >> > >> On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez <curroje...@riseup.net > > > >> wrote: > >> > >>> Jason Ekstrand <ja...@jlekstrand.net> writes: > >>> > >>> > On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez < > curroje...@riseup.net > >>> > > >>> > wrote: > >>> > > >>> >> Jason Ekstrand <ja...@jlekstrand.net> writes: > >>> >> > >>> >> > I'm still a little pensive. But > >>> >> > > >>> >> > Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com> > >>> >> > > >>> >> Thanks. > >>> >> > >>> >> > Now for a little aside. I have come to the conclusion that I > made a > >>> >> grave > >>> >> > mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I > should > >>> have > >>> >> > just subclassed fs_inst for load_payload. The problem is that we > >>> need to > >>> >> > snag a bunch of information for the sources when we create the > >>> >> > load_payload. In particular, we need to know the width of the > >>> source so > >>> >> > that we know how much space it consumes in the payload and we > need to > >>> >> know > >>> >> > the information required to properly re-create the mov such as > >>> >> > force_sechalf and force_writemask_all. Really, in order to do > things > >>> >> > properly, we need to gather this information *before* we do any > >>> >> > optimizations. The nasty pile of code that you're editing > together > >>> with > >>> >> > the "effective_width" parameter is a lame attempt to > >>> capture/reconstruct > >>> >> > this information. Really, we should just subclass, capture the > >>> >> information > >>> >> > up-front, and do it properly. > >>> >> > > >>> >> Yes, absolutely, this lowering pass is a real mess. There are four > >>> more > >>> >> unreviewed patches in the mailing list fixing bugs of the metadata > >>> >> guessing of lower_load_payload() [1][2][3][4], you may want to take > a > >>> >> look at them. There are more bugs I'm aware of but it didn't seem > >>> >> practical to fix them. > >>> >> > >>> > > >>> > Yeah, Matt pointed me at those. I'll give them a look later today. > >>> > > >>> > > >>> >> That said, I don't think it would be worth subclassing fs_inst. > >>> >> > >>> >> My suggestion would have been to keep it simple and lower > LOAD_PAYLOAD > >>> >> into a series of MOVs with force_writemask_all enabled in all cases, > >>> >> simply rely on the optimizer to eliminate those where possible. > Then > >>> >> get rid of the metadata and effective_width guessing. Instead > agree on > >>> >> immediates and uniforms being exec_size-wide by convention > >>> >> (i.e. LOAD_PAYLOAD's exec_size rather than the original > instruction's), > >>> >> then prevent constant propagation from propagating an immediate load > >>> >> into a LOAD_PAYLOAD if it would lead to a change in the semantics of > >>> the > >>> >> program, and maybe just run constant propagation after lowering so > we > >>> >> can be sure those cases are cleaned up properly where register > coalesce > >>> >> isn't enough. > >>> >> > >>> > > >>> > First off, simply setting force_writemask_all isn't an option. > Consider > >>> > the following scenario: > >>> > > >>> > a = foo; > >>> > if (bar) { > >>> > b = 7; > >>> > } else { > >>> > use(a); > >>> > } > >>> > use(b); > >>> > > >>> > If "use(a)" is the last use of the variable a, then the live ranges > of a > >>> > and b don't actually over-lap and we can assign a and b to the same > >>> > register. However, if force_writemask_all is set on b, it will > destroy > >>> the > >>> > value in a before its last use. Right now, we don't actually do this > >>> > because our liveness analysis pass flattens everything so it will > think > >>> > that b and a over-lap even though they don't. However, if we ever > >>> decide > >>> > to make the liveness information more accurate, having a pile of > >>> > force_writemask_all instructions will be a major problem. So, while > it > >>> is > >>> > *technically* safe for now, it's a really bad idea in the long-term. > >>> > > >>> The thing is we *will* have to deal with that scenario. Building a > >>> message payload inherently involves crossing channel boundaries > (because > >>> of headers, unsupported SIMD modes by some shared functions, etc.). > I'd > >>> say it's a bug in the liveness analysis pass if it wouldn't consider > the > >>> liveness interval of a and b to overlap in your example if the > >>> assignment of b had force_writemask_all set. > >>> > >> > >> Yes, I'm aware of that. However, the part of that register that has to > >> squash everything is usually only a couple of registers while the entire > >> payload may be up to 13 (if I remmeber correctly). We'd rather not > have to > >> squash everything if we can. All that being said, our liveness/register > >> allocation can't handle this and making register allocation handle > parts of > >> registers interfering but not other bits is going to be a real pain. > Maybe > >> not even worth it. If you'd rather force_writemask_all everything, I'll > >> sign off on that for now. I just wanted to point out that it may not > be a > >> good permanent solution. > >> > >> > >>> A reasonable compromise might be to leave it up to the caller whether > to > >>> set the force_writemask_all and force_sechalf flags or not. I.e. just > >>> copy the same flags set on the LOAD_PAYLOAD instruction to the MOV > >>> instructions. That would still allow reducing the liveness intervals > in > >>> cases where we know that the payload respects channel boundaries. > >>> > >>> Tracking the flag information per-register in cases where the same > >>> payload has well- and ill-behaved values seems rather premature and > >>> rather useless to me because the optimizer is likely to be able to get > >>> rid of redundant copies with force_writemask_all -- register coalesce > >>> is doing this already AFAIK, maybe by accident. > >>> > >> > >> Sure. I'm not worried about our ability to optimize. I'm primarily > >> worried about register pressure. Like I said, it's an OK solution in > the > >> temporary. I think we'll want to give it more thought in the long-run > but > >> that's going to interact a lot with how we do register allocation etc. > >> --Jason > >> > >> > >>> > Regarding the other suggestion of just requiring width == exec_size > for > >>> > immediates and uniforms, that seems pretty reasonable to me. I'd > like > >>> to > >>> > know what it will do to optimizations, but it seems ok initially. > I'm > >>> > still a bigger fan of just subclassing and stashing everything we > need > >>> to > >>> > know up-front. If we do it right, the only things that will actually > >>> need > >>> > to know about the subclass are the function for creating a > LOAD_PAYLOAD > >>> and > >>> > lower_load_payloads. > >>> > > >>> > --Jason > >>> > > >>> > > >>> >> > >>> >> [1] > >>> >> > >>> > http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html > >>> >> [2] > >>> >> > >>> > http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html > >>> >> [3] > >>> >> > >>> > http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html > >>> >> [4] > >>> >> > >>> > http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html > >>> >> > >>> >> > >>> >> > --Jason > >>> >> > > >>> >> > On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand < > >>> ja...@jlekstrand.net> > >>> >> > wrote: > >>> >> > > >>> >> >> > >>> >> >> > >>> >> >> On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez < > >>> curroje...@riseup.net > >>> >> > > >>> >> >> wrote: > >>> >> >> > >>> >> >>> Jason Ekstrand <ja...@jlekstrand.net> writes: > >>> >> >>> > >>> >> >>> > On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez < > >>> >> >>> curroje...@riseup.net> > >>> >> >>> > wrote: > >>> >> >>> > > >>> >> >>> >> Jason Ekstrand <ja...@jlekstrand.net> writes: > >>> >> >>> >> > >>> >> >>> >> > On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez < > >>> >> >>> curroje...@riseup.net> > >>> >> >>> >> > wrote: > >>> >> >>> >> > > >>> >> >>> >> >> Hey Matt, > >>> >> >>> >> >> > >>> >> >>> >> >> Matt Turner <matts...@gmail.com> writes: > >>> >> >>> >> >> > >>> >> >>> >> >> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez < > >>> >> >>> >> curroje...@riseup.net> > >>> >> >>> >> >> wrote: > >>> >> >>> >> >> >> MRFs cannot be read from anyway so they cannot possibly > >>> be a > >>> >> >>> valid > >>> >> >>> >> >> >> source of LOAD_PAYLOAD. > >>> >> >>> >> >> >> --- > >>> >> >>> >> >> > > >>> >> >>> >> >> > The function only seems to test inst->dst.file == MRF. I > >>> don't > >>> >> >>> see any > >>> >> >>> >> >> > code for handling MRF sources. What am I missing? > >>> >> >>> >> >> > >>> >> >>> >> >> That test is for "handling" MRF sources -- More precisely, > >>> it's > >>> >> >>> >> >> collecting the writemask and half flags for MRF writes, > >>> which can > >>> >> >>> only > >>> >> >>> >> >> possibly be useful if we're going to use them later on to > >>> read > >>> >> >>> something > >>> >> >>> >> >> out of an MRF into a payload, which we shouldn't be doing > in > >>> the > >>> >> >>> first > >>> >> >>> >> >> place. > >>> >> >>> >> >> > >>> >> >>> >> >> Aside from simplifying the function somewhat, that allows > us > >>> to > >>> >> >>> drop the > >>> >> >>> >> >> 16 register gap reserved for MRFs at register offset zero, > >>> what > >>> >> will > >>> >> >>> >> >> allow us to drop the vgrf_to_reg[] offset calculation > >>> completely > >>> >> >>> (also > >>> >> >>> >> >> in split_virtual_grfs()) in a future patch (not sent for > >>> review > >>> >> >>> yet). > >>> >> >>> >> >> > >>> >> >>> >> > > >>> >> >>> >> > No, we do read from MRF's sort-of... Send messages have an > >>> >> implicit > >>> >> >>> >> "read" > >>> >> >>> >> > from an MRF. > >>> >> >>> >> > >>> >> >>> >> Heh, and that's pretty much the only way you "read" from it. > >>> >> >>> >> > >>> >> >>> >> > This was written precicely so that we could use > LOAD_PAYLOAD > >>> >> >>> >> > to build MRF payloads. We do on pre-GEN6. > >>> >> >>> >> > > >>> >> >>> >> I'm aware, but you don't need any of this meta-data to > >>> LOAD_PAYLOAD > >>> >> >>> >> *into* an MRF, and LOAD_PAYLOAD with an MRF as source should > be > >>> >> illegal > >>> >> >>> >> anyway. > >>> >> >>> >> > >>> >> >>> > > >>> >> >>> > And no one is using it that way. All of the metadata checks > you > >>> are > >>> >> >>> > deleting are checks on the *destination*. > >>> >> >>> > > >>> >> >>> > >>> >> >>> Didn't you write this code yourself? The only use for the > >>> collected > >>> >> >>> metadata is initializing the instruction flags of the MOVs > >>> subsequent > >>> >> >>> LOAD_PAYLOAD instructions are lowered to, based on the metadata > >>> already > >>> >> >>> collected for its source registers, which can never be MRFs, so > the > >>> >> >>> metadata you collect from MRF writes is never actually used. > >>> >> >>> > >>> >> >> > >>> >> >> Right... I misred something initially. Yes, we should never be > >>> tracking > >>> >> >> MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look > a > >>> bit > >>> >> >> later, but it looks better. > >>> >> >> > >>> >> > >>> > >> > >> >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev