Jason Ekstrand <ja...@jlekstrand.net> writes: > On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez <curroje...@riseup.net> > wrote: > >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> > On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez <curroje...@riseup.net> >> > wrote: >> > >> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >> >> > I'm still a little pensive. But >> >> > >> >> > Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com> >> >> > >> >> Thanks. >> >> >> >> > Now for a little aside. I have come to the conclusion that I made a >> >> grave >> >> > mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should >> have >> >> > just subclassed fs_inst for load_payload. The problem is that we >> need to >> >> > snag a bunch of information for the sources when we create the >> >> > load_payload. In particular, we need to know the width of the source >> so >> >> > that we know how much space it consumes in the payload and we need to >> >> know >> >> > the information required to properly re-create the mov such as >> >> > force_sechalf and force_writemask_all. Really, in order to do things >> >> > properly, we need to gather this information *before* we do any >> >> > optimizations. The nasty pile of code that you're editing together >> with >> >> > the "effective_width" parameter is a lame attempt to >> capture/reconstruct >> >> > this information. Really, we should just subclass, capture the >> >> information >> >> > up-front, and do it properly. >> >> > >> >> Yes, absolutely, this lowering pass is a real mess. There are four more >> >> unreviewed patches in the mailing list fixing bugs of the metadata >> >> guessing of lower_load_payload() [1][2][3][4], you may want to take a >> >> look at them. There are more bugs I'm aware of but it didn't seem >> >> practical to fix them. >> >> >> > >> > Yeah, Matt pointed me at those. I'll give them a look later today. >> > >> > >> >> That said, I don't think it would be worth subclassing fs_inst. >> >> >> >> My suggestion would have been to keep it simple and lower LOAD_PAYLOAD >> >> into a series of MOVs with force_writemask_all enabled in all cases, >> >> simply rely on the optimizer to eliminate those where possible. Then >> >> get rid of the metadata and effective_width guessing. Instead agree on >> >> immediates and uniforms being exec_size-wide by convention >> >> (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), >> >> then prevent constant propagation from propagating an immediate load >> >> into a LOAD_PAYLOAD if it would lead to a change in the semantics of the >> >> program, and maybe just run constant propagation after lowering so we >> >> can be sure those cases are cleaned up properly where register coalesce >> >> isn't enough. >> >> >> > >> > First off, simply setting force_writemask_all isn't an option. Consider >> > the following scenario: >> > >> > a = foo; >> > if (bar) { >> > b = 7; >> > } else { >> > use(a); >> > } >> > use(b); >> > >> > If "use(a)" is the last use of the variable a, then the live ranges of a >> > and b don't actually over-lap and we can assign a and b to the same >> > register. However, if force_writemask_all is set on b, it will destroy >> the >> > value in a before its last use. Right now, we don't actually do this >> > because our liveness analysis pass flattens everything so it will think >> > that b and a over-lap even though they don't. However, if we ever decide >> > to make the liveness information more accurate, having a pile of >> > force_writemask_all instructions will be a major problem. So, while it >> is >> > *technically* safe for now, it's a really bad idea in the long-term. >> > >> The thing is we *will* have to deal with that scenario. Building a >> message payload inherently involves crossing channel boundaries (because >> of headers, unsupported SIMD modes by some shared functions, etc.). I'd >> say it's a bug in the liveness analysis pass if it wouldn't consider the >> liveness interval of a and b to overlap in your example if the >> assignment of b had force_writemask_all set. >> > > Yes, I'm aware of that. However, the part of that register that has to > squash everything is usually only a couple of registers while the entire > payload may be up to 13 (if I remmeber correctly). We'd rather not have to > squash everything if we can. All that being said, our liveness/register > allocation can't handle this and making register allocation handle parts of > registers interfering but not other bits is going to be a real pain. Maybe > not even worth it. If you'd rather force_writemask_all everything, I'll > sign off on that for now. I just wanted to point out that it may not be a > good permanent solution. > > >> A reasonable compromise might be to leave it up to the caller whether to >> set the force_writemask_all and force_sechalf flags or not. I.e. just >> copy the same flags set on the LOAD_PAYLOAD instruction to the MOV >> instructions. That would still allow reducing the liveness intervals in >> cases where we know that the payload respects channel boundaries. >> >> Tracking the flag information per-register in cases where the same >> payload has well- and ill-behaved values seems rather premature and >> rather useless to me because the optimizer is likely to be able to get >> rid of redundant copies with force_writemask_all -- register coalesce >> is doing this already AFAIK, maybe by accident. >> > > Sure. I'm not worried about our ability to optimize. I'm primarily > worried about register pressure. Like I said, it's an OK solution in the > temporary. I think we'll want to give it more thought in the long-run but > that's going to interact a lot with how we do register allocation etc.
Oh, what I also meant is that you're likely to get the best from both worlds if we teach the optimizer to get rid of redundant force_writemask_all flags, as register coalesce already does to some extent mostly as a side effect. E.g. a move with force_writemask_all can be simplified into a normal move if we can prove that the values of the inactive components of the source are undefined -- The case where the source value is defined within the same basic block is trivial and probably the most common. > --Jason > > >> > Regarding the other suggestion of just requiring width == exec_size for >> > immediates and uniforms, that seems pretty reasonable to me. I'd like to >> > know what it will do to optimizations, but it seems ok initially. I'm >> > still a bigger fan of just subclassing and stashing everything we need to >> > know up-front. If we do it right, the only things that will actually >> need >> > to know about the subclass are the function for creating a LOAD_PAYLOAD >> and >> > lower_load_payloads. >> > >> > --Jason >> > >> > >> >> >> >> [1] >> >> http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html >> >> [2] >> >> >> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html >> >> [3] >> >> >> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html >> >> [4] >> >> >> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html >> >> >> >> >> >> > --Jason >> >> > >> >> > On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand <ja...@jlekstrand.net >> > >> >> > wrote: >> >> > >> >> >> >> >> >> >> >> >> On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez < >> curroje...@riseup.net >> >> > >> >> >> wrote: >> >> >> >> >> >>> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >>> >> >> >>> > On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez < >> >> >>> curroje...@riseup.net> >> >> >>> > wrote: >> >> >>> > >> >> >>> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >>> >> >> >> >>> >> > On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez < >> >> >>> curroje...@riseup.net> >> >> >>> >> > wrote: >> >> >>> >> > >> >> >>> >> >> Hey Matt, >> >> >>> >> >> >> >> >>> >> >> Matt Turner <matts...@gmail.com> writes: >> >> >>> >> >> >> >> >>> >> >> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez < >> >> >>> >> curroje...@riseup.net> >> >> >>> >> >> wrote: >> >> >>> >> >> >> MRFs cannot be read from anyway so they cannot possibly be >> a >> >> >>> valid >> >> >>> >> >> >> source of LOAD_PAYLOAD. >> >> >>> >> >> >> --- >> >> >>> >> >> > >> >> >>> >> >> > The function only seems to test inst->dst.file == MRF. I >> don't >> >> >>> see any >> >> >>> >> >> > code for handling MRF sources. What am I missing? >> >> >>> >> >> >> >> >>> >> >> That test is for "handling" MRF sources -- More precisely, >> it's >> >> >>> >> >> collecting the writemask and half flags for MRF writes, which >> can >> >> >>> only >> >> >>> >> >> possibly be useful if we're going to use them later on to read >> >> >>> something >> >> >>> >> >> out of an MRF into a payload, which we shouldn't be doing in >> the >> >> >>> first >> >> >>> >> >> place. >> >> >>> >> >> >> >> >>> >> >> Aside from simplifying the function somewhat, that allows us >> to >> >> >>> drop the >> >> >>> >> >> 16 register gap reserved for MRFs at register offset zero, >> what >> >> will >> >> >>> >> >> allow us to drop the vgrf_to_reg[] offset calculation >> completely >> >> >>> (also >> >> >>> >> >> in split_virtual_grfs()) in a future patch (not sent for >> review >> >> >>> yet). >> >> >>> >> >> >> >> >>> >> > >> >> >>> >> > No, we do read from MRF's sort-of... Send messages have an >> >> implicit >> >> >>> >> "read" >> >> >>> >> > from an MRF. >> >> >>> >> >> >> >>> >> Heh, and that's pretty much the only way you "read" from it. >> >> >>> >> >> >> >>> >> > This was written precicely so that we could use LOAD_PAYLOAD >> >> >>> >> > to build MRF payloads. We do on pre-GEN6. >> >> >>> >> > >> >> >>> >> I'm aware, but you don't need any of this meta-data to >> LOAD_PAYLOAD >> >> >>> >> *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be >> >> illegal >> >> >>> >> anyway. >> >> >>> >> >> >> >>> > >> >> >>> > And no one is using it that way. All of the metadata checks you >> are >> >> >>> > deleting are checks on the *destination*. >> >> >>> > >> >> >>> >> >> >>> Didn't you write this code yourself? The only use for the collected >> >> >>> metadata is initializing the instruction flags of the MOVs >> subsequent >> >> >>> LOAD_PAYLOAD instructions are lowered to, based on the metadata >> already >> >> >>> collected for its source registers, which can never be MRFs, so the >> >> >>> metadata you collect from MRF writes is never actually used. >> >> >>> >> >> >> >> >> >> Right... I misred something initially. Yes, we should never be >> tracking >> >> >> MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look a >> bit >> >> >> later, but it looks better. >> >> >> >> >> >>
pgpU0aJynQrgk.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev