On Wed, Aug 20, 2014 at 12:26:15PM -0700, Connor Abbott wrote: > On Wed, Aug 20, 2014 at 12:17 PM, Tom Stellard <t...@stellard.net> wrote: > > On Tue, Aug 19, 2014 at 05:19:15PM -0700, Connor Abbott wrote: > >> On Tue, Aug 19, 2014 at 3:57 PM, Tom Stellard <t...@stellard.net> wrote: > >> > On Tue, Aug 19, 2014 at 01:37:56PM -0700, Connor Abbott wrote: > >> >> On Tue, Aug 19, 2014 at 11:40 AM, Francisco Jerez > >> >> <curroje...@riseup.net> wrote: > >> >> > Tom Stellard <t...@stellard.net> writes: > >> >> > > >> >> >> On Tue, Aug 19, 2014 at 11:04:59AM -0400, Connor Abbott wrote: > >> >> >>> On Mon, Aug 18, 2014 at 8:52 PM, Michel Dänzer <mic...@daenzer.net> > >> >> >>> wrote: > >> >> >>> > On 19.08.2014 01:28, Connor Abbott wrote: > >> >> >>> >> On Mon, Aug 18, 2014 at 4:32 AM, Michel Dänzer > >> >> >>> >> <mic...@daenzer.net> wrote: > >> >> >>> >>> On 16.08.2014 09:12, Connor Abbott wrote: > >> >> >>> >>>> I know what you might be thinking right now. "Wait, *another* > >> >> >>> >>>> IR? Don't > >> >> >>> >>>> we already have like 5 of those, not counting all the > >> >> >>> >>>> driver-specific > >> >> >>> >>>> ones? Isn't this stuff complicated enough already?" Well, > >> >> >>> >>>> there are some > >> >> >>> >>>> pretty good reasons to start afresh (again...). In the years > >> >> >>> >>>> we've been > >> >> >>> >>>> using GLSL IR, we've come to realize that, in fact, it's not > >> >> >>> >>>> what we > >> >> >>> >>>> want *at all* to do optimizations on. > >> >> >>> >>> > >> >> >>> >>> Did you evaluate using LLVM IR instead of inventing yet another > >> >> >>> >>> one? > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> -- > >> >> >>> >>> Earthling Michel Dänzer | > >> >> >>> >>> http://www.amd.com > >> >> >>> >>> Libre software enthusiast | Mesa and X > >> >> >>> >>> developer > >> >> >>> >> > >> >> >>> >> Yes. See > >> >> >>> >> > >> >> >>> >> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053502.html > >> >> >>> >> > >> >> >>> >> and > >> >> >>> >> > >> >> >>> >> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053522.html > >> >> >>> > > >> >> >>> > I know Ian can't deal with LLVM for some reason. I was wondering > >> >> >>> > if > >> >> >>> > *you* evaluated it, and if so, why you rejected it. > >> >> >>> > > >> >> >>> > > >> >> >>> > -- > >> >> >>> > Earthling Michel Dänzer | > >> >> >>> > http://www.amd.com > >> >> >>> > Libre software enthusiast | Mesa and X > >> >> >>> > developer > >> >> >>> > >> >> >>> > >> >> >>> Well, first of all, the fact that Ian and Ken don't want to use it > >> >> >>> means that any plan to use LLVM for the Intel driver is dead in the > >> >> >>> water anyways - you can translate NIR into LLVM if you want, but for > >> >> >>> i965 we want to share optimizations between our 2 backends (FS and > >> >> >>> vec4) that we can't do today in GLSL IR so this is what we want to > >> >> >>> use > >> >> >>> for that, and since nobody else does anything with the core GLSL > >> >> >>> compiler except when they have to, when we start moving things out > >> >> >>> of > >> >> >>> GLSL IR this will probably replace GLSL IR as the infrastructure > >> >> >>> that > >> >> >>> all Mesa drivers use. But with that in mind, here are a few reasons > >> >> >>> why we wouldn't want to use LLVM: > >> >> >>> > >> >> >>> * LLVM wasn't built to understand structured CFG's, meaning that you > >> >> >>> need to re-structurize it using a pass that's fragile and prone to > >> >> >>> break if some other pass "optimizes" the shader in a way that makes > >> >> >>> it > >> >> >>> non-structured (i.e. not expressible in terms of loops and if > >> >> >>> statements). This loss of information also means that passes that > >> >> >>> need > >> >> >>> to know things like, for example, the loop nesting depth need to do > >> >> >>> an > >> >> >>> analysis pass whereas with NIR you can just walk up the control flow > >> >> >>> tree and count the number of loops we hit. > >> >> >>> > >> >> >> > >> >> >> LLVM has a pass to structurize the CFG. We use it in the radeon > >> >> >> drivers, and it is run after all of the other LLVM optimizations > >> >> >> which have > >> >> >> no concept of structured CFG. It's not bug free, but it works really > >> >> >> well even with all of the complex OpenCL kernels we throw at it. > >> >> >> > >> >> >> Your point about losing information when the CFG is de-structurized > >> >> >> is > >> >> >> valid, but for things like loop depth, I'm not sure why we couldn't > >> >> >> write an > >> >> >> LLVM analysis pass for this (if one doesn't already exist). > >> >> >> > >> >> > > >> >> > I don't think this is such a big deal either. At least the > >> >> > structurization pass used on newer AMD hardware isn't "fragile" in the > >> >> > way you seem to imply -- AFAIK (unlike the old AMDIL heuristic > >> >> > algorithm) it's guaranteed to give you a valid structurized output no > >> >> > matter what the previous optimization passes have done to the CFG, > >> >> > modulo bugs. I admit that the situation is nevertheless suboptimal. > >> >> > Ideally this information wouldn't get lost along the way. For the > >> >> > long > >> >> > term we may want to represent structured control flow directly in the > >> >> > IR > >> >> > as you say, I just don't see how reinventing the IR saves us any work > >> >> > if > >> >> > we could just fix the existing one. > >> >> > >> >> It seems to me that something like how we represent control flow is a > >> >> pretty fundamental part of the IR - it affects any optimization pass > >> >> that needs to do anything beyond adding and removing instructions. How > >> >> would you fix that, especially given that LLVM is primarily designed > >> >> for CPU's where you don't want to be restricted to structured control > >> >> flow at all? It seems like our goals (preserve the structure) conflict > >> >> with the way LLVM has been designed. > >> >> > >> > > >> > I think it's important to distinguish between LLVM IR and the tools > >> > available to manipulate it. LLVM IR is meant to be a platform > >> > independent program representation. There is nothing about the IR that > >> > would prevent someone from using it for hardware that required structured > >> > control flow. > >> > >> Right - when I said that structured control flow was a fundamental > >> part of the IR, I meant that in the sense that it's a constraint that > >> all optimization passes have to follow. I was also thinking of NIR, > >> where it actually is a fundamental part of the IR datastructures - all > >> control flow consists of a tree of loops, if statements, and basic > >> blocks and there are no jump statements in the IR except for break, > >> continue, and return. There are helpers to mutate the control flow > >> tree (adding an if after an instruction, deleting a loop, etc.) so > >> that you can more or less pretend you're operating on something like > >> GLSL IR, while the CFG is being updated for you, basic blocks are > >> being created and deleted, etc. > >> > >> > > >> > The tools (mainly the optimization passes) are where decisions about > >> > things like preserving structured control flow are made. There are > >> > currently two strategies available for using the tools to produce > >> > programs > >> > with structured control flow: > >> > > >> > 1. Use the CFG structurizer pass > >> > > >> > 2. Only use transforms that maintain the structure of the control flow. > >> > >> I'm a little confused about how this strategy would work. I'm assuming > >> that the control flow structure (i.e. the tree of loops and ifs) is > >> stored in some kind of metadata or fake instruction on top of the IR - > >> I haven't looked into this much, so correct me if I'm wrong. If so, > >> wouldn't you still have to make every optimization pass that touches > >> the CFG properly update that metadata to avoid it going stale, since > >> the optimizations themselves are operating on a list of basic blocks > >> which is a little lower-level? > >> > > > > There is no CFG metadata. If you want to collect some information about the > > CFG, you would use an analysis pass to do this. For example, LLVM has an > > analysis pass for computing the dominator tree. If an optimization > > wants to use this analysis it would add this analysis as a pass dependency > > and then LLVM would run the dominator tree analysis before the > > optimizations pass. > > > > Once the analysis has been run, the result is cached for other passes to > > use. > > However, the base assumption is that optimization passes invalidate > > all analysis information, so passes are required to report which analysis > > passes > > or which features of the program are preserved. So, if a pass reports > > that it preserves the CFG, then the dominator tree analysis is still > > considered > > valid. > > > > This a high level overview of how it works, but to get back to your > > question, > > if you wanted to use strategy number 2, you could just choose to only run > > optimizations that preserved the CFG. > > > > -Tom > > Ah, I see, that makes sense. That does seem like a rather terrible > solution though, since not being able to change the CFG seems rather > harsh. >
Yeah, that's why I listed it second ;) -Tom > >> > > >> > -Tom > >> > > >> >> > > >> >> >>> * LLVM doesn't do modifiers, meaning that we can't do optimizations > >> >> >>> like "clamp(x, 0.0, 1.0) => mov.sat x" and "clamp(x, 0.25, 1.0) => > >> >> >>> max.sat(x, .25)" in a generic fashion. > >> >> >>> > >> >> >> > >> >> >> The way to handle this with LLVM would be to add intrinsics to > >> >> >> represent > >> >> >> the various modifiers and then fold them into instructions during > >> >> >> instruction selection. > >> >> >> > >> >> > > >> >> > IMHO this is a feature. One of the things I don't like about NIR is > >> >> > that it's still vec4-centric. Most drivers are going to want > >> >> > something > >> >> > else and different to each other, we cannot please all of them with > >> >> > one > >> >> > single vector addressing model built into the core instruction set, so > >> >> > I'd rather have modifiers, writemasks and swizzles represented as the > >> >> > composition of separate instructions/intrinsics with simple and > >> >> > well-defined semantics, which can be coalesced back into the real > >> >> > instruction as Tom says (easy even if you don't use LLVM's instruction > >> >> > selector as long as it's SSA form). > >> >> > >> >> While NIR is vec4-centric, nothing's stopping you from splitting up > >> >> instructions and doing optimizations at the scalar level for scalar > >> >> ISA's - in fact, that's what I expect to happen. And for backends that > >> >> really do need to have swizzles and writemasks, coalescing these > >> >> things back into the original instruction is not at all trivial - in > >> >> fact, going into and out of SSA without introducing extra copies even > >> >> in situations like: > >> >> > >> >> foo.xyz = ... > >> >> ... = foo > >> >> foo.x = ... > >> >> > >> >> is a problem that hasn't been solved yet publicly (it seems doable, > >> >> but difficult). So while we might not need swizzles and writemasks for > >> >> most backends, for the few that do need it (like, for example, the > >> >> i965 vec4 backend) it will be very nice to have one common lowering > >> >> pass that solves this hard problem, which would be impossible to do > >> >> without having swizzles and writemasks in the IR. And it's very likely > >> >> that these backends, which probably aren't using SSA due to the > >> >> aforementioned difficulties, will also benefit from having modifiers > >> >> already folded for them - this is something that's already a problem > >> >> for i965 vec4 backend and that NIR will help a lot. > >> >> > >> >> > > >> >> >>> * LLVM is hard to embed into other projects, especially if it's used > >> >> >>> as anything but a command-line tool that only runs once. See, for > >> >> >>> example, > >> >> >>> http://blog.llvm.org/2014/07/ftl-webkits-llvm-based-jit.html > >> >> >>> under "Linking WebKit with LLVM" - most of those problems would also > >> >> >>> apply to us. > >> >> >>> > >> >> >> > >> >> >> You have to keep in mind that the way webkit uses LLVM is totally > >> >> >> different than how Mesa would use LLVM if LLVM IR was adopted as a > >> >> >> common IR. > >> >> >> > >> >> >> webkit is using LLVM as a full JIT compiler, which means it depends > >> >> >> on almost all of the pieces of the LLVM stack, the IR manipulation, > >> >> >> optimization passes, one or more of the code gen backends, as well > >> >> >> as the entire JIT layer. The JIT layer in particular is missing a > >> >> >> lot of > >> >> >> functionality in the C API, which makes it more difficult to work > >> >> >> with. > >> >> >> > >> >> >> If Mesa were to adopt LLVM IR as a common IR, the only LLVM library > >> >> >> functionality it would need would be the IR manipulation and the > >> >> >> optimizations passes. > >> >> >> > >> >> >>> * LLVM is on a different release schedule (6 months vs. 3 months), > >> >> >>> has > >> >> >>> a different review process, etc., which means that to add support > >> >> >>> for > >> >> >>> new functionality that involves shaders, we now have to submit > >> >> >>> patches > >> >> >>> to two separate projects, and then 2 months later when we ship Mesa > >> >> >>> it > >> >> >>> turns out that nobody can actually use the new feature because it > >> >> >>> depends upon an unreleased version of LLVM that won't be released > >> >> >>> for > >> >> >>> another 3 months and then packaged by distros even later... we've > >> >> >>> already had problems where distros refused to ship newer Mesa > >> >> >>> releases > >> >> >>> because radeon depended on a version of LLVM newer than the one they > >> >> >>> were shipping, and if we started using LLVM in core Mesa it would > >> >> >>> get > >> >> >>> even worse. Proprietary drivers solve this problem by just forking > >> >> >>> LLVM, building it with the rest of their driver, and linking it in > >> >> >>> as > >> >> >>> a static library, but distro packagers would hate us if we did that. > >> >> >>> > >> >> >> > >> >> >> If Mesa were using LLVM IR as a common IR I'm not sure what features > >> >> >> in Mesa would be tied to new additions in LLVM. As I said before, > >> >> >> all Mesa would be using would be the IR manipulations and the > >> >> >> optimization passes. The IR manipulations only require new features > >> >> >> when something new is added to LLVM IR specification, which is rare. > >> >> >> It's possible there could be some lag in new features that go into > >> >> >> the optimization passes, but if there was some optimization that was > >> >> >> deemed really critical, it could be implemented in Mesa using the IR > >> >> >> manipulators. > >> >> >> > >> >> >> -Tom > >> >> >> > >> >> >>> I wouldn't completely rule out LLVM, and I do think they do a lot of > >> >> >>> things right, but for now it seems like it's not the path that the > >> >> >>> Intel team wants to take. > >> >> >>> > >> >> >>> Connor > >> >> >>> _______________________________________________ > >> >> >>> mesa-dev mailing list > >> >> >>> mesa-dev@lists.freedesktop.org > >> >> >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > >> >> >> _______________________________________________ > >> >> >> mesa-dev mailing list > >> >> >> mesa-dev@lists.freedesktop.org > >> >> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev