Connor Abbott <cwabbo...@gmail.com> writes: > On Tue, Aug 19, 2014 at 11:33 PM, Francisco Jerez <curroje...@riseup.net> > wrote: >> Connor Abbott <cwabbo...@gmail.com> writes: >> >>> On Tue, Aug 19, 2014 at 11:40 AM, Francisco Jerez <curroje...@riseup.net> >>> wrote: >>>> Tom Stellard <t...@stellard.net> writes: >>>> >>>>> On Tue, Aug 19, 2014 at 11:04:59AM -0400, Connor Abbott wrote: >>>>>> On Mon, Aug 18, 2014 at 8:52 PM, Michel Dänzer <mic...@daenzer.net> >>>>>> wrote: >>>>>> > On 19.08.2014 01:28, Connor Abbott wrote: >>>>>> >> On Mon, Aug 18, 2014 at 4:32 AM, Michel Dänzer <mic...@daenzer.net> >>>>>> >> wrote: >>>>>> >>> On 16.08.2014 09:12, Connor Abbott wrote: >>>>>> >>>> I know what you might be thinking right now. "Wait, *another* IR? >>>>>> >>>> Don't >>>>>> >>>> we already have like 5 of those, not counting all the >>>>>> >>>> driver-specific >>>>>> >>>> ones? Isn't this stuff complicated enough already?" Well, there are >>>>>> >>>> some >>>>>> >>>> pretty good reasons to start afresh (again...). In the years we've >>>>>> >>>> been >>>>>> >>>> using GLSL IR, we've come to realize that, in fact, it's not what we >>>>>> >>>> want *at all* to do optimizations on. >>>>>> >>> >>>>>> >>> Did you evaluate using LLVM IR instead of inventing yet another one? >>>>>> >>> >>>>>> >>> >>>>>> >>> -- >>>>>> >>> Earthling Michel Dänzer | >>>>>> >>> http://www.amd.com >>>>>> >>> Libre software enthusiast | Mesa and X >>>>>> >>> developer >>>>>> >> >>>>>> >> Yes. See >>>>>> >> >>>>>> >> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053502.html >>>>>> >> >>>>>> >> and >>>>>> >> >>>>>> >> http://lists.freedesktop.org/archives/mesa-dev/2014-February/053522.html >>>>>> > >>>>>> > I know Ian can't deal with LLVM for some reason. I was wondering if >>>>>> > *you* evaluated it, and if so, why you rejected it. >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Earthling Michel Dänzer | >>>>>> > http://www.amd.com >>>>>> > Libre software enthusiast | Mesa and X >>>>>> > developer >>>>>> >>>>>> >>>>>> Well, first of all, the fact that Ian and Ken don't want to use it >>>>>> means that any plan to use LLVM for the Intel driver is dead in the >>>>>> water anyways - you can translate NIR into LLVM if you want, but for >>>>>> i965 we want to share optimizations between our 2 backends (FS and >>>>>> vec4) that we can't do today in GLSL IR so this is what we want to use >>>>>> for that, and since nobody else does anything with the core GLSL >>>>>> compiler except when they have to, when we start moving things out of >>>>>> GLSL IR this will probably replace GLSL IR as the infrastructure that >>>>>> all Mesa drivers use. But with that in mind, here are a few reasons >>>>>> why we wouldn't want to use LLVM: >>>>>> >>>>>> * LLVM wasn't built to understand structured CFG's, meaning that you >>>>>> need to re-structurize it using a pass that's fragile and prone to >>>>>> break if some other pass "optimizes" the shader in a way that makes it >>>>>> non-structured (i.e. not expressible in terms of loops and if >>>>>> statements). This loss of information also means that passes that need >>>>>> to know things like, for example, the loop nesting depth need to do an >>>>>> analysis pass whereas with NIR you can just walk up the control flow >>>>>> tree and count the number of loops we hit. >>>>>> >>>>> >>>>> LLVM has a pass to structurize the CFG. We use it in the radeon >>>>> drivers, and it is run after all of the other LLVM optimizations which >>>>> have >>>>> no concept of structured CFG. It's not bug free, but it works really >>>>> well even with all of the complex OpenCL kernels we throw at it. >>>>> >>>>> Your point about losing information when the CFG is de-structurized is >>>>> valid, but for things like loop depth, I'm not sure why we couldn't write >>>>> an >>>>> LLVM analysis pass for this (if one doesn't already exist). >>>>> >>>> >>>> I don't think this is such a big deal either. At least the >>>> structurization pass used on newer AMD hardware isn't "fragile" in the >>>> way you seem to imply -- AFAIK (unlike the old AMDIL heuristic >>>> algorithm) it's guaranteed to give you a valid structurized output no >>>> matter what the previous optimization passes have done to the CFG, >>>> modulo bugs. I admit that the situation is nevertheless suboptimal. >>>> Ideally this information wouldn't get lost along the way. For the long >>>> term we may want to represent structured control flow directly in the IR >>>> as you say, I just don't see how reinventing the IR saves us any work if >>>> we could just fix the existing one. >>> >>> It seems to me that something like how we represent control flow is a >>> pretty fundamental part of the IR - it affects any optimization pass >>> that needs to do anything beyond adding and removing instructions. How >>> would you fix that, especially given that LLVM is primarily designed >>> for CPU's where you don't want to be restricted to structured control >>> flow at all? It seems like our goals (preserve the structure) conflict >>> with the way LLVM has been designed. >>> >> I think we can fix this by introducing new structured variants of the >> branch instruction in a way that doesn't alter the fundamental structure >> of the IR. E.g. an if branch could look like: >> >> ifbr i1 <cond>, label <iftrue>, label <iffalse>, label <join> >> >> Where both branches are guaranteed to converge at <join>. Sure, this >> will require fixing many assumptions, but on the one hand it's not >> immediately required (as we can address this problem for the time being >> using the same solution AMD uses) and on the other hand it's still less >> work than starting from scratch. > > I disagree with the "less work than starting from scratch" part, > especially since it involves modifying it in a pretty invasive way, > when we won't even need half of the things that it does for us. LLVM > just isn't a solution to everything - there is no one-size-fits-all > compiler. >
*Shrug* That's quite a strong statement. Honestly I haven't ruled out the possibility of coming up with a decent IR by ourselves yet, but at this point I feel like improving the LLVM framework to make it more suitable for GPUs would be a much more promising use of my time than working on NIR -- Even if starting from scratch sounds like a lot more fun. >> >>>> >>>>>> * LLVM doesn't do modifiers, meaning that we can't do optimizations >>>>>> like "clamp(x, 0.0, 1.0) => mov.sat x" and "clamp(x, 0.25, 1.0) => >>>>>> max.sat(x, .25)" in a generic fashion. >>>>>> >>>>> >>>>> The way to handle this with LLVM would be to add intrinsics to represent >>>>> the various modifiers and then fold them into instructions during >>>>> instruction selection. >>>>> >>>> >>>> IMHO this is a feature. One of the things I don't like about NIR is >>>> that it's still vec4-centric. Most drivers are going to want something >>>> else and different to each other, we cannot please all of them with one >>>> single vector addressing model built into the core instruction set, so >>>> I'd rather have modifiers, writemasks and swizzles represented as the >>>> composition of separate instructions/intrinsics with simple and >>>> well-defined semantics, which can be coalesced back into the real >>>> instruction as Tom says (easy even if you don't use LLVM's instruction >>>> selector as long as it's SSA form). >>> >>> While NIR is vec4-centric, nothing's stopping you from splitting up >>> instructions and doing optimizations at the scalar level for scalar >>> ISA's - in fact, that's what I expect to happen. And for backends that >>> really do need to have swizzles and writemasks, coalescing these >>> things back into the original instruction is not at all trivial >> >> It's a simple peephole optimization AFAICT: >> >> val2 = alu-op(modifier(val1)) -> hardware-specific-extended-alu-op(val) >> val2 = shuffle(val2, alu-op(val1)) -> >> hardware-specific-alu-op-with-writemask(val2, val1) > > No, it's not. Imagine something like: > > vec4 foo = ... > vec4 bar = ... > vec4 baz = vec4(foo.xy, bar.zw) > ... = foo > ... = bar > ... = baz > > where the vec4() is the shuffle instruction. In this case, you can't > eliminate the shuffle - you need to insert writemasked moves when you > come out of SSA: > > vec4 foo = ... > vec4 bar = ... > baz.xy = foo.xy > baz.zw = bar.zw > > This basically comes down to something analogous to a register > allocation problem, where in this case the scalar components that we > want to put into a single vec4 (foo, bar, and baz) can't fit - we need > to "spill" by inserting copies. Then, once we've done this, we have to > convert it into a non-SSA form with registers, writemasks, and > swizzles - something that would be easy to do in the IR -> backend > translation, if it really were just a simple peephole, but in this > case it's not and so you either have to consult the result of your > analysis during the translation or have an IR that can represent > swizzles, writemasks, and non-SSA registers for you like NIR does. Of > course, LLVM will help with none of this because it's vectorization > model is built around CPU vector processors like SSE, NEON, etc. and > so AFAIK it has no concept of per-component liveness, and even if it > did, this stuff is intimately tied to the out-of-SSA process itself so > we would basically have to write it from scratch anyways. > I think you keep mixing two unrelated problems: 1/ How we represent vector addressing, writemasks and modifiers in the core IR. 2/ How we bring vector operations back into non-SSA form. Re 1 you propose making the vec4 model a central part of the IR rather than using composition of simpler operations. Whatever we do, going From one representation to the other is a simple peephole, which I never meant would be a solution for 2. Re 2 I agree with you that it would ideally be taken care of by a shared transformation pass because of its complexity, but I disagree that a vec4-centric IR is required for this purpose, or even especially useful, because different hardware has wildly different vector models with different constraints and requiring a different representation, so I think ideally we would have some mechanism for back-ends to provide their own representation in the form of machine-specific instructions accompanied with some machine-specific logic.
pgpCwdmmNInHV.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev