On Fri, Feb 07, 2014 at 02:52:15PM -0500, Rob Clark wrote: > On Fri, Feb 7, 2014 at 11:20 AM, Christian König > <deathsim...@vodafone.de> wrote: > > Am 07.02.2014 16:49, schrieb Alex Deucher: > > > >> On Fri, Feb 7, 2014 at 12:34 AM, Connor Abbott <cwabbo...@gmail.com> > >> wrote: > >>> > >>> Hi, > >>> > >>> So I believe that we can all agree that the tree-based representation > >>> that GLSL IR currently uses for shaders needs to go. For the benefit > >>> of those that didn't watch Ian Romanick's talk at FOSDEM, I'll > >>> reiterate some of the problems with it as of now: > >>> > >>> - All the ir_dereference chains blow up the memory usage, and the > >>> constant pointer chasing in the recursive algorithms needed to handle > >>> them is not just cache-unfriendly but "cache-mean." > >>> > >>> - The ir_hierachical_visitor pattern that we currently use for > >>> optimization/analysis passes has to examine every piece of IR, even > >>> the irrelevant stuff, making the above problems even worse. > >>> > >>> - Nobody else does it this way, meaning that the existing well-known > >>> optimizations don't apply as much here, and oftentimes we have to > >>> write some pretty nasty code in order to make necessary optimizations > >>> (like tree grafting). > >>> > >>> - It turns out that the original advantage of a tree-based IR, to be > >>> able to automatically generate pattern-matching code for optimizing > >>> certain code patterns, only really matters for CPU's with weird > >>> instruction sets with lots of exotic instructions; GPU's tend to be > >>> pretty regular and consistent in their ISA's, so being able to > >>> pattern-match with trees doesn't help us much here. > >>> > >>> Finally, it seems like a lot of important SSA-based passes assume that > >>> we have a flat IR, and so moving to SSA won't be nearly as beneficial > >>> as we would like it to be; we could flatten the IR before doing these > >>> passes, but that would make the first problem even worse. So we can't > >>> really take advantage of SSA too much either until we have a flat IR. > >>> > >>> The real issue is, how do we let this transition occur gradually, in > >>> pieces, without breaking existing code? Ian proposed one solution at > >>> FOSDEM, but here's my idea of another. > >>> > >>> So, my idea is that rather than slowly introducing changes across the > >>> board, we create the IR in its final form in the beginning, write > >>> passes to flatten and unflatten the IR, and then piece-by-piece > >>> rewrite the rest of the compiler. We're going to have to rewrite a lot > >>> of the passes to support SSA in the first place, so why not convert > >>> them to a flat IR while we're at it? The benefit of this is that it's > >>> much easier to do asynchronously and in parallel; rather than > >>> introducing changes to the entire thing at once, several people can > >>> convert this and that pass, the frontend, the linker, etc. > >>> independently. It would entail some extra overhead during the > >>> transition in the form of the flattening and unflattening passes, but > >>> I think it would be worth it for the immediate benefits (optimizations > >>> like GVN-GCM and CSE made possible, etc.). > >>> > >>> The first part to be converted would be my passes to convert to and > >>> from SSA, so that the compiler optimization part would look like this: > >>> > >>> flatten -> convert to SSA -> (the new hotness) -> out of SSA -> > >>> unflatten -> (the old stuff) > >>> > >>> Then we gradually convert ast_to_hir, various passes, the linker, > >>> backends, etc. to this form while now actually having the > >>> infrastructure to implement any advanced compiler optimization > >>> designed in the last ~15 years or so by more-or-less copying down the > >>> pseudocode. Hopefully, then, we can reach a point where we can rip out > >>> the old IR and the converters. > >>> > >>> So what would this new IR look like? Well, here's my 2 cents (in the > >>> form of some abridged class definitions, you should get the point...) > >>> > >>> struct ir_calc_source > >>> { > >>> mode; /** < SSA or non-SSA */ > >>> union { > >>> ir_calculation *def; /** < for SSA sources */ > >>> unsigned int reg; /** < for non-SSA sources */ > >>> } src; > >>> unsigned swizzle : 8; > >>> }; > >>> > >>> struct ir_calc_dest > >>> { > >>> mode; /** < SSA or non-SSA */ > >>> union { > >>> unsigned int reg; /** < for non-SSA destinations */ > >>> > >>> /** > >>> * For SSA destinations. Types are needed here because > >>> normally they're part > >>> * of the register, but SSA doesn't have registers. > >>> */ > >>> glsl_type *type; > >>> } reg_or_type; /* this name is kinda ugly but couldn't think of > >>> anything better. */ > >>> }; > >>> > >>> /* > >>> * This is Ian's name for it, personally I would vote for > >>> s/ir_instruction/ir_node/ and > >>> * call this ir_instruction > >>> */ > >>> > >>> class ir_calculation > >>> { > >>> ir_calc_dest dest; > >>> ir_expression_operation op; > >>> unsigned write_mask : 4; > >>> ir_calc_source srcs[4]; > >>> }; > >>> > >>> class ir_load_var > >>> { > >>> ir_calc_dest dest; > >>> ir_variable *src; > >>> > >>> /** > >>> * For array and record loads, whether we're loading a specific > >>> member or the whole > >>> * thing. > >>> */ > >>> bool deref_member; > >>> ir_calc_source array_index; /** < for array loads if > >>> deref_array_index is true */ > >>> char *record_index; /** < for structure loads */ > >>> }; > >>> > >>> class ir_store_var > >>> { > >>> ir_variable *dest; > >>> ir_calc_source src; > >>> bool deref_member; > >>> ir_calc_source array_index; /** < for array loads */ > >>> char *record_index; /** < for structure loads */ > >>> unsigned write_mask : 4; > >>> }; > >>> > >>> So ir_variable still exists, but it will only be used for function > >>> parameters, shader in/outs and uniforms, and arrays and structures. > >>> Registers will be much more lightweight, only requiring a table with > >>> each register's type and perhaps uses and definitions. The flattening > >>> pass, and later ast_to_hir, will emit loads and stores wherever there > >>> is an ir_dereference now, but there will be an ir_variable -> register > >>> pass that converts these to moves that will later be eliminated by > >>> copy propagation (in SSA form, after converting the registers to SSA > >>> writes). This is similar to how LLVM works, with everything starting > >>> out allocated on the stack using alloca (equivalent to ir_variables > >>> here) and accessed explicitly using loads and stores, but then some of > >>> these loads/stores are optimized out. > >>> > >> What about just moving to llvm directly? We already use it for > >> compute/OpenCL on gallium and as the shader compiler for radeon > >> hardware and llvmpipe. > > > > > > That was discussed in the talk as well. LLVM would be a good choice for > > this, the only problem is that they have no stable API. > > > > I'm currently thinking about if it isn't possible to make llvm-c stable and > > reliable enough to be used for this, but this is rather something we would > > need to discuss with the LLVM folks as well. > > Would the C API be sufficient for a driver that had it's own special > scheduling or register assignment constraints? Or would it just be > something we continue to turn into our own driver private IR like we > currently do with tgsi? >
You can't really use the C API or even the C++ API for scheduling or register assignments. You would need to write a full-blown LLVM backend in order to do that, otherwise you would still have to lower it into your on driver specific IR. The C API gives gives you the ability to manipulate the IR and also run generic transforms and optimization passes on your code. If LLVM IR were used as a target independent shader IR in Mesa, I think this would be most of the functionality that was needed. > Just curious, something more suitable than tgsi would be nice but > dealing with unstable c++ abi seems like a real pain. Especially on > slower arm devices if I end up having to recompile llvm all the time. > It may be possible to build only a subset of the LLVM libraries to use with Mesa. If you didn't have to build any of the CodeGen libraries, then recompiling wouldn't be so bad. -Tom > BR, > -R > > > Christian. > > > > > >> > >> Alex > >> _______________________________________________ > >> mesa-dev mailing list > >> mesa-dev@lists.freedesktop.org > >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev