On Fri, Feb 7, 2014 at 12:34 AM, Connor Abbott <cwabbo...@gmail.com> wrote: > Hi, > > So I believe that we can all agree that the tree-based representation > that GLSL IR currently uses for shaders needs to go. For the benefit > of those that didn't watch Ian Romanick's talk at FOSDEM, I'll > reiterate some of the problems with it as of now: > > - All the ir_dereference chains blow up the memory usage, and the > constant pointer chasing in the recursive algorithms needed to handle > them is not just cache-unfriendly but "cache-mean." > > - The ir_hierachical_visitor pattern that we currently use for > optimization/analysis passes has to examine every piece of IR, even > the irrelevant stuff, making the above problems even worse. > > - Nobody else does it this way, meaning that the existing well-known > optimizations don't apply as much here, and oftentimes we have to > write some pretty nasty code in order to make necessary optimizations > (like tree grafting). > > - It turns out that the original advantage of a tree-based IR, to be > able to automatically generate pattern-matching code for optimizing > certain code patterns, only really matters for CPU's with weird > instruction sets with lots of exotic instructions; GPU's tend to be > pretty regular and consistent in their ISA's, so being able to > pattern-match with trees doesn't help us much here. > > Finally, it seems like a lot of important SSA-based passes assume that > we have a flat IR, and so moving to SSA won't be nearly as beneficial > as we would like it to be; we could flatten the IR before doing these > passes, but that would make the first problem even worse. So we can't > really take advantage of SSA too much either until we have a flat IR. > > The real issue is, how do we let this transition occur gradually, in > pieces, without breaking existing code? Ian proposed one solution at > FOSDEM, but here's my idea of another. > > So, my idea is that rather than slowly introducing changes across the > board, we create the IR in its final form in the beginning, write > passes to flatten and unflatten the IR, and then piece-by-piece > rewrite the rest of the compiler. We're going to have to rewrite a lot > of the passes to support SSA in the first place, so why not convert > them to a flat IR while we're at it? The benefit of this is that it's > much easier to do asynchronously and in parallel; rather than > introducing changes to the entire thing at once, several people can > convert this and that pass, the frontend, the linker, etc. > independently. It would entail some extra overhead during the > transition in the form of the flattening and unflattening passes, but > I think it would be worth it for the immediate benefits (optimizations > like GVN-GCM and CSE made possible, etc.). > > The first part to be converted would be my passes to convert to and > from SSA, so that the compiler optimization part would look like this: > > flatten -> convert to SSA -> (the new hotness) -> out of SSA -> > unflatten -> (the old stuff) > > Then we gradually convert ast_to_hir, various passes, the linker, > backends, etc. to this form while now actually having the > infrastructure to implement any advanced compiler optimization > designed in the last ~15 years or so by more-or-less copying down the > pseudocode. Hopefully, then, we can reach a point where we can rip out > the old IR and the converters. > > So what would this new IR look like? Well, here's my 2 cents (in the > form of some abridged class definitions, you should get the point...) > > struct ir_calc_source > { > mode; /** < SSA or non-SSA */ > union { > ir_calculation *def; /** < for SSA sources */ > unsigned int reg; /** < for non-SSA sources */ > } src; > unsigned swizzle : 8; > }; > > struct ir_calc_dest > { > mode; /** < SSA or non-SSA */ > union { > unsigned int reg; /** < for non-SSA destinations */ > > /** > * For SSA destinations. Types are needed here because > normally they're part > * of the register, but SSA doesn't have registers. > */ > glsl_type *type; > } reg_or_type; /* this name is kinda ugly but couldn't think of > anything better. */ > }; > > /* > * This is Ian's name for it, personally I would vote for > s/ir_instruction/ir_node/ and > * call this ir_instruction > */ > > class ir_calculation > { > ir_calc_dest dest; > ir_expression_operation op; > unsigned write_mask : 4; > ir_calc_source srcs[4]; > }; > > class ir_load_var > { > ir_calc_dest dest; > ir_variable *src; > > /** > * For array and record loads, whether we're loading a specific > member or the whole > * thing. > */ > bool deref_member; > ir_calc_source array_index; /** < for array loads if > deref_array_index is true */ > char *record_index; /** < for structure loads */ > }; > > class ir_store_var > { > ir_variable *dest; > ir_calc_source src; > bool deref_member; > ir_calc_source array_index; /** < for array loads */ > char *record_index; /** < for structure loads */ > unsigned write_mask : 4; > }; > > So ir_variable still exists, but it will only be used for function > parameters, shader in/outs and uniforms, and arrays and structures. > Registers will be much more lightweight, only requiring a table with > each register's type and perhaps uses and definitions. The flattening > pass, and later ast_to_hir, will emit loads and stores wherever there > is an ir_dereference now, but there will be an ir_variable -> register > pass that converts these to moves that will later be eliminated by > copy propagation (in SSA form, after converting the registers to SSA > writes). This is similar to how LLVM works, with everything starting > out allocated on the stack using alloca (equivalent to ir_variables > here) and accessed explicitly using loads and stores, but then some of > these loads/stores are optimized out. >
What about just moving to llvm directly? We already use it for compute/OpenCL on gallium and as the shader compiler for radeon hardware and llvmpipe. Alex _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev