NIR (pronounced "ner") is a new IR (internal representation) for the Mesa shader compiler that will sit between the old IR (GLSL IR) and back-end compilers. The primary purpose of NIR is to be more efficient for doing optimizations and generate better code for the back-ends. We have a lot of optimizations implemented in GLSL IR right now. However, they still generate fairly bad code primarily because its tree-based structure makes writing good optimizations difficult. For this reason, we have implemented a lot of optimizations in the i965 back-end compilers just to fix up the code we get from GLSL IR. The "proper fix" to this is to implement a better high-level IR; enter NIR.
Most of the initial work on NIR including setting up common data structures, helper methods, and a few basic passes was by Connor Abbot who interned with us over the summer. Connor did a fantastic job, but there is still a lot left to be done. I've spent the last two months trying to fill in the pieces that we need in order to get NIR off the ground. At this point, we now have compitent in and out of SSA passes, are at zero piglit regressions for i965 SIMD8 fragment shaders, and the shader-db numbers aren't terrible. This is still a bit experimental. I have been testing only on HSW but it should work ok on SNB and later. Eventually, once we get booleans fixed up, it should work fine on older chips as well. It also doesn't yet support SIMD16, so performance won't be that great. That said, I think we are at the point now where we should try and land this and I can stop developing in my masive private branch. Since this isn't quite ready for prime-time yet, using it requires setting the INTEL_USE_NIR environment variable. A few key points about NIR: 1. It is primarily an SSA-based IR. 2. It supports source/destination-modifiers and swizzles/*write-masks. 3. Standard GPU operations such as sin() and fmad() are first-class ALU operations, not intrinsics. 4. GLSL concepts like inputs, outputs, uniforms, etc. are built into the IR so we can do proper analysis on them. 5. Even though it's SSA, it still has a concept of registers and write-masks in the core IR data structures. This means we can generate code that is much closer to what backends want. 6. Control flow is structured explicitly in the IR. (*write-masks are not available for SSA values) While source/destination modifiers and writemasks/swizzles are not particularly useful for optimizations, having them represented in the IR gives us the ability to generate more useful code for backends. A few notes about review: 1. For those of you who aren't interested in the general compiler, I'm sorry for the patch-bomb. However, several people have requsted that we maintain the history of the NIR development since connor's original drop at the end of the summer. Therefore, while I've squashed several things, I've tried to leave the diff of what I've done more-or-less preserved. 2. No, this is not LLVM. There was a long-winded discussion about that when Connor dropped his patches that went a whole lot of nowhere as usual. I would really prefer if we left that debate alone. If there must be bikeshedding on the topic, please do so on the cover-letter e-mail. 3. Please keep all bikeshedding about C++, typedefs, etc. on the core datastructures e-mail. If we need, we can split that off in its own thread. 4. While I welcome review, I don't plan to make non-trivial changes to specific patches or squash anything beyond what has already been squashed. I've tried thus far to more-or-less keep the history and I'd like to continue this if we can. 5. Eric Anholt has also written NIR -> TGSI -> NIR passes which will hopefully get landed soon after NIR initially lands. Exactly how that all gets hooked up for other gallium drivers beyond vc4 is outside the scope of this series. I have pushed a branch to my personal freedesktop.org account. For certain types of review, it may be easier to look at the end result rather than the patches. The branch can be found via freedesktop cgit here: http://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/nir-v1 Last week, I did a presentation for some of the other Intel people to try and help bring them up to speed on NIR concepts quickly. As part of this, I typed up a bunch of notes that provide a decent overview of a lot of NIR concepts. Those notes can be found here: http://www.jlekstrand.net/jason/projects/mesa/nir-notes/ Happy reviewing! P.S. Connor, Don't do too much reviewing before your finals are done. :-P Connor Abbott (22): exec_list: add a list_foreach_typed_reverse() macro nir: add initial README nir: add a simple C wrapper around glsl_types.h nir: add the core datastructures nir: add core helper functions nir: add a printer nir: add a validation pass nir: add a glsl-to-nir pass nir: add a pass to lower variables for scalar backends nir: keep track of the number of input, output, and uniform slots nir: add a pass to remove unused variables nir: add a pass to lower sampler instructions nir: add a pass to lower system value reads nir: add a pass to lower atomics nir: add an optimization to turn global registers into local registers nir: calculate dominance information nir: add a pass to convert to SSA nir: add an SSA-based copy propagation pass nir: add an SSA-based dead code elimination pass i965/fs: make emit_fragcoord_interpolation() more general i965/fs: Don't pass through the coordinate type i965/fs: add a NIR frontend Jason Ekstrand (101): i965/fs: Only use nir for 8-wide non-fast-clear shaders. i965/fs_nir: Make the sampler register always unsigned i965/fs_nir: Use the correct types for texture inputs i965/fs_nir: Use the correct texture offset immediate Fix what I think are a few NIR typos Fix up varying pull constants i965/fs_nir: Add support for sample_pos and sample_id nir/glsl: Add support for saturate nir: Add fine and coarse derivative opcodes nir/glsl: Add support for coarse and fine derivatives i965/fs_nir: Handle coarse/fine derivatives nir/lower_atomics: Multiply array offsets by ATOMIC_COUNTER_SIZE i965/fs_nir: Add atomic counters support i965/fs: Allow reinterpretation in constant propagation nir: Add NIR_TRUE and NIR_FALSE constants and use them for boolean immediates nir: Add intrinsics to do alternate interpolation on inputs i965/fs: Don't take an ir_variable for emit_general_interpolation i965/fs_nir: Don't duplicate emit_general_interpolation nir: Add a naieve from-SSA pass nir: Add a lower_vec_to_movs pass i965/fs_nir: Convert the shader to/from SSA nir/lower_variables_scalar: Silence a compiler warning nir: Add a basic metadata management system nir: Add an assert nir/foreach_block: Return false if the callback on the last block fails nir: Add a foreach_block_reverse function nir: Add a function to detect if a block is immediately followed by an if nir: Make the nir_index_* functions return the nuber of items nir: Add an SSA-based liveness analysis pass. nir: Add an initialization function for SSA definitions nir: Automatically handle SSA uses when an instruction is inserted nir: Add a function for rewriting all the uses of a SSA def nir: Add a parallel copy instruction type nir: Add a function for comparing two sources nir: Add a better out-of-SSA pass i965/fs_nir: Do retyping for ALU srouces in get_nir_alu_src glsl/list: Fix the exec_list_validate function nir: Validate all lists in the validator nir/print: Don't reindex things nir: Differentiate between signed and unsigned versions of find_msb i965/fs_nir: Validate optimization passes nir/nir: Fix a bug in move_successors glsl/list: Add a foreach_list_typed_safe_reverse macro nir/nir: Use safe iterators when iterating over the CFG nir/nir: Patch up phi predecessors in move_successors nir: Add a peephole select optimization i965/fs_nir: Turn on the peephole select optimization nir: Validate that the SSA def and register indices are unique nir: Add a fused multiply-add peephole nir: Add a basic CSE pass i965/fs_nir: Add the CSE pass and actually run in a loop i965/fs_nir: Use an array rather than a hash table for register lookup i965/fs_nir: Handle SSA constants i965/fs_nir: Properly saturate multiplies nir: Add a helper for rewriting an instruction source nir/lower_samplers: Use the nir_instr_rewrite_src function nir: Clean up nir_deref helper functions nir: Make array deref direct vs. indirect an enum nir: Add a concept of a wildcard array dereference nir: Use an integer index for specifying structure fields nir: Don't require a function in ssa_def_init nir/copy_propagate: Don't cause size mismatches on phi node sources nir: Validate that the sources of a phi have the same size as the destination nir/glsl: Don't allocate a state_slots array for 0 state slots i965/fs_nir: Don't dump the shader. nir: Use the enum for the variable mode nir: Automatically update SSA if uses nir: Add a copy splitting pass nir: Add a pass to lower local variable accesses to SSA values nir: Add a pass to lower local variables to registers nir: Add a pass for lowering input/output loads/stores nir: Add a pass to lower global variables to local variables nir/glsl: Generate SSA NIR i965/fs_nir: Use the new variable lowering code nir/validate: Ensure that outputs are write-only and inputs are read-only nir: Remove the old variable lowering code nir: Vectorize intrinsics nir/validate: Validate intrinsic source/destination sizes nir: Add gpu_shader5 interpolation intrinsics nir/glsl: Add support for gpu_shader5 interpolation instrinsics nir: Add a helper for getting a constant value from an SSA source i965/fs_nir: Add a has_indirect flag and clean up some of the input/output code i965/fs_nir: Implement the ARB_gpu_shader5 interpolation intrinsics nir: Add neg, abs, and sat opcodes nir: Add a lowering pass for adding source modifiers where possible nir: Make the type casting operations static inline functions nir/glsl: Emit abs, neg, and sat operations instead of source modifiers nir: Add an expression matching framework nir: Add infastructure for generating algebraic transformation passes nir: Add an algebraic optimization pass nir: Add a basic constant folding pass nir: Remove the ffma peephole nir: Make texture instruction names more consistent nir: Constant fold array indirects nir: Use a source for uniform buffer indices instead of an index nir: Add a sampler index indirect to nir_tex_instr nir: Rework the way samplers are lowered i965/fs_nir: Add support for indirect texture arrays nir/metadata: Rename metadata_dirty to metadata_preserve nir: Call nir_metadata_preserve more places nir: Make bcsel a fully vector operation src/glsl/Makefile.am | 10 +- src/glsl/Makefile.sources | 39 +- src/glsl/list.h | 19 +- src/glsl/nir/README | 118 ++ src/glsl/nir/glsl_to_nir.cpp | 1825 +++++++++++++++++ src/glsl/nir/glsl_to_nir.h | 40 + src/glsl/nir/nir.c | 2042 ++++++++++++++++++++ src/glsl/nir/nir.h | 1433 ++++++++++++++ src/glsl/nir/nir_algebraic.py | 249 +++ src/glsl/nir/nir_dominance.c | 298 +++ src/glsl/nir/nir_from_ssa.c | 859 ++++++++ src/glsl/nir/nir_intrinsics.c | 49 + src/glsl/nir/nir_intrinsics.h | 140 ++ src/glsl/nir/nir_live_variables.c | 282 +++ src/glsl/nir/nir_lower_atomics.c | 146 ++ src/glsl/nir/nir_lower_global_vars_to_local.c | 107 + src/glsl/nir/nir_lower_io.c | 324 ++++ src/glsl/nir/nir_lower_locals_to_regs.c | 308 +++ src/glsl/nir/nir_lower_samplers.cpp | 181 ++ src/glsl/nir/nir_lower_system_values.c | 107 + src/glsl/nir/nir_lower_to_source_mods.c | 181 ++ src/glsl/nir/nir_lower_variables.c | 1046 ++++++++++ src/glsl/nir/nir_lower_vec_to_movs.c | 96 + src/glsl/nir/nir_metadata.c | 54 + src/glsl/nir/nir_opcodes.c | 46 + src/glsl/nir/nir_opcodes.h | 356 ++++ src/glsl/nir/nir_opt_algebraic.py | 67 + src/glsl/nir/nir_opt_constant_folding.c | 355 ++++ src/glsl/nir/nir_opt_copy_propagate.c | 325 ++++ src/glsl/nir/nir_opt_cse.c | 269 +++ src/glsl/nir/nir_opt_dce.c | 186 ++ src/glsl/nir/nir_opt_global_to_local.c | 103 + src/glsl/nir/nir_opt_peephole_select.c | 214 ++ src/glsl/nir/nir_print.c | 948 +++++++++ src/glsl/nir/nir_remove_dead_variables.c | 138 ++ src/glsl/nir/nir_search.c | 337 ++++ src/glsl/nir/nir_search.h | 80 + src/glsl/nir/nir_split_var_copies.c | 225 +++ src/glsl/nir/nir_to_ssa.c | 660 +++++++ src/glsl/nir/nir_types.cpp | 143 ++ src/glsl/nir/nir_types.h | 75 + src/glsl/nir/nir_validate.c | 912 +++++++++ src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_fs.cpp | 74 +- src/mesa/drivers/dri/i965/brw_fs.h | 57 +- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 4 +- src/mesa/drivers/dri/i965/brw_fs_fp.cpp | 32 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 1778 +++++++++++++++++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 39 +- src/mesa/main/bitset.h | 1 + 50 files changed, 17301 insertions(+), 77 deletions(-) create mode 100644 src/glsl/nir/README create mode 100644 src/glsl/nir/glsl_to_nir.cpp create mode 100644 src/glsl/nir/glsl_to_nir.h create mode 100644 src/glsl/nir/nir.c create mode 100644 src/glsl/nir/nir.h create mode 100644 src/glsl/nir/nir_algebraic.py create mode 100644 src/glsl/nir/nir_dominance.c create mode 100644 src/glsl/nir/nir_from_ssa.c create mode 100644 src/glsl/nir/nir_intrinsics.c create mode 100644 src/glsl/nir/nir_intrinsics.h create mode 100644 src/glsl/nir/nir_live_variables.c create mode 100644 src/glsl/nir/nir_lower_atomics.c create mode 100644 src/glsl/nir/nir_lower_global_vars_to_local.c create mode 100644 src/glsl/nir/nir_lower_io.c create mode 100644 src/glsl/nir/nir_lower_locals_to_regs.c create mode 100644 src/glsl/nir/nir_lower_samplers.cpp create mode 100644 src/glsl/nir/nir_lower_system_values.c create mode 100644 src/glsl/nir/nir_lower_to_source_mods.c create mode 100644 src/glsl/nir/nir_lower_variables.c create mode 100644 src/glsl/nir/nir_lower_vec_to_movs.c create mode 100644 src/glsl/nir/nir_metadata.c create mode 100644 src/glsl/nir/nir_opcodes.c create mode 100644 src/glsl/nir/nir_opcodes.h create mode 100644 src/glsl/nir/nir_opt_algebraic.py create mode 100644 src/glsl/nir/nir_opt_constant_folding.c create mode 100644 src/glsl/nir/nir_opt_copy_propagate.c create mode 100644 src/glsl/nir/nir_opt_cse.c create mode 100644 src/glsl/nir/nir_opt_dce.c create mode 100644 src/glsl/nir/nir_opt_global_to_local.c create mode 100644 src/glsl/nir/nir_opt_peephole_select.c create mode 100644 src/glsl/nir/nir_print.c create mode 100644 src/glsl/nir/nir_remove_dead_variables.c create mode 100644 src/glsl/nir/nir_search.c create mode 100644 src/glsl/nir/nir_search.h create mode 100644 src/glsl/nir/nir_split_var_copies.c create mode 100644 src/glsl/nir/nir_to_ssa.c create mode 100644 src/glsl/nir/nir_types.cpp create mode 100644 src/glsl/nir/nir_types.h create mode 100644 src/glsl/nir/nir_validate.c create mode 100644 src/mesa/drivers/dri/i965/brw_fs_nir.cpp -- 2.2.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev