One other comment. I'm not sure if you've seen it but, if you haven't, you should check out what Connor and the Igalia guys already did for NIR:
https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir_lower_double_ops.c It's not full soft-float but there's some very nice algorithms in there for things such as rcp. On Fri, Mar 3, 2017 at 11:16 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > Hey Elie! > > On Fri, Mar 3, 2017 at 8:22 AM, Elie Tournier <tournier.e...@gmail.com> > wrote: > >> From: Elie Tournier <elie.tourn...@collabora.com> >> >> This series is based on Ian's work about GL_ARB_gpu_shader_int64 [1]. >> The goal is to expose GL_ARB_shader_fp64 to OpenGL 3.0 GPUs. >> >> Each function can be independently tested using shader_runner from piglit. >> The piglit files are stored on github [2]. >> >> [1] https://lists.freedesktop.org/archives/mesa-dev/2016-Novembe >> r/136718.html >> [2] https://github.com/Hopetech/libSoftFloat >> > > Glad to see this finally turning into code. > > Before, we get too far into things, I'd like to talk about the approach a > bit. First off, if we (Intel) are going to use this on any hardware, we > would really like it to be in NIR. The reason for this is that NIR has a > much more powerful algebraic optimizer than GLSL IR and we would like to > have as few fp64 instructions as possible before we start lowering them to > piles of integer math. I believe Ian's plan for this was that someone > would write a nir_builder back-end for the stand-alone compiler. > Unfortunately, he sort-of left that as "an exercise to the reader" and no > code exists to my knowledge. If we're going to write things in GLSL, we > really need that NIR back-end. > > When implemneting int64 (which needs similar lowering) for the Vulkan > driver, I took the opportunity to try doing it directly in nir_builder > instead of writing back-end code for the stand-alone compiler. All in all, > I'm fairly happy with the result. You can find my (almost finished) branch > here: > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/nir-int64 > > This approach had several advantages: > > 1. The compiler does less work. Loops can be automatically unrolled, you > can choose to use select instead of control-flow, it doesn't generate > functions that have to be inlined, etc. Now, in GLSL IR, using functions > may actually be a requirement because it's a tree-based IR and adding stuff > to the middle of the tree can be tricky. Also, I'm pretty sure they're a > requirement for control-flow. NIR is flat so it's a bit nicer in that > regard. > > 2. It doesn't require additional compiler infrastructure for converting > GLSL to compiler code. We've gone back-and-forth over the years about how > much is too much codegen. At one point, the build process built the GLSL > compiler and used it to compile GLSL to compiler code for the built-ins and > then built that into the compiler. The build system for doing this was a > mess. The result was that Eric wrote ir_builder and all the code was moved > over to that. A quick look at eiether GLSL IR or NIR will show you that we > haven't completely rejected codegen but one always has to ask if it's > really the best solution. Running the stand-alone compiler to generate > code and then checking it in isn't a terrible solution, but it does seem > like at it could be a least one too many levels of abstraction. > > 3. It's actually less code. The nir_builder code is approximately 50% > larger than the GLSL code but, because you don't have to add built-in > functions and do all of the other plumbing per-opcode, it actually ends up > being smaller. Due to the way vectorization is handled (see next point), > it also involves a lot less infastructure in the lowering pass. Also, it > doesn't need 750 lines of standalone compiler code. > > 4. Because I used the "split" pack/unpack opcodes and bcsel instead of > "if", everything vectorizes automatically. It turns a i64vec4 iadd, for > instance, into a bunch of ivec4 operations and kicks out a i32vec4 result > in the end without ever splitting into 4 int64's. (The one exception to > this is the if statement in the division lowering which required a little > special care). This means that we don't have to carry extra code to split > all "dvec4" values into 4 "double" values because it gets handled by the > normal nir_alu_to_scalar pass that we already have. Also, because it uses > entirely vector instructions, it can work on an entire dvec4 at a time on > vec4 hardware (all geometry stages on Intel Haswell and earlier). This > should make it about 4x as fast on vec4 hardware. > > The downside, of course, to writing it nir_builder was that I duplicated > Ian's GLSL IR pass. I'm not a fan of duplicating code but, if int64 on > gen8+ was all I cared about, I think the end result is nice enough that I > don't really care about the code duplication. If, on the other hand, we're > going to have full int64 and fp64 lowering and want to provide both in both > IR's, then maybe we should reconsider. :-) It's worth noting that, without > adding more GLSL built-ins for the split pack/unpack opcodes, point 4 above > will always be a problem if we use GLSL as the base language. > > One solution is to just do it in NIR and tell people that, if they want > the lowering, they need to support NIR. Surprisingly, I'm not the one who > is going to push too hard for this approach. If we can come up with a > reasonable way to do it in both, I'm moderately ok with doing so if it > isn't too much pain. > > Another solution that has come to mind would to be to come up with some > way to use a carefully chosen set of C/C++ macros that let you write one > blob of code and compile it as either NIR or GLSL IR builder code. Doing > this without creating a mess is going to be difficult. I've thought about > a few possible ways to do it but none of them have been extraordinarily > pretty. It could look something like > > #if BUILD_NIR > #define BLD(type, op, ...) nir_##type##op(b, __VA_ARGS__) > #else > #define BLD(type, op, ...) op(__VA_ARGS__) > #endif > > Of course, there are a *lot* of problems with this approach. One being > that NIR is typeless while GLSL IR is a typed IR. Also, NIR is SSA but > GLSL IR is tree-based with lots of variables. Between those two, I haven't > come up with a good idea for how to do a "generic builder" without lots of > pain. > > Sorry if I haven't provided a lot of answers. :-/ However, I think we do > want to have this discussion for real before we start landing piles more > GLSL and codegen'd builder code. > > --Jason Ekstrand >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev