On Tue, Sep 22, 2015 at 4:20 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Tue, Sep 22, 2015 at 4:04 PM, Matt Turner <matts...@gmail.com> wrote: >> On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev <el...@igalia.com> >> wrote: >>> When both fadd and fmul instructions have at least one operand that is a >>> constant and it is only used once, the total number of instructions can >>> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because >>> the constants will be progagated as immediate operands of fmul and fadd. >>> >>> This patch modifies opt_peephole_ffma pass to detect this situation and >>> bails-out fusing fmul+fadd into ffma. >>> >>> As shown in shader-db results below, it seems to help a good bunch. However, >>> there are some caveats: >>> >>> * It seems i965 specific, so I'm not sure if modifying the NIR pass >>> directly is desired, as opposed to moving this to the backend. >>> >>> * There are still a high number of HURTs, but these could be reduced by >>> being >>> more specific in the conditions to bailout. >>> >>> total instructions in shared programs: 1683959 -> 1677447 (-0.39%) >>> instructions in affected programs: 604918 -> 598406 (-1.08%) >>> helped: 4633 >>> HURT: 804 >>> GAINED: 0 >>> LOST: 0 >>> --- >> >> Interesting -- yeah, I've thought about doing this as well. It was >> more difficult before because with GLSL IR (where I was trying to do >> it) it wasn't possible to determine if the constant was used by >> multiple 3-src instructions. Actually, your check might be able to be >> more refined to consider only uses of 3-src instructions. >> >> But that's getting kind of hardware-specific. > > If we want to move this into the i965 driver we can. I think we're > the only users. That would completely get rid of > hardware-specificness issues. > --Jason
Thinking about this a bit more, my inclination is to just push both patches and then add another that moves nir_opt_peephole_ffma to the i965 driver and call it a day. We're the only ones using it and it's demonstrated itself tricky enough that we should just free ourselves to use backend-specific heuristics. Matt, thoughts? --Jason >> Perhaps another approach would be to modify the >> opt_combine_constants() pass to split MADs under some circumstances -- >> e.g., it accounts for the only use of a constant we would otherwise >> have to promote. But of course we don't have that pass for the vec4 >> backend. >> >> In the mean time, I've sent a related patch that may be of interest: >> "[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds." >> >> This patch, applied on top of mine gives these results on Haswell: >> >> Total: >> total instructions in shared programs: 6595563 -> 6584885 (-0.16%) >> instructions in affected programs: 1183608 -> 1172930 (-0.90%) >> helped: 8074 >> HURT: 842 >> GAINED: 4 >> >> FS: >> total instructions in shared programs: 4863484 -> 4859884 (-0.07%) >> instructions in affected programs: 554042 -> 550442 (-0.65%) >> helped: 3072 >> HURT: 38 >> GAINED: 4 >> >> VS: >> total instructions in shared programs: 1729224 -> 1722146 (-0.41%) >> instructions in affected programs: 629566 -> 622488 (-1.12%) >> total loops in shared programs: 221 -> 221 (0.00%) >> helped: 5002 >> HURT: 804 >> >> Another thing to consider for the vec4 backend is that vec4 uniforms >> have to be unpacked for use by 3-src instructions (see the >> VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations, >> but they often do account for increases in instruction counts. >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev