On Sun, 2012-06-10 at 10:27 +0200, Christian König wrote: > On 10.06.2012 04:07, Vadim Girlin wrote: > > Shader variants are stored in the list, the key for lookup is based on the > > states that require different hw shaders - currently it's rctx->two_side > > (all > > gpus) and rctx->nr_cbufs (evergreen/cayman, when writes_all property is > > set). > > > > v2: > > - use simple list instead of keymap as suggested by Marek on irc > > - call r600_adjust_gprs from r600_bind_vs_shader for r6xx/r7xx > > (r600_shader_select isn't used for vertex shaders currently) > > > > Improves performance for some apps, e.g. FlightGear - > > see https://bugs.freedesktop.org/show_bug.cgi?id=50360 > > > > Signed-off-by: Vadim Girlin<vadimgir...@gmail.com> > Mhm, I really start wondering if it might not be easier to avoid having > different shader variants by using CF_COND_BOOL/CF_COND_NOT_BOOL for > those two special cases, e.g. build the shader in a way that it can > handle both variants and then select the one we currently want with the > CF bool constants. > > If the shader overhead for it is to much we might also try using this > implementation only if the application really starts using those > features in question. >
I agree that we might want to use common shader code for those cases. I just don't want to use control flow for that. According to the docs, the cost of the single CF instruction is ~40x comparing to the cost of the ALU instruction. And it seems we'll need to add 3 CF instructions to guard color selection for the two_side case. I'm not sure how we could use it for the writes_all case, where we need varying number of the exports. There are other possible solutions, e.g. for the first case I think we can pass bool value (0.0/1.0) to PS through the SPI by using SPI_PS_INPUT_CNTL_x:DEFAULT_VAL and non-existant semantic index, or put it into the constant buffer - we're already using special const buffer to pass clip planes for clipvertex, so we can just add the constant for that. Then we can MUL that value with the front_face to get the selector value for the colors. Additional MUL instruction per shader could be merged into some alu group, so I guess it might have lower overhead than using control flow. Regarding the writes_all case, I guess we simply need to try playing with CB_SHADER_MASK, CB_TARGET_MASK, and some other bits to avoid performance regression when the shader does export to all possible CBs, as Alex implemented it initially. IIRC there were some changes related to those masks after that, so maybe the problem is solved already. Anyway, those solutions will require additional time for implementation and testing, and I'm not sure if they will result in a better performance than caching. After all, it's not a high priority for me, I just wanted to provide a quick fix for the performance problem with FlightGear - I don't know any other apps that are affected by rebuilding. I think we can improve it later if we need. Vadim > Cheers, > Christian. > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev