On Sun, 2012-06-10 at 21:45 +0400, Vadim Girlin wrote: > On Sun, 2012-06-10 at 10:27 +0200, Christian König wrote: > > On 10.06.2012 04:07, Vadim Girlin wrote: > > > Shader variants are stored in the list, the key for lookup is based on the > > > states that require different hw shaders - currently it's rctx->two_side > > > (all > > > gpus) and rctx->nr_cbufs (evergreen/cayman, when writes_all property is > > > set). > > > > > > v2: > > > - use simple list instead of keymap as suggested by Marek on irc > > > - call r600_adjust_gprs from r600_bind_vs_shader for r6xx/r7xx > > > (r600_shader_select isn't used for vertex shaders currently) > > > > > > Improves performance for some apps, e.g. FlightGear - > > > see https://bugs.freedesktop.org/show_bug.cgi?id=50360 > > > > > > Signed-off-by: Vadim Girlin<vadimgir...@gmail.com> > > Mhm, I really start wondering if it might not be easier to avoid having > > different shader variants by using CF_COND_BOOL/CF_COND_NOT_BOOL for > > those two special cases, e.g. build the shader in a way that it can > > handle both variants and then select the one we currently want with the > > CF bool constants. > > > > If the shader overhead for it is to much we might also try using this > > implementation only if the application really starts using those > > features in question. > > > > I agree that we might want to use common shader code for those cases. I > just don't want to use control flow for that. According to the docs, the > cost of the single CF instruction is ~40x comparing to the cost of the > ALU instruction. And it seems we'll need to add 3 CF instructions to > guard color selection for the two_side case. I'm not sure how we could > use it for the writes_all case, where we need varying number of the > exports. > > There are other possible solutions, e.g. for the first case I think we > can pass bool value (0.0/1.0) to PS through the SPI by using > SPI_PS_INPUT_CNTL_x:DEFAULT_VAL and non-existant semantic index, or put > it into the constant buffer - we're already using special const buffer > to pass clip planes for clipvertex, so we can just add the constant for > that. Then we can MUL that value with the front_face to get the selector > value for the colors. Additional MUL instruction per shader could be > merged into some alu group, so I guess it might have lower overhead than > using control flow. > > Regarding the writes_all case, I guess we simply need to try playing > with CB_SHADER_MASK, CB_TARGET_MASK, and some other bits to avoid > performance regression when the shader does export to all possible CBs, > as Alex implemented it initially. IIRC there were some changes related > to those masks after that, so maybe the problem is solved already.
Though it seems there are no magic bits - catalyst also uses different shaders in that case. Vadim > > Anyway, those solutions will require additional time for implementation > and testing, and I'm not sure if they will result in a better > performance than caching. After all, it's not a high priority for me, I > just wanted to provide a quick fix for the performance problem with > FlightGear - I don't know any other apps that are affected by > rebuilding. I think we can improve it later if we need. > > Vadim > > > Cheers, > > Christian. > > > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev