Hi, This mail is mostly a remainder to myself and also a try to get someone to look into it before i myself got more time on this :)
A common GLSL pattern is : glUseProgram(Program1); glActiveTexture(GL_TEXTURE0 + 0); glBindTexture(...); glActiveTexture(GL_TEXTURE0 + 1); glBindTexture(...); ... glUniform4fARB(...); ... glDraw() glUseProgram(Program2); glActiveTexture(GL_TEXTURE0 + 0); glBindTexture(...); glActiveTexture(GL_TEXTURE0 + 1); glBindTexture(...); ... glUniform4fARB(...); ... glDraw() Such usage pattern shouldn't trigger many computation inside mesa as we are not modifying texture object state or doing anythings fancy, it's just about switching btw GLSL program and textures. Which sounds like a very common pattern for any GL program that does somethings else than spinning gears :) I added glslstateschange to perf in mesa demos to test the performances in front of such usage pattern. Here is some results (core-i5 3Ghz, difference with closed driver are much worse with slower CPU) : noop 105.0 thousand change/sec nouveau 13.0 thousand change/sec fglrx-nosmp 57.5 thousand change/sec fglrx 73.4 thousand change/sec nvidia-nosmp 158.8 thousand change/sec nvidia 277.8 thousand change/sec All profiles/datas can be downloaded at http://people.freedesktop.org/~glisse/results/ Obviously the noop driver shows that we are severly underperforming as we are slower than the closed nvidia driver while performing no real rendering, and we don't outperform fglrx by that much. Profiling of the noop driver and also of nexuiz (which has similar pattern) shows a couple of guilty points. Biggest offender is the recreation of sampler each time a texture is bind. The update_samplers (st_atom_sampler.c) memset their is likely useless, but the true optimization is to build sampler state along with the sampler_view_state as the only variable that doesn't came from the texture object is the lod bias value (unless i missed somethings). So idea is to build sampler along sampler_view when a texture object is finalized and to update this sampler state in update_sampler only if the lod bias value change, this should avoid a lot of cso creation overhead and speedup driver. I am not sure why update_textures shows so much in profile, my guess is that pipe_sampler_view_reference is burning cpu cycle as we likely have a lot of texture unit. Texture state is also a big offender, mesa revalidate texture state & texture coordinate generation each time a new shader program is bound. Plan here is to compute a mask for (see update_texture_state main/texstate.c) _EnabledUnits _GenFlags _TexGenEnabled for each program (compute this mask from program information as it's constant with the program) and only recompute state if we see states changes in the masked unit (ie unit that affect the bound program). I will get back to this optimization latter (in couple weeks hopefully), but if you have more idea or if someone remember of an easy improvement that can help for this situations that would be welcome :) Cheers, Jerome _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev