Ohh that's interesting, you'd think the comparison shouldn't be that expensive (though I guess in ipers case the comparison is never true). memcmp is quite extensively used everywhere. Maybe we could replace that with something faster (since we only ever care if the blocks are the same but not care about the lexographic ordering and always compare whole structs, should compare dwords instead of bytes for a 4 time speedup)? Or isn't that the reason cmpsb instead of cmpsd is used? Also I guess it would help if the values which are more likely to be unequal are first in the struct (if we can tell that). Of course though if it's unlikely to be the same as the compared value anyway not comparing at all still might be a win (here).
Roland Am 29.06.2011 19:19, schrieb Adam Jackson: > Perversely, do this by eliminating the comparison between stored and > current fs state. On ipers, a perf trace showed try_update_scene_state > using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. > Taking that out takes try_update_scene_state down to 6.5% of the > profile; more importantly, ipers goes from 10 to 14fps and gears goes > from 790 to 830fps. > > Signed-off-by: Adam Jackson <a...@redhat.com> > --- > src/gallium/drivers/llvmpipe/lp_setup.c | 61 > ++++++++++++++----------------- > 1 files changed, 27 insertions(+), 34 deletions(-) > > diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c > b/src/gallium/drivers/llvmpipe/lp_setup.c > index cbe06e5..9118db5 100644 > --- a/src/gallium/drivers/llvmpipe/lp_setup.c > +++ b/src/gallium/drivers/llvmpipe/lp_setup.c > @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup ) > setup->dirty |= LP_SETUP_NEW_FS; > } > > - > if (setup->dirty & LP_SETUP_NEW_FS) { > - if (!setup->fs.stored || > - memcmp(setup->fs.stored, > - &setup->fs.current, > - sizeof setup->fs.current) != 0) > - { > - struct lp_rast_state *stored; > - uint i; > - > - /* The fs state that's been stored in the scene is different from > - * the new, current state. So allocate a new lp_rast_state object > - * and append it to the bin's setup data buffer. > - */ > - stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof > *stored); > - if (!stored) { > - assert(!new_scene); > - return FALSE; > - } > + struct lp_rast_state *stored; > + uint i; > + > + /* The fs state that's been stored in the scene is different from > + * the new, current state. So allocate a new lp_rast_state object > + * and append it to the bin's setup data buffer. > + */ > + stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof > *stored); > + if (!stored) { > + assert(!new_scene); > + return FALSE; > + } > > - memcpy(stored, > - &setup->fs.current, > - sizeof setup->fs.current); > - setup->fs.stored = stored; > - > - /* The scene now references the textures in the rasterization > - * state record. Note that now. > - */ > - for (i = 0; i < Elements(setup->fs.current_tex); i++) { > - if (setup->fs.current_tex[i]) { > - if (!lp_scene_add_resource_reference(scene, > - setup->fs.current_tex[i], > - new_scene)) { > - assert(!new_scene); > - return FALSE; > - } > + memcpy(stored, > + &setup->fs.current, > + sizeof setup->fs.current); > + setup->fs.stored = stored; > + > + /* The scene now references the textures in the rasterization > + * state record. Note that now. > + */ > + for (i = 0; i < Elements(setup->fs.current_tex); i++) { > + if (setup->fs.current_tex[i]) { > + if (!lp_scene_add_resource_reference(scene, > + setup->fs.current_tex[i], > + new_scene)) { > + assert(!new_scene); > + return FALSE; > } > } > } _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev