Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

Roland Scheidegger Wed, 29 Jun 2011 16:46:43 -0700

I didn't even look at that was just curious why the memcmp (which is
used a lot in other places) is slow. However, none of the other memcmp
seem to show up prominently (cso functions are quite low in profiles,
_mesa_search_program_cache uses memcmp too but it's not that high
neither). So I guess those functions either aren't called that often or
the sizes they compare are small.
So should maybe just file a gcc bug for memcmp and look at that
particular llvmpipe issue again :-).


Roland

Am 30.06.2011 01:16, schrieb Corbin Simpson:
> Okay, so maybe I'm failing to recognize the exact situation here, but
> wouldn't it be possible to mark the FS state with a serial number and
> just compare those? Or are these FS states not CSO-cached?
> 
> ~ C.
> 
> On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger <srol...@vmware.com> 
> wrote:
>> Actually I ran some numbers here and tried out a optimized struct compare:
>> original ipers: 12.1 fps
>> ajax patch: 15.5 fps
>> optimized struct compare: 16.8 fps
>>
>>
>> This is the function I used for that (just enabled in that lp_setup
>> function):
>>
>> static INLINE int util_cmp_struct(const void *src1, const void *src2,
>> unsigned count)
>> {
>>  /* hmm pointer casting is evil */
>>  const uint32_t *src1_ptr = (uint32_t *)src1;
>>  const uint32_t *src2_ptr = (uint32_t *)src2;
>>  unsigned i;
>>  assert(count % 4 == 0);
>>  for (i = 0; i < count/4; i++) {
>>    if (*src1_ptr != *src2_ptr) {
>>      return 1;
>>    }
>>    src1_ptr++;
>>    src2_ptr++;
>>  }
>>  return 0;
>> }
>>
>> (And no this doesn't use repz cmpsd here.)
>>
>> So, unless I made some mistake, memcmp is just dead slow (*), most of
>> the slowness probably coming from the bytewise comparison (and
>> apparently I was wrong in assuming the comparison there might never be
>> the same for ipers).
>> Of course, the optimized struct compare relies on structs really being
>> dword aligned (I think this is always the case), and additionally it
>> relies on the struct size being a whole multiple of dwords - likely
>> struct needs padding to ensure that (at least I don't think this is
>> always guaranteed for all structs).
>> But since memcmp is used extensively (cso for one) maybe some
>> optimization along these lines might be worth it (though of course for
>> small structs the win isn't going to be as big - and can't beat the repz
>> cmps in code size...).
>>
>> Roland
>>
>> (*) I actually found some references some implementations might be
>> better they don't just use repz cmpsb but they split this up in parts
>> which do dword (or qword even - well for really large structs could use
>> sse2) comparisons for the parts where it's possible and only byte
>> comparisons for the remaining bytes (and if the compiler does that it
>> probably would know the size at compile time here hence could leave out
>> much of the code). Of course memcmp requires that the return value isn't
>> just a true or false value, hence there's more code needed once an
>> unequal dword is found, though the compiler could optimize that away too
>> in case it's not needed. Much the same as memcpy is optimized usually
>> really, so blame gcc :-).
>>
>>
>>
>> Am 29.06.2011 20:33, schrieb Roland Scheidegger:
>>> Ohh that's interesting, you'd think the comparison shouldn't be that
>>> expensive (though I guess in ipers case the comparison is never true).
>>> memcmp is quite extensively used everywhere. Maybe we could replace that
>>> with something faster (since we only ever care if the blocks are the
>>> same but not care about the lexographic ordering and always compare
>>> whole structs, should compare dwords instead of bytes for a 4 time
>>> speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
>>> Also I guess it would help if the values which are more likely to be
>>> unequal are first in the struct (if we can tell that).
>>> Of course though if it's unlikely to be the same as the compared value
>>> anyway not comparing at all still might be a win (here).
>>>
>>> Roland
>>>
>>> Am 29.06.2011 19:19, schrieb Adam Jackson:
>>>> Perversely, do this by eliminating the comparison between stored and
>>>> current fs state.  On ipers, a perf trace showed try_update_scene_state
>>>> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
>>>> Taking that out takes try_update_scene_state down to 6.5% of the
>>>> profile; more importantly, ipers goes from 10 to 14fps and gears goes
>>>> from 790 to 830fps.
>>>>
>>>> Signed-off-by: Adam Jackson <a...@redhat.com>
>>>> ---
>>>>  src/gallium/drivers/llvmpipe/lp_setup.c |   61 
>>>> ++++++++++++++-----------------
>>>>  1 files changed, 27 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
>>>> b/src/gallium/drivers/llvmpipe/lp_setup.c
>>>> index cbe06e5..9118db5 100644
>>>> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
>>>> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
>>>> @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context 
>>>> *setup )
>>>>        setup->dirty |= LP_SETUP_NEW_FS;
>>>>     }
>>>>
>>>> -
>>>>     if (setup->dirty & LP_SETUP_NEW_FS) {
>>>> -      if (!setup->fs.stored ||
>>>> -          memcmp(setup->fs.stored,
>>>> -                 &setup->fs.current,
>>>> -                 sizeof setup->fs.current) != 0)
>>>> -      {
>>>> -         struct lp_rast_state *stored;
>>>> -         uint i;
>>>> -
>>>> -         /* The fs state that's been stored in the scene is different from
>>>> -          * the new, current state.  So allocate a new lp_rast_state 
>>>> object
>>>> -          * and append it to the bin's setup data buffer.
>>>> -          */
>>>> -         stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof 
>>>> *stored);
>>>> -         if (!stored) {
>>>> -            assert(!new_scene);
>>>> -            return FALSE;
>>>> -         }
>>>> +      struct lp_rast_state *stored;
>>>> +      uint i;
>>>> +
>>>> +      /* The fs state that's been stored in the scene is different from
>>>> +       * the new, current state.  So allocate a new lp_rast_state object
>>>> +       * and append it to the bin's setup data buffer.
>>>> +       */
>>>> +      stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof 
>>>> *stored);
>>>> +      if (!stored) {
>>>> +         assert(!new_scene);
>>>> +         return FALSE;
>>>> +      }
>>>>
>>>> -         memcpy(stored,
>>>> -                &setup->fs.current,
>>>> -                sizeof setup->fs.current);
>>>> -         setup->fs.stored = stored;
>>>> -
>>>> -         /* The scene now references the textures in the rasterization
>>>> -          * state record.  Note that now.
>>>> -          */
>>>> -         for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>>>> -            if (setup->fs.current_tex[i]) {
>>>> -               if (!lp_scene_add_resource_reference(scene,
>>>> -                                                    
>>>> setup->fs.current_tex[i],
>>>> -                                                    new_scene)) {
>>>> -                  assert(!new_scene);
>>>> -                  return FALSE;
>>>> -               }
>>>> +      memcpy(stored,
>>>> +             &setup->fs.current,
>>>> +             sizeof setup->fs.current);
>>>> +      setup->fs.stored = stored;
>>>> +
>>>> +      /* The scene now references the textures in the rasterization
>>>> +       * state record.  Note that now.
>>>> +       */
>>>> +      for (i = 0; i < Elements(setup->fs.current_tex); i++) {
>>>> +         if (setup->fs.current_tex[i]) {
>>>> +            if (!lp_scene_add_resource_reference(scene,
>>>> +                                                 setup->fs.current_tex[i],
>>>> +                                                 new_scene)) {
>>>> +               assert(!new_scene);
>>>> +               return FALSE;
>>>>              }
>>>>           }
>>>>        }
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
> 
> 
> 

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

Reply via email to