Am 20.09.2011 16:35, schrieb Roland Scheidegger: > Am 20.09.2011 16:15, schrieb Keith Whitwell: >> On Tue, 2011-09-20 at 16:02 +0200, Roland Scheidegger wrote: >>> Am 20.09.2011 12:35, schrieb Keith Whitwell: >>>> On Tue, 2011-09-20 at 10:59 +0200, Fabio wrote: >>>>> There was a discussion some months ago about using -fno-builtin-memcmp >>>>> for >>>>> improving memcmp performance: >>>>> http://lists.freedesktop.org/archives/mesa-dev/2011-June/009078.html >>>>> >>>>> Since then, was it properly addressed in mesa or the flag is still >>>>> recommended? If so, what about adding it in configure.ac? >>>> >>>> I've been meaning to follow up on this too. I don't know the answer, >>>> but pinging Roland in case he does. >>> >>> I guess it is still recommended. >>> Ideally this is really something which should be fixed in gcc - the >>> compiler has all the knowledge about fixed alignment and size (if any) >>> (and more importantly knows if only a binary answer is needed which >>> makes this much easier) and doesn't need to do any function call. >>> If you enable that flag and some platform just has the same primitive >>> repz cmpsb sequence in the system library it will just get even slower, >>> though I guess chances of that happening are slim (with the possible >>> exception of windows). >>> I think in most cases it won't make much difference, so nobody cared to >>> implement that change. It is most likely still a good idea unless gcc >>> addressed that in the meantime... >> >> Hmm, it seemed like it made a big difference in the earlier >> discussion... > Yes for llvmpipe and one app at least. > But that struct being compared there is most likely the biggest (by far) > anywhere (at least which is compared in a regular fashion). > >> I should take a look at reducing the size of the struct (as mentioned >> before), but surely there's some way to pull in a better memcmp?? > > Well, apart from using -fno-builtin-memcmp we could build our own > memcmpxx, though the version I did there (returning binary only result > and assuming 32bit alignment/size allowing gcc to optimize it) was still > slower for large sizes than -fno-builtin-memcmp. Of course we could > optimize it more (e.g. for 64bit aligned/sized things, or using > hand-coded sse2 versions using 128bit at-a-time comparisons) but then it > gets more complicated, so I wasn't sure it was worth it. > > For reference here are the earlier numbers (ipers with llvmpipe): > original ipers: 12.1 fps > optimized struct compare: 16.8 fps > -fno-builtin-memcmp: 18.1 fps > > And this was the function I used for getting the numbers: > > static INLINE int util_cmp_struct(const void *src1, const void *src2, > unsigned count) > { > /* hmm pointer casting is evil */ > const uint32_t *src1_ptr = (uint32_t *)src1; > const uint32_t *src2_ptr = (uint32_t *)src2; > unsigned i; > assert(count % 4 == 0); > for (i = 0; i < count/4; i++) { > if (*src1_ptr != *src2_ptr) { > return 1; > } > src1_ptr++; > src2_ptr++; > } > return 0; > }
Oh I also forgot to mention that it would get a bit more complicated probably this was just a quick hack. We might only use this to compare structs, but IIRC they are not guaranteed to always have a size or alignment which is a multiple of 32bit (I think it is legal if a struct which only has say shorts in it to be only 16bit aligned and its size to be a multiple of 16bit). Of course we could just code this explicitly and the compiler (hopefully) should be smart enough to optimize it away (well unless optimization is disabled). Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev