On Tue, 2015-05-12 at 23:09 -0700, Ian Romanick wrote: > On 05/12/2015 03:12 PM, Timothy Arceri wrote: > > On Sat, 2015-04-18 at 12:26 +0200, Marek Olšák wrote: > >> On Fri, Apr 17, 2015 at 1:21 PM, Timothy Arceri <t_arc...@yahoo.com.au> > >> wrote: > >>> Hi all, > >>> > >>> Last year I spent a whole bunch of time profiling Mesa looking for areas > >>> where improvements could be made. Anyway I thought I'd point out a > >>> couple of things, and see if anyone thinks these are worthwhile > >>> following up. > >>> > >>> 1. While the hash table has been getting a lot of attention lately, > >>> after running the TF2 benchmark one place that showed up as using more > >>> cpu than the hash table was the glsl parser. I guess this can be mostly > >>> solved once mesa has a disk cache for shaders. > >>> > >>> But something I came across at the time was this paper describing > >>> modifying (with apparently little effort) bison to generate a hardcoded > >>> parser that 2.5-6.5 times faster will generating a slightly bigger > >>> binary [1]. > >>> > >>> The resulting project has been lost in the sands of time unfortunately > >>> so I couldn't try it out. > >>> > >>> 2. On most of the old quake engine benchmarks the Intel driver spends > >>> between 3-4.5% of its time or 400 million calls to glib since this code > >>> can't be inlined in this bit of code from copy_array_to_vbo_array(): > >>> > >>> while (count--) { > >>> memcpy(dst, src, dst_stride); > >>> src += src_stride; > >>> dst += dst_stride; > >>> } > >>> > >>> I looked in other drivers but I couldn't see them doing this kind of > >>> thing. I'd imaging because of its nature this code could be a bottle > >>> neck. Is there any easy ways to avoid doing this type of copy? Or would > >>> the only thing possible be to write a complex optimisation? > >> > >> Yeah, other drivers don't do this. In Gallium, we don't change the > >> stride when uploading buffers, so in our case src_stride == > >> dst_stride. > >> > > > > Thanks Marek. Looking at the history of the Intel code in git it seems > > when the code was first written memcpy() wasn't used and the data was > > just copied 8-bits at a time. In this case you can see the advantage of > > doing the copy this way, however with the use of memcpy() there doesn't > > seem to be much of a difference between the code paths. > > > > Out of interest I implemented my own version of memcpy() that can do the > > copy's with mismatched strides. I did this by aligning the memory to the > > 8-bytes, doing some shifts in temporaries if needed and then doing 64bit > > copy's. > > It was made simpler for my test case because the strides were always > > 12-bytes = dst, 16-bytes = src. > > In the end my memcpy() used slightly less cpu and could give a > > measurable boost in frame rate in the UrbanTerror benchmark, although > > the boost isn't always measurable and is mostly about the same. I > > suspect the boost only happens when memory isn't aligned to 8-bytes. > > > > On average there seems to be around 150 to 200 of these copy's done each > > this this loop is hit in UrbanTerror so in theory my memcpy() may be > > able to be made even faster with SSE using load/store and some > > shuffling. I did attempt this but haven't got it work yet. > > What kind of system were you measuring on? You might measure a bigger > delta on a Bay Trail system, for example. You might also try locking > the CPU clock low.
I'm using an Ivy Bridge laptop, to be exact: Processor: Intel Core i5-3317U @ 2.60GHz (4 Cores) Graphics: Intel HD 4000 (1050MHz) I'll try locking the CPU to a lower clock and see what happens. > > I know Eero has some tips for measuring small changes in CPU usage. It > can be... annoying. :) > > > In the end I'm not sure if implementing a custom memcpy() is worth all > > the effort but thought I'd post my findings. My memcpy() code is a bit > > of a mess at the moment but if anyone is interested I can clean it up > > and push it to my github repo, just let me know. > > > > Tim > > > >> Marek > > > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev