On Mon, Jun 18, 2018 at 2:14 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 2018-06-16 08:23 AM, Jason Ekstrand wrote: > > On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <e...@anholt.net> wrote: > > > >> Michel Dänzer <mic...@daenzer.net> writes: > >> > >>> On 2018-06-15 05:25 PM, Jason Ekstrand wrote: > >>>> On June 15, 2018 01:14:24 Michel Dänzer <mic...@daenzer.net> wrote: > >>>>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote: > >>>>>> > >>>>>> I did some testing and x11perf -copywinwin500 is... exactly the same > >>>>>> with > >>>>>> or without my patches. If anything they might improve it by just a > >>>>>> hair. > >>>>> > >>>>> Possible explanations I can think of: > >>>>> > >>>>> 1. Your glamor still has its own FBO cache. Which version of xserver > >> are > >>>>> you testing with? > >>>>> > >>>> 1.19 I think > >>> > >>> Okay, that doesn't have the glamor FBO cache anymore. > >>> > >>> > >>>>> 2. The i965 driver cache isn't hit even before these changes. > >>>> > >>>> It's definitely getting hit in both cases, it just may require a > >>>> slightly larger cache of we aren't recycling BOs until they're idle. > >>> > >>> It might be more than just slightly, -copywinwin500 can queue many > >>> overlapping copies between flushes. Can you compare the maximum total > >>> cache size with and without this series? > >> > >> I suspect it'll be only about a factor of > >> how-many-batchbuffers-before-throttling difference -- while the > >> batchbuffer still references the BO, the bufmgr wouldn't see the buffer > >> to reuse it anyway. I suspect we hit the aperture limit and flush in > >> the copywinwin500 case. > >> > > > > At Ken's suggestion, I ran some statistics for hits/misses. I did three > > runs each with master and with my branch: > > > > Master: > > > > hits = 455868, > > misses = 388, > > max_bucket_size = 160 > > > > hits = 404358, > > misses = 113, > > max_bucket_size = 34 > > > > hits = 497731, > > misses = 363, > > max_bucket_size = 148 > > > > With patches: > > > > hits = 493634 > > misses = 253, > > max_bucket_size = 85 > > > > hits = 495667, > > misses = 237, > > max_bucket_size = 83 > > > > hits = 454738, > > misses = 358, > > max_bucket_size = 132 > > > > Some of the numbers, as you can see, are rather noisy but the end result > is > > about the same: we get at least 1000x as many cache hits as misses when > > running that test. I don't think the choice to recycle busy BOs is > really > > gaining us anything whatsoever. It is worth noting that I did both of > > those runs in debug builds because I had to use gdb to get the data back > > out of the driver (prints inside the GL driver used by glamor don't work > > too well). That probably affected things a bit but I doubt the end > result > > would have been that much different. > > > > Which begs the question, why does Michel see such a big difference on > > radeon? > > The glamor FBO cache could reuse the temporary FBO even before flushing, > so only one such FBO was ever needed. From what Eric wrote above, it > sounds like the i965 cache can only reuse BOs after a flush, so there's > relatively little difference between reusing busy BOs or not. > It occurred to me today while talking to Jordan about this stuff that X may not be getting busy BO re-use. We generally only allocate busy BOs for renderbuffers. For textures we expect that there's a decent chance we'll map it so we allocate an idle BO. Guess which one modesetting uses! Yup, textures. It wasn't getting the busy BO optimization at all. I hacked up mesa to use the busy BO path for textures as well and x11perf -copywinwin500 improved by 25%. It's no 3x but it's enough to make me think that this patch series may not be such a good idea. :-( On the upside, I now know how to improve an X11 microbenchmark by 25%. :-)
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev