On Wed, 2 May 2018 14:18:21 +0300 Eero Tamminen <eero.t.tammi...@intel.com> wrote:
> Hi, > > On 02.05.2018 02:25, James Xiong wrote: > > From: "Xiong, James" <james.xi...@intel.com> > > > > With the current implementation, brw_bufmgr may round up a request > > size to the next bucket size, result in 25% more memory allocated in > > the worst senario. For example: > > Request size Actual size > > 32KB+1Byte 40KB > > . > > 8MB+1Byte 10MB > > . > > 96MB+1Byte 112MB > > This series align the buffer size up to page instead of a bucket > > size to improve memory allocation efficiency. Performances are > > almost the same with Basemark ES3, GfxBench4 and 5: > > > > Basemark ES3 > > score peak memory allocation > > before after diff before after diff > > 21.537462 21.888784 1.61% 419766272 408809472 -10956800 > > 19.566198 19.763429 1.00% > > What memory you're measuring: > > * VmSize (not that relevant unless you're running out of address > space)? > > * PrivateDirty (listed in /proc/PID/smaps and e.g. by "smem" tool > [1])? > > * total of allocation sizes used by Mesa? > > Or something else? > > In general, unused memory isn't much of a problem, only dirty > (written) memory. Kernel maps all unused memory to a single zero > page, so unused memory takes only few bytes of RAM for the page table > entries (required for tracking the allocation pages). I did the measurements in brw_bufmgr from the user space, I kept tracks of the allocated size for each brw_bufmgr context, and printed out the peak allocated size when the test completed and context was destroyed. basically I increased/decreased the size when I915_GEM_CREATE or GEM_CLOSE were called, so the cached buffers, imported or user_ptr buffers were excluded. The brw_bufmgr context is created when the test starts and destroyed after it completes, the size is for the test case in bytes. This method can measure exact size allocated for a given test case and the result is precise too. > > > > GfxBench 4.0 > > score > > peak memory before after diff before > > after diff gl_4 564.6052246094 565.2348632813 > > 0.11% 578490368 550199296 -28291072 gl_4_off > > 727.0440063477 703.5833129883 -3.33% 629501952 > > 598216704 -31285248 gl_manhattan 1053.4223632813 > > 1057.3690185547 0.37% 449568768 421134336 -28434432 > > gl_trex 2708.0656738281 2699.2646484375 -0.33% > > 130076672 125042688 -5033984 gl_alu2 1207.1490478516 > > 1212.2220458984 0.42% 55496704 55029760 -466944 > > gl_driver2 103.0383071899 103.5478439331 0.49% > > 13107200 12980224 -126976 gl_manhattan_off 1703.4780273438 > > 1736.9074707031 1.92% 490016768 456548352 -33468416 > > gl_trex_off 2951.6809082031 3058.5422363281 3.49% > > 157511680 152260608 -5251072 gl_alu2_off 2604.0903320313 > > 2626.2524414063 0.84% 86130688 85483520 -647168 > > gl_driver2_off 204.0173187256 207.0510101318 1.47% > > 40869888 40615936 -253952 > > You're missing information on: > * On which plaform you did the testing (affects variance) > * how many test rounds you ran, and > * what is your variance I ran these tests on a gen9 platform/ubuntu 17.10 LTS. Most of the tests are consistent, especially the memory usage. The only exception is GfxBench 4.0 gl_manhattan, I had to ran it 3 times and pick the highest one. I will apply this method to all tests and re-send with updated results. > > -> I don't know whether your numbers are just random noise. > > > Memory is allocated in pages from kernel, so there's no point in > showing its usage as bytes. Please use KBs, that's more readable. > > (Because of randomness e.g. interactions with the windowing system, > there can be some variance also in process memory usage, which may > also be useful to report.) > > Because of variance, you don't need that decimals for the scores. > Removing the extra ones makes that data a bit more readable too. > > > - Eero > > [1] "smem" is python based tool available at least in Debian. > If you want something simpler, e.g. shell script working with > minimal shells like Busybox, you can use this: > https://github.com/maemo-tools-old/sp-memusage/blob/master/scripts/mem-smaps-private > > > > GfxBench 5.0 > > score peak memory > > before after before after diff > > gl_5 259 259 1137549312 1038286848 -99262464 > > gl_5_off 297 297 1170853888 1071357952 -99495936 > > > > Xiong, James (4): > > i965/drm: Reorganize code for the next patch > > i965/drm: Round down buffer size and calculate the bucket index > > i965/drm: Searching for a cached buffer for reuse > > i965/drm: Purge the bucket when its cached buffer is evicted > > > > src/mesa/drivers/dri/i965/brw_bufmgr.c | 139 > > ++++++++++++++++++--------------- > > src/util/list.h | 5 ++ 2 files changed, 79 > > insertions(+), 65 deletions(-) > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev