You can run: AMD_DEBUG=testdmaperf glxgears It tests transfer sizes of up to 128 MB, and it tests ~60 slightly different methods of transfering data.
Marek On Wed, Jul 3, 2019 at 4:07 AM Michel Dänzer <mic...@daenzer.net> wrote: > On 2019-07-02 11:49 a.m., Timur Kristóf wrote: > > On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote: > >> On 2019-07-01 6:01 p.m., Timur Kristóf wrote: > >>> On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: > >>>> On 2019-06-28 2:21 p.m., Timur Kristóf wrote: > >>>>> I haven't found a good way to measure the maximum PCIe > >>>>> throughput > >>>>> between the CPU and GPU, > >>>> > >>>> amdgpu.benchmark=3 > >>>> > >>>> on the kernel command line will measure throughput for various > >>>> transfer > >>>> sizes during driver initialization. > >>> > >>> Thanks, I will definitely try that. > >>> Is this the only way to do this, or is there a way to benchmark it > >>> after it already booted? > >> > >> The former. At least in theory, it's possible to unload the amdgpu > >> module while nothing is using it, then load it again. > > > > Okay, so I booted my system with amdgpu.benchmark=3 > > You can find the full dmesg log here: https://pastebin.com/zN9FYGw4 > > > > The result is between 1-5 Gbit / sec depending on the transfer size > > (the higher the better), which corresponds to neither the 8 Gbit / sec > > that the kernel thinks it is limited to, nor the 20 Gbit / sec which I > > measured earlier with pcie_bw. > > 5 Gbit/s throughput could be consistent with 8 Gbit/s theoretical > bandwidth, due to various overhead. > > > > Since pcie_bw only shows the maximum PCIe packet size (and not the > > actual size), could it be that it's so inaccurate that the 20 Gbit / > > sec is a fluke? > > Seems likely or at least plausible. > > > >>>>> but I did take a look at AMD's sysfs interface at > >>>>> /sys/class/drm/card1/device/pcie_bw which while running the > >>>>> bottlenecked > >>>>> game. The highest throughput I saw there was only 2.43 Gbit > >>>>> /sec. > >>>> > >>>> PCIe bandwidth generally isn't a bottleneck for games, since they > >>>> don't > >>>> constantly transfer large data volumes across PCIe, but store > >>>> them in > >>>> the GPU's local VRAM, which is connected at much higher > >>>> bandwidth. > >>> > >>> There are reasons why I think the problem is the bandwidth: > >>> 1. The same issues don't happen when the GPU is not used with a TB3 > >>> enclosure. > >>> 2. In case of radeonsi, the problem was mitigated once Marek's SDMA > >>> patch was merged, which hugely reduces the PCIe bandwidth use. > >>> 3. In less optimized cases (for example D9VK), the problem is still > >>> very noticable. > >> > >> However, since you saw as much as ~20 Gbit/s under different > >> circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard > >> limit; there must be other limiting factors. > > > > There may be other factors, yes. I can't offer a good explanation on > > what exactly is happening, but it's pretty clear that amdgpu can't take > > full advantage of the TB3 link, so it seemed like a good idea to start > > investigating this first. > > Yeah, actually it would be consistent with ~16-32 KB granularity > transfers based on your measurements above, which is plausible. So > making sure that the driver doesn't artificially limit the PCIe > bandwidth might indeed help. > > OTOH this also indicates a similar potential for improvement by using > larger transfers in Mesa and/or the kernel. > > > -- > Earthling Michel Dänzer | https://www.amd.com > Libre software enthusiast | Mesa and X developer > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel