You can try to do the allocation of the staging buffer with pipe_buffer_create instead of u_upload_mgr. You can also use u_suballocator, which is like a stripped out version of u_upload_mgr. You would need another instance of u_upload_mgr anyway, because we'd like to continue using PIPE_USAGE_STREAM for uploads.
Marek On Sat, Aug 9, 2014 at 2:35 PM, Niels Ole Salscheider <niels_...@salscheider-online.de> wrote: > On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote: >> Could you please do this without changing u_upload_mgr? You can still >> use u_upload_alloc to allocate buffer memory in the driver and the map >> buffer read/write flags are not important with persistent coherent >> buffer mappings anyway. > > Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU -> GPU > streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM. > We should therefore set buffer.usage to PIPE_USAGE_STAGING in > u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we > won't get any performance improvements. > Would it now be OK to change u_upload_mgr or do you have a better proposal? > > Ole > >> Marek >> >> On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider >> >> <niels_...@salscheider-online.de> wrote: >> > Using the DMA engine for buffer downloads vastly improves performance. >> > This is because reads from VRAM by the CPU are slow because of the high >> > latency of the PCIe bus. >> > >> > The first patch allows u_upload_mgr to be used for downloads, too. The >> > second patch then uses u_upload_mgr in the radeon driver for downloads. >> > I considered to rename u_upload_mgr to u_transfer_mgr since it might be >> > confusing that an "upload manager" can be used for downloads. But then >> > again we also have "transfers" so that u_transfer_mgr might also be >> > confusing. Thus, I decided not to rename it for now. >> > >> > Without these patches, the buffer_bandwidth benchmark from uCLbench gives >> > me: >> > >> > ./buffer_bandwidth --size=20000000 --iterations=100 >> > # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant >> > memory,> >> > 32 KB local memory) >> > >> > 1/1 direct 20000000 Bytes 759.29 MB/s(HD) 17.13 MB/s(DD) >> > >> > 14.61 MB/s(DH) >> > >> > With these paches, the read performance is much better: >> > >> > ./buffer_bandwidth --size=20000000 --iterations=100 >> > # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant >> > memory,> >> > 32 KB local memory) >> > >> > 1/1 direct 20000000 Bytes 759.90 MB/s(HD) 613.49 MB/s(DD) >> > >> > 1841.07 MB/s(DH) >> > >> > Judging by these numbers, it might even make sense to use the DMA engine >> > for larger buffer downloads... >> > >> > Niels Ole Salscheider (2): >> > util/u_upload_mgr: Allow to also use it for downloads >> > radeon: Use transfer manager for buffer downloads >> > >> > src/gallium/auxiliary/hud/hud_context.c | 3 +- >> > src/gallium/auxiliary/util/u_blitter.c | 3 +- >> > src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++++++++++----- >> > src/gallium/auxiliary/util/u_upload_mgr.h | 13 ++++- >> > src/gallium/auxiliary/util/u_vbuf.c | 3 +- >> > src/gallium/auxiliary/vl/vl_compositor.c | 3 +- >> > src/gallium/drivers/ilo/ilo_context.c | 3 +- >> > src/gallium/drivers/r300/r300_context.c | 3 +- >> > src/gallium/drivers/radeon/r600_buffer_common.c | 78 >> > +++++++++++++++++++------ src/gallium/drivers/radeon/r600_pipe_common.c >> > | 14 ++++- >> > src/gallium/drivers/radeon/r600_pipe_common.h | 1 + >> > src/mesa/state_tracker/st_context.c | 9 ++- >> > 12 files changed, 136 insertions(+), 46 deletions(-) >> > >> > -- >> > 1.9.0 >> > >> > _______________________________________________ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev