On Thu, 2010-11-11 at 14:59 -0800, Jerome Glisse wrote: > 2010/11/11 Keith Whitwell <kei...@vmware.com>: > > There is still more to do there. Currently r600g treats buffer and texture > > uploads separately, and I've only attempted to improve texture uploads. > > Buffer is just as important however. > > > > The change needed is likely to be one of two: > > a) Allow newly created vertex buffers to be in the GTT domain, where they > > can be mapped cached. > > b) Provide a staging resource upload path (with the staging buffer in GTT > > domain). > > > > The latter will catch more cases and doesn't suffer from waits for the > > engine to go idle when accessing an in-use buffer. The former is probably > > fastest for the cases where it works. > > > > Right now staged texture uploads use a 3d blit to copy from the staging > > resource to the final destination. That probably won't work (directly at > > least) for buffer uploads as buffer dimensions (eg 64k by 1) mean they > > usually can't be bound as render targets. So we need to jump through some > > hoops to get a hardware upload path in the absence of a DMA engine or > > 1d-blit. > > > > Keith > > I am not sure on how gallium texture upload was ever supposed to be or > done, but from memory management point of view the idea i had was to > create all bo in GTT and let migrate them to VRAM once they are use, > eliminating any need for staging buffer. So it would be allocate bo, > memcpy to bo the content of the texture, use bo and set it as vram bo > so kernel migrate it to vram, that way you take advantage of kernel bo > move which should be faster than any blit helped move.
That works great for normal/static textures that are written at most once by the CPU and from then on always used by the GPU, and is basically the (a) path, above. The purpose of an intermediate/staging/dma-based upload path is to cope with textures/buffers/etc which receive incremental updates from the CPU at concurrently with being rendered from by the GPU. This is actually pretty common for VBOs, where a lot of applications come up with schemes for incrementally updating a small number of large VBOs (I think ETQW did this for instance), but also any application using TexSubImage, etc, is effectively doing this. Doing these updates with DMAs means we don't have to wait for buffer idle before the update, which seems to be the most obvious current bottleneck in r600g for a lot of apps. > Anyway this was my initial thinking when doing the code. It's definitely the most efficient path for static textures, but for dynamically-updated resources, and for readbacks, having a GPU-mediated copy seems to be a win. Keith _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev