On Sat, Apr 18, 2015 at 12:37 AM, Terje Bergstrom <[email protected]> wrote: > > On 04/17/2015 02:11 AM, Alexandre Courbot wrote: >> >> Tracking the PDE and PTE of each memory chunk can probably be avoided >> if you change your unmapping strategy. Currently you are going through >> the list of nvkm_vm_bp_list, but you know your PDE and PTE are always >> going to be adjacent, since a nvkm_vma represents a contiguous block >> in the GPU VA. So when unmapping, you can simply check for each PTE >> entry whether the IOMMU bit is set, and unmap from the IOMMU space >> after unmapping from the GPU VA space, in a loop similar to that of >> nvkm_vm_unmap_at(). >> >> Then we only need priv. You are keeping the nvkm_mm_node of the IOMMU >> space into it, and you need it to free the IOMMU VA space. If only we >> could find another way to store it, we could get rid of the whole >> structure and associated list_head in nvkm_vma... >> >> I need to give it some more thoughts, and we will probably need to >> change a few things in base.c to make the hooks more flexible, so >> please give me some more time to think about it. :) I just wanted to >> share my thoughts so far in case this puts you on track. > > The way you described it would make GPU MMU and IOMMU mappings 1:1. So when > we map a buffer to GPU MMU, we always map page by page the buffer also to > IOMMU. There are disadvantages here. > > IOMMU addresses are global, and uses in the GPU caches. When a buffer is > mapped multiple times to different graphics contexts, we want to avoid cache > aliasing by mapping the buffer only once to IOMMU. We also want to unmap the > buffer from IOMMU only once after all the instances of the buffer have been > unmapped, or only when the buffer is actually freed to cache IOMMU mappings. > > Doing IOMMU mapping for the whole buffer with dma_map_sg is also faster than > mapping page by page, because you can do only one TLB invalidate in the end > of the loop instead of after every page if you use dma_map_single. > > All of these would talk for having IOMMU and GMMU mapping loops separate. > This patch set does not implement both the advantages above, but your > suggestion would take us further away from that than Vince's version.
Aha, looks like both Vince and I overlooked this point. So IIUC we would need to make sure a GPU buffer is only ever mapped once by the IOMMU. This means we either need to preemptively entilery map it at some point and just keep a reference count, or keep track of which 128k ranges are already mapped (again, with a reference count to know when to unmap them). First solution is tempting because it is simpler, but surely there is something wrong with it? _______________________________________________ Nouveau mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/nouveau
