Re: Making drm_gpuvm work across gpu devices

Christian König Thu, 29 Feb 2024 01:41:46 -0800

Am 28.02.24 um 20:51 schrieb Zeng, Oak:

The mail wasn’t indent/preface correctly. Manually format it.

*From:*Christian König <christian.koe...@amd.com>
*Sent:* Tuesday, February 27, 2024 1:54 AM
*To:* Zeng, Oak <oak.z...@intel.com>; Danilo Krummrich<d...@redhat.com>; Dave Airlie <airl...@redhat.com>; Daniel Vetter<dan...@ffwll.ch>; Felix Kuehling <felix.kuehl...@amd.com>;jgli...@redhat.com*Cc:* Welty, Brian <brian.we...@intel.com>;dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org;Bommu, Krishnaiah <krishnaiah.bo...@intel.com>; Ghimiray, Himal Prasad<himal.prasad.ghimi...@intel.com>; thomas.hellst...@linux.intel.com;Vishwanathapura, Niranjana <niranjana.vishwanathap...@intel.com>;Brost, Matthew <matthew.br...@intel.com>; Gupta, saurabhg<saurabhg.gu...@intel.com>
*Subject:* Re: Making drm_gpuvm work across gpu devices

Hi Oak,

Am 23.02.24 um 21:12 schrieb Zeng, Oak:

    Hi Christian,

    I go back this old email to ask a question.


sorry totally missed that one.

    Quote from your email:

    “Those ranges can then be used to implement the SVM feature
    required for higher level APIs and not something you need at the
    UAPI or even inside the low level kernel memory management.”

    “SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This
    should not have any influence on the design of the kernel UAPI.”

    There are two category of SVM:

    1.driver svm allocator: this is implemented in user space,  i.g.,
    cudaMallocManaged (cuda) or zeMemAllocShared (L0) or
    clSVMAlloc(openCL). Intel already have gem_create/vm_bind in xekmd
    and our umd implemented clSVMAlloc and zeMemAllocShared on top of
    gem_create/vm_bind. Range A..B of the process address space is
    mapped into a range C..D of the GPU address space, exactly as you
    said.

    2.system svm allocator:  This doesn’t introduce extra driver API
    for memory allocation. Any valid CPU virtual address can be used
    directly transparently in a GPU program without any extra driver
    API call. Quote from kernel Documentation/vm/hmm.hst: “Any
    application memory region (private anonymous, shared memory, or
    regular file backed memory) can be used by a device transparently”
    and “to share the address space by duplicating the CPU page table
    in the device page table so the same address points to the same
    physical memory for any valid main memory address in the process
    address space”. In system svm allocator, we don’t need that A..B
    C..D mapping.

    It looks like you were talking of 1). Were you?
No, even when you fully mirror the whole address space from a processinto the GPU you still need to enable this somehow with an IOCTL.
And while enabling this you absolutely should specify to which part ofthe address space this mirroring applies and where it maps to.
*/[Zeng, Oak] /*
Lets say we have a hardware platform where both CPU and GPU support57bit(use it for example. The statement apply to any address range)virtual address range, how do you decide “which part of the addressspace this mirroring applies”? You have to mirror the whole addressspace [0~2^57-1], do you? As you designed it, the giganticvm_bind/mirroring happens at the process initialization time, and atthat time, you don’t know which part of the address space will be usedfor gpu program. Remember for system allocator, *any* valid CPUaddress can be used for GPU program. If you add an offset to[0~2^57-1], you get an address out of 57bit address range. Is this avalid concern?

Well you can perfectly mirror on demand. You just need something similarto userfaultfd() for the GPU. This way you don't need to mirror the fulladdress space, but can rather work with large chunks created on demand,let's say 1GiB or something like that.

The virtual address space is basically just a hardware functionality toroute memory accesses. While the mirroring approach is a very common usecase for data-centers and high performance computing there are quite anumber of different use cases which makes use of virtual address spacein a non "standard" fashion. The native context approach for VMs is justone example, databases and emulators are another one.

I see the system svm allocator as just a special case of the driverallocator where not fully backed buffer objects are allocated, butrather sparse one which are filled and migrated on demand.
*/[Zeng, Oak] /*
Above statement is true to me. We don’t have BO for system svmallocator. It is a sparse one as we can sparsely map vma to GPU. Ourmigration policy decide which pages/how much of the vma ismigrated/mapped to GPU page table.
*//*
The difference b/t your mind and mine is, you want a gigantic vma(created during the gigantic vm_bind) to be sparsely populated to gpu.While I thought vma (xe_vma in xekmd codes) is a place to save memoryattributes (such as caching, user preferred placement etc). All thosememory attributes are range based, i.e., user can specify range1 iscached while range2 is uncached. So I don’t see how you can manage itwith the gigantic vma. Do you split your gigantic vma later to saverange based memory attributes?

Yes, exactly that. I mean the splitting and eventually merging of rangesis a standard functionality of the GPUVM code.

So when you need to store additional attributes per range then I wouldstrongly suggest to make use of this splitting and merging functionalityas well.

So basically an IOCTL which says range A..B of the GPU address space ismapped to offset X of the CPU address space with parameters Y (caching,migration behavior etc..). That is essentially the same we have formapping GEM objects, the provider of the backing store is just somethingdifferent.


Regards,
Christian.

Regards,

Oak



Regards,
Christian.

Re: Making drm_gpuvm work across gpu devices

Reply via email to