k, asx.ddk ________________________________ From: Thomas Hellström <thomas.hellst...@linux.intel.com> Sent: Friday, February 14, 2025 2:17:13 PM To: Demi Marie Obenour <d...@invisiblethingslab.com>; Brost, Matthew <matthew.br...@intel.com>; intel...@lists.freedesktop.org <intel...@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimi...@intel.com>; apop...@nvidia.com <apop...@nvidia.com>; airl...@gmail.com <airl...@gmail.com>; simona.vet...@ffwll.ch <simona.vet...@ffwll.ch>; felix.kuehl...@amd.com <felix.kuehl...@amd.com>; d...@kernel.org <d...@kernel.org> Subject: Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation
Hi On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote: > On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote: > > Version 5 of GPU SVM. Thanks to everyone (especially Sima, Thomas, > > Alistair, Himal) for their numerous reviews on revision 1, 2, 3 > > and for > > helping to address many design issues. > > > > This version has been tested with IGT [1] on PVC, BMG, and LNL. > > Also > > tested with level0 (UMD) PR [2]. > > What is the plan to deal with not being able to preempt while a page > fault is pending? This seems like an easy DoS vector. My > understanding > is that SVM is mostly used by compute workloads on headless systems. > Recent AMD client GPUs don't support SVM, so programs that want to > run > on client systems should not require SVM if they wish to be portable. > > Given the potential for abuse, I think it would be best to require > explicit administrator opt-in to enable SVM, along with possibly > having > a timeout to resolve a page fault (after which the context is > killed). > Since I expect most uses of SVM to be in the datacenter space (for > the > reasons mentioned above), I don't believe this will be a major > limitation in practice. Programs that wish to run on client systems > already need to use explicit memory transfer or pinned userptr, and > administrators of compute clusters should be willing to enable this > feature because only one workload will be using a GPU at a time. While not directly having addressed the potential DoS issue you mention, there is an associated deadlock possibility that may happen due to not being able to preempt a pending pagefault. That is if a dma- fence job is requiring the same resources held up by the pending page- fault, and then the pagefault servicing is dependent on that dma-fence to be signaled in one way or another. That deadlock is handled by only allowing either page-faulting jobs or dma-fence jobs on a resource (hw engine or hw engine group) that can be used by both at a time, blocking synchronously in the exec IOCTL until the resource is available for the job type. That means LR jobs waits for all dma-fence jobs to complete, and dma-fence jobs wait for all LR jobs to preempt. So a dma-fence job wait could easily mean "wait for all outstanding pagefaults to be serviced". Whether, on the other hand, that is a real DoS we need to care about, is probably a topic for debate. The directions we've had so far are that it's not. Nothing is held up indefinitely, what's held up can be Ctrl-C'd by the user and core mm memory management is not blocked since mmu_notifiers can execute to completion and shrinkers / eviction can execute while a page-fault is pending. Thanks, Thomas