[PATCH v4 2/2] nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman The `nouveau_dmem_copy_one` function ensures that the copy push command is sent to the device firmware but does not track whether it was executed successfully. In the case of a copy error (e.g., firmware or hardware failure), the copy push command will be sent via the

Re: [PATCH 2/2] nouveau/dmem: Fix memory leak in `migrate_to_ram` upon copy error

2024-10-08 Thread Yonatan Maman
On 30/09/2024 14:20, Danilo Krummrich wrote: External email: Use caution opening links or attachments On Mon, Sep 23, 2024 at 01:54:58PM +, Yonatan Maman wrote: A copy push command might fail, causing `migrate_to_ram` to return a dirty HIGH_USER page to the user. This exposes a

[PATCH v2 1/2] nouveau/dmem: Fix privileged error in copy engine channel

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman When `nouveau_dmem_copy_one` is called, the following error occurs: [272146.675156] nouveau :06:00.0: fifo: PBDMA9: 0004 [HCE_PRIV] ch 1 0300 3386 This indicates that a copy push command triggered a Host Copy Engine Privileged error on channel 1 (Copy Engine

[no subject]

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman Date: Mon, 7 Oct 2024 14:48:26 +0300 Subject: [PATCH v2 0/2] drm/nouveau/dmem: Fix Memory Leaking and Device Channels configuration This patch series addresses two critical issues in the Nouveau driver related to device channels, error handling, and memory leaks. - Memory

[PATCH v4 0/2] drm/nouveau/dmem: Fix Vulnerability and Device Channels configuration

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman This patch series addresses two critical issues in the Nouveau driver related to device channels, error handling, and sensitive data leaks. - Vulnerability in migrate_to_ram: The migrate_to_ram function might return a dirty HIGH_USER page when a copy push command (FW

[PATCH v3 2/2] nouveau/dmem: Fix vulnerability in migrate_to_ram upon copy error

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman The `nouveau_dmem_copy_one` function ensures that the copy push command is sent to the device firmware but does not track whether it was executed successfully. In the case of a copy error (e.g., firmware or hardware failure), the copy push command will be sent via the

[PATCH v3 1/2] nouveau/dmem: Fix privileged error in copy engine channel

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman When `nouveau_dmem_copy_one` is called, the following error occurs: [272146.675156] nouveau :06:00.0: fifo: PBDMA9: 0004 [HCE_PRIV] ch 1 0300 3386 This indicates that a copy push command triggered a Host Copy Engine Privileged error on channel 1 (Copy Engine

[PATCH v4 1/2] nouveau/dmem: Fix privileged error in copy engine channel

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman When `nouveau_dmem_copy_one` is called, the following error occurs: [272146.675156] nouveau :06:00.0: fifo: PBDMA9: 0004 [HCE_PRIV] ch 1 0300 3386 This indicates that a copy push command triggered a Host Copy Engine Privileged error on channel 1 (Copy Engine

[PATCH v2 2/2] nouveau/dmem: Fix memory leak in `migrate_to_ram` upon copy error

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman A copy push command might fail, causing `migrate_to_ram` to return a dirty HIGH_USER page to the user. This exposes a security vulnerability in the nouveau driver. To prevent memory leaks in `migrate_to_ram` upon a copy error, allocate a zero page for the destination page

[PATCH v3 0/2] drm/nouveau/dmem: Fix Vulnerability and Device Channels configuration

2024-10-08 Thread Yonatan Maman
From: Yonatan Maman This patch series addresses two critical issues in the Nouveau driver related to device channels, error handling, and sensitive data leaks. - Vulnerability in migrate_to_ram: The migrate_to_ram function might return a dirty HIGH_USER page when a copy push command (FW

Re: [PATCH 1/2] nouveau/dmem: Fix privileged error in copy engine channel

2024-10-08 Thread Yonatan Maman
On 30/09/2024 14:09, Danilo Krummrich wrote: External email: Use caution opening links or attachments Hi Yonatan, On Mon, Sep 23, 2024 at 01:54:56PM +, Yonatan Maman wrote: When `nouveau_dmem_copy_one` is called, the following error occurs: [272146.675156] nouveau :06:00.0: fifo

Re: [PATCH v1 1/4] mm/hmm: HMM API for P2P DMA to device zone pages

2024-10-16 Thread Yonatan Maman
, 2024 at 06:23:45PM +0300, Yonatan Maman wrote: From: Yonatan Maman hmm_range_fault() natively triggers a page fault on device private pages, migrating them to RAM. That "natively" above doesn't make sense to me. What I meant to convey is that hmm_range_fault() by default triggered a

Re: [PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages

2024-10-16 Thread Yonatan Maman
On 16/10/2024 7:23, Christoph Hellwig wrote: On Tue, Oct 15, 2024 at 06:23:44PM +0300, Yonatan Maman wrote: From: Yonatan Maman This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize RDMA and private device pages. This enhancement is crucial

Re: [PATCH v1 2/4] nouveau/dmem: HMM P2P DMA for private dev pages

2024-10-16 Thread Yonatan Maman
On 16/10/2024 8:12, Alistair Popple wrote: Yonatan Maman writes: From: Yonatan Maman Enabling Peer-to-Peer DMA (P2P DMA) access in GPU-centric applications is crucial for minimizing data transfer overhead (e.g., for RDMA use- case). This change aims to enable that capability for

Re: [PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages

2024-10-20 Thread Yonatan Maman
On 18/10/2024 10:26, Zhu Yanjun wrote: External email: Use caution opening links or attachments 在 2024/10/16 17:16, Yonatan Maman 写道: On 16/10/2024 7:23, Christoph Hellwig wrote: On Tue, Oct 15, 2024 at 06:23:44PM +0300, Yonatan Maman wrote: From: Yonatan Maman This patch series aims

[PATCH v1 2/4] nouveau/dmem: HMM P2P DMA for private dev pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman Enabling Peer-to-Peer DMA (P2P DMA) access in GPU-centric applications is crucial for minimizing data transfer overhead (e.g., for RDMA use- case). This change aims to enable that capability for Nouveau over HMM device private pages. P2P DMA for private device pages allows

[PATCH v1 3/4] IB/core: P2P DMA for device private pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman Add Peer-to-Peer (P2P) DMA request for hmm_range_fault calling, utilizing capabilities introduced in mm/hmm. By setting range.default_flags to HMM_PFN_REQ_FAULT | HMM_PFN_REQ_TRY_P2P, HMM attempts to initiate P2P DMA connections for device private pages (instead of page fault

[PATCH v1 4/4] RDMA/mlx5: Enabling ATS for ODP memory

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman ATS (Address Translation Services) mainly utilized to optimize PCI Peer-to-Peer transfers and prevent bus failures. This change employed ATS usage for ODP memory, to optimize DMA P2P for ODP memory. (e.g DMA P2P for private device pages - ODP memory). Signed-off-by: Yonatan

[PATCH v1 1/4] mm/hmm: HMM API for P2P DMA to device zone pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman hmm_range_fault() natively triggers a page fault on device private pages, migrating them to RAM. In some cases, such as with RDMA devices, the migration overhead between the device (e.g., GPU) and the CPU, and vice-versa, significantly damages performance. Thus, enabling Peer

[PATCH v1 0/4] GPU Direct RDMA (P2P DMA) for Device Private Pages

2024-10-15 Thread Yonatan Maman
From: Yonatan Maman This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize RDMA and private device pages. This enhancement is crucial for minimizing data transfer overhead by allowing the GPU to directly expose device private page data to devices

[RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

2024-12-01 Thread Yonatan Maman
From: Yonatan Maman hmm_range_fault() by default triggered a page fault on device private when HMM_PFN_REQ_FAULT flag was set. pages, migrating them to RAM. In some cases, such as with RDMA devices, the migration overhead between the device (e.g., GPU) and the CPU, and vice-versa, significantly

[RFC 0/5] GPU Direct RDMA (P2P DMA) for Device Private Pages

2024-12-01 Thread Yonatan Maman
From: Yonatan Maman Based on: Provide a new two step DMA mapping API patchset https://lore.kernel.org/kvm/20241114170247.ga5...@lst.de/T/#t This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize RDMA and private device pages. This enhancement

[RFC 2/5] nouveau/dmem: HMM P2P DMA for private dev pages

2024-12-01 Thread Yonatan Maman
From: Yonatan Maman Enabling Peer-to-Peer DMA (P2P DMA) access in GPU-centric applications is crucial for minimizing data transfer overhead (e.g., for RDMA use- case). This change aims to enable that capability for Nouveau over HMM device private pages. P2P DMA for private device pages allows

[RFC 3/5] IB/core: P2P DMA for device private pages

2024-12-01 Thread Yonatan Maman
From: Yonatan Maman Add Peer-to-Peer (P2P) DMA request for hmm_range_fault calling, utilizing capabilities introduced in mm/hmm. By setting range.default_flags to HMM_PFN_REQ_FAULT | HMM_PFN_REQ_TRY_P2P, HMM attempts to initiate P2P DMA connections for device private pages (instead of page fault

[RFC 4/5] RDMA/mlx5: Add fallback for P2P DMA errors

2024-12-01 Thread Yonatan Maman
From: Yonatan Maman Handle P2P DMA mapping errors when the transaction requires traversing an inaccessible host bridge that is not in the allowlist: - In `populate_mtt`, if a P2P mapping fails, the `HMM_PFN_ALLOW_P2P` flag is cleared only for the PFNs that returned a mapping error. - In

[RFC 5/5] RDMA/mlx5: Enabling ATS for ODP memory

2024-12-01 Thread Yonatan Maman
From: Yonatan Maman ATS (Address Translation Services) mainly utilized to optimize PCI Peer-to-Peer transfers and prevent bus failures. This change employed ATS usage for ODP memory, to optimize DMA P2P for ODP memory. (e.g DMA P2P for private device pages - ODP memory). Signed-off-by: Yonatan

[PATCH v2 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

2025-07-18 Thread Yonatan Maman
From: Yonatan Maman hmm_range_fault() by default triggered a page fault on device private when HMM_PFN_REQ_FAULT flag was set. pages, migrating them to RAM. In some cases, such as with RDMA devices, the migration overhead between the device (e.g., GPU) and the CPU, and vice-versa, significantly

[PATCH v2 2/5] nouveau/dmem: HMM P2P DMA for private dev pages

2025-07-18 Thread Yonatan Maman
From: Yonatan Maman Enabling Peer-to-Peer DMA (P2P DMA) access in GPU-centric applications is crucial for minimizing data transfer overhead (e.g., for RDMA use- case). This change aims to enable that capability for Nouveau over HMM device private pages. P2P DMA for private device pages allows

[PATCH v2 3/5] IB/core: P2P DMA for device private pages

2025-07-18 Thread Yonatan Maman
From: Yonatan Maman Add Peer-to-Peer (P2P) DMA request for hmm_range_fault calling, utilizing capabilities introduced in mm/hmm. By setting range.default_flags to HMM_PFN_REQ_FAULT | HMM_PFN_REQ_TRY_P2P, HMM attempts to initiate P2P DMA connections for device private pages (instead of page fault

[PATCH v2 0/5] *** GPU Direct RDMA (P2P DMA) for Device Private Pages ***

2025-07-18 Thread Yonatan Maman
From: Yonatan Maman This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize RDMA and private device pages. This enhancement reduces data transfer overhead by allowing the GPU to directly expose device private page data to devices such as NICs

[PATCH v2 4/5] RDMA/mlx5: Enable P2P DMA with fallback mechanism

2025-07-18 Thread Yonatan Maman
From: Yonatan Maman Add support for P2P for MLX5 NIC devices with automatic fallback to standard DMA when P2P mapping fails. The change introduces P2P DMA requests by default using the HMM_PFN_ALLOW_P2P flag. If P2P mapping fails with -EFAULT error, the operation is retried without the P2P flag

[PATCH v2 5/5] RDMA/mlx5: Enabling ATS for ODP memory

2025-07-18 Thread Yonatan Maman
From: Yonatan Maman ATS (Address Translation Services) mainly utilized to optimize PCI Peer-to-Peer transfers and prevent bus failures. This change employed ATS usage for ODP memory, to optimize DMA P2P for ODP memory. (e.g DMA P2P for private device pages - ODP memory). Signed-off-by: Yonatan

Re: [PATCH v2 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages

2025-07-21 Thread Yonatan Maman
On 21/07/2025 9:59, Christoph Hellwig wrote: On Fri, Jul 18, 2025 at 02:51:08PM +0300, Yonatan Maman wrote: From: Yonatan Maman hmm_range_fault() by default triggered a page fault on device private when HMM_PFN_REQ_FAULT flag was set. pages, migrating them to RAM. In some cases, such as

Re: [PATCH v2 2/5] nouveau/dmem: HMM P2P DMA for private dev pages

2025-07-21 Thread Yonatan Maman
On 21/07/2025 10:00, Christoph Hellwig wrote: On Fri, Jul 18, 2025 at 02:51:09PM +0300, Yonatan Maman wrote: + .get_dma_pfn_for_device = nouveau_dmem_get_dma_pfn, Please don't shorten the method name prefix in the implementation symbol name, as that makes reading / refactorin

Re: [PATCH v2 0/5] *** GPU Direct RDMA (P2P DMA) for Device Private Pages ***

2025-07-20 Thread Yonatan Maman
On 20/07/2025 13:30, Leon Romanovsky wrote: External email: Use caution opening links or attachments On Fri, Jul 18, 2025 at 02:51:07PM +0300, Yonatan Maman wrote: From: Yonatan Maman This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize