17/04/2020 01:46, Dmitry Kozlyuk:
> >   *   [AI Dmitry K, Harini] Dmitry K to send summary of conversation for 
> > feedback, Harini to follow-up for resolution.
> 
> On Windows community calls we've been discussing memory management
> implementation approaches and plans. This summary aims to bring everyone
> interested to the same page and to record information in one public place.
> 
> [Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me.
> Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer.
> 
> 
> Current State
> -------------
> 
> Patches are sent for basic memory management that should be suitable for most
> simple cases. Relevant implementation traits are as follows:
> 
> * IOVA as PA only, PA is obtained via a kernel-mode driver.
> * Hugepages are allocated dynamically in user-mode (2MB only),
>   IOVA-contiguity is provided by allocator to the extent possible.
> * No multi-process support.
> 
> 
> Background and Findings
> -----------------------
> 
> Physical addresses are fundamentally limited and insecure because of the
> following (this list is not specific to Windows, but provides context):
> 
> 1. A user-mode application with access to DMA and PA can convince the
>    device to overwrite arbitrary RAM content, bypassing OS security.
> 
> 2. IOMMU might be engaged rendering PA invalid for a particular device.
>    This mode is mandatory for PCI passthrough into VM.
> 
> 3. IOMMU may be used even on a bare-metal system to protect against #1 by
>    limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using
>    DMA from different RX and TX devices must take care of this. On Windows,
>    such mechanism is called Kernel DMA Protection [1].
> 
> 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).

Mellanox NICs work also with PA memory.

> 5. In complex PCI topologies logical bus addresses may differ from PA,
>    although a concrete example is missing for modern systems (IoT SoC?).
> 
> 
> Within Windows kernel there are two facilities to deal with the above:
> 
> 1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2].
>    "DMA adapter" is an abstraction of bus-master mode or an allocated channel
>    of a DMA controller. Also, each device belongs to a DMA domain, initially
>    its so-called default domain. Only devices of the same domain can have a
>    buffer suitable for DMA by all devices. In that, DMA domains are similar
>    to IOMMU groups in Linux.
> 
>    Besides domain management, this interface allows allocation of such a
>    common buffer, that is, a contiguous range of IOVA (logical addresses) and
>    kernel VA (which can be mapped to user-space). Advantages of this
>    interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it
>    supports hugepages. One disadvantage is that kernel controls IOVA and VA.
> 
> 2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver,
>    that is, it allows management of IOMMU mappings within a domain [3].
> 
> [Dmitry M] Microsoft considers creating a generic memory-management driver
> exposing (some of) these interfaces which will be shipped with Windows. This
> is an idea on its early stage, not a commitment.

DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API?


> Notable DPDK memory management traits:
> 
> 1. When memory is requested from EAL, it is unknown whether it will be used
> for DMA and with which device. The hint is when rte_virt2iova() is called,
> but this is not the case for VA-only devices.
> 
> 2. Memory is reserved and then committed in segments (basically, hugepages).
> 
> 3. There is a callback for segment list allocation and deallocation. For
> example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged.
> 
> 4. There are drivers that explicitly request PA via rte_virt2phys().
> 
> 
> Last but not the least, user-mode memory management notes:
> 
> 1. Windows doesn't report limits on the number of hugepages.
> 
> 2. By official documentation, only 2MB hugepages are supported.
> 
>    [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [5].
>    [Dmitry K] Found a novel allocator library using these new features [6].
>    Failed to make use of [5] with AWE, unclear how to integrate into MM.
> 
> 3. Address Windowing Extensions [4] allow allocating physical page
>    frames (PFN) and then mapping them to VA, all in user-mode.
> 
>    [Dmitry K] Experiments show AWE cannot allocate hugepages (in a documented
>    way at least) and cannot reliably provide contiguous ranges (and does not
>    guarantee it). IMO, this interface is useless for common MM. Some drivers
>    that do not need hugepages but require PA may benefit from it.
> 
> 
> Opens
> -----
> 
> IMO, "Advanced memory management" milestone from roadmap should be split.

Yes for splitting. Feel free to send a patch for the roadmap.
And we should plan these tasks later in the year.
Basic memory management should be enough for first steps with PMDs.

> There are three major points of MM improvement, each requiring research and a
> complex patch:
> 
> 1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclear).
> 2. VFIO-like code in Windows EAL using DMA_IOMMU.
> 3. Support for 1GB hugepages and related changes.
> 
> Windows kernel interfaces described above have poor documentation. On Windows
> community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete
> questions were raised and noted).
> 
> Hugepages of 1GB are desirable, but allocating them relies on undocumented
> features. Also, because Windows does not provide hugepage limits, it may
> require more work to manage multiple sizes in DPDK.
> 
> 
> References
> ----------
> 
> [1]: Kernel DMA Protection for Thunderboltâ„¢ 3
> <https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt>
> [2]: DMA_IOMMU interface -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-iommu_map_identity_range>
> [3]: DMA_ADAPTER.AllocateDomainCommonBuffer -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_domain_common_buffer>
> [4]: Address Windowing Extensions (AWE)
> <https://docs.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions>
> [5]: GitHub issue <https://github.com/dotnet/runtime/issues/12779>
> [6]: mimalloc <https://github.com/microsoft/mimalloc>


Thanks for the great summary.


Reply via email to