On Thu, Oct 27, 2022 at 7:49 PM Rahul Singh <rahul.si...@arm.com> wrote:

> Hi Oleksandr,
>

Hello Rahul

[sorry for the possible format issues]


>
> > On 26 Oct 2022, at 7:23 pm, Oleksandr Tyshchenko <olekst...@gmail.com>
> wrote:
> >
> >
> >
> > On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.or...@amd.com>
> wrote:
> > Hi Rahul,
> >
> >
> > Hello all
> >
> > [sorry for the possible format issues]
> >
> >
> > On 26/10/2022 16:33, Rahul Singh wrote:
> > >
> > >
> > > Hi Julien,
> > >
> > >> On 26 Oct 2022, at 2:36 pm, Julien Grall <jul...@xen.org> wrote:
> > >>
> > >>
> > >>
> > >> On 26/10/2022 14:17, Rahul Singh wrote:
> > >>> Hi All,
> > >>
> > >> Hi Rahul,
> > >>
> > >>> At Arm, we started to implement the POC to support 2 levels of page
> tables/nested translation in SMMUv3.
> > >>> To support nested translation for guest OS Xen needs to expose the
> virtual IOMMU. If we passthrough the
> > >>> device to the guest that is behind an IOMMU and virtual IOMMU is
> enabled for the guest there is a need to
> > >>> add IOMMU binding for the device in the passthrough node as per [1].
> This email is to get an agreement on
> > >>> how to add the IOMMU binding for guest OS.
> > >>> Before I will explain how to add the IOMMU binding let me give a
> brief overview of how we will add support for virtual
> > >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3
> Nested translation support. SMMUv3 hardware
> > >>> supports two stages of translation. Each stage of translation can be
> independently enabled. An incoming address is logically
> > >>> translated from VA to IPA in stage 1, then the IPA is input to stage
> 2 which translates the IPA to the output PA. Stage 1 is
> > >>> intended to be used by a software entity( Guest OS) to provide
> isolation or translation to buffers within the entity, for example,
> > >>> DMA isolation within an OS. Stage 2 is intended to be available in
> systems supporting the Virtualization Extensions and is
> > >>> intended to virtualize device DMA to guest VM address spaces. When
> both stage 1 and stage 2 are enabled, the translation
> > >>> configuration is called nesting.
> > >>> Stage 1 translation support is required to provide isolation between
> different devices within the guest OS. XEN already supports
> > >>> Stage 2 translation but there is no support for Stage 1 translation
> for guests. We will add support for guests to configure
> > >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU
> hardware and exposes the virtual SMMU to the guest.
> > >>> Guest can use the native SMMU driver to configure the stage 1
> translation. When the guest configures the SMMU for Stage 1,
> > >>> XEN will trap the access and configure the hardware accordingly.
> > >>> Now back to the question of how we can add the IOMMU binding between
> the virtual IOMMU and the master devices so that
> > >>> guests can configure the IOMMU correctly. The solution that I am
> suggesting is as below:
> > >>> For dom0, while handling the DT node(handle_node()) Xen will replace
> the phandle in the "iommus" property with the virtual
> > >>> IOMMU node phandle.
> > >> Below, you said that each IOMMUs may have a different ID space. So
> shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the
> user to specify the mapping?
> > >
> > > Yes you are right we need to create one vIOMMU per pIOMMU for dom0.
> This also helps in the ACPI case
> > > where we don’t need to modify the tables to delete the pIOMMU entries
> and create one vIOMMU.
> > > In this case, no need to replace the phandle as Xen create the vIOMMU
> with the same pIOMMU
> > > phandle and same base address.
> > >
> > > For domU guests one vIOMMU per guest will be created.
> > >
> > >>
> > >>> For domU guests, when passthrough the device to the guest as per
> [2],  add the below property in the partial device tree
> > >>> node that is required to describe the generic device tree binding
> for IOMMUs and their master(s)
> > >>> "iommus = < &magic_phandle 0xvMasterID>
> > >>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)
> that will be documented so that the user can set that in partial DT node
> (0xfdea).
> > >>
> > >> Does this mean only one IOMMU will be supported in the guest?
> > >
> > > Yes.
> > >
> > >>
> > >>>      • vMasterID will be the virtual master ID that the user will
> provide.
> > >>> The partial device tree will look like this:
> > >>> /dts-v1/;
> > >>>  / {
> > >>>     /* #*cells are here to keep DTC happy */
> > >>>     #address-cells = <2>;
> > >>>     #size-cells = <2>;
> > >>>       aliases {
> > >>>         net = &mac0;
> > >>>     };
> > >>>       passthrough {
> > >>>         compatible = "simple-bus";
> > >>>         ranges;
> > >>>         #address-cells = <2>;
> > >>>         #size-cells = <2>;
> > >>>         mac0: ethernet@10000000 {
> > >>>             compatible = "calxeda,hb-xgmac";
> > >>>             reg = <0 0x10000000 0 0x1000>;
> > >>>             interrupts = <0 80 4  0 81 4  0 82 4>;
> > >>>            iommus = <0xfdea 0x01>;
> > >>>         };
> > >>>     };
> > >>> };
> > >>>  In xl.cfg we need to define a new option to inform Xen about
> vMasterId to pMasterId mapping and to which IOMMU device this
> > >>> the master device is connected so that Xen can configure the right
> IOMMU. This is required if the system has devices that have
> > >>> the same master ID but behind a different IOMMU.
> > >>
> > >> In xl.cfg, we already pass the device-tree node path to passthrough.
> So Xen should already have all the information about the IOMMU and
> Master-ID. So it doesn't seem necessary for Device-Tree.
> > >>
> > >> For ACPI, I would have expected the information to be found in the
> IOREQ.
> > >>
> > >> So can you add more context why this is necessary for everyone?
> > >
> > > We have information for IOMMU and Master-ID but we don’t have
> information for linking vMaster-ID to pMaster-ID.
> > > The device tree node will be used to assign the device to the guest
> and configure the Stage-2 translation. Guest will use the
> > > vMaster-ID to configure the vIOMMU during boot. Xen needs information
> to link vMaster-ID to pMaster-ID to configure
> > > the corresponding pIOMMU. As I mention we need vMaster-ID in case a
> system could have 2 identical Master-ID but
> > > each one connected to a different SMMU and assigned to the guest.
> >
> > I think the proposed solution would work and I would just like to clear
> some issues.
> >
> > Please correct me if I'm wrong:
> >
> > In the xl config file we already need to specify dtdev to point to the
> device path in host dtb.
> > In the partial device tree we specify the vMasterId as well as magic
> phandle.
> > Isn't it that we already have all the information necessary without the
> need for iommu_devid_map?
> > For me it looks like the partial dtb provides vMasterID and dtdev
> provides pMasterID as well as physical phandle to SMMU.
> >
> > Having said that, I can also understand that specifying everything in
> one place using iommu_devid_map can be easier
> > and reduces the need for device tree parsing.
> >
> > Apart from that, what is the reason of exposing only one vSMMU to guest
> instead of one vSMMU per pSMMU?
> > In the latter solution, the whole issue with handling devices with the
> same stream ID but belonging to different SMMUs
> > would be gone. It would also result in a more natural way of the device
> tree look. Normally a guest would see
> > e.g. both SMMUs and exposing only one can be misleading.
> >
> > I also have the same question. From earlier answers as I understand it
> is going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge
> for DomU?
> >
> > Also I am thinking how this solution would work for IPMMU-VMSA
> Gen3(Gen4), which also supports two stages of translation, so the nested
> translation could be possible in general, although there might be some
> pitfalls
> > (yes, I understand that code to emulate access to control registers
> would be different in comparison with SMMUv3, but some other code could be
> common).
>
> Yes we will try to make code common so that other vIOMMU can be
> implemented easily.
> >
> >
> >
> >
> >
> > >>
> > >>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” ,
> “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> > >>>      • PMASTER_ID is the physical master ID of the device from the
> physical DT.
> > >>>      • VMASTER_ID is the virtual master Id that the user will
> configure in the partial device tree.
> > >>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU
> device to which this device is connected.
> >
> >
> > If iommu_devid_map is a way to go, I have a question, would this
> configuration cover the following cases?
> > 1. Device has several stream IDs
>
> Yes in that case user needs to create the mapping for each streamIDs. For
> example if device has streamId 0x10 , 0x20 and 0x30.
> iommu_devid_map will be:
>
> iommu_devid_map = ["0x10@0x01,0x40000000”, "0x20@0x02
> ,0x40000000”,"0x30@0x03,0x40000000”]
>
> Here 0x40000000 is physical IOMMU base address.
>
> > 2. Several devices share the stream ID (or several stream IDs)
>
> Let take an example of two devices :
>
> Device 1: 0x10
> Device 2: 0x10
>
> Iommu_devid_map = [“0x10@0x1,0x40000000”,"0x10@0x2,0x40000000”]
>
> Xen will create the data structure that include vStreamID, pMasterID and
> IOMMU base address.
> With the help of three tuples we will be able to find the right physical
> IOMMU.



Thanks for the clarification, I see that iommu_devid_map is able to
describe various combinations, which is good. But, the user should be very
careful when filling in iommu_devid_map especially
if dealing with a system that has many iommus and devices with many stream
IDs, as it would be easy to make a mistake in that case.
As a real example, if I want to describe 5 DMA controllers assigned to the
guest where each has 16 uTLBs (this is an equivalent of stream IDs) I would
need to add 80 entries (quite lot) to iommu_devid_map with specifying
VMASTER_ID for each entry (as uTLBs are not unique across the system).

https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1042
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1084
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1126
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2450
https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2492


So I agree in general with what has been said earlier in that thread to
*better* avoid user interaction
and teach the toolstack to do this automatically. At the same time I
understand this might be quite difficult to implement, etc.



>
>
>
> Regards,
> Rahul



-- 
Regards,

Oleksandr Tyshchenko

Reply via email to