Hi Michal, > On 26 Oct 2022, at 6:17 pm, Michal Orzel <michal.or...@amd.com> wrote: > > Hi Rahul, > > On 26/10/2022 16:33, Rahul Singh wrote: >> >> >> Hi Julien, >> >>> On 26 Oct 2022, at 2:36 pm, Julien Grall <jul...@xen.org> wrote: >>> >>> >>> >>> On 26/10/2022 14:17, Rahul Singh wrote: >>>> Hi All, >>> >>> Hi Rahul, >>> >>>> At Arm, we started to implement the POC to support 2 levels of page >>>> tables/nested translation in SMMUv3. >>>> To support nested translation for guest OS Xen needs to expose the virtual >>>> IOMMU. If we passthrough the >>>> device to the guest that is behind an IOMMU and virtual IOMMU is enabled >>>> for the guest there is a need to >>>> add IOMMU binding for the device in the passthrough node as per [1]. This >>>> email is to get an agreement on >>>> how to add the IOMMU binding for guest OS. >>>> Before I will explain how to add the IOMMU binding let me give a brief >>>> overview of how we will add support for virtual >>>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 Nested >>>> translation support. SMMUv3 hardware >>>> supports two stages of translation. Each stage of translation can be >>>> independently enabled. An incoming address is logically >>>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 >>>> which translates the IPA to the output PA. Stage 1 is >>>> intended to be used by a software entity( Guest OS) to provide isolation >>>> or translation to buffers within the entity, for example, >>>> DMA isolation within an OS. Stage 2 is intended to be available in systems >>>> supporting the Virtualization Extensions and is >>>> intended to virtualize device DMA to guest VM address spaces. When both >>>> stage 1 and stage 2 are enabled, the translation >>>> configuration is called nesting. >>>> Stage 1 translation support is required to provide isolation between >>>> different devices within the guest OS. XEN already supports >>>> Stage 2 translation but there is no support for Stage 1 translation for >>>> guests. We will add support for guests to configure >>>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU >>>> hardware and exposes the virtual SMMU to the guest. >>>> Guest can use the native SMMU driver to configure the stage 1 translation. >>>> When the guest configures the SMMU for Stage 1, >>>> XEN will trap the access and configure the hardware accordingly. >>>> Now back to the question of how we can add the IOMMU binding between the >>>> virtual IOMMU and the master devices so that >>>> guests can configure the IOMMU correctly. The solution that I am >>>> suggesting is as below: >>>> For dom0, while handling the DT node(handle_node()) Xen will replace the >>>> phandle in the "iommus" property with the virtual >>>> IOMMU node phandle. >>> Below, you said that each IOMMUs may have a different ID space. So >>> shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the >>> user to specify the mapping? >> >> Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This >> also helps in the ACPI case >> where we don’t need to modify the tables to delete the pIOMMU entries and >> create one vIOMMU. >> In this case, no need to replace the phandle as Xen create the vIOMMU with >> the same pIOMMU >> phandle and same base address. >> >> For domU guests one vIOMMU per guest will be created. >> >>> >>>> For domU guests, when passthrough the device to the guest as per [2], add >>>> the below property in the partial device tree >>>> node that is required to describe the generic device tree binding for >>>> IOMMUs and their master(s) >>>> "iommus = < &magic_phandle 0xvMasterID> >>>> • magic_phandle will be the phandle ( vIOMMU phandle in xl) that will >>>> be documented so that the user can set that in partial DT node (0xfdea). >>> >>> Does this mean only one IOMMU will be supported in the guest? >> >> Yes. >> >>> >>>> • vMasterID will be the virtual master ID that the user will provide. >>>> The partial device tree will look like this: >>>> /dts-v1/; >>>> / { >>>> /* #*cells are here to keep DTC happy */ >>>> #address-cells = <2>; >>>> #size-cells = <2>; >>>> aliases { >>>> net = &mac0; >>>> }; >>>> passthrough { >>>> compatible = "simple-bus"; >>>> ranges; >>>> #address-cells = <2>; >>>> #size-cells = <2>; >>>> mac0: ethernet@10000000 { >>>> compatible = "calxeda,hb-xgmac"; >>>> reg = <0 0x10000000 0 0x1000>; >>>> interrupts = <0 80 4 0 81 4 0 82 4>; >>>> iommus = <0xfdea 0x01>; >>>> }; >>>> }; >>>> }; >>>> In xl.cfg we need to define a new option to inform Xen about vMasterId to >>>> pMasterId mapping and to which IOMMU device this >>>> the master device is connected so that Xen can configure the right IOMMU. >>>> This is required if the system has devices that have >>>> the same master ID but behind a different IOMMU. >>> >>> In xl.cfg, we already pass the device-tree node path to passthrough. So Xen >>> should already have all the information about the IOMMU and Master-ID. So >>> it doesn't seem necessary for Device-Tree. >>> >>> For ACPI, I would have expected the information to be found in the IOREQ. >>> >>> So can you add more context why this is necessary for everyone? >> >> We have information for IOMMU and Master-ID but we don’t have information >> for linking vMaster-ID to pMaster-ID. >> The device tree node will be used to assign the device to the guest and >> configure the Stage-2 translation. Guest will use the >> vMaster-ID to configure the vIOMMU during boot. Xen needs information to >> link vMaster-ID to pMaster-ID to configure >> the corresponding pIOMMU. As I mention we need vMaster-ID in case a system >> could have 2 identical Master-ID but >> each one connected to a different SMMU and assigned to the guest. > > I think the proposed solution would work and I would just like to clear some > issues. > > Please correct me if I'm wrong: > > In the xl config file we already need to specify dtdev to point to the device > path in host dtb. > In the partial device tree we specify the vMasterId as well as magic phandle. > Isn't it that we already have all the information necessary without the need > for iommu_devid_map? > For me it looks like the partial dtb provides vMasterID and dtdev provides > pMasterID as well as physical phandle to SMMU. > > Having said that, I can also understand that specifying everything in one > place using iommu_devid_map can be easier > and reduces the need for device tree parsing. > > Apart from that, what is the reason of exposing only one vSMMU to guest > instead of one vSMMU per pSMMU? > In the latter solution, the whole issue with handling devices with the same > stream ID but belonging to different SMMUs > would be gone. It would also result in a more natural way of the device tree > look. Normally a guest would see > e.g. both SMMUs and exposing only one can be misleading.
Please see the other email that I replied to Julien to know the answer to the above question. Regards, Rahul