On Thu, Oct 27, 2022 at 7:49 PM Rahul Singh <rahul.si...@arm.com> wrote:
> Hi Oleksandr, > Hello Rahul [sorry for the possible format issues] > > > On 26 Oct 2022, at 7:23 pm, Oleksandr Tyshchenko <olekst...@gmail.com> > wrote: > > > > > > > > On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.or...@amd.com> > wrote: > > Hi Rahul, > > > > > > Hello all > > > > [sorry for the possible format issues] > > > > > > On 26/10/2022 16:33, Rahul Singh wrote: > > > > > > > > > Hi Julien, > > > > > >> On 26 Oct 2022, at 2:36 pm, Julien Grall <jul...@xen.org> wrote: > > >> > > >> > > >> > > >> On 26/10/2022 14:17, Rahul Singh wrote: > > >>> Hi All, > > >> > > >> Hi Rahul, > > >> > > >>> At Arm, we started to implement the POC to support 2 levels of page > tables/nested translation in SMMUv3. > > >>> To support nested translation for guest OS Xen needs to expose the > virtual IOMMU. If we passthrough the > > >>> device to the guest that is behind an IOMMU and virtual IOMMU is > enabled for the guest there is a need to > > >>> add IOMMU binding for the device in the passthrough node as per [1]. > This email is to get an agreement on > > >>> how to add the IOMMU binding for guest OS. > > >>> Before I will explain how to add the IOMMU binding let me give a > brief overview of how we will add support for virtual > > >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 > Nested translation support. SMMUv3 hardware > > >>> supports two stages of translation. Each stage of translation can be > independently enabled. An incoming address is logically > > >>> translated from VA to IPA in stage 1, then the IPA is input to stage > 2 which translates the IPA to the output PA. Stage 1 is > > >>> intended to be used by a software entity( Guest OS) to provide > isolation or translation to buffers within the entity, for example, > > >>> DMA isolation within an OS. Stage 2 is intended to be available in > systems supporting the Virtualization Extensions and is > > >>> intended to virtualize device DMA to guest VM address spaces. When > both stage 1 and stage 2 are enabled, the translation > > >>> configuration is called nesting. > > >>> Stage 1 translation support is required to provide isolation between > different devices within the guest OS. XEN already supports > > >>> Stage 2 translation but there is no support for Stage 1 translation > for guests. We will add support for guests to configure > > >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU > hardware and exposes the virtual SMMU to the guest. > > >>> Guest can use the native SMMU driver to configure the stage 1 > translation. When the guest configures the SMMU for Stage 1, > > >>> XEN will trap the access and configure the hardware accordingly. > > >>> Now back to the question of how we can add the IOMMU binding between > the virtual IOMMU and the master devices so that > > >>> guests can configure the IOMMU correctly. The solution that I am > suggesting is as below: > > >>> For dom0, while handling the DT node(handle_node()) Xen will replace > the phandle in the "iommus" property with the virtual > > >>> IOMMU node phandle. > > >> Below, you said that each IOMMUs may have a different ID space. So > shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the > user to specify the mapping? > > > > > > Yes you are right we need to create one vIOMMU per pIOMMU for dom0. > This also helps in the ACPI case > > > where we don’t need to modify the tables to delete the pIOMMU entries > and create one vIOMMU. > > > In this case, no need to replace the phandle as Xen create the vIOMMU > with the same pIOMMU > > > phandle and same base address. > > > > > > For domU guests one vIOMMU per guest will be created. > > > > > >> > > >>> For domU guests, when passthrough the device to the guest as per > [2], add the below property in the partial device tree > > >>> node that is required to describe the generic device tree binding > for IOMMUs and their master(s) > > >>> "iommus = < &magic_phandle 0xvMasterID> > > >>> • magic_phandle will be the phandle ( vIOMMU phandle in xl) > that will be documented so that the user can set that in partial DT node > (0xfdea). > > >> > > >> Does this mean only one IOMMU will be supported in the guest? > > > > > > Yes. > > > > > >> > > >>> • vMasterID will be the virtual master ID that the user will > provide. > > >>> The partial device tree will look like this: > > >>> /dts-v1/; > > >>> / { > > >>> /* #*cells are here to keep DTC happy */ > > >>> #address-cells = <2>; > > >>> #size-cells = <2>; > > >>> aliases { > > >>> net = &mac0; > > >>> }; > > >>> passthrough { > > >>> compatible = "simple-bus"; > > >>> ranges; > > >>> #address-cells = <2>; > > >>> #size-cells = <2>; > > >>> mac0: ethernet@10000000 { > > >>> compatible = "calxeda,hb-xgmac"; > > >>> reg = <0 0x10000000 0 0x1000>; > > >>> interrupts = <0 80 4 0 81 4 0 82 4>; > > >>> iommus = <0xfdea 0x01>; > > >>> }; > > >>> }; > > >>> }; > > >>> In xl.cfg we need to define a new option to inform Xen about > vMasterId to pMasterId mapping and to which IOMMU device this > > >>> the master device is connected so that Xen can configure the right > IOMMU. This is required if the system has devices that have > > >>> the same master ID but behind a different IOMMU. > > >> > > >> In xl.cfg, we already pass the device-tree node path to passthrough. > So Xen should already have all the information about the IOMMU and > Master-ID. So it doesn't seem necessary for Device-Tree. > > >> > > >> For ACPI, I would have expected the information to be found in the > IOREQ. > > >> > > >> So can you add more context why this is necessary for everyone? > > > > > > We have information for IOMMU and Master-ID but we don’t have > information for linking vMaster-ID to pMaster-ID. > > > The device tree node will be used to assign the device to the guest > and configure the Stage-2 translation. Guest will use the > > > vMaster-ID to configure the vIOMMU during boot. Xen needs information > to link vMaster-ID to pMaster-ID to configure > > > the corresponding pIOMMU. As I mention we need vMaster-ID in case a > system could have 2 identical Master-ID but > > > each one connected to a different SMMU and assigned to the guest. > > > > I think the proposed solution would work and I would just like to clear > some issues. > > > > Please correct me if I'm wrong: > > > > In the xl config file we already need to specify dtdev to point to the > device path in host dtb. > > In the partial device tree we specify the vMasterId as well as magic > phandle. > > Isn't it that we already have all the information necessary without the > need for iommu_devid_map? > > For me it looks like the partial dtb provides vMasterID and dtdev > provides pMasterID as well as physical phandle to SMMU. > > > > Having said that, I can also understand that specifying everything in > one place using iommu_devid_map can be easier > > and reduces the need for device tree parsing. > > > > Apart from that, what is the reason of exposing only one vSMMU to guest > instead of one vSMMU per pSMMU? > > In the latter solution, the whole issue with handling devices with the > same stream ID but belonging to different SMMUs > > would be gone. It would also result in a more natural way of the device > tree look. Normally a guest would see > > e.g. both SMMUs and exposing only one can be misleading. > > > > I also have the same question. From earlier answers as I understand it > is going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge > for DomU? > > > > Also I am thinking how this solution would work for IPMMU-VMSA > Gen3(Gen4), which also supports two stages of translation, so the nested > translation could be possible in general, although there might be some > pitfalls > > (yes, I understand that code to emulate access to control registers > would be different in comparison with SMMUv3, but some other code could be > common). > > Yes we will try to make code common so that other vIOMMU can be > implemented easily. > > > > > > > > > > > > >> > > >>> iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , > “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”] > > >>> • PMASTER_ID is the physical master ID of the device from the > physical DT. > > >>> • VMASTER_ID is the virtual master Id that the user will > configure in the partial device tree. > > >>> • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU > device to which this device is connected. > > > > > > If iommu_devid_map is a way to go, I have a question, would this > configuration cover the following cases? > > 1. Device has several stream IDs > > Yes in that case user needs to create the mapping for each streamIDs. For > example if device has streamId 0x10 , 0x20 and 0x30. > iommu_devid_map will be: > > iommu_devid_map = ["0x10@0x01,0x40000000”, "0x20@0x02 > ,0x40000000”,"0x30@0x03,0x40000000”] > > Here 0x40000000 is physical IOMMU base address. > > > 2. Several devices share the stream ID (or several stream IDs) > > Let take an example of two devices : > > Device 1: 0x10 > Device 2: 0x10 > > Iommu_devid_map = [“0x10@0x1,0x40000000”,"0x10@0x2,0x40000000”] > > Xen will create the data structure that include vStreamID, pMasterID and > IOMMU base address. > With the help of three tuples we will be able to find the right physical > IOMMU. Thanks for the clarification, I see that iommu_devid_map is able to describe various combinations, which is good. But, the user should be very careful when filling in iommu_devid_map especially if dealing with a system that has many iommus and devices with many stream IDs, as it would be easy to make a mistake in that case. As a real example, if I want to describe 5 DMA controllers assigned to the guest where each has 16 uTLBs (this is an equivalent of stream IDs) I would need to add 80 entries (quite lot) to iommu_devid_map with specifying VMASTER_ID for each entry (as uTLBs are not unique across the system). https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1042 https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1084 https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L1126 https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2450 https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/renesas/r8a77951.dtsi#L2492 So I agree in general with what has been said earlier in that thread to *better* avoid user interaction and teach the toolstack to do this automatically. At the same time I understand this might be quite difficult to implement, etc. > > > > Regards, > Rahul -- Regards, Oleksandr Tyshchenko