RE: [PATCH v2 4/9] iommufd: Convert to alloc_domain_paging()

2023-10-09 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, September 28, 2023 7:48 AM
> 
> Move the global static blocked domain to the ops and convert the
> unmanaged
> domain to domain_alloc_paging.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH v2 1/9] iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain

2023-10-09 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, September 28, 2023 7:48 AM
> 
> Following the pattern of identity domains, just assign the BLOCKED domain
> global statics to a value in ops. Update the core code to use the global
> static directly.
> 
> Update powerpc to use the new scheme and remove its empty domain_alloc
> callback.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH v2 2/9] iommu/vt-d: Update the definition of the blocking domain

2023-10-09 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, September 28, 2023 7:48 AM
> 
> The global static should pre-define the type and the NOP free function can
> be now left as NULL.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH v2 3/9] iommu/vt-d: Use ops->blocked_domain

2023-10-09 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, September 28, 2023 7:48 AM
> 
> Trivially migrate to the ops->blocked_domain for the existing global
> static.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 0/4] Invalidate secondary IOMMU TLB on permission upgrade

2023-07-18 Thread Tian, Kevin
> From: Anshuman Khandual 
> Sent: Wednesday, July 19, 2023 11:04 AM
> 
> On 7/18/23 13:26, Alistair Popple wrote:
> > The main change is to move secondary TLB invalidation mmu notifier
> > callbacks into the architecture specific TLB flushing functions. This
> > makes secondary TLB invalidation mostly match CPU invalidation while
> > still allowing efficient range based invalidations based on the
> > existing TLB batching code.
> >
> > ==
> > Background
> > ==
> >
> > The arm64 architecture specifies TLB permission bits may be cached and
> > therefore the TLB must be invalidated during permission upgrades. For
> > the CPU this currently occurs in the architecture specific
> > ptep_set_access_flags() routine.
> >
> > Secondary TLBs such as implemented by the SMMU IOMMU match the CPU
> > architecture specification and may also cache permission bits and
> > require the same TLB invalidations. This may be achieved in one of two
> > ways.
> >
> > Some SMMU implementations implement broadcast TLB maintenance
> > (BTM). This snoops CPU TLB invalidates and will invalidate any
> > secondary TLB at the same time as the CPU. However implementations are
> > not required to implement BTM.
> 
> So, the implementations with BTM do not even need a MMU notifier callback
> for secondary TLB invalidation purpose ? Perhaps mmu_notifier_register()
> could also be skipped for such cases i.e with ARM_SMMU_FEAT_BTM
> enabled ?
> 

Out of curiosity. How does BTM work with device tlb? Can SMMU translate
a TLB broadcast request (based on ASID) into a set of PCI ATS invalidation
requests (based on PCI requestor ID and PASID) in hardware?

If software intervention is required then it might be the reason why mmu
notifier cannot be skipped. With BTM enabled it just means the notifier
callback can skip iotlb invalidation...


RE: [PATCH 01/11] iommu: Have __iommu_probe_device() check for already probed devices

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> This is a step toward making __iommu_probe_device() self contained.
> 
> It should, under proper locking, check if the device is already associated
> with an iommu driver and resolve parallel probes. All but one of the
> callers open code this test using two different means, but they all
> rely on dev->iommu_group.
> 
> Currently the bus_iommu_probe()/probe_iommu_group() and
> probe_acpi_namespace_devices() rejects already probed devices with an
> unlocked read of dev->iommu_group. The OF and ACPI "replay" functions
> use
> device_iommu_mapped() which is the same read without the pointless
> refcount.
> 
> Move this test into __iommu_probe_device() and put it under the
> iommu_probe_device_lock. The store to dev->iommu_group is in
> iommu_group_add_device() which is also called under this lock for iommu
> driver devices, making it properly locked.
> 
> The only path that didn't have this check is the hotplug path triggered by
> BUS_NOTIFY_ADD_DEVICE. The only way to get dev->iommu_group assigned
> outside the probe path is via iommu_group_add_device(). Today there are
> only three callers, VFIO no-iommu, powernv and power pseries - none of
> these cases probe iommu drivers. Thus adding this additional check is
> safe.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 02/11] iommu: Use iommu_group_ref_get/put() for dev->iommu_group

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> No reason to open code this, use the proper helper functions.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 03/11] iommu: Inline iommu_group_get_for_dev() into __iommu_probe_device()

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> This is the only caller, and it doesn't need the generality of the
> function. We already know there is no iommu_group, so it is simply two
> function calls.
> 
> Moving it here allows the following patches to split the logic in these
> functions.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 04/11] iommu: Simplify the __iommu_group_remove_device() flow

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> Instead of returning the struct group_device and then later freeing it, do
> the entire free under the group->mutex and defer only putting the
> iommu_group.
> 
> It is safe to remove the sysfs_links and free memory while holding that
> mutex.
> 
> Move the sanity assert of the group status into
> __iommu_group_free_device().
> 
> The next patch will improve upon this and consolidate the group put and
> the mutex into __iommu_group_remove_device().
> 
> __iommu_group_free_device() is close to being the paired undo of
> iommu_group_add_device(), following patches will improve on that.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 05/11] iommu: Add iommu_init/deinit_driver() paired functions

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> +static int iommu_init_driver(struct device *dev, const struct iommu_ops
> *ops)

would iommu_init_device() better fit the purpose?

otherwise,

Reviewed-by: Kevin Tian 


RE: [PATCH 06/11] iommu: Move the iommu driver sysfs setup into iommu_init/deinit_driver()

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> It makes logical sense that once the driver is attached to the device the
> sysfs links appear, even if we haven't fully created the group_device or
> attached the device to a domain.
> 
> Fix the missing error handling on sysfs creation since
> iommu_init_driver() can trivially handle this.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 07/11] iommu: Do not export iommu_device_link/unlink()

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> These are not used outside iommu.c, they should not be available to
> modular code.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [PATCH 08/11] iommu: Always destroy the iommu_group during iommu_release_device()

2023-04-26 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, April 20, 2023 12:12 AM
> 
> Have release fully clean up the iommu related parts of the struct device,
> no matter what state they are in.
> 
> POWER creates iommu_groups without drivers attached, and the next patch
> removes the open-coding of this same cleanup from POWER.
> 
> Split the logic so that the three things owned by the iommu core are
> always cleaned up:
>  - Any attached iommu_group
>  - Any allocated dev->iommu, eg for fwsepc
>  - Any attached driver via a struct group_device
> 
> This fixes a bug where a fwspec created without an iommu_group being
> probed would not be freed.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 


RE: [patch 01/39] PCI/MSI: Check for MSI enabled in __pci_msix_enable()

2022-11-17 Thread Tian, Kevin
> From: Thomas Gleixner 
> Sent: Friday, November 11, 2022 9:54 PM
> 
> PCI/MSI and PCI/MSI-X are mutually exclusive, but the MSI-X enable code
> lacks a check for already enabled MSI.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |5 +
>  1 file changed, 5 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -935,6 +935,11 @@ static int __pci_enable_msix_range(struc
>   if (maxvec < minvec)
>   return -ERANGE;
> 
> + if (dev->msi_enabled) {
> + pci_info(dev, "can't enable MSI-X (MSI already enabled)\n");
> + return -EINVAL;
> + }
> +
>   if (WARN_ON_ONCE(dev->msix_enabled))
>   return -EINVAL;
> 

a same check remains in __pci_enable_msix():

/* Check whether driver already requested for MSI IRQ */
if (dev->msi_enabled) {
pci_info(dev, "can't enable MSI-X (MSI IRQ already 
assigned)\n");
return -EINVAL;
}
return msix_capability_init(dev, entries, nvec, affd);

It's removed later in patch33 when sanitizing MSI-X checks. But logically
the removal can come with this patch.


RE: [patch 33/39] PCI/MSI: Sanitize MSI-X checks

2022-11-17 Thread Tian, Kevin
> From: Thomas Gleixner 
> Sent: Friday, November 11, 2022 9:55 PM
> 
> @@ -785,7 +786,7 @@ int __pci_enable_msix_range(struct pci_d
>   return -ENOSPC;
>   }
> 
> - rc = __pci_enable_msix(dev, entries, nvec, affd, flags);
> + rc = msix_capability_init(dev, entries, nvec, affd);
>   if (rc == 0)
>   return nvec;
> 

The check in following lines doesn't make sense now:

if (rc < minvec)
return -ENOSPC;


RE: [patch 00/39] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 1 cleanups

2022-11-17 Thread Tian, Kevin
> From: Thomas Gleixner 
> Sent: Friday, November 11, 2022 9:54 PM
> 
> Enough of history and theory. Here comes part 1:
> 
> This is just a cleanup and a reorganisation of the PCI/MSI code which
> became quite an unreadable mess over time. There is no intentional
> functional change in this series.
> 
> It's just a separate step to make the subsequent changes in the
> infrastructure easier both to implement and to review.
> 
> Thanks,
> 
>   tglx

The entire series looks good to me except a couple nits replied in
individual patches:

Reviewed-by: Kevin Tian 


RE: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-24 Thread Tian, Kevin
> From: Auger Eric
> Sent: Monday, March 15, 2021 3:52 PM
> To: Christoph Hellwig 
> Cc: k...@vger.kernel.org; Will Deacon ; linuxppc-
> d...@lists.ozlabs.org; dri-de...@lists.freedesktop.org; Li Yang
> ; io...@lists.linux-foundation.org;
> 
> Hi Christoph,
> 
> On 3/14/21 4:58 PM, Christoph Hellwig wrote:
> > On Sun, Mar 14, 2021 at 11:44:52AM +0100, Auger Eric wrote:
> >> As mentionned by Robin, there are series planning to use
> >> DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu
> (ARM
> >> and Intel):
> >>
> >> [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
> >> patches 1, 2, 3
> >>
> >> Is the plan to introduce a new domain_get_nesting_info ops then?
> >
> > The plan as usual would be to add it the series adding that support.
> > Not sure what the merge plans are - if the series is ready to be
> > merged I could rebase on top of it, otherwise that series will need
> > to add the method.
> OK I think your series may be upstreamed first.
> 

Agree. The vSVA series is still undergoing a refactor according to Jason's
comment thus won't be ready in short term. It's better to let this one
go in first.

Thanks
Kevin


RE: [PATCH] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-05-02 Thread Tian, Kevin
> From: Yongji Xie
> Sent: Wednesday, April 27, 2016 8:22 PM
> 
> Current vfio-pci implementation disallows to mmap
> sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
> page may be shared with other BARs. This will cause some
> performance issues when we passthrough a PCI device with
> this kind of BARs. Guest will be not able to handle the mmio
> accesses to the BARs which leads to mmio emulations in host.
> 
> However, not all sub-page BARs will share page with other BARs.
> We should allow to mmap those sub-page MMIO BARs which we can
> make sure will not share page with other BARs.
> 
> This patch adds support for this case. And we also try to use
> shadow resource to reserve the remaind of the page which hot-add
> device's BAR might be assigned into.

'shadow' usually means you have a corresponding part being
shadowed, while here looks you mostly want some 'dummy'
resource for reservation purpose?

> +
> + if (!(res->start & ~PAGE_MASK)) {
> + /*
> +  * Add shadow resource for sub-page bar whose mmio
> +  * page is exclusive in case that hot-add device's
> +  * bar is assigned into the mem hole.
> +  */
> + shadow_res = kzalloc(sizeof(*shadow_res), GFP_KERNEL);
> + shadow_res->resource.start = res->end + 1;
> + shadow_res->resource.end = res->start + PAGE_SIZE - 1;

What about res->start not page aligned so you end up still having 
a portion before res->start not exclusively reserved?

> + shadow_res->resource.flags = res->flags;
> + if (request_resource(res->parent,
> + &shadow_res->resource)) {
> + kfree(shadow_res);
> + return false;
> + }
> + shadow_res->index = index;
> + list_add(&shadow_res->res_next,
> + &vdev->shadow_resources_list);
> + return true;

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-02 Thread Tian, Kevin
> From: Yongji Xie
> Sent: Wednesday, April 27, 2016 8:43 PM
> 
> This patch enables mmapping MSI-X tables if hardware supports
> interrupt remapping which can ensure that a given pci device
> can only shoot the MSIs assigned for it.
> 
> With MSI-X table mmapped, we also need to expose the
> read/write interface which will be used to access MSI-X table.
> 
> Signed-off-by: Yongji Xie 

A curious question here. Does "allow to mmap MSI-X" essentially
mean that KVM guest can directly read/write physical MSI-X
structure then?

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-05-02 Thread Tian, Kevin
> From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> Sent: Tuesday, May 03, 2016 1:52 PM
> 
> >> +
> >> +  if (!(res->start & ~PAGE_MASK)) {
> >> +  /*
> >> +   * Add shadow resource for sub-page bar whose mmio
> >> +   * page is exclusive in case that hot-add device's
> >> +   * bar is assigned into the mem hole.
> >> +   */
> >> +  shadow_res = kzalloc(sizeof(*shadow_res), GFP_KERNEL);
> >> +  shadow_res->resource.start = res->end + 1;
> >> +  shadow_res->resource.end = res->start + PAGE_SIZE - 1;
> > What about res->start not page aligned so you end up still having
> > a portion before res->start not exclusively reserved?
> 
> Do you mean add a 'dummy' resource to reserve the portion
> before res->start if res->start not page aligned?
> 
> But would it happen that there is a mem hole in the portion
> before res->start? The resource should have been assigned
> into the hole at the beginning.
> 

Just a quick thought. Another device might occupy that range 
before initializing this device, and then 'another device' is hot
removed later... 

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-02 Thread Tian, Kevin
> From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> Sent: Tuesday, May 03, 2016 2:08 PM
> 
> On 2016/5/3 13:34, Tian, Kevin wrote:
> 
> >> From: Yongji Xie
> >> Sent: Wednesday, April 27, 2016 8:43 PM
> >>
> >> This patch enables mmapping MSI-X tables if hardware supports
> >> interrupt remapping which can ensure that a given pci device
> >> can only shoot the MSIs assigned for it.
> >>
> >> With MSI-X table mmapped, we also need to expose the
> >> read/write interface which will be used to access MSI-X table.
> >>
> >> Signed-off-by: Yongji Xie 
> > A curious question here. Does "allow to mmap MSI-X" essentially
> > mean that KVM guest can directly read/write physical MSI-X
> > structure then?
> >
> > Thanks
> > Kevin
> >
> 
> Here we just allow to mmap MSI-X table in kernel. It doesn't
> mean all KVM guest can directly read/write physical MSI-X
> structure. This should be decided by QEMU. For PPC64
> platform, we would allow to passthrough the MSI-X table
> because we know guest kernel would not write physical
> MSI-X structure when enabling MSI.
> 

A bit confused here. If guest kernel doesn't need to write
physical MSI-X structure, what's the point of passing through
the table then?

I think the key whether MSI-X table can be passed through
is related to where hypervisor control is deployed. At least
for x86:

- When irq remapping is not enabled, host/hypervisor needs
to control physical interrupt message including vector/dest/etc.
directly in MSI-X structure, so we cannot allow a guest to 
access it;

- when irq remapping is enabled, host/hypervisor can control
interrupt routing in irq remapping table. However MSI-X
also needs to be configured as remappable format. In this
manner we also cannot allow direct access from guest.

The only sane case to pass through MSI-X structure, is a
mechanism similar to irq remapping but w/o need to change
original MSI-X format so direct access from guest side is
safe. Is it the case in PPC64?

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-05 Thread Tian, Kevin
> From: Yongji Xie
> Sent: Tuesday, May 03, 2016 3:34 PM
> 
> On 2016/5/3 14:22, Tian, Kevin wrote:
> 
> >> From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> >> Sent: Tuesday, May 03, 2016 2:08 PM
> >>
> >> On 2016/5/3 13:34, Tian, Kevin wrote:
> >>
> >>>> From: Yongji Xie
> >>>> Sent: Wednesday, April 27, 2016 8:43 PM
> >>>>
> >>>> This patch enables mmapping MSI-X tables if hardware supports
> >>>> interrupt remapping which can ensure that a given pci device
> >>>> can only shoot the MSIs assigned for it.
> >>>>
> >>>> With MSI-X table mmapped, we also need to expose the
> >>>> read/write interface which will be used to access MSI-X table.
> >>>>
> >>>> Signed-off-by: Yongji Xie 
> >>> A curious question here. Does "allow to mmap MSI-X" essentially
> >>> mean that KVM guest can directly read/write physical MSI-X
> >>> structure then?
> >>>
> >>> Thanks
> >>> Kevin
> >>>
> >> Here we just allow to mmap MSI-X table in kernel. It doesn't
> >> mean all KVM guest can directly read/write physical MSI-X
> >> structure. This should be decided by QEMU. For PPC64
> >> platform, we would allow to passthrough the MSI-X table
> >> because we know guest kernel would not write physical
> >> MSI-X structure when enabling MSI.
> >>
> > A bit confused here. If guest kernel doesn't need to write
> > physical MSI-X structure, what's the point of passing through
> > the table then?
> 
> We want to allow the MSI-X table because there may be
> some critical registers in the same page as the MSI-X table.
> We have to handle the mmio access to these register in QEMU
> rather than in guest if mmapping MSI-X table is disallowed.

So you mean critical registers in same MMIO BAR as MSI-X
table, instead of two MMIO BARs in same page (the latter I
suppose with your whole patchset it won't happen then)?

> 
> > I think the key whether MSI-X table can be passed through
> > is related to where hypervisor control is deployed. At least
> > for x86:
> >
> > - When irq remapping is not enabled, host/hypervisor needs
> > to control physical interrupt message including vector/dest/etc.
> > directly in MSI-X structure, so we cannot allow a guest to
> > access it;
> >
> > - when irq remapping is enabled, host/hypervisor can control
> > interrupt routing in irq remapping table. However MSI-X
> > also needs to be configured as remappable format. In this
> > manner we also cannot allow direct access from guest.
> >
> > The only sane case to pass through MSI-X structure, is a
> > mechanism similar to irq remapping but w/o need to change
> > original MSI-X format so direct access from guest side is
> > safe. Is it the case in PPC64?
> >
> > Thanks
> > Kevin
> 
> Acutually, we are not aimed at accessing MSI-X table from
> guest. So I think it's safe to passthrough MSI-X table if we
> can make sure guest kernel would not touch MSI-X table in
> normal code path such as para-virtualized guest kernel on PPC64.
> 

Then how do you prevent malicious guest kernel accessing it?

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-05 Thread Tian, Kevin
> From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> Sent: Thursday, May 05, 2016 7:43 PM
> 
> Hi David and Kevin,
> 
> On 2016/5/5 17:54, David Laight wrote:
> 
> > From: Tian, Kevin
> >> Sent: 05 May 2016 10:37
> > ...
> >>> Acutually, we are not aimed at accessing MSI-X table from
> >>> guest. So I think it's safe to passthrough MSI-X table if we
> >>> can make sure guest kernel would not touch MSI-X table in
> >>> normal code path such as para-virtualized guest kernel on PPC64.
> >>>
> >> Then how do you prevent malicious guest kernel accessing it?
> > Or a malicious guest driver for an ethernet card setting up
> > the receive buffer ring to contain a single word entry that
> > contains the address associated with an MSI-X interrupt and
> > then using a loopback mode to cause a specific packet be
> > received that writes the required word through that address.
> >
> > Remember the PCIe cycle for an interrupt is a normal memory write
> > cycle.
> >
> > David
> >
> 
> If we have enough permission to load a malicious driver or
> kernel, we can easily break the guest without exposed
> MSI-X table.
> 
> I think it should be safe to expose MSI-X table if we can
> make sure that malicious guest driver/kernel can't use
> the MSI-X table to break other guest or host. The
> capability of IRQ remapping could provide this
> kind of protection.
> 

With IRQ remapping it doesn't mean you can pass through MSI-X
structure to guest. I know actual IRQ remapping might be platform
specific, but at least for Intel VT-d specification, MSI-X entry must
be configured with a remappable format by host kernel which
contains an index into IRQ remapping table. The index will find a
IRQ remapping entry which controls interrupt routing for a specific
device. If you allow a malicious program random index into MSI-X 
entry of assigned device, the hole is obvious...

Above might make sense only for a IRQ remapping implementation 
which doesn't rely on extended MSI-X format (e.g. simply based on 
BDF). If that's the case for PPC, then you should build MSI-X 
passthrough based on this fact instead of general IRQ remapping 
enabled or not.

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-10 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, May 05, 2016 11:05 PM
> 
> On Thu, 5 May 2016 12:15:46 +0000
> "Tian, Kevin"  wrote:
> 
> > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > Sent: Thursday, May 05, 2016 7:43 PM
> > >
> > > Hi David and Kevin,
> > >
> > > On 2016/5/5 17:54, David Laight wrote:
> > >
> > > > From: Tian, Kevin
> > > >> Sent: 05 May 2016 10:37
> > > > ...
> > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > >>> can make sure guest kernel would not touch MSI-X table in
> > > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > > >>>
> > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > Or a malicious guest driver for an ethernet card setting up
> > > > the receive buffer ring to contain a single word entry that
> > > > contains the address associated with an MSI-X interrupt and
> > > > then using a loopback mode to cause a specific packet be
> > > > received that writes the required word through that address.
> > > >
> > > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > > cycle.
> > > >
> > > > David
> > > >
> > >
> > > If we have enough permission to load a malicious driver or
> > > kernel, we can easily break the guest without exposed
> > > MSI-X table.
> > >
> > > I think it should be safe to expose MSI-X table if we can
> > > make sure that malicious guest driver/kernel can't use
> > > the MSI-X table to break other guest or host. The
> > > capability of IRQ remapping could provide this
> > > kind of protection.
> > >
> >
> > With IRQ remapping it doesn't mean you can pass through MSI-X
> > structure to guest. I know actual IRQ remapping might be platform
> > specific, but at least for Intel VT-d specification, MSI-X entry must
> > be configured with a remappable format by host kernel which
> > contains an index into IRQ remapping table. The index will find a
> > IRQ remapping entry which controls interrupt routing for a specific
> > device. If you allow a malicious program random index into MSI-X
> > entry of assigned device, the hole is obvious...
> >
> > Above might make sense only for a IRQ remapping implementation
> > which doesn't rely on extended MSI-X format (e.g. simply based on
> > BDF). If that's the case for PPC, then you should build MSI-X
> > passthrough based on this fact instead of general IRQ remapping
> > enabled or not.
> 
> I don't think anyone is expecting that we can expose the MSI-X vector
> table to the guest and the guest can make direct use of it.  The end
> goal here is that the guest on a power system is already
> paravirtualized to not program the device MSI-X by directly writing to
> the MSI-X vector table.  They have hypercalls for this since they
> always run virtualized.  Therefore a) they never intend to touch the
> MSI-X vector table and b) they have sufficient isolation that a guest
> can only hurt itself by doing so.
> 
> On x86 we don't have a), our method of programming the MSI-X vector
> table is to directly write to it. Therefore we will always require QEMU
> to place a MemoryRegion over the vector table to intercept those
> accesses.  However with interrupt remapping, we do have b) on x86, which
> means that we don't need to be so strict in disallowing user accesses
> to the MSI-X vector table.  It's not useful for configuring MSI-X on
> the device, but the user should only be able to hurt themselves by
> writing it directly.  x86 doesn't really get anything out of this
> change, but it helps this special case on power pretty significantly
> aiui.  Thanks,
> 

Allowing guest direct write to MSI-x table has system-wide impact.
As I explained earlier, hypervisor needs to control "interrupt_index"
programmed in MSI-X entry, which is used to associate a specific
IRQ remapping entry. Now if you expose whole MSI-x table to guest, 
it can program random index into MSI-X entry to associate with 
any IRQ remapping entry and then there won't be any isolation per se.

You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d
spec.

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, May 11, 2016 11:54 PM
> 
> On Wed, 11 May 2016 06:29:06 +0000
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, May 05, 2016 11:05 PM
> > >
> > > On Thu, 5 May 2016 12:15:46 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > >
> > > > > Hi David and Kevin,
> > > > >
> > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > >
> > > > > > From: Tian, Kevin
> > > > > >> Sent: 05 May 2016 10:37
> > > > > > ...
> > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > > > > >>>
> > > > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > the receive buffer ring to contain a single word entry that
> > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > then using a loopback mode to cause a specific packet be
> > > > > > received that writes the required word through that address.
> > > > > >
> > > > > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > > > > cycle.
> > > > > >
> > > > > > David
> > > > > >
> > > > >
> > > > > If we have enough permission to load a malicious driver or
> > > > > kernel, we can easily break the guest without exposed
> > > > > MSI-X table.
> > > > >
> > > > > I think it should be safe to expose MSI-X table if we can
> > > > > make sure that malicious guest driver/kernel can't use
> > > > > the MSI-X table to break other guest or host. The
> > > > > capability of IRQ remapping could provide this
> > > > > kind of protection.
> > > > >
> > > >
> > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > structure to guest. I know actual IRQ remapping might be platform
> > > > specific, but at least for Intel VT-d specification, MSI-X entry must
> > > > be configured with a remappable format by host kernel which
> > > > contains an index into IRQ remapping table. The index will find a
> > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > device. If you allow a malicious program random index into MSI-X
> > > > entry of assigned device, the hole is obvious...
> > > >
> > > > Above might make sense only for a IRQ remapping implementation
> > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > passthrough based on this fact instead of general IRQ remapping
> > > > enabled or not.
> > >
> > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > table to the guest and the guest can make direct use of it.  The end
> > > goal here is that the guest on a power system is already
> > > paravirtualized to not program the device MSI-X by directly writing to
> > > the MSI-X vector table.  They have hypercalls for this since they
> > > always run virtualized.  Therefore a) they never intend to touch the
> > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > can only hurt itself by doing so.
> > >
> > > On x86 we don't have a), our method of programming the MSI-X vector
> > > table is to directly write to it. Therefore we will always require QEMU
> > > to place a MemoryRegion over the vector table to intercept those
> > > accesses.  However with interrupt remapping, we do have b) on x86, which
> > > means that we don't need to be so strict in disallowing user accesses
> > > to the

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, May 12, 2016 10:21 AM
> 
> On Thu, 12 May 2016 01:19:44 +0000
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, May 11, 2016 11:54 PM
> > >
> > > On Wed, 11 May 2016 06:29:06 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > >
> > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > >
> > > > > > > Hi David and Kevin,
> > > > > > >
> > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > >
> > > > > > > > From: Tian, Kevin
> > > > > > > >> Sent: 05 May 2016 10:37
> > > > > > > > ...
> > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > > >>> normal code path such as para-virtualized guest kernel on 
> > > > > > > >>> PPC64.
> > > > > > > >>>
> > > > > > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > received that writes the required word through that address.
> > > > > > > >
> > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory 
> > > > > > > > write
> > > > > > > > cycle.
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > >
> > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > MSI-X table.
> > > > > > >
> > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > capability of IRQ remapping could provide this
> > > > > > > kind of protection.
> > > > > > >
> > > > > >
> > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > structure to guest. I know actual IRQ remapping might be platform
> > > > > > specific, but at least for Intel VT-d specification, MSI-X entry 
> > > > > > must
> > > > > > be configured with a remappable format by host kernel which
> > > > > > contains an index into IRQ remapping table. The index will find a
> > > > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > > > device. If you allow a malicious program random index into MSI-X
> > > > > > entry of assigned device, the hole is obvious...
> > > > > >
> > > > > > Above might make sense only for a IRQ remapping implementation
> > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > > enabled or not.
> > > > >
> > > > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > >

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, May 13, 2016 1:48 AM
> 
> On Thu, 12 May 2016 04:53:19 +0000
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, May 12, 2016 10:21 AM
> > >
> > > On Thu, 12 May 2016 01:19:44 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Wednesday, May 11, 2016 11:54 PM
> > > > >
> > > > > On Wed, 11 May 2016 06:29:06 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > > > >
> > > > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > > > "Tian, Kevin"  wrote:
> > > > > > >
> > > > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > > > >
> > > > > > > > > Hi David and Kevin,
> > > > > > > > >
> > > > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > > > >
> > > > > > > > > > From: Tian, Kevin
> > > > > > > > > >> Sent: 05 May 2016 10:37
> > > > > > > > > > ...
> > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if 
> > > > > > > > > >>> we
> > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > > > > >>> normal code path such as para-virtualized guest kernel on 
> > > > > > > > > >>> PPC64.
> > > > > > > > > >>>
> > > > > > > > > >> Then how do you prevent malicious guest kernel accessing 
> > > > > > > > > >> it?
> > > > > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > > > received that writes the required word through that address.
> > > > > > > > > >
> > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory 
> > > > > > > > > > write
> > > > > > > > > > cycle.
> > > > > > > > > >
> > > > > > > > > > David
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > > > MSI-X table.
> > > > > > > > >
> > > > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > > > capability of IRQ remapping could provide this
> > > > > > > > > kind of protection.
> > > > > > > > >
> > > > > > > >
> > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > > > structure to guest. I know actual IRQ remapping might be 
> > > > > > > > platform
> > > > > > > > specific, but at least for Intel VT-d specification, MSI-X 
> > > > > > > > entry must
> > > > > > > > be configured with a remappable format by host kerne

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Friday, May 13, 2016 10:33 AM
> 
> > means.  The MSI-X vector table of a device is always considered
> > untrusted which is why we require user opt-ins to subvert that
> > protection.  Thanks,
> >
> 
> I only partially agree with this statement since there is different
> trust level for kernel space driver and user space driver.
> 
> OS may choose to trust kernel space driver then it can enable IRQ
> remapping only for load balance purpose w/o source id verfification.
> In such case MSI-X vector table is trusted and fully under kernel
> control. Allowing to mmap MSI-X table in user space definitely
> breaks that boundary.
> 
> Anyway my point is simple. Let's ignore how Linux kernel implements
> IRQ remapping on x86 (which may change time to time), and just
> focus on architectural possibility. Non-x86 platform may implement
> IRQ remapping completely separate from device side, then checking
> availability of IRQ remapping is enough to decide whether mmap
> MSI-X table is safe. x86 with VT-d can be configured to a mode
> requiring host control of both MSI-X entry and IRQ remapping hardware
> (without source id check). In such case it's insufficient to make
> decision simply based on IRQ remapping availability. We need a way
> to query from IRQ remapping module whether it's actually safe to
> mmap MSI-X.
> 

Or another way is to have VFIO explicitly request intel-iommu to
enforce sid check when IRQ remapping is enabled...

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-12 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, May 13, 2016 1:33 PM
> > >
> > > As argued previously in this thread, there's nothing special about a
> > > DMA write to memory versus a DMA write to a special address that
> > > triggers an MSI vector.  If the device is DMA capable, which we assume
> > > all are, it can be fooled into generating those DMA writes regardless
> > > of whether we actively block access to the MSI-X vector table itself.
> >
> > But with DMA remapping above can be blocked.
> 
> How?  VT-d explicitly ignores DMA writes to 0xFEEx_, section 3.13:
> 
>   Write requests without PASID of DWORD length are treated as interrupt
>   requests. Interrupt requests are not subjected to DMA remapping[...]
>   Instead, remapping hardware can be enabled to subject such interrupt
>   requests to interrupt remapping.

Thanks for catching this!

> 
> > > MSI-X vector table access w/o interrupt remapping is to avoid obvious
> > > collisions if it were to be programmed directly, it doesn't actually
> > > prevent an identical DMA transaction from being generated by other
> >
> > Kernel can enable DMA remapping but disable IRQ remapping. In such
> > case identical DMA transaction can be prevented.
> 
> Not according to the VT-d spec as quoted above.  If so, how?

So my argument on this is wrong. sorry.

> 
> > Anyway my point is simple. Let's ignore how Linux kernel implements
> > IRQ remapping on x86 (which may change time to time), and just
> > focus on architectural possibility. Non-x86 platform may implement
> > IRQ remapping completely separate from device side, then checking
> > availability of IRQ remapping is enough to decide whether mmap
> > MSI-X table is safe. x86 with VT-d can be configured to a mode
> > requiring host control of both MSI-X entry and IRQ remapping hardware
> > (without source id check). In such case it's insufficient to make
> > decision simply based on IRQ remapping availability. We need a way
> > to query from IRQ remapping module whether it's actually safe to
> > mmap MSI-X.
> 
> We're going in circles here.  This patch is attempting to remove
> protection from the MSI-X vector table that is really nothing more than
> security theater already.  That "protection" only actually prevents
> casual misuse of the API which is really only a problem when the
> platform offers no form of interrupt isolation, such as VT-d with DMA
> remapping but not interrupt remapping.  Disabling source-id checking in
> VT-d should be handled as the equivalent of disabling interrupt
> remapping altogether as far as the IOMMU API is concerned.  That's
> a trivial gap that should be fixed.  There is no such thing as a secure

That is the main change I'm asking against original patch, which has:

+static void pci_check_msi_remapping(struct pci_dev *pdev,
+   const struct iommu_ops *ops)
+{
+   struct pci_bus *bus = pdev->bus;
+
+   if (ops->capable(IOMMU_CAP_INTR_REMAP) &&
+   !(bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP))
+   bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP;
+}
+

Above flag should be cleared when source-id checking is disabled on x86. 
Yes, VFIO is part of OS but any assumption we made about other parts
needs to be reflected accurately in the code.

> MSI-X vector table when untrusted userspace drivers are involved, we
> must always assume that a device can generate DMA writes that are
> indistinguishable from actual interrupt requests and if the platform
> does not provide interrupt isolation we should require the user to
> opt-in to an unsafe mode.
> 
> Simply denying direct writes to the vector table or preventing mapping
> of the vector table into the user address space does not provide any
> tangible form of protection.  Many devices make use of window registers
> that allow backdoors to arbitrary device registers.  Some drivers even
> use this as the primary means for configuring MSI-X, which makes them
> incompatible with device assignment without device specific quirks to
> enable virtualization of these paths.
> 
> If you have an objection to this patch, please show me how preventing
> direct CPU access to the MSI-X vector table provides any kind of
> security guarantee of the contents of the vector table and also prove
> to me that a device cannot spoof a DMA write that is indistinguishable
> from one associated with an actual interrupt, regardless of the
> contents of the MSI-X vector table.  Thanks,
> 

I'm not object to the whole patch series. As replied above, my point
is just that current condition of allowing mmap MSI-X in this patch is not 
accurate, but my argument on security manner is not correct. Thanks
for your elaboration to make it clear.

Thanks
Kevin
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

2022-07-08 Thread Tian, Kevin
> From: Alexey Kardashevskiy 
> Sent: Friday, July 8, 2022 2:35 PM
> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> >
> >
> > On 7/8/22 01:10, Jason Gunthorpe wrote:
> >> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> >>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> >>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> >>> the Type1 VFIO driver. Recent development though has added a
> coherency
> >>> capability check to the generic part of VFIO and essentially disabled
> >>> VFIO on PPC64; the similar story about
> iommu_group_dma_owner_claimed().
> >>>
> >>> This adds an iommu_ops stub which reports support for cache
> >>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
> >>> devices,
> >>> this provides minimum code for the probing to not crash.

stale comment since this patch doesn't use bus_set_iommu() now.

> >>> +
> >>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> >>> +  struct device *dev)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>
> >> It is important when this returns that the iommu translation is all
> >> emptied. There should be no left over translations from the DMA API at
> >> this point. I have no idea how power works in this regard, but it
> >> should be explained why this is safe in a comment at a minimum.
> >>
> >  > It will turn into a security problem to allow kernel mappings to leak
> >  > past this point.
> >  >
> >
> > I've added for v2 checking for no valid mappings for a device (or, more
> > precisely, in the associated iommu_group), this domain does not need
> > checking, right?
> 
> 
> Uff, not that simple. Looks like once a device is in a group, its
> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
> then
> there is a way to set those to NULL or do something similar to let
> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

dev->dma_ops is NULL as long as you don't handle DMA domain type
here and don't call iommu_setup_dma_ops().

Given this only supports blocking domain then above should be irrelevant.

> 
> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
> fine to do nothing as tce_iommu_take_ownership() and
> tce_iommu_take_ownership_ddw() take care of not having active DMA
> mappings. Thanks,
> 
> 
> >
> > In general, is "domain" something from hardware or it is a software
> > concept? Thanks,
> >

'domain' is a software concept to represent the hardware I/O page
table. A blocking domain means all DMAs from a device attached to
this domain are blocked/rejected (equivalent to an empty I/O page
table), usually enforced in the .attach_dev() callback. 

Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.

Thanks
Kevin


RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

2022-07-08 Thread Tian, Kevin
> From: Alexey Kardashevskiy 
> Sent: Friday, July 8, 2022 5:46 PM
> 
> >>>
> >>> In general, is "domain" something from hardware or it is a software
> >>> concept? Thanks,
> >>>
> >
> > 'domain' is a software concept to represent the hardware I/O page
> > table. A blocking domain means all DMAs from a device attached to
> > this domain are blocked/rejected (equivalent to an empty I/O page
> > table), usually enforced in the .attach_dev() callback.
> >
> > Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.
> 
> 
> Making it NULL makes __iommu_attach_device() fail, .attach_dev() needs
> to return 0 in this crippled environment. Thanks for explaining the
> rest, good food for thoughts.
> 

Yeah, that's a typo. 😊


RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

2022-07-28 Thread Tian, Kevin
> From: Oliver O'Halloran 
> Sent: Friday, July 29, 2022 10:53 AM
> 
> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy  wrote:
> >
> > *snip*
> >
> > About this. If a platform has a concept of explicit DMA windows (2 or
> > more), is it one domain with 2 windows or 2 domains with one window
> each?
> >
> > If it is 2 windows, iommu_domain_ops misses windows manipulation
> > callbacks (I vaguely remember it being there for embedded PPC64 but
> > cannot find it quickly).
> >
> > If it is 1 window per a domain, then can a device be attached to 2
> > domains at least in theory (I suspect not)?
> >
> > On server POWER CPUs, each DMA window is backed by an independent
> IOMMU
> > page table. (reminder) A window is a bus address range where devices are
> > allowed to DMA to/from ;)
> 
> I've always thought of windows as being entries to a top-level "iommu
> page table" for the device / domain. The fact each window is backed by
> a separate IOMMU page table shouldn't really be relevant outside the
> arch/platform.

Yes. This is what was agreed when discussing how to integrate iommufd
with POWER [1].

One domain represents one address space.

Windows are just constraints on the address space for what ranges can
be mapped.

having two page tables underlying is just kind of POWER specific format.

Thanks
Kevin

[1] https://lore.kernel.org/all/Yns+TCSa6hWbU7wZ@yekko/


RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

2022-07-28 Thread Tian, Kevin
> From: Alexey Kardashevskiy 
> Sent: Friday, July 29, 2022 11:50 AM
> 
> 
> On 29/07/2022 13:10, Tian, Kevin wrote:
> >> From: Oliver O'Halloran 
> >> Sent: Friday, July 29, 2022 10:53 AM
> >>
> >> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy 
> wrote:
> >>>
> >>> *snip*
> >>>
> >>> About this. If a platform has a concept of explicit DMA windows (2 or
> >>> more), is it one domain with 2 windows or 2 domains with one window
> >> each?
> >>>
> >>> If it is 2 windows, iommu_domain_ops misses windows manipulation
> >>> callbacks (I vaguely remember it being there for embedded PPC64 but
> >>> cannot find it quickly).
> >>>
> >>> If it is 1 window per a domain, then can a device be attached to 2
> >>> domains at least in theory (I suspect not)?
> >>>
> >>> On server POWER CPUs, each DMA window is backed by an independent
> >> IOMMU
> >>> page table. (reminder) A window is a bus address range where devices
> are
> >>> allowed to DMA to/from ;)
> >>
> >> I've always thought of windows as being entries to a top-level "iommu
> >> page table" for the device / domain. The fact each window is backed by
> >> a separate IOMMU page table shouldn't really be relevant outside the
> >> arch/platform.
> >
> > Yes. This is what was agreed when discussing how to integrate iommufd
> > with POWER [1].
> >
> > One domain represents one address space.
> >
> > Windows are just constraints on the address space for what ranges can
> > be mapped.
> >
> > having two page tables underlying is just kind of POWER specific format.
> 
> 
> It is a POWER specific thing with one not-so-obvious consequence of each
> window having an independent page size (fixed at the moment or creation)
> and (most likely) different page size, like, 4K vs. 2M.
> 
> 

page size is anyway decided by the iommu driver. Same for other vendors.
the caller (e.g. vfio) just tries to map as many contiguous pages as
possible but in the end it's iommu driver to decide the actual size.