possible dmar_init_reserved_ranges() error

2016-12-19 Thread Bjorn Helgaas
Hi guys,

I have some questions about dmar_init_reserved_ranges().  On systems
where CPU physical address space is not identity-mapped to PCI bus
address space, e.g., where the PCI host bridge windows have _TRA
offsets, I'm not sure we're doing the right thing.

Assume we have a PCI host bridge with _TRA that maps CPU addresses
0x8000-0x9fff to PCI bus addresses 0x-0x1fff, with
two PCI devices below it:

  PCI host bridge domain  [bus 00-3f]
  PCI host bridge window [mem 0x8000-0x9fff] (bus 0x-0x1fff]
  00:00.0: BAR 0 [mem 0x8000-0x8] (0x-0x0fff on bus)
  00:01.0: BAR 0 [mem 0x9000-0x9] (0x1000-0x1fff on bus)

The IOMMU init code in dmar_init_reserved_ranges() reserves the PCI
MMIO space for all devices:

  pci_iommu_init()
intel_iommu_init()
  dmar_init_reserved_ranges()
reserve_iova(0x8000-0x8)
reserve_iova(0x9000-0x9)

This looks odd because we're reserving CPU physical addresses, but
the IOVA space contains *PCI bus* addresses.  On most x86 systems they
would be the same, but not on all.

Assume the driver for 00:00.0 maps a page of main memory for DMA.  It
may receive a dma_addr_t of 0x1000:

  00:00.0: intel_map_page() returns dma_addr_t 0x1000
  00:00.0: issues DMA to 0x1000

What happens here?  The DMA access should go to main memory.  In
conventional PCI it would be a peer-to-peer access to device 00:01.0.
Is there enough PCIe smarts (ACS or something?) to do otherwise?

The dmar_init_reserved_ranges() comment says "Reserve all PCI MMIO to
avoid peer-to-peer access."  Without _TRA, CPU addresses and PCI bus
addresses would be identical, and I think these reserve_iova() calls
*would* prevent this situation.  So maybe we're just missing a
pcibios_resource_to_bus() here?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: possible dmar_init_reserved_ranges() error

2016-12-22 Thread Bjorn Helgaas
On Thu, Dec 22, 2016 at 05:27:14PM +0100, Joerg Roedel wrote:
> Hi Bjorn,
> 
> On Mon, Dec 19, 2016 at 03:20:44PM -0600, Bjorn Helgaas wrote:
> > I have some questions about dmar_init_reserved_ranges().  On systems
> > where CPU physical address space is not identity-mapped to PCI bus
> > address space, e.g., where the PCI host bridge windows have _TRA
> > offsets, I'm not sure we're doing the right thing.
> > 
> > Assume we have a PCI host bridge with _TRA that maps CPU addresses
> > 0x8000-0x9fff to PCI bus addresses 0x-0x1fff, with
> > two PCI devices below it:
> > 
> >   PCI host bridge domain  [bus 00-3f]
> >   PCI host bridge window [mem 0x8000-0x9fff] (bus 
> > 0x-0x1fff]
> >   00:00.0: BAR 0 [mem 0x8000-0x8] (0x-0x0fff on bus)
> >   00:01.0: BAR 0 [mem 0x9000-0x9] (0x1000-0x1fff on bus)
> > 
> > The IOMMU init code in dmar_init_reserved_ranges() reserves the PCI
> > MMIO space for all devices:
> > 
> >   pci_iommu_init()
> > intel_iommu_init()
> >   dmar_init_reserved_ranges()
> > reserve_iova(0x8000-0x8)
> > reserve_iova(0x9000-0x9)
> > 
> > This looks odd because we're reserving CPU physical addresses, but
> > the IOVA space contains *PCI bus* addresses.  On most x86 systems they
> > would be the same, but not on all.
> 
> Interesting, I wasn't aware of that. Looks like we are not doing the
> right thing in dmar_init_reserved_ranges(). How is that handled without
> an IOMMU, when the bus-addresses overlap with ram addresses?

I don't know enough about these systems to answer that.  One way would
be to avoid overlaps, e.g., by using bus addresses
0x8000-0x and not putting RAM at those addresses.  Or
maybe the host bridge could apply a constant offset to bus addresses
before forwarding transactions up to the sytem bus.

> > Assume the driver for 00:00.0 maps a page of main memory for DMA.  It
> > may receive a dma_addr_t of 0x1000:
> > 
> >   00:00.0: intel_map_page() returns dma_addr_t 0x1000
> >   00:00.0: issues DMA to 0x1000
> > 
> > What happens here?  The DMA access should go to main memory.  In
> > conventional PCI it would be a peer-to-peer access to device 00:01.0.
> > Is there enough PCIe smarts (ACS or something?) to do otherwise?
> 
> If there is a bridge doing ACS between the devices, the IOMMU will see
> the request and re-map it to its RAM address.
> 
> > The dmar_init_reserved_ranges() comment says "Reserve all PCI MMIO to
> > avoid peer-to-peer access."  Without _TRA, CPU addresses and PCI bus
> > addresses would be identical, and I think these reserve_iova() calls
> > *would* prevent this situation.  So maybe we're just missing a
> > pcibios_resource_to_bus() here?
> 
> I'll have a look, the AMD IOMMU driver implements this too, so it needs
> also be fixed there. Do you know which x86 systems are configured like
> this?

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4873931cc8c
added this support, and I'm pretty sure it was tested, but I don't
know what machines it was for.  I know many large ia64 systems use
this _TRA support, but I don't have first-hand knowledge of x86
systems that do.

The untested patch below is what I was thinking for the Intel IOMMU
driver.

Bjorn


commit 529a6db0b0b2ff37a0cdb49d11eee4eb6f960a48
Author: Bjorn Helgaas 
Date:   Tue Dec 20 11:08:09 2016 -0600

iommu/vt-d: Reserve IOVA space for bus address, not CPU address

IOVA space contains bus addresses, not CPU addresses.  On many systems they
    are identical, but PCI host bridges in some systems do apply an address
offset when forwarding CPU MMIO transactions to PCI.  In ACPI, this is
expressed as a _TRA offset in the window descriptor.

Convert the PCI resource CPU addresses to PCI bus addresses before
reserving them in the IOVA space.

Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c66c273..be78ab7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1865,6 +1865,7 @@ static struct lock_class_key reserved_rbtree_key;
 static int dmar_init_reserved_ranges(void)
 {
struct pci_dev *pdev = NULL;
+   struct pci_bus_region region;
struct iova *iova;
int i;
 
@@ -1890,9 +1891,11 @@ static int dmar_init_reserved_ranges(void)
r = &pdev->resource[i];
if (!r->flags || !(r->flags & IORESOURCE_MEM))
continue;
+
+

Re: possible dmar_init_reserved_ranges() error

2016-12-22 Thread Bjorn Helgaas
Hi Ashok,

On Thu, Dec 22, 2016 at 03:45:08PM -0800, Raj, Ashok wrote:
> Hi Bjorn
> 
> None in the platform group say they know about this. So i'm fairly sure
> we don't do that on Intel hardware (x86). 

I'm pretty sure there was once an x86 prototype for which PCI bus
addresses were not identical to CPU physical addresses, but I have no
idea whether it shipped that way.

Even if such a system never shipped, the x86 arch code supports _TRA,
and there's no reason to make the unnecessary assumption in this code
that _TRA is always zero.

If we didn't want to use pcibios_resource_to_bus() here for some
reason, we should at least add a comment about why we think it's OK to
use a CPU physical address as an IOVA.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: possible dmar_init_reserved_ranges() error

2016-12-27 Thread Bjorn Helgaas
On Mon, Dec 19, 2016 at 03:20:44PM -0600, Bjorn Helgaas wrote:
> Hi guys,
> 
> I have some questions about dmar_init_reserved_ranges().  On systems
> where CPU physical address space is not identity-mapped to PCI bus
> address space, e.g., where the PCI host bridge windows have _TRA
> offsets, I'm not sure we're doing the right thing.
> 
> Assume we have a PCI host bridge with _TRA that maps CPU addresses
> 0x8000-0x9fff to PCI bus addresses 0x-0x1fff, with
> two PCI devices below it:
> 
>   PCI host bridge domain  [bus 00-3f]
>   PCI host bridge window [mem 0x8000-0x9fff] (bus 
> 0x-0x1fff]
>   00:00.0: BAR 0 [mem 0x8000-0x8] (0x-0x0fff on bus)
>   00:01.0: BAR 0 [mem 0x9000-0x9] (0x1000-0x1fff on bus)
> 
> The IOMMU init code in dmar_init_reserved_ranges() reserves the PCI
> MMIO space for all devices:
> 
>   pci_iommu_init()
> intel_iommu_init()
>   dmar_init_reserved_ranges()
> reserve_iova(0x8000-0x8)
> reserve_iova(0x9000-0x9)
> 
> This looks odd because we're reserving CPU physical addresses, but
> the IOVA space contains *PCI bus* addresses.  On most x86 systems they
> would be the same, but not on all.

While we're looking at this, here's another question.  We do basically
this:

  dmar_init_reserved_ranges()
  {
...
for_each_pci_dev(pdev) {
  for (i = 0; i < PCI_NUM_RESOURCES; i++) {
r = &pdev->resource[i];
reserve_iova(r)

But I assume it's possible to have more than one IOTLB in a system,
so you could have some PCI devices under one IOTLB and others under a
different IOTLB.  So it seems like we should reserve only the IOVA
space used by the devices under *this* IOTLB.

Also, we may hot-add a device under the IOTLB, and I don't see where
we reserve the IOVA space it uses.

I think the best thing to do would be to reserve the host bridge
apertures related to each IOTLB.  That would resolve both questions.
It looks like iova_reserve_pci_windows() does this in the
iommu_dma_init_domain() path.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 07/11] iommu: of: Handle IOMMU lookup failure with deferred probing or error

2017-01-28 Thread Bjorn Helgaas
On Mon, Jan 23, 2017 at 09:48:09PM +0530, Sricharan R wrote:
> From: Laurent Pinchart 
> 
> Failures to look up an IOMMU when parsing the DT iommus property need to
> be handled separately from the .of_xlate() failures to support deferred
> probing.
> 
> The lack of a registered IOMMU can be caused by the lack of a driver for
> the IOMMU, the IOMMU device probe not having been performed yet, having
> been deferred, or having failed.
> 
> The first case occurs when the device tree describes the bus master and
> IOMMU topology correctly but no device driver exists for the IOMMU yet
> or the device driver has not been compiled in. Return NULL, the caller
> will configure the device without an IOMMU.
> 
> The second and third cases are handled by deferring the probe of the bus
> master device which will eventually get reprobed after the IOMMU.
> 
> The last case is currently handled by deferring the probe of the bus
> master device as well. A mechanism to either configure the bus master
> device without an IOMMU or to fail the bus master device probe depending
> on whether the IOMMU is optional or mandatory would be a good
> enhancement.
> 
> Signed-off-by: Laurent Pichart 
> Signed-off-by: Sricharan R 
> ...

> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 349bd1d..9529d6c 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Why do we need this?

>  #include 
>  
>  static const struct of_device_id __iommu_of_table_sentinel
> @@ -223,7 +224,7 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   ops = ERR_PTR(err);
>   }
>  
> - return IS_ERR(ops) ? NULL : ops;
> + return ops;
>  }
>  
>  static int __init of_iommu_init(void)
> @@ -234,7 +235,7 @@ static int __init of_iommu_init(void)
>   for_each_matching_node_and_match(np, matches, &match) {
>   const of_iommu_init_fn init_fn = match->data;
>  
> - if (init_fn(np))
> + if (init_fn && init_fn(np))
>   pr_err("Failed to initialise IOMMU %s\n",
>   of_node_full_name(np));
>   }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 09/11] arm64: dma-mapping: Remove the notifier trick to handle early setting of dma_ops

2017-01-28 Thread Bjorn Helgaas
On Mon, Jan 23, 2017 at 09:48:11PM +0530, Sricharan R wrote:
> With arch_setup_dma_ops now being called late during device's probe after
> the device's iommu is probed, the notifier trick required to handle the
> early setup of dma_ops before the iommu group gets created is not
> required. So removing the notifier's here.

s/notifier's/notifiers/

Personally I'd capitalize "IOMMU" in the English text above, too.

> Acked-by: Will Deacon 
> Signed-off-by: Sricharan R 
> [rm: clean up even more]
> Signed-off-by: Robin Murphy 
> ---
>  arch/arm64/mm/dma-mapping.c | 132 
> 
>  1 file changed, 12 insertions(+), 120 deletions(-)

Nice :)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 10/11] iommu/arm-smmu: Clean up early-probing workarounds

2017-01-28 Thread Bjorn Helgaas
On Mon, Jan 23, 2017 at 09:48:12PM +0530, Sricharan R wrote:
> From: Robin Murphy 
> 
> Now that the appropriate ordering is enforced via profe-deferral of

s/profe-deferral/probe-deferral/

> masters in core code, rip it all out and bask in the simplicity.
> 
> Acked-by: Will Deacon 
> Signed-off-by: Robin Murphy 
> [Sricharan: Rebased on top of ACPI IORT SMMU series]
> Signed-off-by: Sricharan R 
> ---
>  drivers/iommu/arm-smmu-v3.c | 46 ++-
>  drivers/iommu/arm-smmu.c| 58 
> +++--
>  2 files changed, 10 insertions(+), 94 deletions(-)

Yay!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 06/11] of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices

2017-01-28 Thread Bjorn Helgaas
On Mon, Jan 23, 2017 at 09:48:08PM +0530, Sricharan R wrote:
> Configuring DMA ops at probe time will allow deferring device probe when
> the IOMMU isn't available yet. The dma_configure for the device is
> now called from the generic device_attach callback just before the
> bus/driver probe is called. This way, configuring the DMA ops for the
> device would be called at the same place for all bus_types, hence the
> deferred probing mechanism should work for all buses as well.
> 
> pci_bus_add_devices(platform/amba)(_device_create/driver_register)
>| |
> pci_bus_add_device (device_add/driver_register)
>| |
> device_attach   device_initial_probe
>| |
> __device_attach_driver__device_attach_driver
>|
> driver_probe_device
>|
> really_probe
>|
> dma_configure
> 
> Similarly on the device/driver_unregister path __device_release_driver is
> called which inturn calls dma_deconfigure.
> 
> This patch changes the dma ops configuration to probe time for
> both OF and ACPI based platform/amba/pci bus devices.
> 
> Signed-off-by: Sricharan R 

Acked-by: Bjorn Helgaas  (drivers/pci part)

> ---
>  [V6 .. V7]
>   * Updated the subject and commit log as per comments
> 
>  [V5 .. V6]
> * Squashed in patch 10 for configuring the dma ops of
>   ACPI device at probe time from previous post.
> * Fixed a bug in dma_configure pointed out by Robin.
> 
>  drivers/acpi/glue.c |  5 -
>  drivers/base/dd.c   |  9 +
>  drivers/base/dma-mapping.c  | 40 
>  drivers/of/platform.c   |  5 +
>  drivers/pci/probe.c | 28 
>  include/linux/dma-mapping.h |  3 +++
>  6 files changed, 53 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
> index fb19e1c..c05f241 100644
> --- a/drivers/acpi/glue.c
> +++ b/drivers/acpi/glue.c
> @@ -176,7 +176,6 @@ int acpi_bind_one(struct device *dev, struct acpi_device 
> *acpi_dev)
>   struct list_head *physnode_list;
>   unsigned int node_id;
>   int retval = -EINVAL;
> - enum dev_dma_attr attr;
>  
>   if (has_acpi_companion(dev)) {
>   if (acpi_dev) {
> @@ -233,10 +232,6 @@ int acpi_bind_one(struct device *dev, struct acpi_device 
> *acpi_dev)
>   if (!has_acpi_companion(dev))
>   ACPI_COMPANION_SET(dev, acpi_dev);
>  
> - attr = acpi_get_dma_attr(acpi_dev);
> - if (attr != DEV_DMA_NOT_SUPPORTED)
> - acpi_dma_configure(dev, attr);
> -
>   acpi_physnode_link_name(physical_node_name, node_id);
>   retval = sysfs_create_link(&acpi_dev->dev.kobj, &dev->kobj,
>  physical_node_name);
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index a1fbf55..4882f06 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -19,6 +19,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -356,6 +357,10 @@ static int really_probe(struct device *dev, struct 
> device_driver *drv)
>   if (ret)
>   goto pinctrl_bind_failed;
>  
> + ret = dma_configure(dev);
> + if (ret)
> + goto dma_failed;
> +
>   if (driver_sysfs_add(dev)) {
>   printk(KERN_ERR "%s: driver_sysfs_add(%s) failed\n",
>   __func__, dev_name(dev));
> @@ -417,6 +422,8 @@ static int really_probe(struct device *dev, struct 
> device_driver *drv)
>   goto done;
>  
>  probe_failed:
> + dma_deconfigure(dev);
> +dma_failed:
>   if (dev->bus)
>   blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
> @@ -826,6 +833,8 @@ static void __device_release_driver(struct device *dev, 
> struct device *parent)
>   drv->remove(dev);
>  
>   device_links_driver_cleanup(dev);
> + dma_deconfigure(dev);
> +
>   devres_release_all(dev);
>   dev->driver = NULL;
>   dev_set_drvdata(dev, NULL);
> diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
> index efd71cf..449b948 100644
> --- a/drivers/base/dma-mapping.c
> +++ b/drivers/base/dma-mapping.c
> @@ -7,9 +7,11 @@
>   * This file is released under the GPLv2.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include

Re: [PATCH V7 00/11] IOMMU probe deferral support

2017-01-28 Thread Bjorn Helgaas
On Mon, Jan 23, 2017 at 09:48:02PM +0530, Sricharan R wrote:
> This series calls the dma ops configuration for the devices
> at a generic place so that it works for all busses.
> The dma_configure_ops for a device is now called during
> the device_attach callback just before the probe of the
> bus/driver is called. Similarly dma_deconfigure is called during
> device/driver_detach path.
> ...

>  arch/arm64/mm/dma-mapping.c   | 132 
> --
>  drivers/acpi/arm64/iort.c |  40 +++-
>  drivers/acpi/glue.c   |   5 --
>  drivers/acpi/scan.c   |   7 +-
>  drivers/base/dd.c |   9 +++
>  drivers/base/dma-mapping.c|  41 
>  drivers/iommu/arm-smmu-v3.c   |  46 +
>  drivers/iommu/arm-smmu.c  |  58 +++--
>  drivers/iommu/of_iommu.c  | 114 +++-
>  drivers/of/address.c  |  20 +-
>  drivers/of/device.c   |  36 ++-
>  drivers/of/platform.c |  10 +--
>  drivers/pci/probe.c   |  28 
>  include/acpi/acpi_bus.h   |   2 +-
>  include/asm-generic/vmlinux.lds.h |   1 -
>  include/linux/acpi.h  |   7 +-
>  include/linux/acpi_iort.h |   3 -
>  include/linux/dma-mapping.h   |   3 +
>  include/linux/of_device.h |  10 ++-
>  19 files changed, 252 insertions(+), 320 deletions(-)

I'm assuming this will go via some other tree, maybe the IOMMU tree?
I acked the PCI parts, so let me know if you need anything more from me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Partial BAR Address Allocation

2017-02-22 Thread Bjorn Helgaas
[+cc Joerg, iommu list]

On Wed, Feb 22, 2017 at 03:44:53PM -0500, Sinan Kaya wrote:
> On 2/22/2017 1:44 PM, Bjorn Helgaas wrote:
> > There is no way for a driver to say "I only need this memory BAR and
> > not the other ones."  The reason is because the PCI_COMMAND_MEMORY bit
> > enables *all* the memory BARs; there's no way to enable memory BARs
> > selectively.  If we enable memory BARs and one of them is unassigned,
> > that unassigned BAR is enabled, and the device will respond at
> > whatever address the register happens to contain, and that may cause
> > conflicts.
> > 
> > I'm not sure this answers your question.  Do you want to get rid of
> > 32-bit BAR addresses because your host bridge doesn't have a window to
> > 32-bit PCI addresses?  It's typical for a bridge to support a window
> > to the 32-bit PCI space as well as one to the 64-bit PCI space.  Often
> > it performs address translation for the 32-bit window so it doesn't
> > have to be in the 32-bit area on the CPU side, e.g., you could have
> > something like this where we have three host bridges and the 2-4GB
> > space on each PCI root bus is addressable:
> > 
> >   pci_bus :00: root bus resource [mem 0x108000-0x10] (bus 
> > address [0x8000-0x])
> >   pci_bus 0001:00: root bus resource [mem 0x118000-0x11] (bus 
> > address [0x8000-0x])
> >   pci_bus 0002:00: root bus resource [mem 0x128000-0x12] (bus 
> > address [0x8000-0x])
> 
> The problem is that according to PCI specification BAR addresses and
> DMA addresses cannot overlap.
> 
> From PCI-to-PCI Bridge Arch. spec.: "A bridge forwards PCI memory
> transactions from its primary interface to its secondary interface
> (downstream) if a memory address is in the range defined by the
> Memory Base and Memory Limit registers (when the base is less than
> or equal to the limit) as illustrated in Figure 4-3. Conversely, a
> memory transaction on the secondary interface that is within this
> address range will not be forwarded upstream to the primary
> interface."
> 
> To be specific, if your DMA address happens to be in
> [0x8000-0x] and root port's aperture includes this
> range; the DMA will never make to the system memory.
> 
> Lorenzo and Robin took some steps to carve out PCI addresses out of
> DMA addresses in IOMMU drivers by using iova_reserve_pci_windows()
> function.
> 
> However, I see that we are still exposed when the operating system
> doesn't have any IOMMU driver and is using the SWIOTLB for instance. 

Hmmm.  I guess SWIOTLB assumes there's no address translation in the
DMA direction, right?  If there's no address translation in the PIO
direction, PCI bus BAR addresses are identical to the CPU-side
addresses.  In that case, there's no conflict because we already have
to assign BARs so they never look like a system memory address.

But if there *is* address translation in the PIO direction, we can
have conflicts because the bridge can translate CPU-side PIO accesses
to arbitrary PCI bus addresses.

> The FW solution I'm looking at requires carving out some part of the
> DDR from before OS boot so that OS doesn't reclaim that area for
> DMA.

If you want to reach system RAM, I guess you need to make sure you
only DMA to bus addresses outside the host bridge windows, as you said
above.  DMA inside the windows would be handled as peer-to-peer DMA.

> I'm not very happy with this solution. I'm also surprised that there
> is no generic solution in the kernel takes care of this for all root
> ports regardless of IOMMU driver presence.

The PCI core isn't really involved in allocating DMA addresses,
although there definitely is the connection with PCI-to-PCI bridge
windows that you mentioned.  I added IOMMU guys, who would know a lot
more than I do.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 03/30] PCI: Move ATS declarations outside of CONFIG_PCI

2017-03-03 Thread Bjorn Helgaas
On Mon, Feb 27, 2017 at 07:54:14PM +, Jean-Philippe Brucker wrote:
> Currently ATS helpers like pci_enable_ats are only defined when CONFIG_PCI
> is enabled. The ARM SMMU driver might get built with CONFIG_PCI disabled.
> It would thus have to wrap any use of ATS helpers around #ifdef
> CONFIG_PCI, which isn't ideal.
> 
> A nicer solution is to always define these helpers. Since CONFIG_PCI_ATS
> is only enabled in association with CONFIG_PCI, move defines outside of
> CONFIG_PCI to prevent build failure when PCI is disabled.
> 
> Signed-off-by: Jean-Philippe Brucker 

I don't think there's any reason to make a pci_ats_init() stub when
CONFIG_PCI is not enabled, because it's only called from the PCI core.
But it does make some sense to keep them all together in one place.

I think you could also remove the #ifdef CONFIG_PCI_ATS in
arm_smmu_enable_ats() ("[RFC PATCH 04/30] iommu/arm-smmu-v3: Add
support for PCI ATS"), right?

If you remove the #ifdef, we'll call pci_enable_ats(), and it will
fail if !pdev->ats_cap.

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci.h | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 282ed32244ce..e606f289bf5f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1418,19 +1418,6 @@ int  ht_create_irq(struct pci_dev *dev, int idx);
>  void ht_destroy_irq(unsigned int irq);
>  #endif /* CONFIG_HT_IRQ */
>  
> -#ifdef CONFIG_PCI_ATS
> -/* Address Translation Service */
> -void pci_ats_init(struct pci_dev *dev);
> -int pci_enable_ats(struct pci_dev *dev, int ps);
> -void pci_disable_ats(struct pci_dev *dev);
> -int pci_ats_queue_depth(struct pci_dev *dev);
> -#else
> -static inline void pci_ats_init(struct pci_dev *d) { }
> -static inline int pci_enable_ats(struct pci_dev *d, int ps) { return 
> -ENODEV; }
> -static inline void pci_disable_ats(struct pci_dev *d) { }
> -static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
> -#endif
> -
>  #ifdef CONFIG_PCIE_PTM
>  int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
>  #else
> @@ -1616,6 +1603,19 @@ static inline int pci_get_new_domain_nr(void) { return 
> -ENOSYS; }
>  #define dev_is_pf(d) (false)
>  #endif /* CONFIG_PCI */
>  
> +#ifdef CONFIG_PCI_ATS
> +/* Address Translation Service */
> +void pci_ats_init(struct pci_dev *dev);
> +int pci_enable_ats(struct pci_dev *dev, int ps);
> +void pci_disable_ats(struct pci_dev *dev);
> +int pci_ats_queue_depth(struct pci_dev *dev);
> +#else
> +static inline void pci_ats_init(struct pci_dev *d) { }
> +static inline int pci_enable_ats(struct pci_dev *d, int ps) { return 
> -ENODEV; }
> +static inline void pci_disable_ats(struct pci_dev *d) { }
> +static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
> +#endif
> +
>  /* Include architecture-dependent settings and functions */
>  
>  #include 
> -- 
> 2.11.0
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 18/30] PCI: Make "PRG Response PASID Required" handling common

2017-03-03 Thread Bjorn Helgaas
On Mon, Feb 27, 2017 at 07:54:29PM +, Jean-Philippe Brucker wrote:
> The PASID ECN to the PCIe spec added a bit in the PRI status register that
> allows a Function to declare whether a PRG Response should contain the
> PASID prefix or not.
> 
> Move the helper that accesses it from amd_iommu into the PCI subsystem,
> renaming it to something more consistent with the spec, and introducing
> another obscure acronym to make it all fit.

Maybe mention the acronym itelf and spell it out here?

> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/iommu/amd_iommu.c | 19 +--
>  drivers/pci/ats.c | 17 +
>  include/linux/pci-ats.h   |  8 
>  include/uapi/linux/pci_regs.h |  1 +
>  4 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 98940d1392cb..c5c598bf4ba3 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2024,23 +2024,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev)
>   return ret;
>  }
>  
> -/* FIXME: Move this to PCI code */
> -#define PCI_PRI_TLP_OFF  (1 << 15)
> -
> -static bool pci_pri_tlp_required(struct pci_dev *pdev)
> -{
> - u16 status;
> - int pos;
> -
> - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> - if (!pos)
> - return false;
> -
> - pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> -
> - return (status & PCI_PRI_TLP_OFF) ? true : false;
> -}
> -
>  /*
>   * If a device is not yet associated with a domain, this function
>   * assigns it visible for the hardware
> @@ -2069,7 +2052,7 @@ static int attach_device(struct device *dev,
>  
>   dev_data->ats.enabled = true;
>   dev_data->ats.qdep= pci_ats_queue_depth(pdev);
> - dev_data->pri_tlp = pci_pri_tlp_required(pdev);
> + dev_data->pri_tlp = 
> pci_prg_resp_requires_prefix(pdev);
>   }
>   } else if (amd_iommu_iotlb_sup &&
>  pci_enable_ats(pdev, PAGE_SHIFT) == 0) {
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index eeb9fb2b47aa..331376e9bb8b 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -334,3 +334,20 @@ int pci_max_pasids(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_max_pasids);
>  #endif /* CONFIG_PCI_PASID */
> +
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> + u16 status;
> + int pos;
> +
> + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> + if (!pos)
> + return false;
> +
> + pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +
> + return !!(status & PCI_PRI_STATUS_PRPR);
> +}
> +EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix);
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 57e0b8250947..e21bcacbe80c 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -57,5 +57,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>  
>  #endif /* CONFIG_PCI_PASID */
>  
> +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI)
> +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev);
> +#else
> +static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev)
> +{
> + return false;
> +}
> +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */
>  
>  #endif /* LINUX_PCI_ATS_H*/
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 634c9c44ed6c..bae815876be6 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -864,6 +864,7 @@
>  #define  PCI_PRI_STATUS_RF   0x001   /* Response Failure */
>  #define  PCI_PRI_STATUS_UPRGI0x002   /* Unexpected PRG index */
>  #define  PCI_PRI_STATUS_STOPPED  0x100   /* PRI Stopped */
> +#define  PCI_PRI_STATUS_PRPR 0x8000  /* PRG Response requires PASID prefix */
>  #define PCI_PRI_MAX_REQ  0x08/* PRI max reqs supported */
>  #define PCI_PRI_ALLOC_REQ0x0c/* PRI max reqs allowed */
>  #define PCI_EXT_CAP_PRI_SIZEOF   16
> -- 
> 2.11.0
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 19/30] PCI: Cache PRI and PASID bits in pci_dev

2017-03-03 Thread Bjorn Helgaas
On Mon, Feb 27, 2017 at 07:54:30PM +, Jean-Philippe Brucker wrote:
> Device drivers need to check if an IOMMU enabled ATS, PRI and PASID in
> order to know when they can use the SVM API. Cache PRI and PASID bits in
> the pci_dev structure, similarly to what is currently done for ATS.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c   | 23 +++
>  include/linux/pci.h |  2 ++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 331376e9bb8b..486dc2208119 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -153,6 +153,9 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   u32 max_requests;
>   int pos;
>  
> + if (WARN_ON(pdev->pri_enabled))
> + return -EBUSY;
> +
>   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>   if (!pos)
>   return -EINVAL;
> @@ -170,6 +173,8 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   control |= PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
> + pdev->pri_enabled = 1;
> +
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pri);
> @@ -185,6 +190,9 @@ void pci_disable_pri(struct pci_dev *pdev)
>   u16 control;
>   int pos;
>  
> + if (WARN_ON(!pdev->pri_enabled))
> + return;
> +
>   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>   if (!pos)
>   return;
> @@ -192,6 +200,8 @@ void pci_disable_pri(struct pci_dev *pdev)
>   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
>   control &= ~PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +
> + pdev->pri_enabled = 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
> @@ -207,6 +217,9 @@ int pci_reset_pri(struct pci_dev *pdev)
>   u16 control;
>   int pos;
>  
> + if (WARN_ON(pdev->pri_enabled))
> + return -EBUSY;
> +
>   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
>   if (!pos)
>   return -EINVAL;
> @@ -239,6 +252,9 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   u16 control, supported;
>   int pos;
>  
> + if (WARN_ON(pdev->pasid_enabled))
> + return -EBUSY;
> +
>   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
>   if (!pos)
>   return -EINVAL;
> @@ -259,6 +275,8 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
>  
> + pdev->pasid_enabled = 1;
> +
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_enable_pasid);
> @@ -273,11 +291,16 @@ void pci_disable_pasid(struct pci_dev *pdev)
>   u16 control = 0;
>   int pos;
>  
> + if (WARN_ON(!pdev->pasid_enabled))
> + return;
> +
>   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
>   if (!pos)
>   return;
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +
> + pdev->pasid_enabled = 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e606f289bf5f..47c353ca9957 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -351,6 +351,8 @@ struct pci_dev {
>   unsigned intmsix_enabled:1;
>   unsigned intari_enabled:1;  /* ARI forwarding */
>   unsigned intats_enabled:1;  /* Address Translation Service */
> + unsigned intpasid_enabled:1;/* Process Address Space ID */
> + unsigned intpri_enabled:1;  /* Page Request Interface */
>   unsigned intis_managed:1;
>   unsigned intneeds_freset:1; /* Dev requires fundamental reset */
>   unsigned intstate_saved:1;
> -- 
> 2.11.0
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Partial BAR Address Allocation

2017-03-10 Thread Bjorn Helgaas
On Mon, Mar 06, 2017 at 12:04:39PM +0100, Joerg Roedel wrote:
> On Wed, Feb 22, 2017 at 05:39:44PM -0600, Bjorn Helgaas wrote:
> > [+cc Joerg, iommu list]
> > 
> > On Wed, Feb 22, 2017 at 03:44:53PM -0500, Sinan Kaya wrote:
> > > On 2/22/2017 1:44 PM, Bjorn Helgaas wrote:
> > > > There is no way for a driver to say "I only need this memory BAR and
> > > > not the other ones."  The reason is because the PCI_COMMAND_MEMORY bit
> > > > enables *all* the memory BARs; there's no way to enable memory BARs
> > > > selectively.  If we enable memory BARs and one of them is unassigned,
> > > > that unassigned BAR is enabled, and the device will respond at
> > > > whatever address the register happens to contain, and that may cause
> > > > conflicts.
> 
> Hmm, maybe I am missing something, but isn't this only a problem if the
> 'unassigned' BAR as an address configured that also falls into the
> Bridge-Window of the parent bridge? Otherwise no requests should be
> routed to the BAR anyway, right?

I guess it's true that we could safely enable a memory BAR if the
upstream bridge would never route anything to it.

But it would depend on the size of the BAR and the upstream bridge's
configuration, so it doesn't feel like it would really be reliable in
general.

> > But if there *is* address translation in the PIO direction, we can
> > have conflicts because the bridge can translate CPU-side PIO accesses
> > to arbitrary PCI bus addresses.
> 
> I am not aware of any hardware that does translation on the PIO space.
> The IOMMUs I know of don't care about PIO at all.

Right, address translation in the PIO direction would be done by the
host bridge, not the IOMMU.  There are a fair number of bridges that
do this -- basically all the callers of pci_add_resource_offset().
They just apply a constant offset, often by chopping off some
high-order bits of the CPU address.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-10 Thread Bjorn Helgaas
Hi Jayachandran,

On Mon, Apr 03, 2017 at 01:15:04PM +, Jayachandran C wrote:
> The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> topology is slightly unusual. For a multi-node system, it looks like:
> 
> [node level PCI bridges - one per node]
> [SoC PCI devices with MSI-X but no IOMMU]
> [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> [PCIe real root ports associated with IOMMU and GICv3 ITS]
> [External PCI devices connected to PCIe links]
> 
> The top two levels of bridges should have introduced aliases since they
> are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> In the case of external PCIe devices, the "real" root ports are connected
> to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> node level bridges do not introduce an alias either.
> 
> To handle this quirk, we mark the real PCIe root ports and node level
> PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> pci_for_each_dma_alias() works correctly for external PCIe devices and
> SoC PCI devices.
> 
> For the current revision of Cavium ThunderX2, the VendorID and Device ID
> are from Broadcom Vulcan (14e4:90XX).

Can you supply some text here about why we want to apply this patch?
E.g., does it avoid making unnecessary IOMMU mappings, improve
performance, avoid a crash, etc?

> Signed-off-by: Jayachandran C 
> ---
>  drivers/pci/quirks.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..564a84a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3958,6 +3958,20 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2260, 
> quirk_mic_x200_dma_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2264, 
> quirk_mic_x200_dma_alias);
>  
>  /*
> + * The IOMMU and interrupt controller on Broadcom Vulcan/Cavium ThunderX2 are
> + * associated not at the root bus, but at a bridge below. This quirk flag
> + * will ensure that the aliases are identified correctly.
> + */
> +static void quirk_bridge_cavm_thrx2_pcie_root(struct pci_dev *pdev)
> +{
> + pdev->dev_flags |= PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9000,
> + quirk_bridge_cavm_thrx2_pcie_root);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_BROADCOM, 0x9084,
> + quirk_bridge_cavm_thrx2_pcie_root);
> +
> +/*
>   * Intersil/Techwell TW686[4589]-based video capture cards have an empty 
> (zero)
>   * class code.  Fix it.
>   */
> -- 
> 2.7.4
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-11 Thread Bjorn Helgaas
[+cc Joerg]

On Tue, Apr 11, 2017 at 07:10:48AM +, Jayachandran C wrote:
> On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > Hi Jayachandran,
> > 
> > On Mon, Apr 03, 2017 at 01:15:04PM +, Jayachandran C wrote:
> > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the PCI
> > > topology is slightly unusual. For a multi-node system, it looks like:
> > > 
> > > [node level PCI bridges - one per node]
> > > [SoC PCI devices with MSI-X but no IOMMU]
> > > [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > [External PCI devices connected to PCIe links]
> > > 
> > > The top two levels of bridges should have introduced aliases since they
> > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do not.
> > > In the case of external PCIe devices, the "real" root ports are connected
> > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > alias. The SoC PCI devices are directly connected to the GIC ITS, so the
> > > node level bridges do not introduce an alias either.
> > > 
> > > To handle this quirk, we mark the real PCIe root ports and node level
> > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > SoC PCI devices.
> > > 
> > > For the current revision of Cavium ThunderX2, the VendorID and Device ID
> > > are from Broadcom Vulcan (14e4:90XX).
> > 
> > Can you supply some text here about why we want to apply this patch?
> > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > performance, avoid a crash, etc?
> 
> If this is for the commit message, I hope the following is ok:
> 
> "With this change, both MSI-X and IO virtualization work correctly on
> Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> devices, and the IOMMU groups are setup correctly."

This doesn't get at what the actual problem is.  I'm hoping for
something like "without this change, we set up an IOMMU mapping for
requestor ID X, but device DMA uses requestor ID Y because , which
results in an IOMMU fault"

I've been puzzling over the fact that most of the callers of
pci_for_each_dma_alias() don't seem to use it correctly.  For Intel
IOMMUs, domain_context_mapping() uses it to add a mapping for every
possible alias.  But most of the other callers only look at the last
alias and ignore all the others.  That might work most of the time,
but:

  - There's no guarantee that pci_for_each_dma_alias() iterates in any
particular order, so relying on the current order is fragile,

  - The pci_add_dma_alias() interface allows an arbitrary number of
aliases (as long as they're all on the same bus), and some devices
do use more than one, e.g., quirk_dma_func0_alias(),
quirk_mic_x200_dma_alias(),

  - pci_for_each_dma_alias() translates the rules in the PCIe to
PCI/PCI-X Bridge spec, r1.0, sec 2.3, about taking ownership into
aliases.  I think it's important to pay attention to *every*
possible alias, not just the last one.

I suspect the reason this patch makes a difference is because the
current pci_for_each_dma_alias() believes one of those top-level
bridges is an alias, and the iterator produces it last, so that's the
one you map.  The IOMMU is attached lower down, so that top-level
bridge is not in fact an alias, but since you only look at the *last*
one, you don't map the correct aliases from lower down in the tree.

Stopping the iterator earlier happens to make the last alias be one of
the correct ones, but it doesn't solve the problems of quirked devices
that can use multiple requester IDs, and it doesn't solve the problem
of PCIe-to-PCI bridges that optionally take ownership of transactions.

> I can send out a new patch if needed.
> 
> The on chip SATA and USB use MSI-X, so this is needed for basic
> functionality of the platform.

No need for a new patch; I can integrate something into the changelog.

> > > Signed-off-by: Jayachandran C 
> > > ---
> > >  drivers/pci/quirks.c | 14 ++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6736836..564a84a 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -3958,6 +3958,20 @

Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk

2017-04-11 Thread Bjorn Helgaas
[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.

On Mon, Apr 03, 2017 at 01:15:02PM +, Jayachandran C wrote:
> Hi Bjorn, Alex,
> 
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
> 
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
> 
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
> 
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
> 
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
> 
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>tree, I have to fixup 6 callers (which is all but one ofthe callers
>outside x86)
>  - 4 of these can be reasonably handled (please see the github repo above),
>but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>But pci_for_each_dma_alias does not work as expected on this platform
>and we have to be aware of that for all future uses of the function.
>   
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
> 
> v3->v4:
>  - new address of author
> 
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
> 
> Let me know your comments and suggestions.
> 
> Thanks,
> JC.
> 
> 
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
> 
>  drivers/pci/quirks.c | 14 ++
>  drivers/pci/search.c |  4 
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
> 
> -- 
> 2.7.4
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/2] Handle Cavium ThunderX2 PCI topology quirk

2017-04-11 Thread Bjorn Helgaas
On Apr 11, 2017 8:48 AM, "Bjorn Helgaas"  wrote:

[+cc David]

I forgot to mention that I'm also hoping for an ack from David, since
he's listed as the maintainer of the ThunderX drivers.


Never mind this, Jon pointed out that ThunderX2 is different than
ThunderX.  Sorry for the noise, David.

On Mon, Apr 03, 2017 at 01:15:02PM +, Jayachandran C wrote:
> Hi Bjorn, Alex,
>
> Sending this again (with a trivial fix to author name), please review.
> Updated summary below:
>
> Here is v4 of the patchset to handle the PCIe topology quirk of Cavium
> ThunderX2 systems (previously known as Broadcom Vulcan).
>
> The earlier discussions on this can be seen at:
> http://www.spinics.net/lists/linux-pci/msg51001.html
> https://patchwork.ozlabs.org/patch/582633/ and
> https://lists.linuxfoundation.org/pipermail/iommu/2016-June/017681.html
>
> The earlier discussion on this patchset ended with a suggestion that it
> may be possible to fix up this quirk by handling the issue in the
> function argument of pci_for_each_dma_alias(). But at that point we did
> not have the codebase to make the changes since the full ACPI and OF code
> for SMMU and GIC ITS was not upstream.
>
> Now that the changes are upstream, I tried to fix it in both the SMMU
> and the GIC ITS code based on this suggestion, the changes needed are at:
>  https://github.com/jchandra-cavm/linux/commits/rid-xlate-fixup
>
> The problems with this approach are:
>  - of the 14 uses of pci_for_each_dma_alias in the function in the kernel
>tree, I have to fixup 6 callers (which is all but one ofthe callers
>outside x86)
>  - 4 of these can be reasonably handled (please see the github repo
above),
>but the calls in drivers/irqchip/irq-gic-v3-its-pci-msi.c and
>drivers/iommu/iommu.c cannot be reasonably fixed up.
>  - Even without the 2 above two changes I can get it to work for now.
>But pci_for_each_dma_alias does not work as expected on this platform
>and we have to be aware of that for all future uses of the function.
>
> For now, I have ruled out the approach, and I have rebased the earlier
> patch on to 4.11-rc and submitting again for review. The changes are:
>
> v3->v4:
>  - new address of author
>
> v2>v3:
>  - changed device flag name from PCI_DEV_FLAGS_DMA_ALIAS_ROOT to
>PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>  - updated commit message to make the quirk clearer.
>
> Let me know your comments and suggestions.
>
> Thanks,
> JC.
>
>
> Jayachandran C (2):
>   PCI: Add device flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT
>   PCI: quirks: Fix ThunderX2 dma alias handling
>
>  drivers/pci/quirks.c | 14 ++
>  drivers/pci/search.c |  4 
>  include/linux/pci.h  |  2 ++
>  3 files changed, 20 insertions(+)
>
> --
> 2.7.4
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-12 Thread Bjorn Helgaas
On Tue, Apr 11, 2017 at 03:27:02PM +, Jayachandran C wrote:
> On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > [+cc Joerg]
> > 
> > On Tue, Apr 11, 2017 at 07:10:48AM +, Jayachandran C wrote:
> > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > Hi Jayachandran,
> > > > 
> > > > On Mon, Apr 03, 2017 at 01:15:04PM +, Jayachandran C wrote:
> > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the 
> > > > > PCI
> > > > > topology is slightly unusual. For a multi-node system, it looks like:
> > > > > 
> > > > > [node level PCI bridges - one per node]
> > > > > [SoC PCI devices with MSI-X but no IOMMU]
> > > > > [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > [External PCI devices connected to PCIe links]
> > > > > 
> > > > > The top two levels of bridges should have introduced aliases since 
> > > > > they
> > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they do 
> > > > > not.
> > > > > In the case of external PCIe devices, the "real" root ports are 
> > > > > connected
> > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not introduce an
> > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, so 
> > > > > the
> > > > > node level bridges do not introduce an alias either.
> > > > > 
> > > > > To handle this quirk, we mark the real PCIe root ports and node level
> > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With this,
> > > > > pci_for_each_dma_alias() works correctly for external PCIe devices and
> > > > > SoC PCI devices.
> > > > > 
> > > > > For the current revision of Cavium ThunderX2, the VendorID and Device 
> > > > > ID
> > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > 
> > > > Can you supply some text here about why we want to apply this patch?
> > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > performance, avoid a crash, etc?
> > > 
> > > If this is for the commit message, I hope the following is ok:
> > > 
> > > "With this change, both MSI-X and IO virtualization work correctly on
> > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > devices, and the IOMMU groups are setup correctly."
> > 
> > This doesn't get at what the actual problem is.  I'm hoping for
> > something like "without this change, we set up an IOMMU mapping for
> > requestor ID X, but device DMA uses requestor ID Y because , which
> > results in an IOMMU fault"
> 
> Ok. I hope this would be better:
> 
> "Without this change, the last alias seen while traversing the PCI
> hierarchy will be used as the RID to generate the device ID for ITS
> and stream ID for SMMU. This in turn causes the MSI-X generated by the
> device to fail since the ITS expects to have translation tables based
> on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> device DMA also fails when SMMU is enabled due to incorrect value in
> SMMU translation tables"

This description is true, but I don't think it addresses the real
problem.  I think the real problem is that your IOMMU code doesn't
handle aliases correctly, and by ignoring these invalid aliases, we
happen to map an alias that works for the builtin devices.  But that's
only because we got lucky (those devices use a single RID and they're
not behind bridges that optionally take ownership).

It would make sense to me if we fixed the IOMMU code to map *all* the
aliases, which should be enough to make your devices work.  If we then
wanted to apply a patch like this on top, it would be simply an
optimization that avoids unnecessary IOMMU mappings.

> > I suspect the reason this patch makes a difference is because the
> > current pci_for_each_dma_alias() believes one of those top-level
> > bridges is an alias, and the iterator produces it last, so that's the
> > one you map.  The IOMMU is attached lower down, so that top-level
> > bridge is not in fact an alias, but since you only look at the *last*
> > one, you 

Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-12 Thread Bjorn Helgaas
On Wed, Apr 12, 2017 at 06:10:34PM +, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2017 at 03:27:02PM +, Jayachandran C wrote:
> > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > [+cc Joerg]
> > > > 
> > > > On Tue, Apr 11, 2017 at 07:10:48AM +, Jayachandran C wrote:
> > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > Hi Jayachandran,
> > > > > > 
> > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +, Jayachandran C wrote:
> > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), 
> > > > > > > the PCI
> > > > > > > topology is slightly unusual. For a multi-node system, it looks 
> > > > > > > like:
> > > > > > > 
> > > > > > > [node level PCI bridges - one per node]
> > > > > > > [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > [PCI-PCIe "glue" bridges - upto 14, one per real port below]
> > > > > > > [PCIe real root ports associated with IOMMU and GICv3 ITS]
> > > > > > > [External PCI devices connected to PCIe links]
> > > > > > > 
> > > > > > > The top two levels of bridges should have introduced aliases 
> > > > > > > since they
> > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 they 
> > > > > > > do not.
> > > > > > > In the case of external PCIe devices, the "real" root ports are 
> > > > > > > connected
> > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not 
> > > > > > > introduce an
> > > > > > > alias. The SoC PCI devices are directly connected to the GIC ITS, 
> > > > > > > so the
> > > > > > > node level bridges do not introduce an alias either.
> > > > > > > 
> > > > > > > To handle this quirk, we mark the real PCIe root ports and node 
> > > > > > > level
> > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  With 
> > > > > > > this,
> > > > > > > pci_for_each_dma_alias() works correctly for external PCIe 
> > > > > > > devices and
> > > > > > > SoC PCI devices.
> > > > > > > 
> > > > > > > For the current revision of Cavium ThunderX2, the VendorID and 
> > > > > > > Device ID
> > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > 
> > > > > > Can you supply some text here about why we want to apply this patch?
> > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > performance, avoid a crash, etc?
> > > > > 
> > > > > If this is for the commit message, I hope the following is ok:
> > > > > 
> > > > > "With this change, both MSI-X and IO virtualization work correctly on
> > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs for PCI
> > > > > devices, and the IOMMU groups are setup correctly."
> > > > 
> > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > something like "without this change, we set up an IOMMU mapping for
> > > > requestor ID X, but device DMA uses requestor ID Y because , which
> > > > results in an IOMMU fault"
> > > 
> > > Ok. I hope this would be better:
> > > 
> > > "Without this change, the last alias seen while traversing the PCI
> > > hierarchy will be used as the RID to generate the device ID for ITS
> > > and stream ID for SMMU. This in turn causes the MSI-X generated by the
> > > device to fail since the ITS expects to have translation tables based
> > > on the actual PCIe RID and not the (irrelevant) alias. Similarly, the
> > > device DMA also fails when SMMU is enabled due to incorrect value in
> > > SMMU translation tables"
> > 
> > This description is true, but I don't think it addresses the real
> > problem.  I think the real p

Re: [PATCH v4 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-12 Thread Bjorn Helgaas
On Wed, Apr 12, 2017 at 08:41:20PM +, Jayachandran C wrote:
> On Wed, Apr 12, 2017 at 02:11:38PM -0500, Bjorn Helgaas wrote:
> > On Wed, Apr 12, 2017 at 06:10:34PM +, Jayachandran C wrote:
> > > On Wed, Apr 12, 2017 at 11:21:18AM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Apr 11, 2017 at 03:27:02PM +, Jayachandran C wrote:
> > > > > On Tue, Apr 11, 2017 at 08:41:25AM -0500, Bjorn Helgaas wrote:
> > > > > > [+cc Joerg]
> > > > > > 
> > > > > > On Tue, Apr 11, 2017 at 07:10:48AM +, Jayachandran C wrote:
> > > > > > > On Mon, Apr 10, 2017 at 08:28:47PM -0500, Bjorn Helgaas wrote:
> > > > > > > > Hi Jayachandran,
> > > > > > > > 
> > > > > > > > On Mon, Apr 03, 2017 at 01:15:04PM +, Jayachandran C wrote:
> > > > > > > > > The Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan 
> > > > > > > > > earlier), the PCI
> > > > > > > > > topology is slightly unusual. For a multi-node system, it 
> > > > > > > > > looks like:
> > > > > > > > > 
> > > > > > > > > [node level PCI bridges - one per node]
> > > > > > > > > [SoC PCI devices with MSI-X but no IOMMU]
> > > > > > > > > [PCI-PCIe "glue" bridges - upto 14, one per real port 
> > > > > > > > > below]
> > > > > > > > > [PCIe real root ports associated with IOMMU and GICv3 
> > > > > > > > > ITS]
> > > > > > > > > [External PCI devices connected to PCIe links]
> > > > > > > > > 
> > > > > > > > > The top two levels of bridges should have introduced aliases 
> > > > > > > > > since they
> > > > > > > > > are PCI and PCI/PCIe bridges, but in the case of ThunderX2 
> > > > > > > > > they do not.
> > > > > > > > > In the case of external PCIe devices, the "real" root ports 
> > > > > > > > > are connected
> > > > > > > > > to the SMMU and the GIC ITS, so PCI-PCIe bridge does not 
> > > > > > > > > introduce an
> > > > > > > > > alias. The SoC PCI devices are directly connected to the GIC 
> > > > > > > > > ITS, so the
> > > > > > > > > node level bridges do not introduce an alias either.
> > > > > > > > > 
> > > > > > > > > To handle this quirk, we mark the real PCIe root ports and 
> > > > > > > > > node level
> > > > > > > > > PCI bridges with the flag PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT.  
> > > > > > > > > With this,
> > > > > > > > > pci_for_each_dma_alias() works correctly for external PCIe 
> > > > > > > > > devices and
> > > > > > > > > SoC PCI devices.
> > > > > > > > > 
> > > > > > > > > For the current revision of Cavium ThunderX2, the VendorID 
> > > > > > > > > and Device ID
> > > > > > > > > are from Broadcom Vulcan (14e4:90XX).
> > > > > > > > 
> > > > > > > > Can you supply some text here about why we want to apply this 
> > > > > > > > patch?
> > > > > > > > E.g., does it avoid making unnecessary IOMMU mappings, improve
> > > > > > > > performance, avoid a crash, etc?
> > > > > > > 
> > > > > > > If this is for the commit message, I hope the following is ok:
> > > > > > > 
> > > > > > > "With this change, both MSI-X and IO virtualization work 
> > > > > > > correctly on
> > > > > > > Cavium ThunderX2. The GIC ITS driver gets the correct device ID to
> > > > > > > configure MSI-X, the SMMUv3 driver gets the correct Stream IDs 
> > > > > > > for PCI
> > > > > > > devices, and the IOMMU groups are setup correctly."
> > > > > > 
> > > > > > This doesn't get at what the actual problem is.  I'm hoping for
> > > > > > something like "without this change, we set up an 

Re: [PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-13 Thread Bjorn Helgaas
I tentatively applied both patches to pci/host-thunder for v4.12.

However, I am concerned about the topology here:

On Thu, Apr 13, 2017 at 08:30:45PM +, Jayachandran C wrote:
> On Cavium ThunderX2 arm64 SoCs (called Broadcom Vulcan earlier), the
> PCI topology is slightly unusual.  For a multi-node system, it looks
> like:
> 
> 00:00.0 [PCI] bridge to [bus 01-1e]
> 01:0a.0 [PCI-PCIe bridge, type 8] bridge to [bus 02-04]
> 02:00.0 [PCIe root port, type 4] bridge to [bus 03-04] (XLATE_ROOT)
> 03:00.0 PCIe Endpoint

A root port normally has a single PCIe link leading downstream.
According to this, 02:00.0 is a root port that has the usual
downstream link leading to 03:00.0, but it also has an upstream link
to 01:0a.0.

Maybe this example is omitting details that are not relevant to DMA
aliases?  The PCIe capability only contains one set of link-related
registers, so I don't know how we could manage a single device that
has two links.

A device with two links would break things like ASPM.  In
set_pcie_port_type(), for example, we have this comment:

   * A Root Port or a PCI-to-PCIe bridge is always the upstream end
   * of a Link.  No PCIe component has two Links.  Two Links are
   * connected by a Switch that has a Port on each Link and internal
   * logic to connect the two Ports.

The topology above breaks these assumptions, which will make
pdev->has_secondary_link incorrect, which means ASPM won't work
correctly.

What am I missing?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-21 Thread Bjorn Helgaas
On Fri, Apr 21, 2017 at 05:05:41PM +, Jayachandran C wrote:
> On Fri, Apr 21, 2017 at 10:48:15AM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 17, 2017 at 12:47 PM, Jayachandran C
> >  wrote:
> > > On Fri, Apr 14, 2017 at 09:00:06PM -0500, Bjorn Helgaas wrote:

> > >> Could you collect "lspci -vv" output from this system?  I'd like to
> > >> archive that as background for this IOMMU issue and the ASPM tweaks I
> > >> suspect we'll have to do.  I *wish* we had more information about that
> > >> VIA thing, because I suspect we could get rid of it if we had more
> > >> details.
> > >
> > > The full logs are slightly large, so I have kept them at:
> > > https://github.com/jchandra-cavm/thunderx2/blob/master/logs/
> > > The lspci -vv output is lspci-vv.txt and lspci -tvn output is 
> > > lspci-tvn.txt
> > >
> > > The output is from 2 socket system, the cards are not on the first slot
> > > like the example above, so the bus and device numbers are different.
> > 
> > Can somebody with this system collect the "lspci -" output as well?
> > 
> > I'm making some lspci changes to handle the PCI-to-PCIe bridge
> > correctly, and I can use the "lspci -" output to create an lspci
> > test case.
> 
> [Sorry was AFK for a few days]
> 
> I have updated the above directory with the log. Also tested your next branch
> and it works fine on ThunderX2.

Thanks!

With regard to my lspci changes, they add "Slot-" here:

   01:0a.0 PCI bridge: Broadcom Corporation Device 9039
   ...
  -   Capabilities: [40] Express (v2) PCI/PCI-X to PCI-Express Bridge, MSI 00
  +   Capabilities: [40] Express (v2) PCI/PCI-X to PCI-Express Bridge (Slot-), 
MSI 00

for all your PCI-to-PCIe bridges.  I assume the "Slot-" is correct, i.e.,
the link is not connected to a slot, right?  This comes from the "Slot
Implemented" bit in the PCIe Capabilities Register.

I did notice that all the Root Port devices claim to *not* be connected to
slots, which doesn't seem right.  For example,

  12:00.0 PCI bridge: Broadcom Corporation Device 9084
  Bus: primary=12, secondary=13, subordinate=14, sec-latency=0
  Capabilities: [ac] Express (v2) Root Port (Slot-), MSI 00

  13:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ 
Network Connection

It seems strange because the 12:00.0 Root Port looks like it probably
*does* lead to a slot where the NIC is plugged in.  Or is that NIC really
soldered down?

But I assume there are *some* PCIe slots, so at some of those Root Ports
should advertise "Slot+" (which by itself does not imply hotplug support,
if that's the concern).

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-17 Thread Bjorn Helgaas
On Wed, May 17, 2017 at 05:00:07PM +0530, Sricharan R wrote:
> Now with iommu probe deferral, we return -EPROBE_DEFER
> for master's that are connected to an iommu which is not

s/master's/masters/

s/iommu/IOMMU/ in your English text (changelogs and comments).  That seems
to be the convention, based on "git log drivers/iommu/of_iommu.c"

> probed yet, but going to get probed, so that we can attach
> the correct dma_ops. So while trying to defer the probe of
> the master, check if the of_iommu node that it is connected
> to is marked in DT as 'status=disabled', then the iommu is never
> is going to get probed. So simply return NULL and let the master
> work without an iommu.
> 
> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
> probing or error")
> Signed-off-by: Sricharan R 
> Reported-by: Geert Uytterhoeven 
> Tested-by: Will Deacon 
> Tested-by: Magnus Damn 
> Acked-by: Will Deacon 
> ---
>  drivers/iommu/of_iommu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 9f44ee8..e6e9bec 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node 
> *np)
>  
>   ops = iommu_ops_from_fwnode(fwnode);
>   if ((ops && !ops->of_xlate) ||
> + !of_device_is_available(iommu_spec->np) ||
>   (!ops && !of_iommu_driver_present(iommu_spec->np)))
>   return NULL;
>  
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
> Code Aurora Forum, hosted by The Linux Foundation
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-17 Thread Bjorn Helgaas
On Tue, May 16, 2017 at 10:52:05AM +0530, Oza Pawandeep wrote:
> current device framework and OF framework integration assumes

s/current/The current/

> dma-ranges in a way where memory-mapped devices define their
> dma-ranges. (child-bus-address, parent-bus-address, length).
> 
> of_dma_configure is specifically written to take care of memory
> mapped devices. but no implementation exists for pci to take
> care of pcie based memory ranges.

s/pci/PCI/  (also other occurrences below)
s/pcie/PCIe/

I don't see how PCIe is relevant here.  The bridge might support PCIe,
but I don't think anything here is actually specific to PCIe.  If
that's the case, I think it's confusing to mention PCIe.

> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
> world dma-ranges.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
> 
> this patch serves following:
> 
> 1) exposes interface to the pci host driver for their
> inbound memory ranges
> 
> 2) provide an interface to callers such as of_dma_get_ranges.
> so then the returned size get best possible (largest) dma_mask.
> because PCI RC drivers do not call APIs such as
> dma_set_coherent_mask() and hence rather it shows its addressing
> capabilities based on dma-ranges.
> for e.g.
> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
> we should get dev->coherent_dma_mask=0x7f.
> 
> 3) this patch handles multiple inbound windows and dma-ranges.
> it is left to the caller, how it wants to use them.
> the new function returns the resources in a standard and unform way
> 
> 4) this way the callers of for e.g. of_dma_get_ranges
> does not need to change.

Please start sentences with a capital letter.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/3] iommu/pci: reserve IOVA for PCI masters

2017-05-17 Thread Bjorn Helgaas
On Tue, May 16, 2017 at 10:52:06AM +0530, Oza Pawandeep wrote:
> this patch reserves the IOVA for PCI masters.
> ARM64 based SOCs may have scattered memory banks.
> such as iproc based SOC has
> 
> <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
> <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
> <0x0090 0x 0x4 0x>, /* 16G @ 576G */
> <0x00a0 0x 0x4 0x>; /* 16G @ 640G */
> 
> but incoming PCI transcation addressing capability is limited

s/transcation/transaction/

> by host bridge, for example if max incoming window capability
> is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.
> 
> to address this problem, iommu has to avoid allocating IOVA which

s/iommu/IOMMU/

> are reserved. which inturn does not allocate IOVA if it falls into hole.

s/inturn/in turn/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-23 Thread Bjorn Helgaas
On Wed, May 10, 2017 at 11:39:02AM -0700, Ashok Raj wrote:
> From: CQ Tang 
> 
> Requires: https://patchwork.kernel.org/patch/9593891

I'm not sure what the status of the patch above is.  I acked it, but it's
part of a 30-patch IOMMU series, so I expect it to be merged via an IOMMU
tree.

In any case, it's not in v4.12-rc1, so I can't apply *this* patch yet.

> After a FLR, pci-states need to be restored. This patch saves PASID features
> and PRI reqs cached.
> 
> Cc: Jean-Phillipe Brucker 
> Cc: David Woodhouse 
> Cc: iommu@lists.linux-foundation.org
> 
> Signed-off-by: CQ Tang 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/pci/ats.c   | 65 
> +
>  drivers/pci/pci.c   |  3 +++
>  include/linux/pci-ats.h | 10 
>  include/linux/pci.h |  6 +
>  4 files changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 2126497..a769955 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> - if ((control & PCI_PRI_CTRL_ENABLE) ||
> - !(status & PCI_PRI_STATUS_STOPPED))
> + if (!(status & PCI_PRI_STATUS_STOPPED))
>   return -EBUSY;
>  
>   pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
>   reqs = min(max_requests, reqs);
> + pdev->pri_reqs_alloc = reqs;
>   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>  
> - control |= PCI_PRI_CTRL_ENABLE;
> + control = PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   pdev->pri_enabled = 1;
> @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
>  /**
> + * pci_restore_pri_state - Restore PRI
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pri_state(struct pci_dev *pdev)
> +{
> +   u16 control = PCI_PRI_CTRL_ENABLE;
> +   u32 reqs = pdev->pri_reqs_alloc;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pri_enabled)
> +   return;
> +
> +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> +
> +/**
>   * pci_reset_pri - Resets device's PRI state
>   * @pdev: PCI device structure
>   *
> @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
> - if (control & PCI_PRI_CTRL_ENABLE)
> - return -EBUSY;
> -
> - control |= PCI_PRI_CTRL_RESET;
> -
> + control = PCI_PRI_CTRL_RESET;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   return 0;
> @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> -
> - if (control & PCI_PASID_CTRL_ENABLE)
> - return -EINVAL;
> -
>   supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>   /* User wants to enable anything unsupported? */
> @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   return -EINVAL;
>  
>   control = PCI_PASID_CTRL_ENABLE | features;
> + pdev->pasid_features = features;
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
>  
> @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
> + * pci_restore_pasid_state - Restore PASID capabilities.
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pasid_state(struct pci_dev *pdev)
> +{
> +   u16 control;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pasid_enabled)
> +   return;
> +
> +   control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> +   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> +
> +/**
>   * pci_pasid_features - Check which PASID features are supported
>   * @pdev: PCI device structure
>   *
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02..c9a6510 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1171,6 +1172,8 @@ void pci_restore_state(struct pci_dev *dev)
>  
>   /* PCI Express register must be restored fir

Re: [PATCH 1/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-24 Thread Bjorn Helgaas
On Tue, May 23, 2017 at 03:33:22PM -0500, Bjorn Helgaas wrote:
> On Wed, May 10, 2017 at 11:39:02AM -0700, Ashok Raj wrote:
> > From: CQ Tang 
> > 
> > Requires: https://patchwork.kernel.org/patch/9593891
> 
> I'm not sure what the status of the patch above is.  I acked it, but it's
> part of a 30-patch IOMMU series, so I expect it to be merged via an IOMMU
> tree.
> 
> In any case, it's not in v4.12-rc1, so I can't apply *this* patch yet.

Ashok or CQ, would you mind reposting this when the patch it depends
on has been merged?  I'm going to drop it from patchwork for now since
I can't do anything with it, and that means it will completely
disappear from my to-do list.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-30 Thread Bjorn Helgaas
On Tue, May 30, 2017 at 09:25:49AM -0700, Ashok Raj wrote:
> From: CQ Tang 
> 
> Requires: https://patchwork.kernel.org/patch/9593891

The above patch (9593891) is not in my tree or Linus' tree, so I can't
do anything with this yet.

> After a FLR, pci-states need to be restored. This patch saves PASID features
> and PRI reqs cached.
> 
> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: linux-...@vger.kernel.org
> To: linux-ker...@vger.kernel.org
> Cc: Jean-Phillipe Brucker 
> Cc: David Woodhouse 
> Cc: iommu@lists.linux-foundation.org
> 
> Signed-off-by: CQ Tang 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/pci/ats.c   | 65 
> +
>  drivers/pci/pci.c   |  3 +++
>  include/linux/pci-ats.h | 10 
>  include/linux/pci.h |  6 +
>  4 files changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 2126497..a769955 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> - if ((control & PCI_PRI_CTRL_ENABLE) ||
> - !(status & PCI_PRI_STATUS_STOPPED))
> + if (!(status & PCI_PRI_STATUS_STOPPED))
>   return -EBUSY;
>  
>   pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
>   reqs = min(max_requests, reqs);
> + pdev->pri_reqs_alloc = reqs;
>   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>  
> - control |= PCI_PRI_CTRL_ENABLE;
> + control = PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   pdev->pri_enabled = 1;
> @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
>  /**
> + * pci_restore_pri_state - Restore PRI
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pri_state(struct pci_dev *pdev)
> +{
> +   u16 control = PCI_PRI_CTRL_ENABLE;
> +   u32 reqs = pdev->pri_reqs_alloc;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pri_enabled)
> +   return;
> +
> +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> +
> +/**
>   * pci_reset_pri - Resets device's PRI state
>   * @pdev: PCI device structure
>   *
> @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
> - if (control & PCI_PRI_CTRL_ENABLE)
> - return -EBUSY;
> -
> - control |= PCI_PRI_CTRL_RESET;
> -
> + control = PCI_PRI_CTRL_RESET;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   return 0;
> @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> -
> - if (control & PCI_PASID_CTRL_ENABLE)
> - return -EINVAL;
> -
>   supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>   /* User wants to enable anything unsupported? */
> @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   return -EINVAL;
>  
>   control = PCI_PASID_CTRL_ENABLE | features;
> + pdev->pasid_features = features;
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
>  
> @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
> + * pci_restore_pasid_state - Restore PASID capabilities.
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pasid_state(struct pci_dev *pdev)
> +{
> +   u16 control;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pasid_enabled)
> +   return;
> +
> +   control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> +   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pasid_sta

Re: [PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-30 Thread Bjorn Helgaas
On Tue, May 30, 2017 at 09:25:49AM -0700, Ashok Raj wrote:
> From: CQ Tang 
> 
> Requires: https://patchwork.kernel.org/patch/9593891
> 
> 
> After a FLR, pci-states need to be restored. This patch saves PASID features
> and PRI reqs cached.
> 
> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: linux-...@vger.kernel.org
> To: linux-ker...@vger.kernel.org
> Cc: Jean-Phillipe Brucker 
> Cc: David Woodhouse 
> Cc: iommu@lists.linux-foundation.org
> 
> Signed-off-by: CQ Tang 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/pci/ats.c   | 65 
> +
>  drivers/pci/pci.c   |  3 +++
>  include/linux/pci-ats.h | 10 
>  include/linux/pci.h |  6 +
>  4 files changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 2126497..a769955 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> - if ((control & PCI_PRI_CTRL_ENABLE) ||
> - !(status & PCI_PRI_STATUS_STOPPED))
> + if (!(status & PCI_PRI_STATUS_STOPPED))
>   return -EBUSY;
>  
>   pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, &max_requests);
>   reqs = min(max_requests, reqs);
> + pdev->pri_reqs_alloc = reqs;
>   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>  
> - control |= PCI_PRI_CTRL_ENABLE;
> + control = PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   pdev->pri_enabled = 1;
> @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
>  /**
> + * pci_restore_pri_state - Restore PRI
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pri_state(struct pci_dev *pdev)
> +{
> +   u16 control = PCI_PRI_CTRL_ENABLE;
> +   u32 reqs = pdev->pri_reqs_alloc;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pri_enabled)
> +   return;

I propose swapping the order of these tests, so that if PRI is not
enabled, we don't have to search for the capability.  Similarly for
PASID below.

I made these changes and re-indented these functions on my branch.  No
action required unless you object to these changes.

> +
> +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> +
> +/**
>   * pci_reset_pri - Resets device's PRI state
>   * @pdev: PCI device structure
>   *
> @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, &control);
> - if (control & PCI_PRI_CTRL_ENABLE)
> - return -EBUSY;
> -
> - control |= PCI_PRI_CTRL_RESET;
> -
> + control = PCI_PRI_CTRL_RESET;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   return 0;
> @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, &control);
>   pci_read_config_word(pdev, pos + PCI_PASID_CAP, &supported);
> -
> - if (control & PCI_PASID_CTRL_ENABLE)
> - return -EINVAL;
> -
>   supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>   /* User wants to enable anything unsupported? */
> @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   return -EINVAL;
>  
>   control = PCI_PASID_CTRL_ENABLE | features;
> + pdev->pasid_features = features;
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
>  
> @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
> + * pci_restore_pasid_state - Restore PASID capabilities.
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pasid_state(struct pci_dev *pdev)
> +{
> +   u16 control;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pasid_enabled)
> +   return;
> +
> +   control = PCI

Re: [PATCH 0/2] Save and restore pci properties to support FLR

2017-05-30 Thread Bjorn Helgaas
On Tue, May 30, 2017 at 09:25:47AM -0700, Ashok Raj wrote:
> Resending Jean's patch so it can be included earlier than his large
> SVM commits. Original patch https://patchwork.kernel.org/patch/9593891
> was ack'ed by Bjorn. Let's commit these separately since we need
> functionality earlier.
> 
> Resending this series as requested by Jean.
> 
> CQ Tang (1):
>   PCI: Save properties required to handle FLR for replay purposes.
> 
> Jean-Philippe Brucker (1):
>   PCI: Cache PRI and PASID bits in pci_dev
> 
>  drivers/pci/ats.c   | 88 
> -
>  drivers/pci/pci.c   |  3 ++
>  include/linux/pci-ats.h | 10 ++
>  include/linux/pci.h |  8 +
>  4 files changed, 94 insertions(+), 15 deletions(-)

Applied to pci/virtualization for v4.13.  See response to 2/2 for minor
changes I made there.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: PCI warning on boot 3.8.0-rc1

2013-01-30 Thread Bjorn Helgaas
On Wed, Jan 16, 2013 at 3:38 PM, Stephen Hemminger
 wrote:
> I see this on boot in dmesg
>
> [0.574494] DMAR: No ATSR found
> [0.574549] IOMMU 0 0xfed9: using Queued invalidation
> [0.574550] IOMMU 1 0xfed91000: using Queued invalidation
> [0.574554] IOMMU: Setting RMRR:
> [0.574583] IOMMU: Setting identity map for device :00:02.0 
> [0xcf00 -
>  0xdf1f]
> [0.575748] IOMMU: Setting identity map for device :00:1d.0 
> [0xcd551000 -
>  0xcd56dfff]
> [0.575767] IOMMU: Setting identity map for device :00:1a.0 
> [0xcd551000 -
>  0xcd56dfff]
> [0.575786] IOMMU: Setting identity map for device :00:14.0 
> [0xcd551000 -
>  0xcd56dfff]
> [0.575797] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [0.575806] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 
> 0x
> ff]
> [0.576186] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> [0.576337] [ cut here ]
> [0.576342] WARNING: at drivers/pci/search.c:46 
> pci_find_upstream_pcie_bridge
> +0x59/0x71()
> [0.576343] Hardware name: System Product Name
> [0.576344] Modules linked in:
> [0.576347] Pid: 1, comm: swapper/0 Not tainted 3.8.0-rc1-net-next+ #4
> [0.576348] Call Trace:
> [0.576352]  [] warn_slowpath_common+0x83/0x9c
> [0.576355]  [] ? bus_set_iommu+0x49/0x49
> [0.576357]  [] warn_slowpath_null+0x1a/0x1c
> [0.576360]  [] pci_find_upstream_pcie_bridge+0x59/0x71
> [0.576362]  [] intel_iommu_add_device+0x4d/0x17a
> [0.576364]  [] add_iommu_group+0x3a/0x48
> [0.576368]  [] bus_for_each_dev+0x57/0x89
> [0.576370]  [] bus_set_iommu+0x42/0x49
> [0.576374]  [] intel_iommu_init+0xa27/0xb44
> [0.576377]  [] ? free_init_pages+0xf5/0x10d
> [0.576379]  [] ? maybe_link.part.2+0x10b/0x10b
> [0.576382]  [] ? memblock_find_dma_reserve+0x133/0x133
> [0.576384]  [] pci_iommu_init+0x13/0x3e
> [0.576387]  [] do_one_initcall+0x7f/0x133
> [0.576390]  [] kernel_init+0x146/0x29b
> [0.576393]  [] ? do_early_param+0x8c/0x8c
> [0.576395]  [] ? rest_init+0xda/0xda
> [0.576398]  [] ret_from_fork+0x7c/0xb0
> [0.576400]  [] ? rest_init+0xda/0xda
> [0.576406] ---[ end trace f709d1eb2b66cbf5 ]---
>
> $ sudo lspci -t -vv
> -[:00]-+-00.0  Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor 
> DRAM Controller
>+-01.0-[01-02]00.0  Intel Corporation Ethernet Controller 
> 10-Gigabit X540-AT2
>+-02.0  Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor 
> Graphics Controller
>+-14.0  Intel Corporation 7 Series/C210 Series Chipset Family USB 
> xHCI Host Controller
>+-16.0  Intel Corporation 7 Series/C210 Series Chipset Family MEI 
> Controller #1
>+-1a.0  Intel Corporation 7 Series/C210 Series Chipset Family USB 
> Enhanced Host Controller #2
>+-1b.0  Intel Corporation 7 Series/C210 Series Chipset Family High 
> Definition Audio Controller
>+-1c.0-[03]--
>+-1c.4-[04]00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B 
> PCI Express Gigabit Ethernet controller
>+-1c.5-[05-06]00.0-[06]--
>+-1d.0  Intel Corporation 7 Series/C210 Series Chipset Family USB 
> Enhanced Host Controller #1
>+-1f.0  Intel Corporation Z77 Express Chipset LPC Controller
>+-1f.2  Intel Corporation 7 Series/C210 Series Chipset Family 
> 6-port SATA Controller [AHCI mode]
>\-1f.3  Intel Corporation 7 Series/C210 Series Chipset Family 
> SMBus Controller

I think drivers/pci/search.c is identical between 3.7 and 3.8-rc1.  Is
this the first time you've turned on the IOMMU on that box?

It's the same warning as in this bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=44881, and there's a patch
there at https://bugzilla.kernel.org/show_bug.cgi?id=44881#c11, but
it's just a quirk that turns off VT-d if we find certain broken
bridges.  It doesn't look like you have any of those (although I don't
know what you have at 05:00.0).

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-20 Thread Bjorn Helgaas
On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
> and AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
> id respectively to avoid the need for duplicate definitions in other modules.
> AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
> an interface to calculate device id from bus number, and devfn pair.
>
> Signed-off-by: Shuah Khan 
> ---
>  include/uapi/linux/pci.h |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index 3c292bc0..6b2c8b3 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -30,6 +30,10 @@
>  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn)((devfn) & 0x07)
> +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
> +
> +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> +#define PCI_BUS(x) (((x) >> 8) & 0xff)
>
>  /* Ioctls for /proc/bus/pci/X/Y nodes. */
>  #define PCIIOC_BASE('P' << 24 | 'C' << 16 | 'I' << 8)

David, can you point me at a description of include/uapi ... what is
there and why, and how we should decide what new things go in
include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
there should be something in Documentation/?

I'm guessing it's something to do with being exported to userland, but
I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
exportable in the sense of being used for syscalls, etc.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-25 Thread Bjorn Helgaas
On Mon, Feb 25, 2013 at 9:37 AM, Shuah Khan  wrote:
> On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote:
>> On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
>> > pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
>> > it doesn't have interfaces to return PCI bus and PCI device id. Drivers
>> > (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
>> > and AMD_IOMMU driver also has a module specific interface to calculate PCI
>> > device id from bus number and devfn.
>> >
>> > Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI 
>> > device
>> > id respectively to avoid the need for duplicate definitions in other 
>> > modules.
>> > AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver 
>> > defines
>> > an interface to calculate device id from bus number, and devfn pair.
>> >
>> > Signed-off-by: Shuah Khan 
>> > ---
>> >  include/uapi/linux/pci.h |4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
>> > index 3c292bc0..6b2c8b3 100644
>> > --- a/include/uapi/linux/pci.h
>> > +++ b/include/uapi/linux/pci.h
>> > @@ -30,6 +30,10 @@
>> >  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>> >  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>> >  #define PCI_FUNC(devfn)((devfn) & 0x07)
>> > +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
>> > +
>> > +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
>> > +#define PCI_BUS(x) (((x) >> 8) & 0xff)
>> >
>> >  /* Ioctls for /proc/bus/pci/X/Y nodes. */
>> >  #define PCIIOC_BASE('P' << 24 | 'C' << 16 | 'I' << 8)
>>
>> David, can you point me at a description of include/uapi ... what is
>> there and why, and how we should decide what new things go in
>> include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
>> there should be something in Documentation/?
>>
>> I'm guessing it's something to do with being exported to userland, but
>> I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
>> exportable in the sense of being used for syscalls, etc.
>>
>
> Bjorn,David,
>
> Looks like the following thread answers some of the questions about when
> this uapi export was done on the existing defines.
>
> https://lkml.org/lkml/2011/7/28/198
>
> Sounds like the concern is that the older defines PCI_DEVFN, PCI_SLOT,
> PCI_FUNC,  and PCI_DEVID could be exported, but not the new ones I
> added. I could find any discussion on whether these four older defines
> are exportable or the reasons for the export in the above thread.

I think David's disintegration script took include/linux/pci.h, left
the #ifdef __KERNEL__ parts there, and moved everything else (which
wasn't much) to include/uapi/linux/pci.h.

It's obvious that the PCIIOC_ #defines need to be exported to
user-space for ioctls.  It's not obvious to me why PCI_DEVFN,
PCI_SLOT, and PCI_FUNC need to be exported to user-space.  But I can
imagine user-space using functionality like that, even if it's not
connected to a kernel interface.  I assume the intent of the
disintegration is that only include/uapi would be exposed to
user-space, so keeping those definitions in include/linux/pci.h would
break any user programs that used them.

> So the question is if uapi/linux.pci.h isn't the right place, do you
> have a recommendation on where they belong. The only alternative I can
> think of is include/linux/pci.h. It makes functional and logical sense
> to add the new defines to where the existing ones are defines. At least,
> not knowing the details of the change that moved PCI_DEVFN etc. to
> uapi/pci.h, that is my conclusion.

Using the linux-fullhist tree, I found these:

059d367 Import 2.1.82 -- moved PCI_DEVFN outside #ifdef __KERNEL__
b039547 Import 2.1.76 -- PCI_DEVFN was inside #ifdef __KERNEL__
f6d9739 Import 2.1.68pre1 -- added #ifdef __KERNEL__ (enclosing PCI_DEVFN)
940649f Import 1.3.0 -- added PCI_DEVFN

There's no indication of *why* PCI_DEVFN was exported, of course.

Bottom line, I think it's reasonable to keep PCI_DEVFN, et al., in
uapi/linux/pci.h to keep from breaking user-programs, even though if
we were adding them today we would probably put them in the
kernel-only linux/pci.h.  For the new ones you're adding, I'd propose
putting them in the kernel-only linux/pci.h because we know no user
programs use them.

It's not nice and consistent, but it does follow the simple rule of
"don't expose things to user-space unnecessarily."  We might want to
add a comment to keep somebody from cleaning it up later.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-25 Thread Bjorn Helgaas
On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
> and AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
> id respectively to avoid the need for duplicate definitions in other modules.
> AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
> an interface to calculate device id from bus number, and devfn pair.
>
> Signed-off-by: Shuah Khan 
> ---
>  include/uapi/linux/pci.h |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index 3c292bc0..6b2c8b3 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -30,6 +30,10 @@
>  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn)((devfn) & 0x07)
> +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
> +
> +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> +#define PCI_BUS(x) (((x) >> 8) & 0xff)

BTW, in the next round, maybe we should call this PCI_BUS_NR() or
similar to avoid confusion with "struct pci_bus"?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-03-26 Thread Bjorn Helgaas
On Wed, Feb 27, 2013 at 5:06 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) have module specific definitions for PCI_BUS() and
> AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> This patch set adds PCI_BUS_NUM(), and PCI_DEVID() to pci.h, changes AER
> to use PCI_BUS_NUM() from pci and remove local PCI_BUS() define. Changes
> AMD_IOMMU driver to use PCI_BUS_NUM() and PCI_DEVID() from pci and remove
> local PCI_BUS() define and local PCI_DEVID() implementation.
>
> Files changed:
>
> [PATCH v2 1/4] pci: Add PCI_BUS_NUM() and PCI_DEVID() interfaces to return bus
>number and device id
>
>  include/linux/pci.h |   15 +++
>  1 file changed, 15 insertions(+)
>
> [PATCH v2 2/4] pci/aer: Remove local PCI_BUS() define and use PCI_BUS_NUM()
>from pci
>
>  drivers/pci/pcie/aer/aerdrv_core.c |4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> [PATCH v2 3/4] iommu/amd: Remove local PCI_BUS() define and use PCI_BUS_NUM()
>from pci
>
>  drivers/iommu/amd_iommu.c   |   12 ++--
>  drivers/iommu/amd_iommu_init.c  |   34 +-
>  drivers/iommu/amd_iommu_types.h |4 +---
>  3 files changed, 24 insertions(+), 26 deletions(-)
>
> [PATCH v2 4/4] iommu/amd: Remove calc_devid() and use PCI_DEVID() from pci
> (no change to this patch - but tagging it v2 for clarity)
>
>  drivers/iommu/amd_iommu.c   |2 +-
>  drivers/iommu/amd_iommu_init.c  |6 +++---
>  drivers/iommu/amd_iommu_types.h |7 ---
>  3 files changed, 4 insertions(+), 11 deletions(-)

Thanks, Shuah, I applied these to a pci/shuah-defines branch and pushed it.

Since some of these touch drivers/iommu, it'd be good if you acked
them again, Joerg.  I know you acked them before, but there have been
minor changes since then, so I didn't add your ack to these.  But if
you're still OK with them, I'll refresh the branch to add it now.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-03-26 Thread Bjorn Helgaas
On Tue, Mar 26, 2013 at 3:59 PM, Joerg Roedel  wrote:
> Hi Bjorn,
>
> On Tue, Mar 26, 2013 at 03:41:07PM -0600, Bjorn Helgaas wrote:
>> Since some of these touch drivers/iommu, it'd be good if you acked
>> them again, Joerg.  I know you acked them before, but there have been
>> minor changes since then, so I didn't add your ack to these.  But if
>> you're still OK with them, I'll refresh the branch to add it now.
>
> Looks still good to me. You can add my
>
> Acked-by: Joerg Roedel 
>
> if you want.

Added and branch updated, thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-03 Thread Bjorn Helgaas
[+cc David and iommu list, Yinghai, Jiang]

On Mon, Mar 4, 2013 at 12:04 PM, Neil Horman  wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.
>
> Signed-off-by: Neil Horman 
> CC: Prarit Bhargava 
> CC: Don Zickus 
> CC: Don Dutile 
> CC: Bjorn Helgaas 
> CC: Asit Mallick 
> CC: linux-...@vger.kernel.org
>
> ---
>
> Change notes:
>
> v2)
>
> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
> chipset series is x86 only.  I decided however to keep the quirk as a regular
> quirk, not an early_quirk.  Early quirks have no way currently to determine if
> BIOS has properly disabled the feature in the iommu, at least not without
> significant hacking, and since its quite possible this will be a short lived
> quirk, should Don Z's workaround code prove successful (and it looks like it 
> may
> well), I don't think that necessecary.
>
> * Removed the WARNING banner from the quirk, and added the HW_ERR token to the
> string, I opted to leave the newlines in place however, as I really couldnt
> find a way to keep the text on a single line is still legible from a code
> perspective.  I think theres enough language in there that using cscope on 
> just
> about any substring however will turn it up, and again, this may be a short
> lived quirk.
> ---
>  arch/x86/kernel/quirks.c | 18 ++
>  include/linux/pci_ids.h  |  2 ++
>  2 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
> index 26ee48a..a718ea2 100644
> --- a/arch/x86/kernel/quirks.c
> +++ b/arch/x86/kernel/quirks.c
> @@ -5,6 +5,7 @@
>  #include 
>
>  #include 
> +#include "../../../drivers/iommu/irq_remapping.h"
>
>  #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)
>
> @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 
> PCI_DEVICE_ID_AMD_15H_NB_F5,
> quirk_amd_nb_node);
>
>  #endif
> +
> +static void intel_remapping_check(struct pci_dev *dev)
> +{
> +   u8 revision;
> +
> +   pci_read_config_byte(dev, PCI_REVISION_ID, &revision);
> +
> +   if ((revision == 0x13) && irq_remapping_enabled) {
> +pr_warn(HW_ERR "This system BIOS has enabled interrupt 
> remapping\n"
> +"on a chipset that contains an errata making that\n"
> +"feature unstable.  Please reboot with nointremap\n"
> +"added to the kernel command line and contact\n"
> +"your BIOS vendor for an update");
> +   }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, 
> intel_remapping_check);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, 
> intel_remapping_check);

This started as an IOMMU change, and I'm not an expert in that area,
so I added David and the IOMMU list.  I'd rather have him deal with
this than me.

Is this something we can just *fix* in the kernel, e.g., by turning
off interrupt remapping ourselves, or does it have to be done before
the OS boots?

> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 31717bd..54027a6 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2732,6 +2732,8 @@
>  #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2  0x2db2
>  #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV20x2db3
>  #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340
> +#define PCI_DEVICE_ID_INTEL_5500_IOHUB 0x3403
&g

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas
On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman  wrote:
> On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
>> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
>> > );
>> > > +
>> > > +   if ((revision == 0x13) && irq_remapping_enabled) {
>> > > +pr_warn(HW_ERR "This system BIOS has enabled interrupt 
>> > > remapping\n"
>> > > +"on a chipset that contains an errata making 
>> > > that\n"
>> > > +"feature unstable.  Please reboot with 
>> > > nointremap\n"
>> > > +"added to the kernel command line and contact\n"
>> > > +"your BIOS vendor for an update");
>>
>> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
>>
> Ok, copy that. I'll repost shortly

When you do, please include URLs for any problem reports or bugzillas you have.

I assume Windows "just works" in this situation?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas
On Thu, Apr 4, 2013 at 9:39 AM, Neil Horman  wrote:
> On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote:
>> On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman  wrote:
>> > On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
>> >> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
>> >> > );
>> >> > > +
>> >> > > +   if ((revision == 0x13) && irq_remapping_enabled) {
>> >> > > +pr_warn(HW_ERR "This system BIOS has enabled 
>> >> > > interrupt remapping\n"
>> >> > > +"on a chipset that contains an errata making 
>> >> > > that\n"
>> >> > > +"feature unstable.  Please reboot with 
>> >> > > nointremap\n"
>> >> > > +"added to the kernel command line and 
>> >> > > contact\n"
>> >> > > +"your BIOS vendor for an update");
>> >>
>> >> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
>> >>
>> > Ok, copy that. I'll repost shortly
>>
>> When you do, please include URLs for any problem reports or bugzillas you 
>> have.
>>
> Well, those are going to be vendor specific, so I'm not sure we can really do
> that, at least not in any meaningful way.

Sorry, I don't understand your point.  It's useful to know who
reported it (e.g., for future testers) and what happened and what
bugzillas it solved.  Of course it applies only to machines with this
chipset and certain BIOS revisions.

>> I assume Windows "just works" in this situation?
> No more or less than linux does in this case.  The Intel provided errata
> indicates that the only acceptable workaround is to disable remapping in the
> BIOS, so I would presume that if a windows system has a BIOS that doesn't
> implement this fix, its just as exposed as we are.

It sounds like the effect of this bug is that on Linux, devices may
not work at all because of lost interrupts.  Either Windows must never
enable remapping (so it never sees the bug), or it must be designed in
a way that tolerates the problem.  I can't believe these machines
shipped with Windows certification if devices didn't work correctly.

Either way, I don't understand why we can't make the quirk just fix
this.  Booting with "nointremap" only sets disable_irq_remap, which is
only used by irq_remapping_supported().  Early quirks are run before
irq_remapping_supported () is ever called, so an early quirk ought to
be just as effective as the command line option.  Here's the relevant
call tree I see:

  start_kernel
setup_arch
  parse_early_param
  early_quirks
rest_init
  ...


The x86 setup_arch() does call generic_apic_probe(), but as far as I
can tell, none of the APIC .probe() methods reference
disable_irq_remap, so that doesn't look like a problem.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-04 Thread Bjorn Helgaas
On Thu, Mar 7, 2013 at 7:35 PM, Andrew Cooks  wrote:
> This patch creates a quirk to allow the Intel IOMMU to be enabled for devices
> that use incorrect tags during DMA. It is similar to the quirk for Ricoh
> devices, but allows mapping multiple functions and mapping of 'ghost'
> functions that do not correspond to real devices. Devices that need this
> include a variety of Marvell 88SE91xx based SATA controllers. [1][2]
>
> Changelog:
> v4: Process feedback received from Alex Williamson.
>  * don't assume function 0 is a real device.
>  * exit early if no ghost functions are known, or all known functions have
>been mapped.
>  * cleanup failure case so mapping succeeds or fails for all ghost functions
>per device.
>  * improve comments.
>
> v3:
>  * Adopt David Woodhouse's terminology by referring to the quirky functions as
>  'ghost' functions.
>  * Unmap ghost functions when device is detached from IOMMU.
>  * Stub function for when CONFIG_PCI_QUIRKS is not enabled.
>
>
>  This patch was generated against 3.9-rc1, but will also apply to 3.7.10.
>
>  Bug reports:
>  1. https://bugzilla.redhat.com/show_bug.cgi?id=757166
>  2. https://bugzilla.kernel.org/show_bug.cgi?id=42679
>
> Signed-off-by: Andrew Cooks 
> ---
>  drivers/iommu/intel-iommu.c |   69 
> +++
>  drivers/pci/quirks.c|   67 +-
>  include/linux/pci.h |5 +++
>  include/linux/pci_ids.h |1 +
>  4 files changed, 141 insertions(+), 1 deletions(-)

I'm OK with the pci/quirks.c part of this, but the bulk of the
interesting code is in intel-iommu.c, so I assume the IOMMU folks will
take care of this.

Bjorn

> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0099667..f53f3e3 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1674,6 +1674,69 @@ static int domain_context_mapping_one(struct 
> dmar_domain *domain, int segment,
> return 0;
>  }
>
> +static void iommu_detach_dev(struct intel_iommu *iommu, u8 bus, u8 devfn);
> +
> +static void unmap_ghost_dma_fn(struct pci_dev *pdev, u8 fn_map)
> +{
> +   u8 fn;
> +   struct intel_iommu *iommu;
> +
> +   iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
> +   pdev->devfn);
> +
> +   /* something must be seriously fubar if we can't lookup the iommu. */
> +   BUG_ON(!iommu);
> +
> +   for (fn = 0; fn <= 7 && fn_map << fn; fn++) {
> +   if (fn == PCI_FUNC(pdev->devfn))
> +   continue;
> +   if (fn_map & (1< +   iommu_detach_dev(iommu,
> +   pdev->bus->number,
> +   PCI_DEVFN(PCI_SLOT(pdev->devfn), fn));
> +   dev_dbg(&pdev->dev, "quirk; ghost func %d unmapped",
> +   fn);
> +   }
> +   }
> +}
> +
> +/* For quirky devices like Marvell 88SE91xx chips that use ghost functions. 
> */
> +static int map_ghost_dma_fn(struct dmar_domain *domain,
> +   struct pci_dev *pdev,
> +   int translation)
> +{
> +   u8 fn, fn_map;
> +   u8 fn_mapped = 0;
> +   int err = 0;
> +
> +   fn_map = pci_get_dma_source_map(pdev);
> +
> +   /* this is the common, non-quirky case. */
> +   if (!fn_map)
> +   return 0;
> +
> +   for (fn = 0; fn <= 7 && fn_map << fn; fn++) {
> +   if (fn == PCI_FUNC(pdev->devfn))
> +   continue;
> +   if (fn_map & (1< +   err = domain_context_mapping_one(domain,
> +   pci_domain_nr(pdev->bus),
> +   pdev->bus->number,
> +   PCI_DEVFN(PCI_SLOT(pdev->devfn), fn),
> +   translation);
> +   if (err) {
> +   dev_err(&pdev->dev,
> +   "mapping ghost func %d failed", fn);
> +   unmap_ghost_dma_fn(pdev, fn_mapped);
> +   return err;
> +   }
> +   dev_dbg(&pdev->dev, "quirk; ghost func %d mapped", 
> fn);
> +   fn_mapped |= (1< +   }
> +   }
> +   return 0;
> +}
> +
>  static int
>  domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
> int translation)
> @@ -1687,6 +1750,11 @@ domain_context_mapping(struct dmar_domain *domain, 
> struct pci_dev *pdev,
> if (ret)
> return ret;
>
> +   /* quirk for undeclared/ghost pci functions */
> +   ret = map_ghost_dma_fn(domain, pdev, translation);
> +   if (ret)
> +   return ret;
> +
> /* dependent device mapping */
> tmp = pci_find_upstr

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas
On Thu, Apr 4, 2013 at 11:51 AM, Neil Horman  wrote:

> Oh, you want the bug report that I'm fixing this against?  Sure, I can do 
> that.
> I thought you wanted me to include a url in the WARN_TAINT, with which user
> could report occurances of this bug.  Yeah, the bug that this is reported in 
> is:
> https://bugzilla.redhat.com/show_bug.cgi?id=887006
>
> Its standing in for about a dozen or so variants of this issue we've seen

Exactly -- I'm just hoping for something in the changelog.  BTW, this
particular bugzilla is not public.

> Regardless, theres also the security issue to consider here - namely that
> disabling irq remapping opens up users of virt to a possible security bug
> (potential irq injection).  Some users may wish to live with the remapping
> error, given that error typically leads to devices that need to be
> restarted/reset to start working again, rather than live with the security 
> hole.
> I rather like the warning, that gives users a choice, but I'll spin up a 
> version
> that just disables it if you would rather.

I don't believe users will want to make a choice like that or even be
sophisticated enough to do it, at least not based on something in
dmesg.  I'm pretty sure I'm not  :)

The only supportable thing I can imagine doing would be:

  - Disable interrupt remapping if this chipset defect is present, so
devices work reliably (they don't need whatever restart/reset you
referred to above).
  - Disable virt functionality when interrupt remapping is disabled to
avoid the security problem (I don't know the details of this.)
  - Add a command-line option to enable interrupt remapping (I think
"intremap=on" is currently parsed too early, but maybe this could be
reworked so the option could override the quirk disable).
  - Add release notes saying "boot with 'intremap=on' if you want the
virt functionality and can accept unreliable devices."

That way the default behavior is safe and reliable (though perhaps
lacking some functionality), and you have told the user a way to get
safe and unreliable operation if he's willing to accept that.  At
least, that's what I think I would want if I were in RH's shoes.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: PCI warning on boot 3.8.0-rc1

2013-04-10 Thread Bjorn Helgaas
On Wed, Feb 06, 2013 at 08:58:41AM -0700, Alex Williamson wrote:
> On Wed, 2013-02-06 at 07:49 -0800, Stephen Hemminger wrote:
> > On Mon, 04 Feb 2013 15:41:24 -0700
> > Alex Williamson  wrote:
> > 
> > > On Mon, 2013-02-04 at 13:28 -0700, Alex Williamson wrote:
> > > > On Mon, 2013-02-04 at 10:36 -0800, Stephen Hemminger wrote:
> > > > > > I think drivers/pci/search.c is identical between 3.7 and 3.8-rc1.  
> > > > > > Is
> > > > > > this the first time you've turned on the IOMMU on that box?
> > > > > 
> > > > > It exists in 3.7 and earlier kernels, just haven't turned on same 
> > > > > config.
> > > > > 
> > > > > > It's the same warning as in this bugzilla:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=44881, and there's a 
> > > > > > patch
> > > > > > there at https://bugzilla.kernel.org/show_bug.cgi?id=44881#c11, but
> > > > > > it's just a quirk that turns off VT-d if we find certain broken
> > > > > > bridges.  It doesn't look like you have any of those (although I 
> > > > > > don't
> > > > > > know what you have at 05:00.0).
> > > > > > 
> > > > > > Bjorn
> > > > > 
> > > > > This is a standard ASUS motherboard, and don't want to disable VT-d.
> > > > 
> > > > Stephen,
> > > > 
> > > > Can you give the lspci -vvv of device 5:00.0 to see if it's one we've
> > > > seen before?  Does the patch below help?
> > > > 
> > > > Bjorn, I think we need to quirk it somehow.  So far they've all been
> > > > PCI-to-PCI bridges attached to root ports where we expect it's actually
> > > > a PCIe-to-PCI bridge.  Seems like maybe we could have the same attached
> > > > to a downstream port.  The patch below avoids the WARN and gives us a
> > > > device, but of course pci_is_pcie reports wrong for this device and may
> > > > cause some trickle down breakage.  A more complete option might be to
> > > > add a is_pcie flag to the device that can be set independent of
> > > > pcie_cap.  We'd need to check all the callers for assumptions, but then
> > > > we could put the quirk in one place and hopefully fix everything.
> > > > Thoughts?  Thanks,
> > > 
> > > This latter approach seems like it might be easier than I expected since
> > > all the users are so well filtered through the access functions.  A
> > > quick look through who uses pci_is_pcie seems like this might be
> > > complete, but more eyes are required.  I'll upload this to the bz for
> > > those reporters to test as well.  Thoughts?  Thanks,
> > > 
> > > Alex
> > 
> > On my hardware this gives:
> 
> > [0.254621] pci_bus :05: busn_res: can not insert [bus 05-ff] under 
> > [bus 00-3e] (conflicts with (null) [bus 00-3e])
> > [0.254647] WARNING: Your hardware is broken, device (null) appears to 
> > be a
> > [0.254647]  Legacy PCI device attached directly to a PCIe device which 
> > is not a
> > [0.254647]  PCIe-to-PCI bridge.  Per section 7.8 of the PCI Express 3.0 
> > spec, the
> > [0.254647]  PCI express capability structure is required for PCI 
> > express device
> > [0.254647] functions.
> > [0.254653] pci :05:00.0: [1b21:1080] type 01 class 0x060401
> 
> I guess I must be calling pci_name() before it's set.  The warning
> message needs some work too, it's mainly meant for hardware vendors with
> the hope that they might test Linux and see it before shipping these
> broken devices.  Bjorn, does this approach seem worth pursuing?  Thanks,

Sorry I dropped this for so long.  I'm looking at the patch
here: https://bugzilla.kernel.org/attachment.cgi?id=92521,
appended for convenience.

In case anybody else needs the context, I think we have
this scenario (from John Wehin's original report at
https://bugzilla.kernel.org/show_bug.cgi?id=44881):

pci :00:1c.4: PCI bridge to [bus 03-04] # PCIe root port
pci :03:00.0: PCI bridge to [bus 04]# no PCIe cap
...
pci :03:00.0: expected upstream PCIe bridge; :00:1c.4 is type 0x4

We called pci_find_upstream_pcie_bridge(03:00.0), which generated
the warning because:

- 03:00.0 is not a PCIe device, and
- 00:1c.4 (its upstream bridge) *is* a PCIe device, and
- 00:1c.4 is a Root Port (PCI_EXP_TYPE_ROOT_PORT == 0x4),
  not a PCIe-to-PCI bridge (PCI_EXP_TYPE_PCI_BRIDGE == 0x7)
  as we expected

> commit 60d668a3cdeeb0e29570cf0043736436c146bde8
> Author: Alex Williamson 
> Date:   Mon Feb 4 15:34:34 2013 -0700
> 
> pci: Handle unadvertised PCIe bridges
> 
> There seem to be several PCIe-to-PCI bridges out in the wild that
> blatantly ignore the PCIe specification and do not expose a PCIe
> capability.  We can attempt to deduce their existence by looking
> for PCI bridges directly connected to root ports or downstream
> ports.  What this means is that pci_is_pcie() does not imply PCIe
> capability and we un-deprecate is_pcie to denote the difference.
> All the accesses seem to go through pcie_capability_reg_implemented,
> so we can significantly limit the footprint of this change by
>

Re: PCI warning on boot 3.8.0-rc1

2013-04-11 Thread Bjorn Helgaas
On Wed, Apr 10, 2013 at 6:01 PM, Alex Williamson
 wrote:
> On Wed, 2013-04-10 at 16:36 -0600, Bjorn Helgaas wrote:
>> On Wed, Feb 06, 2013 at 08:58:41AM -0700, Alex Williamson wrote:
>> > On Wed, 2013-02-06 at 07:49 -0800, Stephen Hemminger wrote:
>> > > On Mon, 04 Feb 2013 15:41:24 -0700
>> > > Alex Williamson  wrote:
>> > >
>> > > > On Mon, 2013-02-04 at 13:28 -0700, Alex Williamson wrote:
>> > > > > On Mon, 2013-02-04 at 10:36 -0800, Stephen Hemminger wrote:
>> > > > > > > I think drivers/pci/search.c is identical between 3.7 and 
>> > > > > > > 3.8-rc1.  Is
>> > > > > > > this the first time you've turned on the IOMMU on that box?
>> > > > > >
>> > > > > > It exists in 3.7 and earlier kernels, just haven't turned on same 
>> > > > > > config.
>> > > > > >
>> > > > > > > It's the same warning as in this bugzilla:
>> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=44881, and there's a 
>> > > > > > > patch
>> > > > > > > there at https://bugzilla.kernel.org/show_bug.cgi?id=44881#c11, 
>> > > > > > > but
>> > > > > > > it's just a quirk that turns off VT-d if we find certain broken
>> > > > > > > bridges.  It doesn't look like you have any of those (although I 
>> > > > > > > don't
>> > > > > > > know what you have at 05:00.0).
>> > > > > > >
>> > > > > > > Bjorn
>> > > > > >
>> > > > > > This is a standard ASUS motherboard, and don't want to disable 
>> > > > > > VT-d.
>> > > > >
>> > > > > Stephen,
>> > > > >
>> > > > > Can you give the lspci -vvv of device 5:00.0 to see if it's one we've
>> > > > > seen before?  Does the patch below help?
>> > > > >
>> > > > > Bjorn, I think we need to quirk it somehow.  So far they've all been
>> > > > > PCI-to-PCI bridges attached to root ports where we expect it's 
>> > > > > actually
>> > > > > a PCIe-to-PCI bridge.  Seems like maybe we could have the same 
>> > > > > attached
>> > > > > to a downstream port.  The patch below avoids the WARN and gives us a
>> > > > > device, but of course pci_is_pcie reports wrong for this device and 
>> > > > > may
>> > > > > cause some trickle down breakage.  A more complete option might be to
>> > > > > add a is_pcie flag to the device that can be set independent of
>> > > > > pcie_cap.  We'd need to check all the callers for assumptions, but 
>> > > > > then
>> > > > > we could put the quirk in one place and hopefully fix everything.
>> > > > > Thoughts?  Thanks,
>> > > >
>> > > > This latter approach seems like it might be easier than I expected 
>> > > > since
>> > > > all the users are so well filtered through the access functions.  A
>> > > > quick look through who uses pci_is_pcie seems like this might be
>> > > > complete, but more eyes are required.  I'll upload this to the bz for
>> > > > those reporters to test as well.  Thoughts?  Thanks,
>> > > >
>> > > > Alex
>> > >
>> > > On my hardware this gives:
>> >
>> > > [0.254621] pci_bus :05: busn_res: can not insert [bus 05-ff] 
>> > > under [bus 00-3e] (conflicts with (null) [bus 00-3e])
>> > > [0.254647] WARNING: Your hardware is broken, device (null) appears 
>> > > to be a
>> > > [0.254647]  Legacy PCI device attached directly to a PCIe device 
>> > > which is not a
>> > > [0.254647]  PCIe-to-PCI bridge.  Per section 7.8 of the PCI Express 
>> > > 3.0 spec, the
>> > > [0.254647]  PCI express capability structure is required for PCI 
>> > > express device
>> > > [0.254647] functions.
>> > > [0.254653] pci :05:00.0: [1b21:1080] type 01 class 0x060401
>> >
>> > I guess I must be calling pci_name() before it's set.  The warning
>> > message needs some work too, it's mainly meant for hardwar

Re: [PATCH 1/3] iommu: Move swap_pci_ref function to pci.h.

2013-04-15 Thread Bjorn Helgaas
On Mon, Apr 15, 2013 at 8:58 AM, Joerg Roedel  wrote:
> On Mon, Apr 15, 2013 at 12:42:00AM +0530, Varun Sethi wrote:
>> swap_pci_ref function is used by the IOMMU API code for swapping pci device
>> pointers, while determining the iommu group for the device.
>> Currently this function was being implemented for different IOMMU drivers.
>> This patch moves the function to pci.h so that the implementation can be
>> shared across various IOMMU drivers.
>
> The function is only used in IOMMU code, so I think its fine to keep it
> there (unless Bjorn disagrees and wants it in PCI code).

I agree; I don't think there's much benefit in putting something under
#ifdef CONFIG_IOMMU_API into pci.h.  Maybe there is or could be a
shared iommu header file?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: PCI warning on boot 3.8.0-rc1

2013-04-15 Thread Bjorn Helgaas
On Mon, Apr 15, 2013 at 1:12 PM, Alex Williamson
 wrote:
> On Thu, 2013-04-11 at 11:23 -0600, Bjorn Helgaas wrote:
>> On Wed, Apr 10, 2013 at 6:01 PM, Alex Williamson
>>  wrote:
>> > On Wed, 2013-04-10 at 16:36 -0600, Bjorn Helgaas wrote:
>> >> On Wed, Feb 06, 2013 at 08:58:41AM -0700, Alex Williamson wrote:
>> >> > On Wed, 2013-02-06 at 07:49 -0800, Stephen Hemminger wrote:
>> >> > > On Mon, 04 Feb 2013 15:41:24 -0700
>> >> > > Alex Williamson  wrote:
>> >> > >
>> >> > > > On Mon, 2013-02-04 at 13:28 -0700, Alex Williamson wrote:
>> >> > > > > On Mon, 2013-02-04 at 10:36 -0800, Stephen Hemminger wrote:
>> >> > > > > > > I think drivers/pci/search.c is identical between 3.7 and 
>> >> > > > > > > 3.8-rc1.  Is
>> >> > > > > > > this the first time you've turned on the IOMMU on that box?
>> >> > > > > >
>> >> > > > > > It exists in 3.7 and earlier kernels, just haven't turned on 
>> >> > > > > > same config.
>> >> > > > > >
>> >> > > > > > > It's the same warning as in this bugzilla:
>> >> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=44881, and 
>> >> > > > > > > there's a patch
>> >> > > > > > > there at 
>> >> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=44881#c11, but
>> >> > > > > > > it's just a quirk that turns off VT-d if we find certain 
>> >> > > > > > > broken
>> >> > > > > > > bridges.  It doesn't look like you have any of those 
>> >> > > > > > > (although I don't
>> >> > > > > > > know what you have at 05:00.0).
>> >> > > > > > >
>> >> > > > > > > Bjorn
>> >> > > > > >
>> >> > > > > > This is a standard ASUS motherboard, and don't want to disable 
>> >> > > > > > VT-d.
>> >> > > > >
>> >> > > > > Stephen,
>> >> > > > >
>> >> > > > > Can you give the lspci -vvv of device 5:00.0 to see if it's one 
>> >> > > > > we've
>> >> > > > > seen before?  Does the patch below help?
>> >> > > > >
>> >> > > > > Bjorn, I think we need to quirk it somehow.  So far they've all 
>> >> > > > > been
>> >> > > > > PCI-to-PCI bridges attached to root ports where we expect it's 
>> >> > > > > actually
>> >> > > > > a PCIe-to-PCI bridge.  Seems like maybe we could have the same 
>> >> > > > > attached
>> >> > > > > to a downstream port.  The patch below avoids the WARN and gives 
>> >> > > > > us a
>> >> > > > > device, but of course pci_is_pcie reports wrong for this device 
>> >> > > > > and may
>> >> > > > > cause some trickle down breakage.  A more complete option might 
>> >> > > > > be to
>> >> > > > > add a is_pcie flag to the device that can be set independent of
>> >> > > > > pcie_cap.  We'd need to check all the callers for assumptions, 
>> >> > > > > but then
>> >> > > > > we could put the quirk in one place and hopefully fix everything.
>> >> > > > > Thoughts?  Thanks,
>> >> > > >
>> >> > > > This latter approach seems like it might be easier than I expected 
>> >> > > > since
>> >> > > > all the users are so well filtered through the access functions.  A
>> >> > > > quick look through who uses pci_is_pcie seems like this might be
>> >> > > > complete, but more eyes are required.  I'll upload this to the bz 
>> >> > > > for
>> >> > > > those reporters to test as well.  Thoughts?  Thanks,
>> >> > > >
>> >> > > > Alex
>> >> > >
>> >> > > On my hardware this gives:
>> >> >
>> >> > > [0.254621] pci_bus 0

Re: [PATCH 1/3] pci: Add PCI walk function and PCIe bridge test

2013-05-23 Thread Bjorn Helgaas
On Fri, May 10, 2013 at 3:18 PM, Alex Williamson
 wrote:
> These will replace pci_find_upstream_pcie_bridge, which is difficult
> to use and rather specific to intel-iommu usage.  A quirked
> pci_is_pcie_bridge function is provided to work around non-compliant
> PCIe-to-PCI bridges such as those found in
> https://bugzilla.kernel.org/show_bug.cgi?id=44881
>
> Signed-off-by: Alex Williamson 
> ---
>  drivers/pci/search.c |   57 
> ++
>  include/linux/pci.h  |   23 
>  2 files changed, 80 insertions(+)
>
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index d0627fa..0357f74 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -17,6 +17,63 @@
>  DECLARE_RWSEM(pci_bus_sem);
>  EXPORT_SYMBOL_GPL(pci_bus_sem);
>
> +/* Test for PCIe bridges. */
> +bool pci_is_pcie_bridge(struct pci_dev *pdev)
> +{
> +   if (!pci_is_bridge(pdev))
> +   return false;
> +
> +   if (pci_is_pcie(pdev))
> +   return true;
> +
> +#ifdef CONFIG_PCI_QUIRKS
> +   /*
> +* If we're not on the root bus, look one device upstream of the
> +* current device.  If that device is PCIe and is not a PCIe-to-PCI
> +* bridge, then the current device is effectively PCIe as it must
> +* be the PCIe-to-PCI bridge.  This handles several bridges that
> +* violate the PCIe spec by not exposing a PCIe capability:
> +* https://bugzilla.kernel.org/show_bug.cgi?id=44881
> +*/
> +   if (!pci_is_root_bus(pdev->bus)) {
> +   struct pci_dev *parent = pdev->bus->self;
> +
> +   if (pci_is_pcie(parent) &&
> +   pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> +
> +   return true;
> +   }
> +#endif
> +   return false;
> +}

I like this strategy.  But I'd rather it not be a general-purpose PCI
interface, because if pci_is_pcie_bridge() is true, people will assume
they can perform PCIe operations on the device, and they can't.  The
only use for this is to figure out the source ID the IOMMU will see,
so I think this should just go in the IOMMU code.

> +/*
> + * Walk upstream from the given pdev for the first device returning
> + * true for the provided match function.  If no match is found, return
> + * NULL.  *last records the previous step in the walk.
> + */
> +struct pci_dev *pci_walk_up_to_first_match(struct pci_dev *pdev,
> +  bool (*match)(struct pci_dev *),
> +  struct pci_dev **last)
> +{
> +   *last = NULL;
> +
> +   if (match(pdev))
> +   return pdev;
> +
> +   *last = pdev;
> +
> +   while (!pci_is_root_bus(pdev->bus)) {
> +   *last = pdev;
> +   pdev = pdev->bus->self;
> +
> +   if (match(pdev))
> +   return pdev;
> +   }
> +
> +   return NULL;
> +}

Same here.  I don't really see much potential for other uses of this,
so it seems like you might as well just put this in the IOMMU code and
make it call pci_is_pcie_bridge() directly.

The "source ID == upstream PCIe bridge" mapping is deeply ingrained in
your skull, but I think it would make the intent of the code clearer
if the function names mentioned the source ID somehow.  Otherwise new
readers like me have to come up with that association on our own.  But
since I'm proposing putting all this in the IOMMU code, it's totally
up to you :)

Bjorn

> +
>  /*
>   * find the upstream PCIe-to-PCI bridge of a PCI device
>   * if the device is PCIE, return NULL
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index bd8ec30..e87423a 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1855,6 +1855,29 @@ static inline struct eeh_dev 
> *pci_dev_to_eeh_dev(struct pci_dev *pdev)
>  #endif
>
>  /**
> + * pci_walk_up_to_first_match - Generic upstream search function
> + * @pdev: starting PCI device to search
> + * @match: match function to call on each device (true = match)
> + * @last: last device examined prior to returned device
> + *
> + * Walk upstream from the given device, calling match() at each device.
> + * Returns the first device matching match().  If the root bus is reached
> + * without finding a match, return NULL.  last returns the N-1 step in
> + * the search path.
> + */
> +struct pci_dev *pci_walk_up_to_first_match(struct pci_dev *pdev,
> +  bool (*match)(struct pci_dev *),
> +  struct pci_dev **last);
> +
> +/**
> + * pci_is_pcie_bridge - Match a PCIe bridge device
> + * @pdev: device to test
> + *
> + * Return true if the given device is a PCIe bridge, false otherwise.
> + */
> +bool pci_is_pcie_bridge(struct pci_dev *pdev);
> +
> +/**
>   * pci_find_upstream_pcie_bridge - find upstream PCIe-to-PCI bridge of a 
> device
>   * @pdev: the PCI device
>   *
>

Re: [PATCH v3, part2 17/20] PCI, iommu: use hotplug-safe iterators to walk PCI buses

2013-06-17 Thread Bjorn Helgaas
On Sun, May 26, 2013 at 11:53:14PM +0800, Jiang Liu wrote:
> Enhance iommu drviers to use hotplug-safe iterators to walk
> PCI buses.
> 
> Signed-off-by: Jiang Liu 
> Cc: Joerg Roedel 
> Cc: Ingo Molnar 
> Cc: Donald Dutile 
> Cc: Hannes Reinecke 
> Cc: "Li, Zhen-Hua" 
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-ker...@vger.kernel.org
> ---
>  drivers/iommu/amd_iommu.c | 4 +++-
>  drivers/iommu/dmar.c  | 6 --

The AMD and Intel IOMMU drivers are very different, and I would
split this into a patch for each.

>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 21d02b0..eef7a7e 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -352,6 +352,7 @@ static int init_iommu_group(struct device *dev)
>   struct iommu_dev_data *dev_data;
>   struct iommu_group *group;
>   struct pci_dev *dma_pdev;
> + struct pci_bus *b = NULL;
>   int ret;
>  
>   group = iommu_group_get(dev);
> @@ -388,7 +389,7 @@ static int init_iommu_group(struct device *dev)
>* the alias.  Be careful to also test the parent device if
>* we think the alias is the root of the group.
>*/
> - bus = pci_find_bus(0, alias >> 8);
> + b = bus = pci_get_bus(0, alias >> 8);
>   if (!bus)
>   goto use_group;
>  
> @@ -408,6 +409,7 @@ static int init_iommu_group(struct device *dev)
>   dma_pdev = get_isolation_root(pci_dev_get(to_pci_dev(dev)));
>  use_pdev:
>   ret = use_pdev_iommu_group(dma_pdev, dev);
> + pci_bus_put(b);
>   pci_dev_put(dma_pdev);
>   return ret;
>  use_group:
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index a7967ce..7162787 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -67,12 +67,12 @@ static void __init dmar_register_drhd_unit(struct 
> dmar_drhd_unit *drhd)
>  static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope 
> *scope,
>  struct pci_dev **dev, u16 segment)
>  {
> - struct pci_bus *bus;
> + struct pci_bus *b, *bus;
>   struct pci_dev *pdev = NULL;
>   struct acpi_dmar_pci_path *path;
>   int count;
>  
> - bus = pci_find_bus(segment, scope->bus);
> + b = bus = pci_get_bus(segment, scope->bus);
>   path = (struct acpi_dmar_pci_path *)(scope + 1);
>   count = (scope->length - sizeof(struct acpi_dmar_device_scope))
>   / sizeof(struct acpi_dmar_pci_path);
> @@ -97,6 +97,8 @@ static int __init dmar_parse_one_dev_scope(struct 
> acpi_dmar_device_scope *scope,
>   count --;
>   bus = pdev->subordinate;
>   }
> + pci_bus_put(b);
> +
>   if (!pdev) {
>   pr_warn("Device scope device [%04x:%02x:%02x.%02x] not found\n",
>   segment, scope->bus, path->dev, path->fn);
> -- 
> 1.8.1.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function

2013-06-25 Thread Bjorn Helgaas
On Thu, Jun 20, 2013 at 10:15 AM, Joerg Roedel  wrote:
> On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
>> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
>> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
>> > > + if (!pci_is_root_bus(pdev->bus)) {
>> > > + struct pci_dev *parent = pdev->bus->self;
>> > > +
>> > > + if (pci_is_pcie(parent) &&
>> > > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>> > > + return true;
>> > > + }
>> >
>> > Hmm, that looks a bit dangerous.
>>
>> How so?  The algorithm seems pretty simple and logical.
>
> It is simple, but it is still a heuristic that may fail at some point,
> no?
>
>> Actually, I believe Bjorn rejected the idea of a fixed list because this
>> problem is detectable.  He also doesn't want me messing with quirks to
>> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
>> having a PCIe capability.  So, I'm stuck and this is where it's ended
>> up.  Thanks,
>
> I think implementing such a list is much safer.
>
> Bjorn, why didn't you like that idea?

Sorry, I can't remember, and I haven't been able to find the
discussion where I said that.  I think the current patches are all in
drivers/iommu, and if a list makes sense there, it's fine with me.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu: Quirked PCIe bridge test and search function

2013-06-26 Thread Bjorn Helgaas
On Wed, Jun 26, 2013 at 12:45 PM, Alex Williamson
 wrote:
> On Tue, 2013-06-25 at 22:20 -0600, Bjorn Helgaas wrote:
>> On Thu, Jun 20, 2013 at 10:15 AM, Joerg Roedel  wrote:
>> > On Thu, Jun 20, 2013 at 09:44:51AM -0600, Alex Williamson wrote:
>> >> On Thu, 2013-06-20 at 15:59 +0200, Joerg Roedel wrote:
>> >> > On Tue, May 28, 2013 at 12:40:20PM -0600, Alex Williamson wrote:
>> >> > > + if (!pci_is_root_bus(pdev->bus)) {
>> >> > > + struct pci_dev *parent = pdev->bus->self;
>> >> > > +
>> >> > > + if (pci_is_pcie(parent) &&
>> >> > > + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
>> >> > > + return true;
>> >> > > + }
>> >> >
>> >> > Hmm, that looks a bit dangerous.
>> >>
>> >> How so?  The algorithm seems pretty simple and logical.
>> >
>> > It is simple, but it is still a heuristic that may fail at some point,
>> > no?
>> >
>> >> Actually, I believe Bjorn rejected the idea of a fixed list because this
>> >> problem is detectable.  He also doesn't want me messing with quirks to
>> >> pci_is_pcie() in PCI because he wants a 1:1 relation between that and
>> >> having a PCIe capability.  So, I'm stuck and this is where it's ended
>> >> up.  Thanks,
>> >
>> > I think implementing such a list is much safer.
>> >
>> > Bjorn, why didn't you like that idea?
>>
>> Sorry, I can't remember, and I haven't been able to find the
>> discussion where I said that.  I think the current patches are all in
>> drivers/iommu, and if a list makes sense there, it's fine with me.
>
> Here's the comment I remember
>
> https://bugzilla.kernel.org/show_bug.cgi?id=44881#c7
>
> Comment #7 From Bjorn Helgaas 2012-08-23 15:58:39
> [snip]
> I doubt the upstream device is at fault.  More likely the
> downstream device is really a PCIe device (a PCIe-to-PCI bridge)
> but just fails to report a PCIe capability.  I think this
> situation is likely too common to deal with via quirks, so we'll
> have to figure out a way to just make this work.

OK, I remember that now.  So the question is whether you want a list
or a set of quirks that may be an ongoing maintenance burden, or
whether you want an algorithm that may be risky but possibly less
maintenance.  I preferred the latter.  I think a failure in the
algorithm will most likely result in a device that just doesn't work
(because we derived a DMA source ID that doesn't match what the IOMMU
sees), so at least the impact is relatively minor, and no worse than a
missing entry in the list of exception devices.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges

2013-07-08 Thread Bjorn Helgaas
On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> Joerg,
> 
> Where do we stand on this series?  You had a concern that the heuristic
> used in patch 1/ could be dangerous.  The suggestion for detecting the
> issue was actually from Bjorn who replied with his rationale.  Do you
> want to go in the direction of a fixed whitelist or do you agree that
> even if the heuristic breaks it provides better behavior than what we
> have now?  Thanks,

I'm trying to take a step back and look at the overall design, not
these specific patches.

IOMMUs translate addresses based on their source, i.e., a PCIe
requester ID.  This is made more complicated by the fact that some
bridges "take ownership" (change the requester ID as they forward
transactions upstream), as well as the fact that conventional PCI has
no requester ID at all.  And some broken devices apparently generate
DMA requests using the requester ID of another device.

We currently deal with this using the pci_find_upstream_pcie_bridge()
and pci_get_dma_source() interfaces, but I think there's too much
assembly required by their users.  pci_find_upstream_pcie_bridge()
callers normally loop through all the bridges between the "upstream
PCIe bridge" and the device, checking for bridges that might take
ownership.  They probably also ought to use pci_get_dma_source() to
account for the broken devices, but most callers don't.

Most of this is PCI-specific stuff that should be of interest to all
IOMMU drivers, and the overall structure of calls and looping should
be the same for all of them, so it would be nice to factor it out
somehow.

The attached patch is guaranteed not to even compile; it's just to
make this idea more concrete.  The basic idea is that since the IOMMU
driver wants to perform some action for each possible requester ID the
IOMMU might see, PCI could provide an iterator
("pci_for_each_requester_id()") to do that.

Bjorn


commit afad51492c6672b96c2b0735600d5695e30f7180
Author: Bjorn Helgaas 
Date:   Wed Jul 3 16:04:26 2013 -0600

pci-add-for-each-requester-id

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index d0627fa..380eb03 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -17,6 +17,89 @@
 DECLARE_RWSEM(pci_bus_sem);
 EXPORT_SYMBOL_GPL(pci_bus_sem);
 
+#define PCI_REQUESTER_ID(dev)  (((dev)->bus->number << 8) | (dev)->devfn)
+#define PCI_BRIDGE_REQUESTER_ID(bridge)((bridge)->subordinate->number 
<< 8)
+
+static inline bool pci_is_pcix(struct pci_dev *dev)
+{
+   return !!pci_pcix_cap(dev); /* XXX not implemented */
+}
+
+static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
+{
+   /*
+* A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
+* Spec v1.0, sec 2.3.
+*/
+   if (pci_is_pcie(bridge) &&
+   pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
+   return true;
+
+   /*
+* A PCI-X to PCI-X bridge need not take ownership because there
+* are requester IDs on the secondary PCI-X bus.  However, if a PCI
+* device is added on the secondary bus, the bridge must revert to
+* being a PCI-X to PCI bridge, and it then *would* take ownership.
+* Assuming a PCI-X to PCI-X bridge takes ownership means we can
+* tolerate a future hot-add without having to change existing
+* IOMMU mappings.
+*/
+   if (pci_is_pcix(bridge))
+   return true;
+
+   return false;
+}
+
+static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
+struct pci_dev *dev)
+{
+   struct pci_dev *bridge;
+   struct pci_bus *child_bus;
+   u8 secondary, subordinate, busn = dev->bus->number;
+
+   if (dev->bus == bus)
+   return dev;
+
+   /*
+* There may be several devices on "bus".  Find the one that is a
+* bridge leading to "dev".
+*/
+   list_for_each_entry(bridge, &bus->devices, bus_list) {
+   child_bus = bridge->subordinate;
+   if (child_bus) {
+   secondary = child_bus->busn_res.start;
+   subordinate = child_bus->busn_res.end;
+   if (secondary <= busn && busn <= subordinate)
+   return bridge;
+   }
+   }
+   return NULL;
+}
+
+int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev *dev,
+ int (*fn)(struct pci_dev *, void *),
+ void *data)
+{
+   int ret;
+
+   dev = pci_get_dma_source(dev);  /* XXX ref count screwup */
+
+   while (bridge != dev) {
+   if (pci_bridge_may_take_ownership(bridge)) {
+   ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridg

Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges

2013-07-08 Thread Bjorn Helgaas
On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
> On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
> > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
> > > Joerg,
> > > 
> > > Where do we stand on this series?  You had a concern that the heuristic
> > > used in patch 1/ could be dangerous.  The suggestion for detecting the
> > > issue was actually from Bjorn who replied with his rationale.  Do you
> > > want to go in the direction of a fixed whitelist or do you agree that
> > > even if the heuristic breaks it provides better behavior than what we
> > > have now?  Thanks,
> > 
> > I'm trying to take a step back and look at the overall design, not
> > these specific patches.
> > 
> > IOMMUs translate addresses based on their source, i.e., a PCIe
> > requester ID.  This is made more complicated by the fact that some
> > bridges "take ownership" (change the requester ID as they forward
> > transactions upstream), as well as the fact that conventional PCI has
> > no requester ID at all.  And some broken devices apparently generate
> > DMA requests using the requester ID of another device.
> > 
> > We currently deal with this using the pci_find_upstream_pcie_bridge()
> > and pci_get_dma_source() interfaces, but I think there's too much
> > assembly required by their users.  pci_find_upstream_pcie_bridge()
> > callers normally loop through all the bridges between the "upstream
> > PCIe bridge" and the device, checking for bridges that might take
> > ownership.  They probably also ought to use pci_get_dma_source() to
> > account for the broken devices, but most callers don't.
> > 
> > Most of this is PCI-specific stuff that should be of interest to all
> > IOMMU drivers, and the overall structure of calls and looping should
> > be the same for all of them, so it would be nice to factor it out
> > somehow.
> > 
> > The attached patch is guaranteed not to even compile; it's just to
> > make this idea more concrete.  The basic idea is that since the IOMMU
> > driver wants to perform some action for each possible requester ID the
> > IOMMU might see, PCI could provide an iterator
> > ("pci_for_each_requester_id()") to do that.
> > 
> > Bjorn
> > 
> > 
> > commit afad51492c6672b96c2b0735600d5695e30f7180
> > Author: Bjorn Helgaas 
> > Date:   Wed Jul 3 16:04:26 2013 -0600
> > 
> > pci-add-for-each-requester-id
> > 
> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > index d0627fa..380eb03 100644
> > --- a/drivers/pci/search.c
> > +++ b/drivers/pci/search.c
> > @@ -17,6 +17,89 @@
> >  DECLARE_RWSEM(pci_bus_sem);
> >  EXPORT_SYMBOL_GPL(pci_bus_sem);
> >  
> > +#define PCI_REQUESTER_ID(dev)  (((dev)->bus->number << 8) | 
> > (dev)->devfn)
> > +#define PCI_BRIDGE_REQUESTER_ID(bridge)((bridge)->subordinate->number 
> > << 8)
> > +
> > +static inline bool pci_is_pcix(struct pci_dev *dev)
> > +{
> > +   return !!pci_pcix_cap(dev); /* XXX not implemented */
> > +}
> > +
> > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
> > +{
> > +   /*
> > +* A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
> > +* Spec v1.0, sec 2.3.
> > +*/
> > +   if (pci_is_pcie(bridge) &&
> > +   pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> > +   return true;
> 
> Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
> capability are exactly what causes the current bug.

Likely so.  We might be able to identify that as "secondary bus is not
PCIe" or something.

> > +
> > +   /*
> > +* A PCI-X to PCI-X bridge need not take ownership because there
> > +* are requester IDs on the secondary PCI-X bus.  However, if a PCI
> > +* device is added on the secondary bus, the bridge must revert to
> > +* being a PCI-X to PCI bridge, and it then *would* take ownership.
> > +* Assuming a PCI-X to PCI-X bridge takes ownership means we can
> > +* tolerate a future hot-add without having to change existing
> > +* IOMMU mappings.
> > +*/
> > +   if (pci_is_pcix(bridge))
> > +   return true;
> > +
> > +   return false;
> > +}
> > +
> > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
> > +struct pci_dev *dev)
> > +{
> > +   struct

Re: [PATCH v2 0/2] iommu/intel: Quirk non-compliant PCIe-to-PCI bridges

2013-07-09 Thread Bjorn Helgaas
On Tue, Jul 9, 2013 at 12:27 PM, Alex Williamson
 wrote:
> On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote:
>> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
>> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
>> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
>> > > > Joerg,
>> > > >
>> > > > Where do we stand on this series?  You had a concern that the heuristic
>> > > > used in patch 1/ could be dangerous.  The suggestion for detecting the
>> > > > issue was actually from Bjorn who replied with his rationale.  Do you
>> > > > want to go in the direction of a fixed whitelist or do you agree that
>> > > > even if the heuristic breaks it provides better behavior than what we
>> > > > have now?  Thanks,
>> > >
>> > > I'm trying to take a step back and look at the overall design, not
>> > > these specific patches.
>> > >
>> > > IOMMUs translate addresses based on their source, i.e., a PCIe
>> > > requester ID.  This is made more complicated by the fact that some
>> > > bridges "take ownership" (change the requester ID as they forward
>> > > transactions upstream), as well as the fact that conventional PCI has
>> > > no requester ID at all.  And some broken devices apparently generate
>> > > DMA requests using the requester ID of another device.
>> > >
>> > > We currently deal with this using the pci_find_upstream_pcie_bridge()
>> > > and pci_get_dma_source() interfaces, but I think there's too much
>> > > assembly required by their users.  pci_find_upstream_pcie_bridge()
>> > > callers normally loop through all the bridges between the "upstream
>> > > PCIe bridge" and the device, checking for bridges that might take
>> > > ownership.  They probably also ought to use pci_get_dma_source() to
>> > > account for the broken devices, but most callers don't.
>> > >
>> > > Most of this is PCI-specific stuff that should be of interest to all
>> > > IOMMU drivers, and the overall structure of calls and looping should
>> > > be the same for all of them, so it would be nice to factor it out
>> > > somehow.
>> > >
>> > > The attached patch is guaranteed not to even compile; it's just to
>> > > make this idea more concrete.  The basic idea is that since the IOMMU
>> > > driver wants to perform some action for each possible requester ID the
>> > > IOMMU might see, PCI could provide an iterator
>> > > ("pci_for_each_requester_id()") to do that.
>> > >
>> > > Bjorn
>> > >
>> > >
>> > > commit afad51492c6672b96c2b0735600d5695e30f7180
>> > > Author: Bjorn Helgaas 
>> > > Date:   Wed Jul 3 16:04:26 2013 -0600
>> > >
>> > > pci-add-for-each-requester-id
>> > >
>> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> > > index d0627fa..380eb03 100644
>> > > --- a/drivers/pci/search.c
>> > > +++ b/drivers/pci/search.c
>> > > @@ -17,6 +17,89 @@
>> > >  DECLARE_RWSEM(pci_bus_sem);
>> > >  EXPORT_SYMBOL_GPL(pci_bus_sem);
>> > >
>> > > +#define PCI_REQUESTER_ID(dev)(((dev)->bus->number << 8) | 
>> > > (dev)->devfn)
>> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge)  ((bridge)->subordinate->number 
>> > > << 8)
>> > > +
>> > > +static inline bool pci_is_pcix(struct pci_dev *dev)
>> > > +{
>> > > + return !!pci_pcix_cap(dev); /* XXX not implemented */
>> > > +}
>> > > +
>> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
>> > > +{
>> > > + /*
>> > > +  * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
>> > > +  * Spec v1.0, sec 2.3.
>> > > +  */
>> > > + if (pci_is_pcie(bridge) &&
>> > > + pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
>> > > + return true;
>> >
>> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
>> > capability are exactly what causes the current bug.
>>
>> Likely so.  We might be able to identify that as "secondary bus is not
>> PCIe" or something.
>>
>> > > +
>> > > + /*
>> 

Re: WARNING: at drivers/iommu/dmar.c:484 warn_invalid_dmar with Intel Motherboard

2013-07-09 Thread Bjorn Helgaas
[+cc Joerg, David, iommu list]

On Tue, Jul 9, 2013 at 2:24 PM, Guenter Roeck  wrote:
> I started seeing this problem after updating the BIOS trying fix another 
> issue,
> though I may have missed it earlier.
>
> I understand this is a BIOS bug. Would be great if someone can pass this on
> to Intel BIOS engineers.

Maybe.  It'd be nice if Linux handled it better, though.

> CPU is i7-4770K.
>
> Guenter
>
> ---
>
> [0.00] WARNING: at drivers/iommu/dmar.c:484 
> warn_invalid_dmar+0x86/0xa0()
> [0.00] Your BIOS is broken; DMAR reported at address 0!
> [0.00] BIOS vendor: Intel Corp.; Ver: 
> RLH8710H.86A.0320.2013.0606.1802; Product Version:
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0+ #1
> [0.00] Hardware name:  /DH87RL, BIOS 
> RLH8710H.86A.0320.2013.0606.1802 06/06/2013
> [0.00]  000b 81c01e20 81671cfc 
> 81c01e68
> [0.00]  81c01e58 81043370 81f6800c 
> 81cbb520
> [0.00]   88061fdaad40 c73cc018 
> 81c01eb8
> [0.00] Call Trace:
> [0.00]  [] dump_stack+0x45/0x56
> [0.00]  [] warn_slowpath_common+0x70/0xa0
> [0.00]  [] warn_slowpath_fmt_taint+0x44/0x50
> [0.00]  [] ? early_ioremap+0x13/0x15
> [0.00]  [] ? __acpi_map_table+0x13/0x1a
> [0.00]  [] warn_invalid_dmar+0x86/0xa0
> [0.00]  [] check_zero_address+0x57/0xf7
> [0.00]  [] detect_intel_iommu+0x15/0xb6
> [0.00]  [] pci_iommu_alloc+0x49/0x70
> [0.00]  [] mem_init+0x17/0x9c
> [0.00]  [] start_kernel+0x1c5/0x3e2
> [0.00]  [] ? repair_env_string+0x5e/0x5e
> [0.00]  [] x86_64_start_reservations+0x2a/0x2c
> [0.00]  [] x86_64_start_kernel+0xf6/0xf9
> [0.00] ---[ end trace a7e3512e2fa85eaf ]---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Linux Plumbers ACPI/PM, PCI Microconference

2013-07-23 Thread Bjorn Helgaas
On Tue, Jul 16, 2013 at 8:21 PM, Myron Stowe  wrote:
> Linux Plumbers has approved an ACPI/PM, PCI microconference. The
> overview page is here:
>
> http://wiki.linuxplumbersconf.org/2013:pci_subsystem
>
> We would like to start receiving volunteers for presenting topics of
> interest.  There is a lot of activity in these subsystems so please
> respond by submitting presentation or discussion proposals that you
> would be willing to cover for consideration.  You should also feel
> free to submit ideas as proposals that others could cover.

Somebody else privately suggested these ARM/PCI topics:

> -> how it's been cleaned up to be more common across arm-32 machines
> -> dependencies on dtb
> -> future use/dependency(s) on ACPI
> -> hotplug support issues (if any; acpi (host-bridge) vs  
> ?)

and I'm interested in those as well.  There seems to be a lot of
activity in that area.  In addition, I'm interested in ARM IOMMU
support.  I see drivers/iommu/arm-smmu.c, but it's brand-new and I
haven't looked at it at all.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-23 Thread Bjorn Helgaas
On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> This provides interfaces for drivers to discover the visible PCIe
> requester ID for a device, for things like IOMMU setup, and iterate

IDs (plural)

> over the device chain from requestee to requester, including DMA
> quirks at each step.

"requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
normally makes a noun that refers to the object of the action.  So
"requestee" sounds like it means something like "target" or "responder,"
but that's not what you mean here.

> Suggested-by: Bjorn Helgaas 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/pci/search.c |  198 
> ++
>  include/linux/pci.h  |7 ++
>  2 files changed, 205 insertions(+)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index d0627fa..4759c02 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -18,6 +18,204 @@ DECLARE_RWSEM(pci_bus_sem);
>  EXPORT_SYMBOL_GPL(pci_bus_sem);
>  
>  /*
> + * pci_has_pcie_requester_id - Does @dev have a PCIe requester ID
> + * @dev: device to test
> + */
> +static bool pci_has_pcie_requester_id(struct pci_dev *dev)
> +{
> + /*
> +  * XXX There's no indicator of the bus type, conventional PCI vs
> +  * PCI-X vs PCI-e, but we assume that a caller looking for a PCIe
> +  * requester ID is a native PCIe based system (such as VT-d or
> +  * AMD-Vi).  It's common that PCIe root complex devices do not
> +  * include a PCIe capability, but we can assume they are PCIe
> +  * devices based on their topology.
> +  */
> + if (pci_is_pcie(dev) || pci_is_root_bus(dev->bus))
> + return true;
> +
> + /*
> +  * PCI-X devices have a requester ID, but the bridge may still take
> +  * ownership of transactions and create a requester ID.  We therefore
> +  * assume that the PCI-X requester ID is not the same one used on PCIe.
> +  */
> +
> +#ifdef CONFIG_PCI_QUIRKS
> + /*
> +  * Quirk for PCIe-to-PCI bridges which do not expose a PCIe capability.
> +  * If the device is a bridge, look to the next device upstream of it.
> +  * If that device is PCIe and not a PCIe-to-PCI bridge, then by
> +  * deduction, the device must be PCIe and therefore has a requester ID.
> +  */
> + if (dev->subordinate) {
> + struct pci_dev *parent = dev->bus->self;
> +
> + if (pci_is_pcie(parent) &&
> + pci_pcie_type(parent) != PCI_EXP_TYPE_PCI_BRIDGE)
> + return true;
> + }
> +#endif
> +
> + return false;
> +}
> +
> +/*
> + * pci_has_visible_pcie_requester_id - Can @bridge see @dev's requester ID?
> + * @dev: requester device
> + * @bridge: upstream bridge (or NULL for root bus)
> + */
> +static bool pci_has_visible_pcie_requester_id(struct pci_dev *dev,
> +   struct pci_dev *bridge)
> +{
> + /*
> +  * The entire path must be tested, if any step does not have a
> +  * requester ID, the chain is broken.  This allows us to support
> +  * topologies with PCIe requester ID gaps, ex: PCIe-PCI-PCIe
> +  */
> + while (dev != bridge) {
> + if (!pci_has_pcie_requester_id(dev))
> + return false;
> +
> + if (pci_is_root_bus(dev->bus))
> + return !bridge; /* false if we don't hit @bridge */
> +
> + dev = dev->bus->self;
> + }
> +
> + return true;
> +}
> +
> +/*
> + * Legacy PCI bridges within a root complex (ex. Intel 82801) report
> + * a different requester ID than a standard PCIe-to-PCI bridge.  Instead
> + * of using (subordinate << 8 | 0) the use (bus << 8 | devfn), like a

s/the/they/

Did you learn about this empirically?  Intel spec?  I wonder if there's
some way to derive this from the PCIe specs.

> + * standard PCIe endpoint.  This function detects them.
> + *
> + * XXX Is this Intel vendor ID specific?
> + */
> +static bool pci_bridge_uses_endpoint_requester(struct pci_dev *bridge)
> +{
> + if (!pci_is_pcie(bridge) && pci_is_root_bus(bridge->bus))
> + return true;
> +
> + return false;
> +}
> +
> +#define PCI_REQUESTER_ID(dev)(((dev)->bus->number << 8) | 
> (dev)->devfn)
> +#define PCI_BRIDGE_REQUESTER_ID(dev) ((dev)->subordinate->number << 8)
> +
> +/*
> + * pci_get_visible_pcie_requester - Get requester and requester ID for
> + * 

Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-24 Thread Bjorn Helgaas
On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
 wrote:
> On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
>> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
>> > This provides interfaces for drivers to discover the visible PCIe
>> > requester ID for a device, for things like IOMMU setup, and iterate
>>
>> IDs (plural)
>
> How does a device can't have multiple requester IDs?  Reading below, I'm
> not sure we're on the same page for the purpose of this patch.

>> "requestee" doesn't make sense to me.  The "-ee" suffix added to a verb
>> normally makes a noun that refers to the object of the action.  So
>> "requestee" sounds like it means something like "target" or "responder,"
>> but that's not what you mean here.
>
> Hmm, ok.  I figured a request-er makes a request on behalf of a
> request-ee.  Suggestions?

I would expect a request-er to make a request *to* a request-ee, just
like a grant-or makes a grant to a grant-ee.  My suggestion is to use
"requester" consistently for only the originator of a DMA transaction.
 Any devices above it are by definition "bridges".  As the DMA
transaction propagates through the fabric, it may be tagged by bridges
with different requester IDs.

The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
not sure the intermediate pci_devs are.

>> > + * pci_get_visible_pcie_requester - Get requester and requester ID for
>> > + *  @requestee below @bridge
>> > + * @requestee: requester device
>> > + * @bridge: upstream bridge (or NULL for root bus)
>> > + * @requester_id: location to store requester ID or NULL
>> > + */
>> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> > +  struct pci_dev *bridge,
>> > +  u16 *requester_id)
>>
>> I'm not sure it makes sense to return a struct pci_dev here because
>> there's no requirement that a requester ID correspond to an actual
>> pci_dev.
>
> That's why this function is named get_.._requester instead of requester
> ID.  I believe there still has to be a struct pci_dev that does the
> request, but the requester ID for that device may not actually match.
> So I return both.  In a PCIe-to-PCI bridge case, the pci_dev is the
> bridge, but the requester ID is either the bridge bus|devfn or
> subordinate|0 depending on the topology.  If we want to support "ghost
> functions", we can return the real pci_dev and a ghost requester ID.
>
> I think if we used just a requester ID, it ends up being extremely
> difficult to pass that into anything else since we then have to search
> again for where that requester ID is rooted.

Returning both a pci_dev and a requester ID makes it more complicated.
 At the hardware level, transactions contain only a requester ID, and
bridges can either drop it, pass it unchanged, or assign a new one.  I
think the code will be simpler if we just model that.

>> > + * pcie_for_each_requester - Call callback @fn on each devices and DMA 
>> > source
>> > + *   from @requestee to the PCIe requester ID 
>> > visible
>> > + *   to @bridge.
>>
>> Transactions from a device may appear with one of several requester IDs,
>> but there's not necessarily an actual pci_dev for each ID, so I think the
>> caller reads better if it's "...for_each_requester_id()"
>
> Wouldn't you expect to pass a requester ID into a function with that
> name?  I'm pretty sure I had it named that at one point but thought the
> parameters made more sense this way.  I'll see if I can think of a
> better name.

My thought was to pass a pci_dev (the originator of the DMA, which I
would call the "requester") and a callback.  The callback would accept
the requester pci_dev (always the same requester device) and a
requester ID.

This would call @fn for each possible requester ID for transactions
from the device.  IOMMU drivers should only need the requester ID to
manage mappings; they shouldn't need a pci_dev corresponding to any
intermediate bridges.

>> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
>> > +  struct pci_dev *bridge,
>> > +  u16 *requester_id);
>>
>> The structure of this interface implies that there is only one visible
>> requester ID, but the whole point of this patch is that a transaction from
&

Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-24 Thread Bjorn Helgaas
On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> >  wrote:
> > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >   As the DMA
> > transaction propagates through the fabric, it may be tagged by bridges
> > with different requester IDs.
> > 
> > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> > not sure the intermediate pci_devs are.
> 
> A u16 requester ID doesn't mean much on it's own though, it's not
> necessarily even unique.  A requester ID associated with the context of
> a pci_dev is unique and gives us a reference point if we need to perform
> another operation on that requester ID.

A u16 requester ID better mean something to an IOMMU -- it's all the
IOMMU can use to look up the correct mapping.  That's why we have to
give the iterator something to define the scope to iterate over.  The
same requester ID could mean something totally unrelated in a
different scope, e.g., below a different IOMMU.

> > Returning both a pci_dev and a requester ID makes it more complicated.
> >  At the hardware level, transactions contain only a requester ID, and
> > bridges can either drop it, pass it unchanged, or assign a new one.  I
> > think the code will be simpler if we just model that.
> 
> I'm not convinced.  Patch 2/2 makes use of both the returned pci_dev and
> the returned requester ID and it's a huge simplification overall.

The IOMMU driver makes mappings so DMA from device A can reach a
buffer.  It needs to know about device A, and it needs to know what
source IDs might appear on those DMA transactions.  Why would it need
to know anything about any bridges between A and the IOMMU?

> > My thought was to pass a pci_dev (the originator of the DMA, which I
> > would call the "requester") and a callback.  The callback would accept
> > the requester pci_dev (always the same requester device) and a
> > requester ID.
> > 
> > This would call @fn for each possible requester ID for transactions
> > from the device.  IOMMU drivers should only need the requester ID to
> > manage mappings; they shouldn't need a pci_dev corresponding to any
> > intermediate bridges.
> 
> This implementation is almost the same with the only change being that
> the pci_dev passed to the callback is the one most closely associated
> with the requester ID.  For the IOMMU driver, it doesn't matter since
> it's only using the requester ID, but why would the callback care about
> the original requester?  If it needed to do something device specific,
> it's going to care about the closest device to the requester ID.

If this can be done without passing a pci_dev at all to the callback,
that would be even better.  If a caller needed the original pci_dev,
it could always pass it via the "void *data".

I don't see the point of passing a "device closest to the requester
ID."  What would the IOMMU do with that?  As far as the IOMMU is
concerned, the requester ID could be an arbitrary number completely
unrelated to a pci_dev.

> > >> > +struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev 
> > >> > *requestee,
> > >> > +  struct pci_dev *bridge,
> > >> > +  u16 *requester_id);
> > >>
> > >> The structure of this interface implies that there is only one visible
> > >> requester ID, but the whole point of this patch is that a transaction 
> > >> from
> > >> @requestee may appear with one of several requester IDs.  So which one 
> > >> will
> > >> this return?
> > >
> > > I thought the point of this patch was to have an integrated interface
> > > for finding the requester ID and doing something across all devices with
> > > that requester ID
> > 
> > Oh, here's the difference in our understanding.  "Doing something
> > across all devices with that requester ID" sounds like identifying the
> > set of devices that have to be handled as a group by the IOMMU.
> > That's certainly an issue, but I wasn't considering it at all.
> > 
> > I was only concerned with the question of a single device that
> > requires multiple IOMMU mappings because DMA requests might use any of
> > several source IDs.  This is mentioned in sec 3.6.1.1 of the VT-d
> > spec, i.e., "requests arriving with the so

Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA

2013-07-25 Thread Bjorn Helgaas
On Wed, Jul 24, 2013 at 12:29 AM, Takao Indoh
 wrote:
> Sorry for letting this discussion slide, I was busy on other works:-(
> Anyway, the summary of previous discussion is:
> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>   boot. This expects PCI enumeration is done before IOMMU
>   initialization as follows.
> (1) PCI enumeration
> (2) fs_initcall ---> device reset
> (3) IOMMU initialization
> - This works on x86, but does not work on other architecture because
>   IOMMU is initialized before PCI enumeration on some architectures. So,
>   device reset should be done where IOMMU is initialized instead of
>   initcall.
> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>
> Resetting devices in panic kernel is against kdump policy and seems not to
> be good idea. So I think adding reset code into iommu initialization is
> better. I'll post patches for that.

Of course nobody *wants* to do anything in the panic kernel.  But
simply saying "it's against kdump policy and seems not to be a good
idea" is not a technical argument.  There are things that are
impractical to do in the kdump kernel, so they have to be done in the
panic kernel even though we know the kernel is unreliable and the
attempt may fail.

My point about IOMMU and PCI initialization order doesn't go away just
because it doesn't fit "kdump policy."  Having system initialization
occur in a logical order is far more important than making kdump work.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-25 Thread Bjorn Helgaas
On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote:
> On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
> >On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >>This provides interfaces for drivers to discover the visible PCIe
> >>requester ID for a device, for things like IOMMU setup, and iterate
> >
> >IDs (plural)
> >
> a single device does not have multiple requester id's;
> can have multiple tag-id's (that are ignored in this situation, but
> can be used by switches for ordering purposes), but there's only 1/fcn
> (except for those quirker pdevs!).

Generally a device does not have multiple requester IDs, but the
IOMMU may see one of several requester IDs for DMAs from a given
device because bridges may take ownership of those transactions (sec
3.6.1.1 of the VT-d spec).

Just to be clear, I envision this whole interface as being
specifically for use by IOMMU drivers, so I'm only trying to provide
what's necessary to build IOMMU mappings.

> >>+ * pci_get_visible_pcie_requester - Get requester and requester ID for
> >>+ *  @requestee below @bridge
> >>+ * @requestee: requester device
> >>+ * @bridge: upstream bridge (or NULL for root bus)
> >>+ * @requester_id: location to store requester ID or NULL
> >>+ */
> >>+struct pci_dev *pci_get_visible_pcie_requester(struct pci_dev *requestee,
> >>+  struct pci_dev *bridge,
> >>+  u16 *requester_id)
> >
> >I'm not sure it makes sense to return a struct pci_dev here because
> >there's no requirement that a requester ID correspond to an actual
> >pci_dev.
> >
> well, I would expect the only callers would be for subsys (iommu's)
> searching to find requester-id for a pdev, b/c if a pdev doesn't exist,
> then the device (and requester-id) doesn't exist... :-/

> >>+ * pcie_for_each_requester - Call callback @fn on each devices and DMA 
> >>source
> >>+ *   from @requestee to the PCIe requester ID 
> >>visible
> >>+ *   to @bridge.
> >
> >Transactions from a device may appear with one of several requester IDs,
> >but there's not necessarily an actual pci_dev for each ID, so I think the
> ditto above; have to have a pdev for each id

This *might* be true, but I don't think we should rely on it.  For
example:

  00:1c.0 PCIe to PCI bridge to [bus 01]
  01:01.0 PCI endpoint

The bridge will take ownership of DMA transactions from the 01:01.0
endpoint.  An IOMMU on bus 00 will see a bridge-assigned requester
ID of 01:00.0 (subordinate bus number, devfn zero), but there is no
01:00.0 device.

Maybe the rules of conventional PCI require a device zero (I don't
remember), but even if they do, it's ugly to rely on that here
because I don't think device 01:00.0 is relevant to mappings for
device 01:01.0.

Obviously we also have to be aware that 01:00.0 and 01:01.0 can't be
isolated from each other, but I think that issue is separate from
the question of what requester IDs have to be mapped to make 01:01.0
work.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-26 Thread Bjorn Helgaas
On Thu, Jul 25, 2013 at 02:25:04PM -0400, Don Dutile wrote:
> On 07/25/2013 01:19 PM, Bjorn Helgaas wrote:
> >On Wed, Jul 24, 2013 at 04:42:03PM -0400, Don Dutile wrote:
> >>On 07/23/2013 06:35 PM, Bjorn Helgaas wrote:
> >>>On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> >>>>+ * pcie_for_each_requester - Call callback @fn on each devices and DMA 
> >>>>source
> >>>>+ *   from @requestee to the PCIe requester ID 
> >>>>visible
> >>>>+ *   to @bridge.
> >>>
> >>>Transactions from a device may appear with one of several requester IDs,
> >>>but there's not necessarily an actual pci_dev for each ID, so I think the
> >>ditto above; have to have a pdev for each id
> >
> >This *might* be true, but I don't think we should rely on it.  For
> >example:
> >
> >   00:1c.0 PCIe to PCI bridge to [bus 01]
> >   01:01.0 PCI endpoint
> >
> >The bridge will take ownership of DMA transactions from the 01:01.0
> >endpoint.  An IOMMU on bus 00 will see a bridge-assigned requester
> >ID of 01:00.0 (subordinate bus number, devfn zero), but there is no
> >01:00.0 device.
> >
> Clarification:
> I meant that each requester-id must have at least 1 PCI device associated
> with it.

I don't think that's true, as in the example above.  Requester ID
0x0100 has no pci_dev associated with it.  What am I missing?

Maybe you mean that requester ID 0x0100 is associated with pci_dev
01:01.0 in the sense that DMAs from 01:01.0 appear with that ID?
That's true, but I can't think of a reason why we would start with
ID 0x0100 and try to look up 01:01.0 from it.  And of course, if you
*did* try to look up the device, there could be several of them.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-26 Thread Bjorn Helgaas
On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
> On Wed, 2013-07-24 at 17:24 -0600, Bjorn Helgaas wrote:
> > On Wed, Jul 24, 2013 at 12:12:28PM -0600, Alex Williamson wrote:
> > > On Wed, 2013-07-24 at 10:47 -0600, Bjorn Helgaas wrote:
> > > > On Tue, Jul 23, 2013 at 5:21 PM, Alex Williamson
> > > >  wrote:
> > > > > On Tue, 2013-07-23 at 16:35 -0600, Bjorn Helgaas wrote:
> > > > >> On Thu, Jul 11, 2013 at 03:03:27PM -0600, Alex Williamson wrote:
> > > >   As the DMA
> > > > transaction propagates through the fabric, it may be tagged by bridges
> > > > with different requester IDs.
> > > > 
> > > > The requester IDs are needed outside PCI (by IOMMU drivers), but I'm
> > > > not sure the intermediate pci_devs are.
> > > 
> > > A u16 requester ID doesn't mean much on it's own though, it's not
> > > necessarily even unique.  A requester ID associated with the context of
> > > a pci_dev is unique and gives us a reference point if we need to perform
> > > another operation on that requester ID.
> > 
> > A u16 requester ID better mean something to an IOMMU -- it's all the
> > IOMMU can use to look up the correct mapping.  That's why we have to
> > give the iterator something to define the scope to iterate over.  The
> > same requester ID could mean something totally unrelated in a
> > different scope, e.g., below a different IOMMU.
> 
> The point I'm trying to make is that a requester ID depends on it's
> context (minimally, the PCI segment).  The caller can assume the context
> based on the calling parameters or we can provide context in the form of
> an associated pci_dev.  I chose the latter path because I prefer
> explicit interfaces and it has some usefulness in the intel-iommu
> implementation.
> 
> For instance, get_domain_for_dev() first looks to see if a pci_dev
> already has a domain.  If it doesn't, we want to look to see if there's
> an upstream device that would use the same requester ID that already has
> a domain.  If our get-requester-ID-for-device function returns only the
> requester ID, we don't know if that requester ID is the device we're
> searching from or some upstream device.  Therefore we potentially do an
> unnecessary search for the domain.
> 
> The other user is intel_iommu_add_device() where we're trying to build
> IOMMU groups.  Visibility is the first requirement of an IOMMU group.
> If the IOMMU cannot distinguish between devices, they must be part of
> the same IOMMU group.  Here we want to find the pci_dev that hosts the
> requester ID.  I don't even know how we'd implement this with a function
> that only returned the requester ID.  Perhaps we'd have to walk upstream
> from the device calling the get-requester-ID-for-device function at each
> step and noticing when it changed.  That moves significant logic back
> into the caller code.
> ...

> > I don't see the point of passing a "device closest to the requester
> > ID."  What would the IOMMU do with that?  As far as the IOMMU is
> > concerned, the requester ID could be an arbitrary number completely
> > unrelated to a pci_dev.
> 
> Do you have an example of a case where a requester ID doesn't have some
> association to a pci_dev?

I think our confusion here is the same as what Don & I have been
hashing out -- I'm saying a requester ID fabricated by a bridge
need not correspond to a specific pci_dev, and you probably mean
that every requester ID is by definition the result of *some* PCI
device making a DMA request.

> ...
> Furthermore, if we have:
> 
>  -- A
> /
> X--Y
> \
>  -- B
> ... 

> Let me go back to the X-Y-A|B example above to see if I can explain why
> pcie_for_each_requester_id() doesn't make sense to me.  Generally a
> for_each_foo function should iterate across all things with the same
> foo.  So a for_each_requester_id should iterate across all things with
> the same requester ID.  

Hm, that's not the way I think of for_each_FOO() interfaces.  I
think of it as "execute the body (or callback) for every possible
FOO", where FOO is different each time.  for_each_pci_dev(),
pci_bus_for_each_resource(), for_each_zone(), for_each_cpu(), etc.,
work like that.

But the more important question is what arguments we give to the
callback.  My proposal was to map

  {pci_dev -> {requester-ID-A, requester-ID-B, ...}}

Yours is to map

  {pci_dev -> {{pci_dev-A, requester-ID-A}, {pci_dev-B, requester-ID-B}, ...}}

i.e., your callback gets both a pci_dev and a requester-ID.  I'm

Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA

2013-07-29 Thread Bjorn Helgaas
On Sun, Jul 28, 2013 at 6:37 PM, Takao Indoh  wrote:
> (2013/07/26 2:00), Bjorn Helgaas wrote:
>> On Wed, Jul 24, 2013 at 12:29 AM, Takao Indoh
>>  wrote:
>>> Sorry for letting this discussion slide, I was busy on other works:-(
>>> Anyway, the summary of previous discussion is:
>>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>>>boot. This expects PCI enumeration is done before IOMMU
>>>initialization as follows.
>>>  (1) PCI enumeration
>>>  (2) fs_initcall ---> device reset
>>>  (3) IOMMU initialization
>>> - This works on x86, but does not work on other architecture because
>>>IOMMU is initialized before PCI enumeration on some architectures. So,
>>>device reset should be done where IOMMU is initialized instead of
>>>initcall.
>>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>>
>>> Resetting devices in panic kernel is against kdump policy and seems not to
>>> be good idea. So I think adding reset code into iommu initialization is
>>> better. I'll post patches for that.
>>
>> Of course nobody *wants* to do anything in the panic kernel.  But
>> simply saying "it's against kdump policy and seems not to be a good
>> idea" is not a technical argument.  There are things that are
>> impractical to do in the kdump kernel, so they have to be done in the
>> panic kernel even though we know the kernel is unreliable and the
>> attempt may fail.
>
> Accessing kernel data in panic kernel causes panic again, so
> - Don't touch kernel data in panic situation
> - Jump to kdump kernel as quickly as possible, and do things in safe
>   kernel
> These are basic "kdump policy". Of course if there are any works which
> we cannot do in kdump kernel and can do only in panic kernel, for
> example saving registers or stopping cpus, we should do them in panic
> kernel.
>
> Resetting devices in panic kernel is worth considering if we can safely
> find pci_dev and reset it, but I have no idea how to do that because
> for example struct pci_dev may be borken.

Nobody can guarantee that the panic kernel can do *anything* safely
because any arbitrary kernel data or text may be corrupted.  But if
you consider any specific data structure, e.g., CPU or PCI device
lists, it's not very likely that it will be corrupted.

>> My point about IOMMU and PCI initialization order doesn't go away just
>> because it doesn't fit "kdump policy."  Having system initialization
>> occur in a logical order is far more important than making kdump work.
>
> My next plan is as follows. I think this is matched to logical order
> on boot.
>
> drivers/pci/pci.c
> - Add function to reset bus, for example, pci_reset_bus(struct pci_bus *bus)
>
> drivers/iommu/intel-iommu.c
> - On initialization, if IOMMU is already enabled, call this bus reset
>   function before disabling and re-enabling IOMMU.

I raised this issue because of arches like sparc that enumerate the
IOMMU before the PCI devices that use it.  In that situation, I think
you're proposing this:

  panic kernel
enable IOMMU
panic
  kdump kernel
initialize IOMMU (already enabled)
  pci_reset_bus
  disable IOMMU
  enable IOMMU
enumerate PCI devices

But the problem is that when you call pci_reset_bus(), you haven't
enumerated the PCI devices, so you don't know what to reset.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-07-29 Thread Bjorn Helgaas
On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson
 wrote:
> On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
>> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:

>> The "most closely associated device" idea seems confusing to
>> describe and think about.  I think you want to use it as part of
>> grouping devices into domains.  But I think we should attack that
>> problem separately.  For grouping or isolation questions, we have
>> to pay attention to things like ACS, which are not relevant for
>> mapping.
>
> We are only touch isolation insofar as providing an interface for a
> driver to determine the point in the PCI topology where the requester ID
> is rooted.  Yes, grouping can make use of that, but I object to the idea
> that we're approaching some slippery slope of rolling multiple concepts
> into this patch.  What I'm proposing here is strictly a requester ID
> interface.  I believe that providing a way for a caller to determine
> that two devices have a requester ID rooted at the same point in the PCI
> topology is directly relevant to that interface.
>
> pci_find_upstream_pcie_bridge() attempts to do this same thing.
> Unfortunately, the interface is convoluted, it doesn't handle quirks,
> and it requires a great deal of work by the caller to then walk the
> topology and create requester IDs at each step.  This also indicates
> that at each step, the requester ID is associated with some pci_dev.
> Providing a pci_dev is simply showing our work and providing context to
> the requester ID (ie. here's the requester ID and the step along the
> path from which it was derived.  Here's your top level requester ID and
> the point in the topology where it's based).

What does the driver *do* with the pci_dev?  If it's not necessary,
then showing our work and providing context just complicates the
interface and creates opportunities for mistakes.  If we're creating
IOMMU mappings, only the requester ID is needed.

I think you used get_domain_for_dev() and intel_iommu_add_device() as
examples, but as far as I can tell, they use the pci_dev as a way to
learn about isolation.  For that purpose, I don't think you want an
iterator -- you only want to know about the single pci_dev that's the
root of the isolation domain, and requester IDs really aren't
relevant.

>> > If we look at an endpoint device like A, only A
>> > has A's requester ID.  Therefore, why would for_each_requester_id(A)
>> > traverse up to Y?
>>
>> Even if A is a PCIe device, we have to traverse upwards to find any
>> bridges that might drop A's requester ID or take ownership, e.g., if
>> we have this:
>>
>>   00:1c.0 PCIe-to-PCI bridge to [bus 01-02]
>>   01:00.0 PCI-to-PCIe bridge to [bus 02]
>>   02:00.0 PCIe endpoint A
>>
>> the IOMMU has to look for requester-ID 0100.
>
> And I believe this patch handles this case correctly; I mentioned this
> exact example in the v2 RFC cover letter.  This is another example where
> pci_find_upstream_pcie_bridge() will currently fail.

OK, I was just trying to answer your question "why we would need to
traverse up to Y."  But apparently we agree about that.

>> > Y can take ownership and become the requester for A,
>> > but if we were to call for_each_requester_id(Y), wouldn't you expect the
>> > callback to happen on {Y, A, B} since all of those can use that
>> > requester ID?
>>
>> No.  If I call for_each_requester_id(Y), I'm expecting the callback
>> to happen for each requester ID that could be used for transactions
>> originated by *Y*.  I'm trying to make an IOMMU mapping for use by
>> Y, so any devices downstream of Y, e.g., A and B, are irrelevant.
>
> Ok, you think of for_each_requester_id() the same as I think of
> for_each_requester().  Can we split the difference and call it
> pci_dev_for_each_requester_id()?

Sure.

>> I think a lot of the confusion here is because we're trying to solve
>> both two questions at once: (1) what requester-IDs need to be mapped
>> to handle DMA from a device, and (2) what devices can't be isolated
>> from each other and must be in the same domain. ...
>
> Don't we already have this split in the code?
>
> (1) pcie_for_each_requester
> (or pci_dev_for_each_requester_id)
>
> (2) pci_get_visible_pcie_requester
> (or pci_get_visible_pcie_requester_id)
>
> Note however that (2) does not impose anything about domains or
> isolation, it is strictly based on PCI topology.  It's left to the
> caller to determine how that translates to IOMMU domains, but the
> typical case is trivial.


Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA

2013-07-30 Thread Bjorn Helgaas
On Tue, Jul 30, 2013 at 12:09 AM, Takao Indoh
 wrote:
> (2013/07/29 23:17), Bjorn Helgaas wrote:
>> On Sun, Jul 28, 2013 at 6:37 PM, Takao Indoh  
>> wrote:
>>> (2013/07/26 2:00), Bjorn Helgaas wrote:

>>>> My point about IOMMU and PCI initialization order doesn't go away just
>>>> because it doesn't fit "kdump policy."  Having system initialization
>>>> occur in a logical order is far more important than making kdump work.
>>>
>>> My next plan is as follows. I think this is matched to logical order
>>> on boot.
>>>
>>> drivers/pci/pci.c
>>> - Add function to reset bus, for example, pci_reset_bus(struct pci_bus *bus)
>>>
>>> drivers/iommu/intel-iommu.c
>>> - On initialization, if IOMMU is already enabled, call this bus reset
>>>function before disabling and re-enabling IOMMU.
>>
>> I raised this issue because of arches like sparc that enumerate the
>> IOMMU before the PCI devices that use it.  In that situation, I think
>> you're proposing this:
>>
>>panic kernel
>>  enable IOMMU
>>  panic
>>kdump kernel
>>  initialize IOMMU (already enabled)
>>pci_reset_bus
>>disable IOMMU
>>enable IOMMU
>>  enumerate PCI devices
>>
>> But the problem is that when you call pci_reset_bus(), you haven't
>> enumerated the PCI devices, so you don't know what to reset.
>
> Right, so my idea is adding reset code into "intel-iommu.c". intel-iommu
> initialization is based on the assumption that enumeration of PCI devices
> is already done. We can find target device from IOMMU page table instead
> of scanning all devices in pci tree.
>
> Therefore, this idea is only for intel-iommu. Other architectures need
> to implement their own reset code.

That's my point.  I'm opposed to adding code to PCI when it only
benefits x86 and we know other arches will need a fundamentally
different design.  I would rather have a design that can work for all
arches.

If your implementation is totally implemented under arch/x86 (or in
intel-iommu.c, I guess), I can't object as much.  However, I think
that eventually even x86 should enumerate the IOMMUs via ACPI before
we enumerate PCI devices.

It's pretty clear that's how BIOS designers expect the OS to work.
For example, sec 8.7.3 of the Intel Virtualization Technology for
Directed I/O spec, rev 1.3, shows the expectation that remapping
hardware (IOMMU) is initialized before discovering the I/O hierarchy
below a hot-added host bridge.  Obviously you're not talking about a
hot-add scenario, but I think the same sequence should apply at
boot-time as well.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA

2013-07-31 Thread Bjorn Helgaas
[+cc Rafael, linux-acpi]

On Tue, Jul 30, 2013 at 6:35 PM, Takao Indoh  wrote:

> On x86, currently IOMMU initialization run *after* PCI enumeration, but
> what you are talking about is that it should be changed so that x86
> IOMMU initialization is done *before* PCI enumeration like sparc, right?

Yes.  I don't know whether or when that initialization order will ever
be changed, but I do think we should avoid building more
infrastructure that depends on the current order.

Changing the order is a pretty big deal because it's a lot more than
just the IOMMU.  Basically I think we should be enumerating ACPI
devices, including the IOMMU, before PCI devices, but there's a lot of
legacy involved in that area.  Added Rafael in case he has any
thoughts.

> Hmm, ok, I think I need to post attached patch to iommu list and
> discuss it including current order of x86 IOMMU initialization.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2013-08-01 Thread Bjorn Helgaas
On Mon, Jul 29, 2013 at 4:32 PM, Alex Williamson
 wrote:
> On Mon, 2013-07-29 at 15:02 -0600, Bjorn Helgaas wrote:
>> On Mon, Jul 29, 2013 at 10:06 AM, Alex Williamson
>>  wrote:
>> > On Fri, 2013-07-26 at 15:54 -0600, Bjorn Helgaas wrote:
>> >> On Thu, Jul 25, 2013 at 11:56:56AM -0600, Alex Williamson wrote:
>>
>> >> The "most closely associated device" idea seems confusing to
>> >> describe and think about.  I think you want to use it as part of
>> >> grouping devices into domains.  But I think we should attack that
>> >> problem separately.  For grouping or isolation questions, we have
>> >> to pay attention to things like ACS, which are not relevant for
>> >> mapping.
>> >
>> > We are only touch isolation insofar as providing an interface for a
>> > driver to determine the point in the PCI topology where the requester ID
>> > is rooted.  Yes, grouping can make use of that, but I object to the idea
>> > that we're approaching some slippery slope of rolling multiple concepts
>> > into this patch.  What I'm proposing here is strictly a requester ID
>> > interface.  I believe that providing a way for a caller to determine
>> > that two devices have a requester ID rooted at the same point in the PCI
>> > topology is directly relevant to that interface.
>> >
>> > pci_find_upstream_pcie_bridge() attempts to do this same thing.
>> > Unfortunately, the interface is convoluted, it doesn't handle quirks,
>> > and it requires a great deal of work by the caller to then walk the
>> > topology and create requester IDs at each step.  This also indicates
>> > that at each step, the requester ID is associated with some pci_dev.
>> > Providing a pci_dev is simply showing our work and providing context to
>> > the requester ID (ie. here's the requester ID and the step along the
>> > path from which it was derived.  Here's your top level requester ID and
>> > the point in the topology where it's based).
>>
>> What does the driver *do* with the pci_dev?  If it's not necessary,
>> then showing our work and providing context just complicates the
>> interface and creates opportunities for mistakes.  If we're creating
>> IOMMU mappings, only the requester ID is needed.
>
> It's true, I don't have a use for the pci_dev in
> pci_dev_for_each_requester_id() today.  But I think providing the
> context for a requester ID is valuable information.  Otherwise we have
> to make assumptions about the requester ID.  For example, if I have
> devices in different PCI segments with the same requester ID the
> callback function only knows that those are actually different requester
> IDs from information the caller provides itself in the opaque pointer.
> This is the case with intel-iommu, but why would we design an API that
> requires the caller to provide that kind of context?

The caller already has to provide context, e.g., the domain in which
to create a mapping, anyway via the opaque pointer.  So I would argue
that it's pointless to supply context twice in different ways.

We only have one caller of pci_dev_for_each_requester_id() anyway
(intel-iommu.c).  That seems really strange to me.  All I can assume
is that other IOMMU drivers *should* be doing something like this, but
aren't yet.  Anyway, since we only have one user, why not just provide
the minimum (only the requester ID), and add the pci_dev later if a
requirement for it turns up?

>  I also provide the
> pci_dev because I think both the pci_dev_for_each_requester_id() and
> pci_get_visible_pcie_requester() should provide similar APIs.  There's
> an association of a requester ID to a pci_dev.  Why hide that?

Information hiding is a basic idea of abstraction and encapsulation.
If we don't hide unnecessary information, we end up with unnecessary
dependencies.

>> I think you used get_domain_for_dev() and intel_iommu_add_device() as
>> examples, but as far as I can tell, they use the pci_dev as a way to
>> learn about isolation.  For that purpose, I don't think you want an
>> iterator -- you only want to know about the single pci_dev that's the
>> root of the isolation domain, and requester IDs really aren't
>> relevant.
>
> See get_domain_for_dev() in patch 2/2.  It uses the returned pci_dev to
> know whether the requester ID is rooted upstream or at the device
> itself.  If upstream, it then uses the requester ID to search for an
> existing domain.  The requester ID is relevant here.  If the returned
> pci_dev is the device itself, it proceeds to allocate a domain because
&g

Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-09-24 Thread Bjorn Helgaas
On Thu, Sep 19, 2013 at 12:59:17PM +0530, Bharat Bhushan wrote:
> This patch adds interface to get following information
>   - Number of MSI regions (which is number of MSI banks for powerpc).
>   - Get the region address range: Physical page which have the
>  address/addresses used for generating MSI interrupt
>  and size of the page.
> 
> These are required to create IOMMU (Freescale PAMU) mapping for
> devices which are directly assigned using VFIO.
> 
> Signed-off-by: Bharat Bhushan 
> ---
>  arch/powerpc/include/asm/machdep.h |8 +++
>  arch/powerpc/include/asm/pci.h |2 +
>  arch/powerpc/kernel/msi.c  |   18 
>  arch/powerpc/sysdev/fsl_msi.c  |   39 +--
>  arch/powerpc/sysdev/fsl_msi.h  |   11 -
>  drivers/pci/msi.c  |   26 
>  include/linux/msi.h|8 +++
>  include/linux/pci.h|   13 
>  8 files changed, 120 insertions(+), 5 deletions(-)
> 
> ...

> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index aca7578..6d85c15 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -30,6 +30,20 @@ static int pci_msi_enable = 1;
>  
>  /* Arch hooks */
>  
> +#ifndef arch_msi_get_region_count
> +int arch_msi_get_region_count(void)
> +{
> + return 0;
> +}
> +#endif
> +
> +#ifndef arch_msi_get_region
> +int arch_msi_get_region(int region_num, struct msi_region *region)
> +{
> + return 0;
> +}
> +#endif

This #define strategy is gone; see 4287d824 ("PCI: use weak functions for
MSI arch-specific functions").  Please use the weak function strategy
for your new MSI region functions.

> +
>  #ifndef arch_msi_check_device
>  int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
>  {
> @@ -903,6 +917,18 @@ void pci_disable_msi(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL(pci_disable_msi);
>  
> +int msi_get_region_count(void)
> +{
> + return arch_msi_get_region_count();
> +}
> +EXPORT_SYMBOL(msi_get_region_count);
> +
> +int msi_get_region(int region_num, struct msi_region *region)
> +{
> + return arch_msi_get_region(region_num, region);
> +}
> +EXPORT_SYMBOL(msi_get_region);

Please split these interface additions, i.e., the drivers/pci/msi.c,
include/linux/msi.h, and include/linux/pci.h changes, into a separate
patch.

I don't know enough about VFIO to understand why these new interfaces
are needed.  Is this the first VFIO IOMMU driver?  I see
vfio_iommu_spapr_tce.c and vfio_iommu_type1.c but I don't know if
they're comparable to the Freescale PAMU.  Do other VFIO IOMMU
implementations support MSI?  If so, do they handle the problem of
mapping the MSI regions in a different way?

>  /**
>   * pci_msix_table_size - return the number of device's MSI-X table entries
>   * @dev: pointer to the pci_dev data structure of MSI-X device function
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index ee66f3a..ae32601 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -50,6 +50,12 @@ struct msi_desc {
>   struct kobject kobj;
>  };
>  
> +struct msi_region {
> + int region_num;
> + dma_addr_t addr;
> + size_t size;
> +};

This needs some sort of explanatory comment.

>  /*
>   * The arch hook for setup up msi irqs
>   */
> @@ -58,5 +64,7 @@ void arch_teardown_msi_irq(unsigned int irq);
>  int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void arch_teardown_msi_irqs(struct pci_dev *dev);
>  int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
> +int arch_msi_get_region_count(void);
> +int arch_msi_get_region(int region_num, struct msi_region *region);
>  
>  #endif /* LINUX_MSI_H */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 186540d..2b26a59 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1126,6 +1126,7 @@ struct msix_entry {
>   u16 entry;  /* driver uses to specify entry, OS writes */
>  };
>  
> +struct msi_region;
>  
>  #ifndef CONFIG_PCI_MSI
>  static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int 
> nvec)
> @@ -1168,6 +1169,16 @@ static inline int pci_msi_enabled(void)
>  {
>   return 0;
>  }
> +
> +static inline int msi_get_region_count(void)
> +{
> + return 0;
> +}
> +
> +static inline int msi_get_region(int region_num, struct msi_region *region)
> +{
> + return 0;
> +}
>  #else
>  int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
>  int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
> @@ -1180,6 +1191,8 @@ void pci_disable_msix(struct pci_dev *dev);
>  void msi_remove_pci_irq_vectors(struct pci_dev *dev);
>  void pci_restore_msi_state(struct pci_dev *dev);
>  int pci_msi_enabled(void);
> +int msi_get_region_count(void);
> +int msi_get_region(int region_num, struct msi_region *region);
>  #endif
>  
>  #ifdef CONFIG_PCIEPORTBUS
> -- 
> 1.7.0.4
> 
> 
> --
> To unsubscribe from this list: send the line "uns

Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Bjorn Helgaas
On Thu, Oct 3, 2013 at 11:19 PM, Bhushan Bharat-R65777
 wrote:

>> I don't know enough about VFIO to understand why these new interfaces are
>> needed.  Is this the first VFIO IOMMU driver?  I see vfio_iommu_spapr_tce.c 
>> and
>> vfio_iommu_type1.c but I don't know if they're comparable to the Freescale 
>> PAMU.
>> Do other VFIO IOMMU implementations support MSI?  If so, do they handle the
>> problem of mapping the MSI regions in a different way?
>
> PAMU is an aperture type of IOMMU while other are paging type, So they are 
> completely different from what PAMU is and handle that differently.

This is not an explanation or a justification for adding new
interfaces.  I still have no idea what an "aperture type IOMMU" is,
other than that it is "different."  But I see that Alex is working on
this issue with you in a different thread, so I'm sure you guys will
sort it out.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Bjorn Helgaas
>> - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
>> + dma_addr_t msiir; /* MSIIR Address in CCSR */
>
> Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies
> that it's the output of the DMA API, but I don't think the DMA API is
> used in the MSI driver.  Perhaps it should be, but we still want the raw
> physical address to pass on to VFIO.

I don't know what "msiir" is used for, but if it's an address you
program into a PCI device, then it's a dma_addr_t even if you didn't
get it from the DMA API.  Maybe "bus_addr_t" would have been a more
suggestive name than "dma_addr_t".  That said, I have no idea how this
relates to VFIO.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/fsl: Factor out PCI specific code.

2013-10-14 Thread Bjorn Helgaas
On Sun, Oct 13, 2013 at 02:02:32AM +0530, Varun Sethi wrote:
> Factor out PCI specific code in the PAMU driver.
> 
> Signed-off-by: Varun Sethi 
> ---
>  drivers/iommu/fsl_pamu_domain.c |   81 
> +++
>  1 file changed, 40 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
> index c857c30..e02e1de 100644
> --- a/drivers/iommu/fsl_pamu_domain.c
> +++ b/drivers/iommu/fsl_pamu_domain.c
> @@ -677,13 +677,9 @@ static int handle_attach_device(struct fsl_dma_domain 
> *dma_domain,
>   return ret;
>  }
>  
> -static int fsl_pamu_attach_device(struct iommu_domain *domain,
> -   struct device *dev)
> +static void check_for_pci_dma_device(struct device **dev)

"check_for_pci_dma_device()" doesn't give a good clue about what the
function returns.  And why return something via a reference parameter
when you could return it directly?

>  {
> - struct fsl_dma_domain *dma_domain = domain->priv;
> - const u32 *liodn;
> - u32 liodn_cnt;
> - int len, ret = 0;
> +#ifdef CONFIG_PCI
>   struct pci_dev *pdev = NULL;
>   struct pci_controller *pci_ctl;

This is sort of a goofy looking function.  It would read much better as
something like this:

  struct device *dma_dev = dev;

  #ifdef CONFIG_PCI
  if (...) {
  dma_dev = ...;
  }
  #endif

  return dma_dev;

Does this need to care about reference counting when you return a pointer
to a different device?

Bjorn

>  
> @@ -691,25 +687,38 @@ static int fsl_pamu_attach_device(struct iommu_domain 
> *domain,
>* Use LIODN of the PCI controller while attaching a
>* PCI device.
>*/
> - if (dev->bus == &pci_bus_type) {
> - pdev = to_pci_dev(dev);
> + if ((*dev)->bus == &pci_bus_type) {
> + pdev = to_pci_dev(*dev);
>   pci_ctl = pci_bus_to_host(pdev->bus);
>   /*
>* make dev point to pci controller device
>* so we can get the LIODN programmed by
>* u-boot.
>*/
> - dev = pci_ctl->parent;
> + *dev = pci_ctl->parent;
>   }
> +#endif
> +}
>  
> - liodn = of_get_property(dev->of_node, "fsl,liodn", &len);
> +static int fsl_pamu_attach_device(struct iommu_domain *domain,
> +   struct device *dev)
> +{
> + struct fsl_dma_domain *dma_domain = domain->priv;
> + struct device *dma_dev = dev;
> + const u32 *liodn;
> + u32 liodn_cnt;
> + int len, ret = 0;
> +
> + check_for_pci_dma_device(&dma_dev);
> +
> + liodn = of_get_property(dma_dev->of_node, "fsl,liodn", &len);
>   if (liodn) {
>   liodn_cnt = len / sizeof(u32);
>   ret = handle_attach_device(dma_domain, dev,
>liodn, liodn_cnt);
>   } else {
>   pr_debug("missing fsl,liodn property at %s\n",
> -   dev->of_node->full_name);
> +   dma_dev->of_node->full_name);
>   ret = -EINVAL;
>   }
>  
> @@ -720,32 +729,18 @@ static void fsl_pamu_detach_device(struct iommu_domain 
> *domain,
> struct device *dev)
>  {
>   struct fsl_dma_domain *dma_domain = domain->priv;
> + struct device *dma_dev = dev;
>   const u32 *prop;
>   int len;
> - struct pci_dev *pdev = NULL;
> - struct pci_controller *pci_ctl;
>  
> - /*
> -  * Use LIODN of the PCI controller while detaching a
> -  * PCI device.
> -  */
> - if (dev->bus == &pci_bus_type) {
> - pdev = to_pci_dev(dev);
> - pci_ctl = pci_bus_to_host(pdev->bus);
> - /*
> -  * make dev point to pci controller device
> -  * so we can get the LIODN programmed by
> -  * u-boot.
> -  */
> - dev = pci_ctl->parent;
> - }
> + check_for_pci_dma_device(&dma_dev);
>  
> - prop = of_get_property(dev->of_node, "fsl,liodn", &len);
> + prop = of_get_property(dma_dev->of_node, "fsl,liodn", &len);
>   if (prop)
>   detach_device(dev, dma_domain);
>   else
>   pr_debug("missing fsl,liodn property at %s\n",
> -   dev->of_node->full_name);
> +   dma_dev->of_node->full_name);
>  }
>  
>  static  int configure_domain_geometry(struct iommu_domain *domain, void 
> *data)
> @@ -905,6 +900,7 @@ static struct iommu_group *get_device_iommu_group(struct 
> device *dev)
>   return group;
>  }
>  
> +#ifdef CONFIG_PCI
>  static  bool check_pci_ctl_endpt_part(struct pci_controller *pci_ctl)
>  {
>   u32 version;
> @@ -945,13 +941,18 @@ static struct iommu_group 
> *get_shared_pci_device_group(struct pci_dev *pdev)
>   return NULL;
>  }
>  
> -static struct iommu_group *get_pci_device_group(struct pci_dev *pdev)
> +static struct iomm

Re: [PATCH 1/1] IOMMU: Save pci device id instead of pci_dev* pointer for DMAR devices

2013-11-07 Thread Bjorn Helgaas
On Tue, Nov 05, 2013 at 04:24:58PM +0800, Yijing Wang wrote:
> Currently, DMAR driver save target pci devices pointers for drhd/rmrr/atsr
> in (pci_dev *) array. This is not safe, because pci devices maybe
> hot added or removed during system running. They will have new pci_dev *
> pointer. So if there have two IOMMUs or more in system, these devices
> will find a wrong drhd during DMA mapping. And DMAR faults will occur.
> This patch save pci device id insted of (pci_dev *) to fix this issue,
> Because DMAR table just provide pci device id under a specific IOMMU,
> so there is no reason to bind IOMMU with the (pci_dev *). Other, here
> use list to manage devices' id for IOMMU, we can easily use list helper
> to manage device id.
> 
> after remove and rescan a pci device
> [  611.857095] dmar: DRHD: handling fault status reg 2
> [  611.857109] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr 
> 7000
> [  611.857109] DMAR:[fault reason 02] Present bit in context entry is clear
> [  611.857524] dmar: DRHD: handling fault status reg 102
> [  611.857534] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr 
> 6000
> [  611.857534] DMAR:[fault reason 02] Present bit in context entry is clear
> [  611.857936] dmar: DRHD: handling fault status reg 202
> [  611.857947] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr 
> 5000
> [  611.857947] DMAR:[fault reason 02] Present bit in context entry is clear
> [  611.858351] dmar: DRHD: handling fault status reg 302
> [  611.858362] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr 
> 4000
> [  611.858362] DMAR:[fault reason 02] Present bit in context entry is clear
> [  611.860819] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
> [  611.860983] dmar: DRHD: handling fault status reg 402
> [  611.860995] dmar: INTR-REMAP: Request device [[86:00.3] fault index a4
> [  611.860995] INTR-REMAP:[fault reason 34] Present field in the IRTE entry 
> is clear
> 
> Signed-off-by: Yijing Wang 
> ---
>  drivers/iommu/dmar.c|   93 +-
>  drivers/iommu/intel-iommu.c |  155 
> ---
>  include/linux/dmar.h|   20 --
>  3 files changed, 159 insertions(+), 109 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 785675a..9aa65a3 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -65,12 +65,13 @@ static void __init dmar_register_drhd_unit(struct 
> dmar_drhd_unit *drhd)
>  }
>  
>  static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope 
> *scope,
> -struct pci_dev **dev, u16 segment)
> + u16 segment, struct list_head *head)
>  {
>   struct pci_bus *bus;
>   struct pci_dev *pdev = NULL;
>   struct acpi_dmar_pci_path *path;
>   int count;
> + struct dmar_device *dmar_dev;
>  
>   bus = pci_find_bus(segment, scope->bus);
>   path = (struct acpi_dmar_pci_path *)(scope + 1);
> @@ -100,7 +101,6 @@ static int __init dmar_parse_one_dev_scope(struct 
> acpi_dmar_device_scope *scope,
>   if (!pdev) {
>   pr_warn("Device scope device [%04x:%02x:%02x.%02x] not found\n",
>   segment, scope->bus, path->dev, path->fn);
> - *dev = NULL;
>   return 0;
>   }
>   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT && \
> @@ -111,54 +111,39 @@ static int __init dmar_parse_one_dev_scope(struct 
> acpi_dmar_device_scope *scope,
>   pci_name(pdev));
>   return -EINVAL;
>   }
> - *dev = pdev;
> +
> + dmar_dev = kzalloc(sizeof(struct dmar_device), GFP_KERNEL);
> + if (!dmar_dev) {
> + pci_dev_put(pdev);
> + return -ENOMEM;
> + }
> +
> + dmar_dev->segment = segment;
> + dmar_dev->bus = pdev->bus->number;
> + dmar_dev->devfn = pdev->devfn;
> + list_add_tail(&dmar_dev->list, head);
> +
> + pci_dev_put(pdev);
>   return 0;
>  }
>  
> -int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
> - struct pci_dev ***devices, u16 segment)
> +int __init dmar_parse_dev_scope(void *start, void *end, u16 segment, 
> + struct list_head *head)
>  {
>   struct acpi_dmar_device_scope *scope;
> - void * tmp = start;
> - int index;
>   int ret;
>  
> - *cnt = 0;
> - while (start < end) {
> - scope = start;
> - if (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT ||
> - scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE)
> - (*cnt)++;
> - else if (scope->entry_type != ACPI_DMAR_SCOPE_TYPE_IOAPIC &&
> - scope->entry_type != ACPI_DMAR_SCOPE_TYPE_HPET) {
> - pr_warn("Unsupported device scope\n");
> - }
> - start += scope->length;
> - }
> - if (*cnt == 0)
> - 

Re: [PATCH 1/1] IOMMU: Save pci device id instead of pci_dev* pointer for DMAR devices

2013-11-08 Thread Bjorn Helgaas
On Thu, Nov 7, 2013 at 8:40 PM, Yijing Wang  wrote:
> HI Bjorn,
>Thanks for your review and comments very much!
>
>>> +list_for_each_entry(dmar_dev, head, list)
>>> +if (dmar_dev->segment == pci_domain_nr(dev->bus)
>>> +&& dmar_dev->bus == dev->bus->number
>>> +&& dmar_dev->devfn == dev->devfn)
>>> +return 1;
>>> +
>>>  /* Check our parent */
>>>  dev = dev->bus->self;
>>
>> You didn't change this, but it looks like this may have the same problem
>> we've been talking about here:
>>
>> http://lkml.kernel.org/r/20131105232903.3790.8738.st...@bhelgaas-glaptop.roam.corp.google.com
>>
>> Namely, if "dev" is a VF on a virtual bus, "dev->bus->self == NULL", so
>> we won't search for any of the bridges leading to the VF.  I proposed a
>> pci_upstream_bridge() interface that could be used like this:
>>
>>   /* Check our parent */
>>   dev = pci_upstream_bridge(dev);
>>
>
> It looks good to me, because pci_upstream_bridge() is still in your next 
> branch, I think maybe
> I can split this changes in a separate patch after 3.13-rc1.

Yep, that would be a fix for a separate issue and should be a separate patch.

>>>  static struct intel_iommu *device_to_iommu(int segment, u8 bus, u8 devfn)
>>>  {
>>>  struct dmar_drhd_unit *drhd = NULL;
>>> -int i;
>>> +struct dmar_device *dmar_dev;
>>> +struct pci_dev *pdev;
>>>
>>>  for_each_drhd_unit(drhd) {
>>>  if (drhd->ignored)
>>> @@ -658,16 +659,22 @@ static struct intel_iommu *device_to_iommu(int 
>>> segment, u8 bus, u8 devfn)
>>>  if (segment != drhd->segment)
>>>  continue;
>>>
>>> -for (i = 0; i < drhd->devices_cnt; i++) {
>>> -if (drhd->devices[i] &&
>>> -drhd->devices[i]->bus->number == bus &&
>>> -drhd->devices[i]->devfn == devfn)
>>> -return drhd->iommu;
>>> -if (drhd->devices[i] &&
>>> -drhd->devices[i]->subordinate &&
>>> -drhd->devices[i]->subordinate->number <= bus &&
>>> -drhd->devices[i]->subordinate->busn_res.end >= bus)
>>> -return drhd->iommu;
>>> +list_for_each_entry(dmar_dev, &drhd->head, list) {
>>> +if (dmar_dev->bus == bus &&
>>> +dmar_dev->devfn == devfn)
>>> +return drhd->iommu;
>>> +
>>> +pdev = pci_get_domain_bus_and_slot(dmar_dev->segment,
>>> +dmar_dev->bus, dmar_dev->devfn);
>>> +if (pdev->subordinate &&
>>> +pdev->subordinate->number <= bus &&
>>> +pdev->subordinate->busn_res.end >= bus) {
>>> +pci_dev_put(pdev);
>>> +return drhd->iommu;
>>
>> I don't know the details of how device_to_iommu() is used, but this
>> style (acquire ref to pci_dev, match it to some other object, drop
>> pci_dev ref, return object) makes me nervous.  How do we know the
>> caller isn't depending on pci_dev to remain attached to the object?
>> What happens if the pci_dev disappears when we do the pci_dev_put()
>> here?
>
> Hmmm, this is the thing I am most worried about. If we just only use
> (pci_dev *) poninter in drhd->devices array as a identification. Change
> (pci_dev *) pointer instead of pci device id segment:bus:devfn is safe.
> Or, this is a wrong way to fix this issue. I don't know IOMMU driver much now,
> so IOMMU guys any comments on this issue is welcome.
>
> If this is not safe, what about we both save pci device id and (pci_dev *) 
> pointer
> in drhd. So we can put pci_dev ref and set pci_dev * = NULL during device 
> removed by bus notify, and
> update (pci_dev *)pointer during device add.

I don't know the IOMMU drivers well either, but it seems like they
rely on notifications of device addition and removal (see
iommu_bus_notifier()).  It doesn't seem right for them to also use the
generic PCI interfaces like pci_get_domain_bus_and_slot() because the
IOMMU driver should already know what devices exist and their
lifetimes.  It seems like confusion to mix the two.  But I don't have
a concrete suggestion.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/9 v2] pci:msi: add weak function for returning msi region info

2013-11-25 Thread Bjorn Helgaas
On Tue, Nov 19, 2013 at 10:47:05AM +0530, Bharat Bhushan wrote:
> In Aperture type of IOMMU (like FSL PAMU), VFIO-iommu system need to know
> the MSI region to map its window in h/w. This patch just defines the
> required weak functions only and will be used by followup patches.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> v1->v2
>  - Added description on "struct msi_region" 
> 
>  drivers/pci/msi.c   |   22 ++
>  include/linux/msi.h |   14 ++
>  2 files changed, 36 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index d5f90d6..2643a29 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -67,6 +67,28 @@ int __weak arch_msi_check_device(struct pci_dev *dev, int 
> nvec, int type)
>   return chip->check_device(chip, dev, nvec, type);
>  }
>  
> +int __weak arch_msi_get_region_count(void)
> +{
> + return 0;
> +}
> +
> +int __weak arch_msi_get_region(int region_num, struct msi_region *region)
> +{
> + return 0;
> +}
> +
> +int msi_get_region_count(void)
> +{
> + return arch_msi_get_region_count();
> +}
> +EXPORT_SYMBOL(msi_get_region_count);
> +
> +int msi_get_region(int region_num, struct msi_region *region)
> +{
> + return arch_msi_get_region(region_num, region);
> +}
> +EXPORT_SYMBOL(msi_get_region);
> +
>  int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  {
>   struct msi_desc *entry;
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index b17ead8..ade1480 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -51,6 +51,18 @@ struct msi_desc {
>  };
>  
>  /*
> + * This structure is used to get
> + * - physical address
> + * - size
> + * of a msi region
> + */
> +struct msi_region {
> + int region_num; /* MSI region number */
> + dma_addr_t addr; /* Address of MSI region */
> + size_t size; /* Size of MSI region */
> +};
> +
> +/*
>   * The arch hooks to setup up msi irqs. Those functions are
>   * implemented as weak symbols so that they /can/ be overriden by
>   * architecture specific code if needed.
> @@ -64,6 +76,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev, int irq);
>  
>  void default_teardown_msi_irqs(struct pci_dev *dev);
>  void default_restore_msi_irqs(struct pci_dev *dev, int irq);
> +int arch_msi_get_region_count(void);
> +int arch_msi_get_region(int region_num, struct msi_region *region);

It doesn't look like any of this (struct msi_region, msi_get_region(),
msi_get_region_count()) is actually used by drivers/pci/msi.c, so I don't
think it needs to be declared in generic code.  It looks like it's only
used in drivers/vfio/vfio_iommu_fsl_pamu.c, where you already know you have
an FSL IOMMU, and you can just call FSL-specific interfaces directly.

Bjorn

>  
>  struct msi_chip {
>   struct module *owner;
> -- 
> 1.7.0.4
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: warn_slowpath_common in drivers/pci/search.c:44 on linux-3.4.0

2013-12-09 Thread Bjorn Helgaas
[+cc Alex]

On Sun, Dec 8, 2013 at 7:26 AM, Antonio Quartulli
 wrote:
> On 29/05/12 10:56, Antonio Quartulli wrote:
>> On Tue, May 29, 2012 at 10:35:53 +0200, Joerg Roedel wrote:
>>> Hi Antonio,
>>>
>>> On Sat, May 26, 2012 at 09:09:52AM -0600, Bjorn Helgaas wrote:
>>>> On Sat, May 26, 2012 at 2:25 AM, Antonio Quartulli  
>>>> wrote:
>>>>> [1.054279] WARNING: at drivers/pci/search.c:44 
>>>>> pci_find_upstream_pcie_bridge+0x5e/0x70()
>>>>> [1.054385] Hardware name: Latitude E5420
>>>>> [1.054457] Modules linked in:
>>>>> [1.054568] Pid: 1, comm: swapper/0 Not tainted 3.4.0ordex+ #3
>>>>> [1.054643] Call Trace:
>>>>> [1.054716]  [] warn_slowpath_common+0x7a/0xb0
>>>>> [1.054793]  [] warn_slowpath_null+0x15/0x20
>>>>> [1.054869]  [] 
>>>>> pci_find_upstream_pcie_bridge+0x5e/0x70
>>>>> [1.054949]  [] intel_iommu_device_group+0x77/0x100
>>>>> [1.055027]  [] add_iommu_group+0x35/0x60
>>>>> [1.055113]  [] ? bus_set_iommu+0x50/0x50
>>>>> [1.055191]  [] bus_for_each_dev+0x56/0x90
>>>>> [1.055267]  [] bus_set_iommu+0x3b/0x50
>>>>> [1.055344]  [] intel_iommu_init+0xab0/0xb3f
>>>>> [1.055421]  [] ? sys_mkdirat+0x76/0xd0
>>>>> [1.055499]  [] ? 
>>>>> memblock_find_dma_reserve+0x13d/0x13d
>>>>> [1.055578]  [] pci_iommu_init+0x13/0x3e
>>>>> [1.055655]  [] do_one_initcall+0x3a/0x170
>>>>> [1.055732]  [] kernel_init+0x148/0x1cc
>>>>> [1.055807]  [] ? do_early_param+0x86/0x86
>>>>> [1.055884]  [] kernel_thread_helper+0x4/0x10
>>>>> [1.055963]  [] ? finish_task_switch+0x80/0x110
>>>>> [1.056040]  [] ? retint_restore_args+0xe/0xe
>>>>> [1.056126]  [] ? start_kernel+0x30b/0x30b
>>>>> [1.056203]  [] ? gs_change+0xb/0xb
>>>
>>> Hmm, this looks like pci_find_upstream_pcie_bridge() found a PCIe device
>>> in the chain which is not a bridge. Can you please post the output of
>>> 'lspci -vvv' and 'lspci -t'?
>>>
>>
>> attached!
>> Some days ago I tried to put some printk in the code and, as far as I can
>> understand, you are right.
>>
>> I hope my lspci output will help!
>>
>> Cheers,
>>
>
> Hi guys,
>
> was there any progress about this?
> Right now I am using linux-3.10.22 and the WARNING is still present.
> The system seems to be working fine, but I don't know if this issue can
> trigger side effects or not.

Alex and I prototyped some IOMMU and PCI restructuring [1] to avoid
this case, but I haven't heard anything lately.

Bjorn

[1] http://lkml.kernel.org/r/20130711210326.1701.56478.st...@bling.home
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] Remove dead code

2014-02-18 Thread Bjorn Helgaas
This is v2 of my rework of part of Stephen's patch [1].  My v1 posting,
with a little discussion, is here [2].

This removes SR-IOV migration support, which seems to be unused.

Changes since v1:
  - Drop the removal of MMIO exclusivity.
  - Add a few includes of .  The SR-IOV migration
support included irqreturn.h via linux/pci.h, and a few drivers relied
on that.  So this v2 series updates those drivers to include
irqreturn.h directly.

Unless there's objection, I'd like to merge all these through my PCI tree
for v3.15.

Bjorn

[1] http://lkml.kernel.org/r/20131227132710.71906...@nehalam.linuxnetplumber.net
[2] 
https://lkml.kernel.org/r/20140130192011.25426.45702.st...@bhelgaas-glaptop.roam.corp.google.com

---

Bjorn Helgaas (4):
  misc: mic: Add include of 
  mei: Add include of 
  iommu/amd: Add include of 
  PCI: Remove unused SR-IOV VF Migration support


 Documentation/PCI/pci-iov-howto.txt |4 -
 drivers/iommu/amd_iommu_types.h |1 
 drivers/misc/mei/hw-me.h|1 
 drivers/misc/mic/card/mic_device.h  |1 
 drivers/misc/mic/host/mic_device.h  |1 
 drivers/pci/iov.c   |  119 ---
 drivers/pci/pci.h   |4 -
 include/linux/pci.h |4 -
 8 files changed, 4 insertions(+), 131 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] misc: mic: Add include of

2014-02-18 Thread Bjorn Helgaas
We currently include  in , but I'm about to
remove that from linux/pci.h, so add explicit includes where needed.

Signed-off-by: Bjorn Helgaas 
---
 drivers/misc/mic/card/mic_device.h |1 +
 drivers/misc/mic/host/mic_device.h |1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/misc/mic/card/mic_device.h 
b/drivers/misc/mic/card/mic_device.h
index 347b9b3b7916..306f502be95e 100644
--- a/drivers/misc/mic/card/mic_device.h
+++ b/drivers/misc/mic/card/mic_device.h
@@ -29,6 +29,7 @@
 
 #include 
 #include 
+#include 
 
 /**
  * struct mic_intr_info - Contains h/w specific interrupt sources info
diff --git a/drivers/misc/mic/host/mic_device.h 
b/drivers/misc/mic/host/mic_device.h
index 1a6edce2ecde..0398c696d257 100644
--- a/drivers/misc/mic/host/mic_device.h
+++ b/drivers/misc/mic/host/mic_device.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mic_intr.h"
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] mei: Add include of

2014-02-18 Thread Bjorn Helgaas
We currently include  in , but I'm about to
remove that from linux/pci.h, so add explicit includes where needed.

Signed-off-by: Bjorn Helgaas 
---
 drivers/misc/mei/hw-me.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/misc/mei/hw-me.h b/drivers/misc/mei/hw-me.h
index 80bd829fbd9a..893d5119fa9b 100644
--- a/drivers/misc/mei/hw-me.h
+++ b/drivers/misc/mei/hw-me.h
@@ -20,6 +20,7 @@
 #define _MEI_INTERFACE_H_
 
 #include 
+#include 
 #include "mei_dev.h"
 #include "client.h"
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] iommu/amd: Add include of

2014-02-18 Thread Bjorn Helgaas
We currently include  in , but I'm about to
remove that from linux/pci.h, so add explicit includes where needed.

Signed-off-by: Bjorn Helgaas 
---
 drivers/iommu/amd_iommu_types.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index e400fbe411de..cff039df056e 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Maximum number of IOMMUs supported

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] PCI: Remove unused SR-IOV VF Migration support

2014-02-18 Thread Bjorn Helgaas
This reverts commit 74bb1bcc7dbb ("PCI: handle SR-IOV Virtual Function
Migration"), removing this exported interface:

  pci_sriov_migration()

Since pci_sriov_migration() is unused, it is impossible to schedule
sriov_migration_task() or use any of the other migration infrastructure.

This is based on Stephen Hemminger's patch (see link below), but goes a bit
further.

Link: 
http://lkml.kernel.org/r/20131227132710.71906...@nehalam.linuxnetplumber.net
Signed-off-by: Bjorn Helgaas 
CC: Stephen Hemminger 
---
 Documentation/PCI/pci-iov-howto.txt |4 -
 drivers/pci/iov.c   |  119 ---
 drivers/pci/pci.h   |4 -
 include/linux/pci.h |4 -
 4 files changed, 131 deletions(-)

diff --git a/Documentation/PCI/pci-iov-howto.txt 
b/Documentation/PCI/pci-iov-howto.txt
index 86551cc72e03..2d91ae251982 100644
--- a/Documentation/PCI/pci-iov-howto.txt
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -68,10 +68,6 @@ To disable SR-IOV capability:
echo  0 > \
 /sys/bus/pci/devices//sriov_numvfs
 
-To notify SR-IOV core of Virtual Function Migration:
-(a) In the driver:
-   irqreturn_t pci_sriov_migration(struct pci_dev *dev);
-
 3.2 Usage example
 
 Following piece of code illustrates the usage of the SR-IOV API.
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 9dce7c5e2a77..de7a74782f92 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -170,97 +170,6 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
pci_dev_put(dev);
 }
 
-static int sriov_migration(struct pci_dev *dev)
-{
-   u16 status;
-   struct pci_sriov *iov = dev->sriov;
-
-   if (!iov->num_VFs)
-   return 0;
-
-   if (!(iov->cap & PCI_SRIOV_CAP_VFM))
-   return 0;
-
-   pci_read_config_word(dev, iov->pos + PCI_SRIOV_STATUS, &status);
-   if (!(status & PCI_SRIOV_STATUS_VFM))
-   return 0;
-
-   schedule_work(&iov->mtask);
-
-   return 1;
-}
-
-static void sriov_migration_task(struct work_struct *work)
-{
-   int i;
-   u8 state;
-   u16 status;
-   struct pci_sriov *iov = container_of(work, struct pci_sriov, mtask);
-
-   for (i = iov->initial_VFs; i < iov->num_VFs; i++) {
-   state = readb(iov->mstate + i);
-   if (state == PCI_SRIOV_VFM_MI) {
-   writeb(PCI_SRIOV_VFM_AV, iov->mstate + i);
-   state = readb(iov->mstate + i);
-   if (state == PCI_SRIOV_VFM_AV)
-   virtfn_add(iov->self, i, 1);
-   } else if (state == PCI_SRIOV_VFM_MO) {
-   virtfn_remove(iov->self, i, 1);
-   writeb(PCI_SRIOV_VFM_UA, iov->mstate + i);
-   state = readb(iov->mstate + i);
-   if (state == PCI_SRIOV_VFM_AV)
-   virtfn_add(iov->self, i, 0);
-   }
-   }
-
-   pci_read_config_word(iov->self, iov->pos + PCI_SRIOV_STATUS, &status);
-   status &= ~PCI_SRIOV_STATUS_VFM;
-   pci_write_config_word(iov->self, iov->pos + PCI_SRIOV_STATUS, status);
-}
-
-static int sriov_enable_migration(struct pci_dev *dev, int nr_virtfn)
-{
-   int bir;
-   u32 table;
-   resource_size_t pa;
-   struct pci_sriov *iov = dev->sriov;
-
-   if (nr_virtfn <= iov->initial_VFs)
-   return 0;
-
-   pci_read_config_dword(dev, iov->pos + PCI_SRIOV_VFM, &table);
-   bir = PCI_SRIOV_VFM_BIR(table);
-   if (bir > PCI_STD_RESOURCE_END)
-   return -EIO;
-
-   table = PCI_SRIOV_VFM_OFFSET(table);
-   if (table + nr_virtfn > pci_resource_len(dev, bir))
-   return -EIO;
-
-   pa = pci_resource_start(dev, bir) + table;
-   iov->mstate = ioremap(pa, nr_virtfn);
-   if (!iov->mstate)
-   return -ENOMEM;
-
-   INIT_WORK(&iov->mtask, sriov_migration_task);
-
-   iov->ctrl |= PCI_SRIOV_CTRL_VFM | PCI_SRIOV_CTRL_INTR;
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-
-   return 0;
-}
-
-static void sriov_disable_migration(struct pci_dev *dev)
-{
-   struct pci_sriov *iov = dev->sriov;
-
-   iov->ctrl &= ~(PCI_SRIOV_CTRL_VFM | PCI_SRIOV_CTRL_INTR);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-
-   cancel_work_sync(&iov->mtask);
-   iounmap(iov->mstate);
-}
-
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
int rc;
@@ -351,12 +260,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
goto failed;
}
 
-   if (iov->cap & PCI_SRIOV_CAP_VFM) {
-   rc = sriov_enable_migration(dev, nr_virtfn);
- 

Re: [PATCH v2 2/4] mei: Add include of

2014-02-19 Thread Bjorn Helgaas
On Wed, Feb 19, 2014 at 1:29 AM, Winkler, Tomas  wrote:
>
>
>>
>> We currently include  in , but I'm about to
>> remove that from linux/pci.h, so add explicit includes where needed.
>>
>> Signed-off-by: Bjorn Helgaas 
>> ---
>>  drivers/misc/mei/hw-me.h |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/misc/mei/hw-me.h b/drivers/misc/mei/hw-me.h
>> index 80bd829fbd9a..893d5119fa9b 100644
>> --- a/drivers/misc/mei/hw-me.h
>> +++ b/drivers/misc/mei/hw-me.h
>> @@ -20,6 +20,7 @@
>>  #define _MEI_INTERFACE_H_
>>
>>  #include 
>> +#include 
>>  #include "mei_dev.h"
>>  #include "client.h"
>
>
> Okay.

Thanks, I added this ack:

Acked-by: Tomas Winkler 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] Remove dead code

2014-02-20 Thread Bjorn Helgaas
On Tue, Feb 18, 2014 at 1:59 PM, Bjorn Helgaas  wrote:
> This is v2 of my rework of part of Stephen's patch [1].  My v1 posting,
> with a little discussion, is here [2].
>
> This removes SR-IOV migration support, which seems to be unused.
>
> Changes since v1:
>   - Drop the removal of MMIO exclusivity.
>   - Add a few includes of .  The SR-IOV migration
> support included irqreturn.h via linux/pci.h, and a few drivers relied
> on that.  So this v2 series updates those drivers to include
> irqreturn.h directly.
>
> Unless there's objection, I'd like to merge all these through my PCI tree
> for v3.15.

I applied these to my pci/dead-code branch, which is now in -next, for v3.15.

> Bjorn
>
> [1] 
> http://lkml.kernel.org/r/20131227132710.71906...@nehalam.linuxnetplumber.net
> [2] 
> https://lkml.kernel.org/r/20140130192011.25426.45702.st...@bhelgaas-glaptop.roam.corp.google.com
>
> ---
>
> Bjorn Helgaas (4):
>   misc: mic: Add include of 
>   mei: Add include of 
>   iommu/amd: Add include of 
>   PCI: Remove unused SR-IOV VF Migration support
>
>
>  Documentation/PCI/pci-iov-howto.txt |4 -
>  drivers/iommu/amd_iommu_types.h |1
>  drivers/misc/mei/hw-me.h|1
>  drivers/misc/mic/card/mic_device.h  |1
>  drivers/misc/mic/host/mic_device.h  |1
>  drivers/pci/iov.c   |  119 
> ---
>  drivers/pci/pci.h   |4 -
>  include/linux/pci.h |4 -
>  8 files changed, 4 insertions(+), 131 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2014-04-03 Thread Bjorn Helgaas
[+cc George]

On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson
 wrote:
> ...
> Great!  I'm still trying to figure out how to handle the quirk around
> Intel PCI-to-PCI bridge on the root complex as just another quirk.  I
> respin another version once I have that worked out.  Thanks,

Is anything happening here?  These buggy IOMMU/DMA source problems,
e.g., [1], have been lingering a long time, and I don't know if we're
stuck because I haven't been giving them enough attention, or if we
don't really have a good solution yet.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=42679
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 1/2] pci: Create PCIe requester ID interface

2014-04-04 Thread Bjorn Helgaas
On Thu, Apr 3, 2014 at 8:51 PM, Alex Williamson
 wrote:
> On Thu, 2014-04-03 at 15:48 -0600, Bjorn Helgaas wrote:
>> [+cc George]
>>
>> On Fri, Aug 2, 2013 at 10:59 AM, Alex Williamson
>>  wrote:
>> > ...
>> > Great!  I'm still trying to figure out how to handle the quirk around
>> > Intel PCI-to-PCI bridge on the root complex as just another quirk.  I
>> > respin another version once I have that worked out.  Thanks,
>>
>> Is anything happening here?  These buggy IOMMU/DMA source problems,
>> e.g., [1], have been lingering a long time, and I don't know if we're
>> stuck because I haven't been giving them enough attention, or if we
>> don't really have a good solution yet.
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=42679
>
> Sorry, no.  This has completely dropped off my radar.  I'll try to
> resurrect it and figure out how to move forward.  Thanks,

Not a problem; I'm just way behind on processing patches, so I'm
trying to clean up my backlog and take care of things that are waiting
on me.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: hpsa driver bug crack kernel down!

2014-04-09 Thread Bjorn Helgaas
[+cc Joerg, iommu list]

On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso  wrote:
> On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
>> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
>> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
>> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
>> > > > [+linux-scsi]
>> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
>> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
>> > > > > > Hi,
>> > > > > >
>> > > > > > The kernel is 3.14.0+ which is pulled just now.
>> > > > >
>> > > > > Cc'ing more people.
>> > > > >
>> > > > > While the hpsa driver appears to be involved in some way, I'm sure if
>> > > > > this is a related issue, but as of today's pull I'm getting another
>> > > > > problem that causes my DL980 not to come up.
>> > > > >
>> > > > > *Massive* amounts of:
>> > > > >
>> > > > > DMAR:[fault reason 02] Present bit in context entry is clear
>> > > > > dmar: DRHD: handling fault status reg 602
>> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 7f61e000
>> > > > >
>> > > > > Then:
>> > > > >
>> > > > > hpsa :03:00.0: Controller lockup detected: 0x
>> > > > > ...
>> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
>> > > > > ...
>> > > > >
>> > > > > Screenshot of the actual LOCKUP:
>> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png
>> > > > >
>> > > > > While I haven't bisected, things worked fine until at least until 
>> > > > > commit
>> > > > > 39de65aa2c3e (April 2nd).
>> > > > >
>> > > > > Any ideas?
>> > > >
>> > > > Well, it's either a DMA remapping issue or a hpsa one.  Your assertion
>> > > > that everything worked fine until 39de65aa2c3e would tend to vindicate
>> > > > hpsa,
>> >
>> > Hmm here you mean DMA, right?
>>
>> No, it vindicates the hpsa changes ... they don't seem to be causing
>> problems until something goes wrong with dma remapping.
>>
>> > > because all the hpsa changes went in before that under
>> > > Missing crucial info:
>> > >
>> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
>> > >
>> > > > Merge: 3e75c6d b2bff6c
>> > > > Author: Linus Torvalds 
>> > > > Date:   Tue Apr 1 18:49:04 2014 -0700
>> > > >
>> > > > Merge tag 'scsi-misc' of
>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> > > >
>> > > > can you revalidate that this commit works OK just to make sure?
>> >
>> > Ok so I don't see those DMA messages and system starts just fine. I'm
>> > thinking perhaps something broke after the IO mmu stuff in commit
>> > 3f583bc21977a608908b83d03ee2250426a5695c... could this be indirectly
>> > causing the CPU stalls and just blame hpsa in the path as a side effect?
>> >
>> > /me goes out to try the commit.
>>
>> That's my guess.  The DMAR messages are DMA remapping issues caused in
>> the IOMMU.  If I had to guess, I'd say the DMAR fault message is
>> indicating the IOMMU is calling for a mapping address before it can
>> satisfy the driver read request, which is causing the hang apparently in
>> the hpsa driver.
>>
>> I've added linux-pci to the cc; I think they deal with iommu issues on
>> x86.
>
> So that merge commit appears to be the culprit, I see both the DMA
> messages and the lockup blaming hpsa...

My understanding so far (please correct me if I'm wrong):

39de65aa2c3e OK ("Merge branch 'i2c/for-next'")
1a0b6abaea78 OK ("Merge tag 'scsi-misc'")
3f583bc21977 BAD ("Merge tag 'iommu-updates-v3.15'")
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: hpsa driver bug crack kernel down!

2014-04-10 Thread Bjorn Helgaas
On Thu, Apr 10, 2014 at 2:46 AM, Woodhouse, David
 wrote:

>> > > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear
>> > > >> > > > > dmar: DRHD: handling fault status reg 602
>> > > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 
>> > > >> > > > > 7f61e000
>
> That "Present bit in context entry is clear" fault means that we have
> not set up *any* mappings for this PCI device… on this IOMMU.
>
>> > Yes, specifically (finally done bisecting):
>> >
>> > commit 2e45528930388658603ea24d49cf52867b928d3e
>> > Author: Jiang Liu 
>> > Date:   Wed Feb 19 14:07:36 2014 +0800
>> >
>> > iommu/vt-d: Unify the way to process DMAR device scope array
>
> This commit is about how we decide which IOMMU a given PCI device is
> attached to.
>
> Thus, my first guess would be that we are quite happily setting up the
> requested DMA maps on the *wrong* IOMMU, and then taking faults when the
> device actually tries to do DMA.
>
> However, I'm not 100% convinced of that. The fault address looks
> suspiciously like a true physical address, not a virtual bus address of
> the type that we'd normally allocate for a dma_map_* operation. Those
> would start at 0xf000 and work downwards, typically.

I like the "wrong IOMMU (or no IOMMU at all)" theory.  If we didn't
connect the device with an IOMMU at all, that would explain the device
DMAing directly to a physical address, wouldn't it?

> Do you have 'iommu=pt' on the kernel command line? Can I see the full
> dmesg as this system boots, and also a copy of the DMAR table?
>
> We should also rate-limit DMA faults, which would avoid the lockup
> failure mode. Bjorn, what should an IOMMU driver *do* when it detects
> that a device is creating an endless stream of DMA faults and isn't
> aborting the transaction?

You mentioned that POWER with EEH does something intelligent in this
case, but I'm not familiar with that code.  We have AER support, which
can result in resetting a device, but I think DMA faults are reported
differently, and I don't think there's any nice existing way for PCI
to deal with them.  Maybe there should be, though.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

hpsa NULL pointer in hpsa_enter_performant_mode()

2014-04-10 Thread Bjorn Helgaas
[subject changed]

On Thu, Apr 10, 2014 at 2:45 PM,   wrote:
> On Wed, Apr 09, 2014 at 11:32:37PM -0700, Davidlohr Bueso wrote:
>> On Wed, 2014-04-09 at 22:03 -0600, Bjorn Helgaas wrote:
>> > [+cc Joerg, iommu list]
>> >
>> > On Wed, Apr 9, 2014 at 6:19 PM, Davidlohr Bueso  wrote:
>> > > On Wed, 2014-04-09 at 16:50 -0700, James Bottomley wrote:
>> > >> On Wed, 2014-04-09 at 16:40 -0700, Davidlohr Bueso wrote:
>> > >> > On Wed, 2014-04-09 at 16:10 -0700, James Bottomley wrote:
>> > >> > > On Wed, 2014-04-09 at 16:08 -0700, James Bottomley wrote:
>> > >> > > > [+linux-scsi]
>> > >> > > > On Wed, 2014-04-09 at 15:49 -0700, Davidlohr Bueso wrote:
>> > >> > > > > On Wed, 2014-04-09 at 10:39 +0800, Baoquan He wrote:
>> > >> > > > > > Hi,
>> > >> > > > > >
>> > >> > > > > > The kernel is 3.14.0+ which is pulled just now.
>> > >> > > > >
>> > >> > > > > Cc'ing more people.
>> > >> > > > >
>> > >> > > > > While the hpsa driver appears to be involved in some way, I'm 
>> > >> > > > > sure if
>> > >> > > > > this is a related issue, but as of today's pull I'm getting 
>> > >> > > > > another
>> > >> > > > > problem that causes my DL980 not to come up.
>> > >> > > > >
>> > >> > > > > *Massive* amounts of:
>> > >> > > > >
>> > >> > > > > DMAR:[fault reason 02] Present bit in context entry is clear
>> > >> > > > > dmar: DRHD: handling fault status reg 602
>> > >> > > > > dmar: DMAR:[DMA Read] Request device [02:00.0] fault addr 
>> > >> > > > > 7f61e000
>> > >> > > > >
>> > >> > > > > Then:
>> > >> > > > >
>> > >> > > > > hpsa :03:00.0: Controller lockup detected: 0x
>> > >> > > > > ...
>> > >> > > > > Workqueue: events hpsa_monitor_ctlr_worker [hpsa]
>> > >> > > > > ...
>> > >> > > > >
>> > >> > > > > Screenshot of the actual LOCKUP:
>> > >> > > > > http://stgolabs.net/hpsa-hard-lockup-3.14+.png
>> > >> > > > >
>> > >> > > > > While I haven't bisected, things worked fine until at least 
>> > >> > > > > until commit
>> > >> > > > > 39de65aa2c3e (April 2nd).
>> > >> > > > >
>> > >> > > > > Any ideas?
>> > >> > > >
>> > >> > > > Well, it's either a DMA remapping issue or a hpsa one.  Your 
>> > >> > > > assertion
>> > >> > > > that everything worked fine until 39de65aa2c3e would tend to 
>> > >> > > > vindicate
>> > >> > > > hpsa,
>> > >> >
>> > >> > Hmm here you mean DMA, right?
>> > >>
>> > >> No, it vindicates the hpsa changes ... they don't seem to be causing
>> > >> problems until something goes wrong with dma remapping.
>> > >>
>> > >> > > because all the hpsa changes went in before that under
>> > >> > > Missing crucial info:
>> > >> > >
>> > >> > > commit 1a0b6abaea78f73d9bc0a2f6df2d9e4c917cade1
>> > >> > >
>> > >> > > > Merge: 3e75c6d b2bff6c
>> > >> > > > Author: Linus Torvalds 
>> > >> > > > Date:   Tue Apr 1 18:49:04 2014 -0700
>> > >> > > >
>> > >> > > > Merge tag 'scsi-misc' of
>> > >> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> > >> > > >
>> > >> > > > can you revalidate that this commit works OK just to make sure?
>> > >> >
>> > >> > Ok so I don't see those DMA messages and system starts just fine. I'm
>> > >> > thinking perhaps something broke after the IO mmu stuff in commit
>> > >> > 3f583bc21977a608908b83d03ee2250426a5695c... co

Re: [Patch Part3 V1 17/22] pci, ACPI, iommu: enhance pci_root to support DMAR device hotplug

2014-04-24 Thread Bjorn Helgaas
On Tue, Apr 22, 2014 at 03:07:28PM +0800, Jiang Liu wrote:
> Finally enhance pci_root driver to support DMAR device hotplug when
> hot-plugging PCI host bridges.
> 
> Signed-off-by: Jiang Liu 
> ---
>  drivers/acpi/pci_root.c |   16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index d388f13d48b4..aa8f549869f3 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include/* for acpi_hest_init() */
> @@ -511,6 +512,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
>   struct acpi_pci_root *root;
>   acpi_handle handle = device->handle;
>   int no_aspm = 0, clear_aspm = 0;
> + bool hotadd = (system_state != SYSTEM_BOOTING);
>  
>   root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
>   if (!root)
> @@ -557,6 +559,11 @@ static int acpi_pci_root_add(struct acpi_device *device,
>   strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
>   device->driver_data = root;
>  
> + if (hotadd && dmar_device_hotplug(handle, true)) {

Apparently "dmar_device_hotplug(handle, true)" means "add a DMAR device,"
and "dmar_device_hotplug(device->handle, false)" means "remove a DMAR
device."  I'm not really a fan of interfaces where one of the arguments
selects between two completely different actions, because it's harder for a
casual reader to see what's going on.

I see how it simplifies your implementation a little bit, but I think it's
more important to simplify for the reader.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-27 Thread Bjorn Helgaas
On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote:
> The driver_override field allows us to specify the driver for a device
> rather than relying on the driver to provide a positive match of the
> device.  This shortcuts the existing process of looking up the vendor
> and device ID, adding them to the driver new_id, binding the device,
> then removing the ID, but it also provides a couple advantages.
> 
> First, the above existing process allows the driver to bind to any
> device matching the new_id for the window where it's enabled.  This is
> often not desired, such as the case of trying to bind a single device
> to a meta driver like pci-stub or vfio-pci.  Using driver_override we
> can do this deterministically using:
> 
> echo pci-stub > /sys/bus/pci/devices/:03:00.0/driver_override
> echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> echo :03:00.0 > /sys/bus/pci/drivers_probe
> 
> Previously we could not invoke drivers_probe after adding a device
> to new_id for a driver as we get non-deterministic behavior whether
> the driver we intend or the standard driver will claim the device.
> Now it becomes a deterministic process, only the driver matching
> driver_override will probe the device.
> 
> To return the device to the standard driver, we simply clear the
> driver_override and reprobe the device:
> 
> echo > /sys/bus/pci/devices/:03:00.0/driver_override
> echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> echo :03:00.0 > /sys/bus/pci/drivers_probe
> 
> Another advantage to this approach is that we can specify a driver
> override to force a specific binding or prevent any binding.  For
> instance when an IOMMU group is exposed to userspace through VFIO
> we require that all devices within that group are owned by VFIO.
> However, devices can be hot-added into an IOMMU group, in which case
> we want to prevent the device from binding to any driver (override
> driver = "none") or perhaps have it automatically bind to vfio-pci.
> With driver_override it's a simple matter for this field to be set
> internally when the device is first discovered to prevent driver
> matches.
> 
> Signed-off-by: Alex Williamson 
> Cc: Greg Kroah-Hartman 

Greg, are you going to weigh in on this?  It does seem to solve some real
problems.  ISTR you had an opinion once, but I don't know your current
thoughts.

Bjorn

> ---
> 
> v3: kfree() override buffer on device release, noted by Alex Graf
> 
> v2: Use strchr() as suggested by Guenter Roeck and adopted by the
> platform driver version of this same interface.
> 
>  Documentation/ABI/testing/sysfs-bus-pci |   21 
>  drivers/pci/pci-driver.c|   25 +--
>  drivers/pci/pci-sysfs.c |   40 
> +++
>  drivers/pci/probe.c |1 +
>  include/linux/pci.h |1 +
>  5 files changed, 85 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index a3c5a66..898ddc4 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -250,3 +250,24 @@ Description:
>   valid.  For example, writing a 2 to this file when sriov_numvfs
>   is not 0 and not 2 already will return an error. Writing a 10
>   when the value of sriov_totalvfs is 8 will return an error.
> +
> +What:/sys/bus/pci/devices/.../driver_override
> +Date:April 2014
> +Contact: Alex Williamson 
> +Description:
> + This file allows the driver for a device to be specified which
> + will override standard static and dynamic ID matching.  When
> + specified, only a driver with a name matching the value written
> + to driver_override will have an opportunity to bind to the
> + device.  The override is specified by writing a string to the
> + driver_override file (echo pci-stub > driver_override) and
> + may be cleared with an empty string (echo > driver_override).
> + This returns the device to standard matching rules binding.
> + Writing to driver_override does not automatically unbind the
> + device from its current driver or make any attempt to
> + automatically load the specified driver.  If no driver with a
> + matching name is currently loaded in the kernel, the device
> + will not bind to any driver.  This also allows devices to
> + opt-out of driver binding using a driver_override name such as
> + "none".  Only a single driver may be specified in the override,
> + there is no support for parsing delimiters.
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d911e0c..4393c12 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/dr

Re: [PATCH v4 05/16] PCI: quirk dma_alias_devfn for Marvell devices

2014-05-28 Thread Bjorn Helgaas
On Thu, May 22, 2014 at 05:07:55PM -0600, Alex Williamson wrote:
> Several Marvell devices and a JMicron device have a similar DMA
> requester ID problem to Ricoh, except they use function 1 as the
> PCIe requester ID.  Add a quirk for these to populate the DMA
> function alias bitmap.

What's the DMA function alias bitmap?

> Signed-off-by: Alex Williamson 
> ---
>  drivers/pci/quirks.c |   36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index bc8ebd9..923689f 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3349,6 +3349,42 @@ static void quirk_dma_func0_alias(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_RICOH, 0xe832, quirk_dma_func0_alias);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_RICOH, 0xe476, quirk_dma_func0_alias);
>  
> +static void quirk_dma_func1_alias(struct pci_dev *dev)
> +{
> + if (PCI_FUNC(dev->devfn) != 1) {
> + dev->dma_alias_devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 1);
> + dev->dev_flags |= PCI_DEV_FLAGS_DMA_ALIAS_DEVFN;
> + }
> +}
> +
> +/*
> + * Marvell 88SE9123 uses function 1 as the requester ID for DMA.  In some
> + * SKUs function 1 is present and is a legacy IDE controller, in other
> + * SKUs this function is not present, making this a ghost requester.
> + * https://bugzilla.kernel.org/show_bug.cgi?id=42679
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9123,
> +  quirk_dma_func1_alias);
> +/* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c14 */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9130,
> +  quirk_dma_func1_alias);
> +/* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c47 + c57 */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9172,
> +  quirk_dma_func1_alias);
> +/* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c59 */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x917a,
> +  quirk_dma_func1_alias);
> +/* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c46 */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x91a0,
> +  quirk_dma_func1_alias);
> +/* https://bugzilla.kernel.org/show_bug.cgi?id=42679#c49 */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9230,
> +  quirk_dma_func1_alias);
> +/* https://bugs.gentoo.org/show_bug.cgi?id=497630 */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_JMICRON,
> +  PCI_DEVICE_ID_JMICRON_JMB388_ESD,
> +  quirk_dma_func1_alias);
> +
>  static struct pci_dev *pci_func_0_dma_source(struct pci_dev *dev)
>  {
>   if (!PCI_FUNC(dev->devfn))
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 06/16] PCI: Quirk pci_for_each_dma_alias() for bridges

2014-05-28 Thread Bjorn Helgaas
On Thu, May 22, 2014 at 05:08:01PM -0600, Alex Williamson wrote:
> Several PCIe-to-PCI bridges fail to provide a PCIe capability, causing
> us to handle them as conventional PCI devices.  In some cases, this
> may be correct, in others it's not.  Add a dev_flag bit to identify
> devices to be handled as standard PCIe-to-PCI bridges.

Can you expand on the "in some cases, this may be correct, in others it's
not"?  Do you mean that for some *devices* it's correct to handle them as
conventional PCI, or in some *situations* it's correct?  Something else?

I'd like to either remove that sentence or add a little more information to
make it useful.

I guess this probably goes along with the tests in
quirk_use_pcie_bridge_dma_alias().

> Signed-off-by: Alex Williamson 
> ---
>  drivers/pci/search.c |   10 --
>  include/linux/pci.h  |2 ++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 2c19f3f..df38f73 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -88,8 +88,14 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>   continue;
>   }
>   } else {
> - ret = fn(tmp, PCI_DEVID(tmp->bus->number, tmp->devfn),
> -  data);
> + if (tmp->dev_flags & PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS)
> + ret = fn(tmp,
> +  PCI_DEVID(tmp->subordinate->number,
> +PCI_DEVFN(0, 0)), data);
> + else
> + ret = fn(tmp,
> +  PCI_DEVID(tmp->bus->number,
> +tmp->devfn), data);
>   if (ret)
>   return ret;
>   }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 9d4035c..85ab35e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -173,6 +173,8 @@ enum pci_dev_flags {
>   PCI_DEV_FLAGS_ACS_ENABLED_QUIRK = (__force pci_dev_flags_t) (1 << 3),
>   /* Flag to indicate the device uses dma_alias_devfn */
>   PCI_DEV_FLAGS_DMA_ALIAS_DEVFN = (__force pci_dev_flags_t) (1 << 4),
> + /* Use a PCIe-to-PCI bridge alias even if !pci_is_pcie */
> + PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS = (__force pci_dev_flags_t) (1 << 5),
>  };
>  
>  enum pci_irq_reroute_variant {
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 00/16] PCI/iommu: Fix DMA alias problems

2014-05-28 Thread Bjorn Helgaas
On Thu, May 22, 2014 at 05:07:23PM -0600, Alex Williamson wrote:
> For testing, this version can be found in my git tree:
> 
> git://github.com/awilliam/linux-vfio.git dma-alias-v4
> 
> Please report any issues.
> 
> v4:
>  - Change dma_func_alias to dma_alias_devfn, holding a single
>devfn to alias, thereby supporting aliases to the wrong slot.
>The DMA alias iterator is easily changed, but IOMMU grouping
>requires significant rework.  This is now done in IOMMU code
>rather than PCI code.
> 
>  - AMD-Vi - try to incorporate IVRS aliases dynamically into
>PCI alias quirks to make sure that our grouping remains the
>same.  Potentially this could end up reporting BIOS aliases
>that we can add to our list of quirks.
> 
> v3:
>  - Found several instances where I had PCI_SLOT when I meant
>PCI_FUNC.  Thanks to Andrew for spotting this.  This should
>fix the problem he was having with Ricoh quirks.  We also
>pruned down the func0 quirks to only those that we know are
>needed.  We can always add them back later.
> 
>  - Found a case in intel-iommu of using dev_is_pci() where I
>really wanted !dev_is_pci().  Fixed.
> 
> v2:
>  - Several new Marvell controllers added to quirks.  There's been
>a lot of success reported with this series in
>https://bugzilla.kernel.org/show_bug.cgi?id=42679
> 
>  - Add quirk for ASMedia and Tundra PCIe-to-PCI bridges that do
>not expose a PCIe capability.  These have been shown to use
>the standard PCIe-to-PCI bridge requester ID.
> 
>  - Fix copy/paste duplicate Ricoh quirk ID
> 
>  - Fixed AMD IOMMU for the "ghost" function case where the DMA
>alias is for an absent device.  The iommu rlookup table and
>data fields need to be initializes.
> 
>  - Fixed Intel interrupt remapping, I wasn't passing the target
>bus number, only the alias bus number.
> 
> These patches are split across PCI and IOMMU, but I've front-loaded
> all of the PCI infrastructure so that the first 7 patches can be
> applied to PCI-core, the IOMMU maintainers can pickup their patches,
> then we can finish with dead code removal.  Bjorn might also be
> willing to carry the IOMMU changes if the maintainers want to ack
> them.

I put 1-7 on a pci/iommu branch for v3.16.  I'm happy to include the rest,
too, given acks from Joerg and David.  Or if they prefer to take them all,
which might be easier than coordinating two trees, especially since there's
PCI stuff at the beginning and end, here's my ack for the PCI bits (patches
1-7 and 15-16):

Acked-by: Bjorn Helgaas 

If you want to send me updated changelogs for patches 5 & 6, I'll drop
those in.

Didn't you have more testing reports?  I see George's, but I thought there
were some others, too.

> Original description:
> 
> This series attempts to fix a couple issues we've had outstanding in
> the PCI/IOMMU code for a while.  The first issue is with devices that
> use the wrong requester ID for DMA transactions.  We already have a
> sort of half-baked attempt to fix this for several Ricoh devices, but
> the fix only helps them be useful through IOMMU groups, not the
> general DMA case.  There are also several Marvell devices which use
> use a different wrong requester ID and don't even fit into the DMA
> source idea.  This series creates a DMA alias iterator that will
> step through each possible alias of a device, allowing IOMMUs to
> insert mappings for both the device and its aliases.
> 
> Hand-in-hand with this is our broken pci_find_upstream_pcie_bridge()
> function, which is known to blowup when it finds itself suddenly at
> a PCIe device without crossing a PCIe-to-PCI bridge (as identified by
> the PCIe capability).  It also likes to make the invalid assumption
> that a PCIe device never has its requester ID masked by any usptream
> bus.  We can fix this using the above new DMA alias iterator, since
> that's effectively what this function was meant to do.
> 
> Finally, with all these helpers, it makes sense to consolidate code
> for determining IOMMU groups.  The first step in finding the root
> of a group is finding the final upstream DMA alias for the device,
> then applying additional ACS rules and incorporating device specific
> aliases.  As this is all common to PCI, create a single implementation
> and remove piles of code from the individual IOMMU drivers.
> 
> This series allows devices like the Marvell 88SE9123 to finally work
> on Linux with either AMD-Vi or VT-d enabled on the box.  I've
> collected device IDs from various bugs to support as many SKUs of
> these devices as possible, but I'm sure there are others that I've
> missed

Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-28 Thread Bjorn Helgaas
On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote:
> The driver_override field allows us to specify the driver for a device
> rather than relying on the driver to provide a positive match of the
> device.  This shortcuts the existing process of looking up the vendor
> and device ID, adding them to the driver new_id, binding the device,
> then removing the ID, but it also provides a couple advantages.
> 
> First, the above existing process allows the driver to bind to any
> device matching the new_id for the window where it's enabled.  This is
> often not desired, such as the case of trying to bind a single device
> to a meta driver like pci-stub or vfio-pci.  Using driver_override we
> can do this deterministically using:
> 
> echo pci-stub > /sys/bus/pci/devices/:03:00.0/driver_override
> echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> echo :03:00.0 > /sys/bus/pci/drivers_probe
> 
> Previously we could not invoke drivers_probe after adding a device
> to new_id for a driver as we get non-deterministic behavior whether
> the driver we intend or the standard driver will claim the device.
> Now it becomes a deterministic process, only the driver matching
> driver_override will probe the device.
> 
> To return the device to the standard driver, we simply clear the
> driver_override and reprobe the device:
> 
> echo > /sys/bus/pci/devices/:03:00.0/driver_override
> echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> echo :03:00.0 > /sys/bus/pci/drivers_probe
> 
> Another advantage to this approach is that we can specify a driver
> override to force a specific binding or prevent any binding.  For
> instance when an IOMMU group is exposed to userspace through VFIO
> we require that all devices within that group are owned by VFIO.
> However, devices can be hot-added into an IOMMU group, in which case
> we want to prevent the device from binding to any driver (override
> driver = "none") or perhaps have it automatically bind to vfio-pci.
> With driver_override it's a simple matter for this field to be set
> internally when the device is first discovered to prevent driver
> matches.
> 
> Signed-off-by: Alex Williamson 
> Cc: Greg Kroah-Hartman 

I applied this with Reviewed-bys/Acks from Konrad, Alexander, and Greg to
pci/virtualization for v3.16, thanks!

> ---
> 
> v3: kfree() override buffer on device release, noted by Alex Graf
> 
> v2: Use strchr() as suggested by Guenter Roeck and adopted by the
> platform driver version of this same interface.
> 
>  Documentation/ABI/testing/sysfs-bus-pci |   21 
>  drivers/pci/pci-driver.c|   25 +--
>  drivers/pci/pci-sysfs.c |   40 
> +++
>  drivers/pci/probe.c |1 +
>  include/linux/pci.h |1 +
>  5 files changed, 85 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index a3c5a66..898ddc4 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -250,3 +250,24 @@ Description:
>   valid.  For example, writing a 2 to this file when sriov_numvfs
>   is not 0 and not 2 already will return an error. Writing a 10
>   when the value of sriov_totalvfs is 8 will return an error.
> +
> +What:/sys/bus/pci/devices/.../driver_override
> +Date:April 2014
> +Contact: Alex Williamson 
> +Description:
> + This file allows the driver for a device to be specified which
> + will override standard static and dynamic ID matching.  When
> + specified, only a driver with a name matching the value written
> + to driver_override will have an opportunity to bind to the
> + device.  The override is specified by writing a string to the
> + driver_override file (echo pci-stub > driver_override) and
> + may be cleared with an empty string (echo > driver_override).
> + This returns the device to standard matching rules binding.
> + Writing to driver_override does not automatically unbind the
> + device from its current driver or make any attempt to
> + automatically load the specified driver.  If no driver with a
> + matching name is currently loaded in the kernel, the device
> + will not bind to any driver.  This also allows devices to
> + opt-out of driver binding using a driver_override name such as
> + "none".  Only a single driver may be specified in the override,
> + there is no support for parsing delimiters.
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d911e0c..4393c12 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -216,6 +216,13 @@ cons

Re: [Patch Part3 V3 00/21] Enable support of Intel DMAR device hotplug

2014-06-25 Thread Bjorn Helgaas
>   iommu/vt-d: match segment number when searching for dev_iotlb capable
> devices
>   iommu/vt-d: use correct domain id to flush virtual machine domains
>   iommu/vt-d: introduce helper functions to improve code readability
>   iommu/vt-d: introduce helper functions to make code symmetric for
> readability
>   iommu/vt-d: only dynamically allocate domain id for virtual domains
>   iommu/vt-d: fix possible invalid memory access caused by
> free_dmar_iommu()
>   iommu/vt-d: avoid freeing virtual machine domain in free_dmar_iommu()
>   iommu/VT-d: simplify include/linux/dmar.h
>   iommu/vt-d: change iommu_enable/disable_translation to return void
>   iommu/vt-d: simplify intel_unmap_sg() and kill duplicated code
>   iommu/vt-d: introduce helper domain_pfn_within_range() to simplify
> code
>   iommu/vt-d: introduce helper function iova_size() to improve code
> readability
>   iommu/vt-d: fix bug in computing domain's iommu_snooping flag
>   IOMMU/vt-d: introduce helper function dmar_walk_resources()
>   iommu/vt-d: dynamically allocate and free seq_id for DMAR units
>   iommu/vt-d: implement DMAR unit hotplug framework
>   iommu/vt-d: search _DSM method for DMAR hotplug
>   iommu/vt-d: enhance intel_irq_remapping driver to support DMAR unit
> hotplug
>   iommu/vt-d: enhance error recovery in function
> intel_enable_irq_remapping()
>   iommu/vt-d: enhance intel-iommu driver to support DMAR unit hotplug
>   pci, ACPI, iommu: enhance pci_root to support DMAR device hotplug

This looks a little sloppy; you have three different ways of
capitalizing the area:

  iommu/vt-d:
  iommu/VT-d:
  IOMMU/vt-d:

Also, "git log --oneline drivers/iommu" says that the convention for
drivers/iommu is to capitalize the first word of the rest of the
subject line.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-02 Thread Bjorn Helgaas
On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote:
> There are PCI devices that require a particular value written
> to the Multiple Message Enable (MME) register while aligned on
> power of 2 boundary value of actually used MSI vectors 'nvec'
> is a lesser of that MME value:
> 
>   roundup_pow_of_two(nvec) < 'Multiple Message Enable'
> 
> However the existing pci_enable_msi_block() interface is not
> able to configure such devices, since the value written to the
> MME register is calculated from the number of requested MSIs
> 'nvec':
> 
>   'Multiple Message Enable' = roundup_pow_of_two(nvec)

For MSI, software learns how many vectors a device requests by reading
the Multiple Message Capable (MMC) field.  This field is encoded, so a
device can only request 1, 2, 4, 8, etc., vectors.  It's impossible
for a device to request 3 vectors; it would have to round up that up
to a power of two and request 4 vectors.

Software writes similarly encoded values to MME to tell the device how
many vectors have been allocated for its use.  For example, it's
impossible to tell the device that it can use 3 vectors; the OS has to
round that up and tell the device it can use 4 vectors.

So if I understand correctly, the point of this series is to take
advantage of device-specific knowledge, e.g., the device requests 4
vectors via MMC, but we "know" the device is only capable of using 3.
Moreover, we tell the device via MME that 4 vectors are available, but
we've only actually set up 3 of them.

This makes me uneasy because we're lying to the device, and the device
is perfectly within spec to use all 4 of those vectors.  If anything
changes the number of vectors the device uses (new device revision,
firmware upgrade, etc.), this is liable to break.

Can you quantify the benefit of this?  Can't a device already use
MSI-X to request exactly the number of vectors it can use?  (I know
not all devices support MSI-X, but maybe we should just accept MSI for
what it is and encourage the HW guys to use MSI-X if MSI isn't good
enough.)

> In this case the result written to the MME register may not
> satisfy the aforementioned PCI devices requirement and therefore
> the PCI functions will not operate in a desired mode.

I'm not sure what you mean by "will not operate in a desired mode."
I thought this was an optimization to save vectors and that these
changes would be completely invisible to the hardware.

Bjorn

> This update introduces pci_enable_msi_partial() extension to
> pci_enable_msi_block() interface that accepts extra 'nvec_mme'
> argument which is then written to MME register while the value
> of 'nvec' is still used to setup as many interrupts as requested.
> 
> As result of this change, architecture-specific callbacks
> arch_msi_check_device() and arch_setup_msi_irqs() get an extra
> 'nvec_mme' parameter as well, but it is ignored for now.
> Therefore, this update is a placeholder for architectures that
> wish to support pci_enable_msi_partial() function in the future.
> 
> Cc: linux-...@vger.kernel.org
> Cc: linux-m...@linux-mips.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-s...@vger.kernel.org
> Cc: x...@kernel.org
> Cc: xen-de...@lists.xenproject.org
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Alexander Gordeev 
> ---
>  Documentation/PCI/MSI-HOWTO.txt |   36 ++--
>  arch/mips/pci/msi-octeon.c  |2 +-
>  arch/powerpc/kernel/msi.c   |4 +-
>  arch/s390/pci/pci.c |2 +-
>  arch/x86/kernel/x86_init.c  |2 +-
>  drivers/pci/msi.c   |   83 
> ++-
>  include/linux/msi.h |5 +-
>  include/linux/pci.h |3 +
>  8 files changed, 115 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
> index 10a9369..c8a8503 100644
> --- a/Documentation/PCI/MSI-HOWTO.txt
> +++ b/Documentation/PCI/MSI-HOWTO.txt
> @@ -195,14 +195,40 @@ By contrast with pci_enable_msi_range() function, 
> pci_enable_msi_exact()
>  returns zero in case of success, which indicates MSI interrupts have been
>  successfully allocated.
>  
> -4.2.4 pci_disable_msi
> +4.2.4 pci_enable_msi_partial
> +
> +int pci_enable_msi_partial(struct pci_dev *dev, int nvec, int nvec_mme)
> +
> +This variation on pci_enable_msi_exact() call allows a device driver to
> +setup 'nvec_mme' number of multiple MSIs with the PCI function, while
> +setup only 'nvec' (which could be a lesser of 'nvec_mme') number of MSIs
> +in operating system. The MSI specification only allows 'nvec_mme' to be
> +allocated in powers of two, up to a maximum of 2^5 (32).
> +
> +This function could be used when a PCI function is known to send 'nvec'
> +MSIs, but still requires a particular number of MSIs 'nvec_mme' to be
> +initialized with. As result, 'nvec_mme' - 'nvec' number of unused MSIs
> +do not waste system resources

Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-07 Thread Bjorn Helgaas
On Fri, Jul 4, 2014 at 2:58 AM, Alexander Gordeev  wrote:
> On Thu, Jul 03, 2014 at 09:20:52AM +, David Laight wrote:
>> From: Bjorn Helgaas
>> > On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote:
>> > > There are PCI devices that require a particular value written
>> > > to the Multiple Message Enable (MME) register while aligned on
>> > > power of 2 boundary value of actually used MSI vectors 'nvec'
>> > > is a lesser of that MME value:
>> > >
>> > >   roundup_pow_of_two(nvec) < 'Multiple Message Enable'
>> > >
>> > > However the existing pci_enable_msi_block() interface is not
>> > > able to configure such devices, since the value written to the
>> > > MME register is calculated from the number of requested MSIs
>> > > 'nvec':
>> > >
>> > >   'Multiple Message Enable' = roundup_pow_of_two(nvec)
>> >
>> > For MSI, software learns how many vectors a device requests by reading
>> > the Multiple Message Capable (MMC) field.  This field is encoded, so a
>> > device can only request 1, 2, 4, 8, etc., vectors.  It's impossible
>> > for a device to request 3 vectors; it would have to round up that up
>> > to a power of two and request 4 vectors.
>> >
>> > Software writes similarly encoded values to MME to tell the device how
>> > many vectors have been allocated for its use.  For example, it's
>> > impossible to tell the device that it can use 3 vectors; the OS has to
>> > round that up and tell the device it can use 4 vectors.
>> >
>> > So if I understand correctly, the point of this series is to take
>> > advantage of device-specific knowledge, e.g., the device requests 4
>> > vectors via MMC, but we "know" the device is only capable of using 3.
>> > Moreover, we tell the device via MME that 4 vectors are available, but
>> > we've only actually set up 3 of them.
>> ...
>>
>> Even if you do that, you ought to write valid interrupt information
>> into the 4th slot (maybe replicating one of the earlier interrupts).
>> Then, if the device does raise the 'unexpected' interrupt you don't
>> get a write to a random kernel location.
>
> I might be missing something, but we are talking of MSI address space
> here, aren't we? I am not getting how we could end up with a 'write'
> to a random kernel location when a unclaimed MSI vector sent. We could
> only expect a spurious interrupt at worst, which is handled and reported.

Yes, that's how I understand it.  With MSI, the OS specifies the a
single Message Address, e.g., a LAPIC address, and a single Message
Data value, e.g., a vector number that will be written to the LAPIC.
The device is permitted to modify some low-order bits of the Message
Data to send one of several vector numbers (the MME value tells the
device how many bits it can modify).

Bottom line, I think a spurious interrupt is the failure we'd expect
if a device used more vectors than the OS expects it to.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-07 Thread Bjorn Helgaas
On Fri, Jul 4, 2014 at 2:57 AM, Alexander Gordeev  wrote:
> On Wed, Jul 02, 2014 at 02:22:01PM -0600, Bjorn Helgaas wrote:
>> On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote:
>> > There are PCI devices that require a particular value written
>> > to the Multiple Message Enable (MME) register while aligned on
>> > power of 2 boundary value of actually used MSI vectors 'nvec'
>> > is a lesser of that MME value:
>> >
>> > roundup_pow_of_two(nvec) < 'Multiple Message Enable'
>> >
>> > However the existing pci_enable_msi_block() interface is not
>> > able to configure such devices, since the value written to the
>> > MME register is calculated from the number of requested MSIs
>> > 'nvec':
>> >
>> > 'Multiple Message Enable' = roundup_pow_of_two(nvec)
>>
>> For MSI, software learns how many vectors a device requests by reading
>> the Multiple Message Capable (MMC) field.  This field is encoded, so a
>> device can only request 1, 2, 4, 8, etc., vectors.  It's impossible
>> for a device to request 3 vectors; it would have to round up that up
>> to a power of two and request 4 vectors.
>>
>> Software writes similarly encoded values to MME to tell the device how
>> many vectors have been allocated for its use.  For example, it's
>> impossible to tell the device that it can use 3 vectors; the OS has to
>> round that up and tell the device it can use 4 vectors.
>
> Nod.
>
>> So if I understand correctly, the point of this series is to take
>> advantage of device-specific knowledge, e.g., the device requests 4
>> vectors via MMC, but we "know" the device is only capable of using 3.
>> Moreover, we tell the device via MME that 4 vectors are available, but
>> we've only actually set up 3 of them.
>
> Exactly.
>
>> This makes me uneasy because we're lying to the device, and the device
>> is perfectly within spec to use all 4 of those vectors.  If anything
>> changes the number of vectors the device uses (new device revision,
>> firmware upgrade, etc.), this is liable to break.
>
> If a device committed via non-MSI specific means to send only 3 vectors
> out of 4 available why should we expect it to send 4? The probability of
> a firmware sending 4/4 vectors in this case is equal to the probability
> of sending 5/4 or 16/4, with the very same reason - a bug in the firmware.
> Moreover, even vector 4/4 would be unexpected by the device driver, though
> it is perfectly within the spec.
>
> As of new device revision or firmware update etc. - it is just yet another
> case of device driver vs the firmware match/mismatch. Not including this
> change does not help here at all IMHO.
>
>> Can you quantify the benefit of this?  Can't a device already use
>> MSI-X to request exactly the number of vectors it can use?  (I know
>
> A Intel AHCI chipset requires 16 vectors written to MME while advertises
> (via AHCI registers) and uses only 6. Even attempt to init 8 vectors results
> in device's fallback to 1 (!).

Is the fact that it uses only 6 vectors documented in the public spec?

Is this a chipset erratum?  Are there newer versions of the chipset
that fix this, e.g., by requesting 8 vectors and using 6, or by also
supporting MSI-X?

I know this conserves vector numbers.  What does that mean in real
user-visible terms?  Are there systems that won't boot because of this
issue, and this patch fixes them?  Does it enable bigger
configurations, e.g., more I/O devices, than before?

Do you know how Windows handles this?  Does it have a similar interface?

As you can tell, I'm a little skeptical about this.  It's a fairly big
change, it affects the arch interface, it seems to be targeted for
only a single chipset (though it's widely used), and we already
support a standard solution (MSI-X, reducing the number of vectors
requested, or even operating with 1 vector).

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-09 Thread Bjorn Helgaas
On Tue, Jul 8, 2014 at 6:26 AM, Alexander Gordeev  wrote:
> On Mon, Jul 07, 2014 at 01:40:48PM -0600, Bjorn Helgaas wrote:
>> >> Can you quantify the benefit of this?  Can't a device already use
>> >> MSI-X to request exactly the number of vectors it can use?  (I know
>> >
>> > A Intel AHCI chipset requires 16 vectors written to MME while advertises
>> > (via AHCI registers) and uses only 6. Even attempt to init 8 vectors 
>> > results
>> > in device's fallback to 1 (!).
>>
>> Is the fact that it uses only 6 vectors documented in the public spec?
>
> Yes, it is documented in ICH specs.

Out of curiosity, do you have a pointer to this?  It looks like it
uses one vector per port, and I'm wondering if the reason it requests
16 is because there's some possibility of a part with more than 8
ports.

>> Is this a chipset erratum?  Are there newer versions of the chipset
>> that fix this, e.g., by requesting 8 vectors and using 6, or by also
>> supporting MSI-X?
>
> No, this is not an erratum. The value of 8 vectors is reserved and could
> cause undefined results if used.

As I read the spec (PCI 3.0, sec 6.8.1.3), if MMC contains 0b100
(requesting 16 vectors), the OS is allowed to allocate 1, 2, 4, 8, or
16 vectors.  If allocating 8 vectors and writing 0b011 to MME causes
undefined results, I'd say that's a chipset defect.

>> I know this conserves vector numbers.  What does that mean in real
>> user-visible terms?  Are there systems that won't boot because of this
>> issue, and this patch fixes them?  Does it enable bigger
>> configurations, e.g., more I/O devices, than before?
>
> Visibly, it ceases logging messages ('ahci :00:1f.2: irq 107 for
> MSI/MSI-X') for IRQs that are not shown in /proc/interrupts later.
>
> No, it does not enable/fix any existing hardware issue I am aware of.
> It just saves a couple of interrupt vectors, as Michael put it (10/16
> to be precise). However, interrupt vectors space is pretty much scarce
> resource on x86 and a risk of exhausting the vectors (and introducing
> quota i.e) has already been raised AFAIR.

I'm not too concerned about the logging issue.  If necessary, we could
tweak that message somehow.

Interrupt vector space is the issue I would worry about, but I think
I'm going to put this on the back burner until it actually becomes a
problem.

>> Do you know how Windows handles this?  Does it have a similar interface?
>
> Have no clue, TBH. Can try to investigate if you see it helpful.

No, don't worry about investigating.  I was just curious because if
Windows *did* support something like this, that would be an indication
that there's a significant problem here and we might need to solve it,
too.  But it sounds like we can safely ignore it for now.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-10 Thread Bjorn Helgaas
On Thu, Jul 10, 2014 at 4:11 AM, Alexander Gordeev  wrote:
> On Wed, Jul 09, 2014 at 10:06:48AM -0600, Bjorn Helgaas wrote:
>> Out of curiosity, do you have a pointer to this?  It looks like it
>
> I.e. ICH8 chapter 12.1.30 or ICH10 chapter 14.1.27
>
>> uses one vector per port, and I'm wondering if the reason it requests
>> 16 is because there's some possibility of a part with more than 8
>> ports.
>
> I doubt that is the reason. The only allowed MME values (powers of two)
> are 0b000, 0b001, 0b010 and 0b100. As you can see, only one bit is used -
> I would speculate it suits nicely to some hardware logic.
>
> BTW, apart from AHCI, it seems the reason MSI is not going to disappear
> (in a decade at least) is it is way cheaper to implement than MSI-X.
>
>> > No, this is not an erratum. The value of 8 vectors is reserved and could
>> > cause undefined results if used.
>>
>> As I read the spec (PCI 3.0, sec 6.8.1.3), if MMC contains 0b100
>> (requesting 16 vectors), the OS is allowed to allocate 1, 2, 4, 8, or
>> 16 vectors.  If allocating 8 vectors and writing 0b011 to MME causes
>> undefined results, I'd say that's a chipset defect.
>
> Well, the PCI spec does not prevent devices to have their own specs on top
> of it. Undefined results are meant on the device side here. On the MSI side
> these results are likely perfectly within the PCI spec. I feel speaking as
> a lawer here ;)

I disagree about this part.  The reason MSI is in the PCI spec is so
the OS can have generic support for it without having to put
device-specific support in every driver.  The PCI spec is clear that
the OS can allocate any number of vectors less than or equal to the
number requested via MMC.  The SATA device requests 16, and it should
be perfectly legal for the OS to give it 8.

It's interesting that the ICH10 spec (sec 14.1.27, thanks for the
reference) says MMC 100b means "8 MSI Capable".  That smells like a
hardware bug.  The PCI spec says:

  000 => 1 vector
  001 => 2 vectors
  010 => 4 vectors
  011 => 8 vectors
  100 => 16 vectors

The ICH10 spec seems to think 100 means 8 vectors (not 16 as the PCI
spec says), and that would fit with the rest of the ICH10 MME info.
If ICH10 was built assuming this table:

  000 => 1 vector
  001 => 2 vectors
  010 => 4 vectors
  100 => 8 vectors

then everything makes sense: the device requests 8 vectors, and the
behavior is defined in all possible MME cases (1, 2, 4, or 8 vectors
assigned).  The "Values '011b' to '111b' are reserved" part is still
slightly wrong, because the 100b value is in that range but is not
reserved, but that's a tangent.

So my guess (speculation, I admit) is that the intent was for ICH SATA
to request only 8 vectors, but because of this error, it requests 16.
Maybe some early MSI proposal used a different encoding for MMC and
MME, and ICH was originally designed using that.

>> Interrupt vector space is the issue I would worry about, but I think
>> I'm going to put this on the back burner until it actually becomes a
>> problem.
>
> I plan to try get rid of arch_msi_check_device() hook. Should I repost
> this series afterwards?

Honestly, I'm still not inclined to pursue this because of the API
complication and lack of concrete benefit.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch Part3 V4 21/21] pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug

2014-07-16 Thread Bjorn Helgaas
On Fri, Jul 11, 2014 at 12:19 AM, Jiang Liu  wrote:
> Finally enhance pci_root driver to support DMAR device hotplug when
> hot-plugging PCI host bridges.
>
> Signed-off-by: Jiang Liu 

Acked-by: Bjorn Helgaas 

I assume you'll merge this along with the rest of the series via some
non-PCI tree.

> ---
>  drivers/acpi/pci_root.c |   16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index d388f13d48b4..99c2b9761c12 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include  /* for acpi_hest_init() */
> @@ -511,6 +512,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
> struct acpi_pci_root *root;
> acpi_handle handle = device->handle;
> int no_aspm = 0, clear_aspm = 0;
> +   bool hotadd = system_state != SYSTEM_BOOTING;
>
> root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL);
> if (!root)
> @@ -557,6 +559,11 @@ static int acpi_pci_root_add(struct acpi_device *device,
> strcpy(acpi_device_class(device), ACPI_PCI_ROOT_CLASS);
> device->driver_data = root;
>
> +   if (hotadd && dmar_device_add(handle)) {
> +   result = -ENXIO;
> +   goto end;
> +   }
> +
> pr_info(PREFIX "%s [%s] (domain %04x %pR)\n",
>acpi_device_name(device), acpi_device_bid(device),
>root->segment, &root->secondary);
> @@ -583,7 +590,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
> root->segment, (unsigned int)root->secondary.start);
> device->driver_data = NULL;
> result = -ENODEV;
> -   goto end;
> +   goto remove_dmar;
> }
>
> if (clear_aspm) {
> @@ -597,7 +604,7 @@ static int acpi_pci_root_add(struct acpi_device *device,
> if (device->wakeup.flags.run_wake)
> device_set_run_wake(root->bus->bridge, true);
>
> -   if (system_state != SYSTEM_BOOTING) {
> +   if (hotadd) {
> pcibios_resource_survey_bus(root->bus);
> pci_assign_unassigned_root_bus_resources(root->bus);
> }
> @@ -607,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
> pci_unlock_rescan_remove();
> return 1;
>
> +remove_dmar:
> +   if (hotadd)
> +   dmar_device_remove(handle);
>  end:
> kfree(root);
> return result;
> @@ -625,6 +635,8 @@ static void acpi_pci_root_remove(struct acpi_device 
> *device)
>
> pci_remove_root_bus(root->bus);
>
> +   dmar_device_remove(device->handle);
> +
> pci_unlock_rescan_remove();
>
> kfree(root);
> --
> 1.7.10.4
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   5   >