from:"Bjorn Helgaas"

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-26 Thread Bjorn Helgaas

[+cc Dan]

On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
 wrote:
> Hi Justin,
>
> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>
>> In the manual it states:
>>
>> Data Direct I/O
>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
>> significantly reduces CPU overhead by leveraging CPU architectural
>> improvements and freeing the system resource for other tasks. The options
>> are Disabled and Enabled.
>>
>> Default is Enabled.
>>
>> When enabled in the kernel, I see the following:
>>
>> [0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>> [0.696487] ioatdma :00:04.0: channel error register unreachable
>> [0.696546] ioatdma :00:04.0: channel enumeration error
>> [0.696604] ioatdma :00:04.0: Intel(R) I/OAT DMA Engine init failed
>> [0.696721] ioatdma :00:04.1: channel error register unreachable
>> [0.696779] ioatdma :00:04.1: channel enumeration error
>> [0.697522] ioatdma :00:04.1: Intel(R) I/OAT DMA Engine init failed
>> [0.697617] ioatdma :00:04.2: channel error register unreachable
>> [0.697681] ioatdma :00:04.2: channel enumeration error
>> [0.697739] ioatdma :00:04.2: Intel(R) I/OAT DMA Engine init failed
>> [0.697831] ioatdma :00:04.3: channel error register unreachable
>> [0.697890] ioatdma :00:04.3: channel enumeration error
>> [0.697948] ioatdma :00:04.3: Intel(R) I/OAT DMA Engine init failed
>> [0.698037] ioatdma :00:04.4: channel error register unreachable
>> [0.698095] ioatdma :00:04.4: channel enumeration error
>> [0.698153] ioatdma :00:04.4: Intel(R) I/OAT DMA Engine init failed
>> [0.698245] ioatdma :00:04.5: channel error register unreachable
>> [0.698303] ioatdma :00:04.5: channel enumeration error
>> [0.698360] ioatdma :00:04.5: Intel(R) I/OAT DMA Engine init failed
>> [0.698449] ioatdma :00:04.6: channel error register unreachable
>> [0.698508] ioatdma :00:04.6: channel enumeration error
>> [0.698565] ioatdma :00:04.6: Intel(R) I/OAT DMA Engine init failed
>> [0.698676] ioatdma :00:04.7: channel error register unreachable
>> [0.698735] ioatdma :00:04.7: channel enumeration error
>> [0.698792] ioatdma :00:04.7: Intel(R) I/OAT DMA Engine init failed
>>
>> --
>>
>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
>> ignored, it fails to work:
>> [0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>
>> I assume this is something Supermicro has to fix?
>
> You are probably missing some kernel config option(s) :) - I did fight similar
> issues on a Fujitsu SandyBridge Xeon based server.
>
> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options.

Changing config options is not a valid fix for error messages like
this.  We should be able to make the config smarter by adding
dependencies or something, or else make the driver smart enough to
give a more useful diagnostic.

The "channel error register unreachable" message indicates that
pci_read_config_dword() failed.  The register in question
(IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
PCI config accessors for the extended config space (0x100-0xfff).  A
complete dmesg log should show that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-26 Thread Bjorn Helgaas

[Try Dan's current email address; sorry Dan]

On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas  wrote:
> [+cc Dan]
>
> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
>  wrote:
>> Hi Justin,
>>
>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>
>>> In the manual it states:
>>>
>>> Data Direct I/O
>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
>>> significantly reduces CPU overhead by leveraging CPU architectural
>>> improvements and freeing the system resource for other tasks. The options
>>> are Disabled and Enabled.
>>>
>>> Default is Enabled.
>>>
>>> When enabled in the kernel, I see the following:
>>>
>>> [0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>> [0.696487] ioatdma :00:04.0: channel error register unreachable
>>> [0.696546] ioatdma :00:04.0: channel enumeration error
>>> [0.696604] ioatdma :00:04.0: Intel(R) I/OAT DMA Engine init failed
>>> [0.696721] ioatdma :00:04.1: channel error register unreachable
>>> [0.696779] ioatdma :00:04.1: channel enumeration error
>>> [0.697522] ioatdma :00:04.1: Intel(R) I/OAT DMA Engine init failed
>>> [0.697617] ioatdma :00:04.2: channel error register unreachable
>>> [0.697681] ioatdma :00:04.2: channel enumeration error
>>> [0.697739] ioatdma :00:04.2: Intel(R) I/OAT DMA Engine init failed
>>> [0.697831] ioatdma :00:04.3: channel error register unreachable
>>> [0.697890] ioatdma :00:04.3: channel enumeration error
>>> [0.697948] ioatdma :00:04.3: Intel(R) I/OAT DMA Engine init failed
>>> [0.698037] ioatdma :00:04.4: channel error register unreachable
>>> [0.698095] ioatdma :00:04.4: channel enumeration error
>>> [0.698153] ioatdma :00:04.4: Intel(R) I/OAT DMA Engine init failed
>>> [0.698245] ioatdma :00:04.5: channel error register unreachable
>>> [0.698303] ioatdma :00:04.5: channel enumeration error
>>> [0.698360] ioatdma :00:04.5: Intel(R) I/OAT DMA Engine init failed
>>> [0.698449] ioatdma :00:04.6: channel error register unreachable
>>> [0.698508] ioatdma :00:04.6: channel enumeration error
>>> [0.698565] ioatdma :00:04.6: Intel(R) I/OAT DMA Engine init failed
>>> [0.698676] ioatdma :00:04.7: channel error register unreachable
>>> [0.698735] ioatdma :00:04.7: channel enumeration error
>>> [0.698792] ioatdma :00:04.7: Intel(R) I/OAT DMA Engine init failed
>>>
>>> --
>>>
>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
>>> ignored, it fails to work:
>>> [0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>
>>> I assume this is something Supermicro has to fix?
>>
>> You are probably missing some kernel config option(s) :) - I did fight 
>> similar
>> issues on a Fujitsu SandyBridge Xeon based server.
>>
>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU 
>> options.
>
> Changing config options is not a valid fix for error messages like
> this.  We should be able to make the config smarter by adding
> dependencies or something, or else make the driver smart enough to
> give a more useful diagnostic.
>
> The "channel error register unreachable" message indicates that
> pci_read_config_dword() failed.  The register in question
> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
> PCI config accessors for the extended config space (0x100-0xfff).  A
> complete dmesg log should show that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-26 Thread Bjorn Helgaas

On Mon, Nov 26, 2012 at 6:00 PM, Justin Piszcz  wrote:
>
>
> -Original Message-
> From: Bjorn Helgaas [mailto:bhelg...@google.com]
> Sent: Monday, November 26, 2012 8:00 PM
> To: Bruno Prémont
> Cc: Justin Piszcz; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan
> Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
> [Try Dan's current email address; sorry Dan]
>
> On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas  wrote:
>> [+cc Dan]
>>
>> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont
>>  wrote:
>>> Hi Justin,
>>>
>>> On Sat, 24 November 2012 "Justin Piszcz" wrote:
>>>> Is the following normal on an X9SRL-F board (bios 1.0a)?
>>>>
>>>> In the manual it states:
>>>>
>>>> Data Direct I/O
>>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology),
> which
>>>> significantly reduces CPU overhead by leveraging CPU architectural
>>>> improvements and freeing the system resource for other tasks. The
> options
>>>> are Disabled and Enabled.
>>>>
>>>> Default is Enabled.
>>>>
>>>> When enabled in the kernel, I see the following:
>>>>
>>>> [0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
>>>> [0.696487] ioatdma :00:04.0: channel error register unreachable
>>>> [0.696546] ioatdma :00:04.0: channel enumeration error
>>>> [0.696604] ioatdma :00:04.0: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.696721] ioatdma :00:04.1: channel error register unreachable
>>>> [0.696779] ioatdma :00:04.1: channel enumeration error
>>>> [0.697522] ioatdma :00:04.1: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.697617] ioatdma :00:04.2: channel error register unreachable
>>>> [0.697681] ioatdma :00:04.2: channel enumeration error
>>>> [0.697739] ioatdma :00:04.2: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.697831] ioatdma :00:04.3: channel error register unreachable
>>>> [0.697890] ioatdma :00:04.3: channel enumeration error
>>>> [0.697948] ioatdma :00:04.3: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.698037] ioatdma :00:04.4: channel error register unreachable
>>>> [0.698095] ioatdma :00:04.4: channel enumeration error
>>>> [0.698153] ioatdma :00:04.4: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.698245] ioatdma :00:04.5: channel error register unreachable
>>>> [0.698303] ioatdma :00:04.5: channel enumeration error
>>>> [0.698360] ioatdma :00:04.5: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.698449] ioatdma :00:04.6: channel error register unreachable
>>>> [0.698508] ioatdma :00:04.6: channel enumeration error
>>>> [0.698565] ioatdma :00:04.6: Intel(R) I/OAT DMA Engine init
> failed
>>>> [0.698676] ioatdma :00:04.7: channel error register unreachable
>>>> [0.698735] ioatdma :00:04.7: channel enumeration error
>>>> [0.698792] ioatdma :00:04.7: Intel(R) I/OAT DMA Engine init
> failed
>>>>
>>>> --
>>>>
>>>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query
> is
>>>> ignored, it fails to work:
>>>> [0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
>>>>
>>>> I assume this is something Supermicro has to fix?
>>>
>>> You are probably missing some kernel config option(s) :) - I did fight
> similar
>>> issues on a Fujitsu SandyBridge Xeon based server.
>>>
>>> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU
> options.
>>
>> Changing config options is not a valid fix for error messages like
>> this.  We should be able to make the config smarter by adding
>> dependencies or something, or else make the driver smart enough to
>> give a more useful diagnostic.
>>
>> The "channel error register unreachable" message indicates that
>> pci_read_config_dword() failed.  The register in question
>> (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have
>> PCI config accessors for the extended config space (0x100-0xfff).  A
>> complete dmesg log should show that.
>
> --
>
> Here is the full dmesg: (I went back to my older kernel, let me know if you
> need a dmesg w/ those options enabled)
> http://home.comcast.net/~jpiszcz/20121126/dmesg.txt

It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-28 Thread Bjorn Helgaas

On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
 wrote:
> When an error is detected on a PCIe device which does not have an
> AER-aware driver, prevent AER infrastructure from reporting
> successful error recovery.
>
> This is because the report_error_detected() function that gets
> called in the first phase of recovery process allows forward
> progress even when the driver for the device does not have AER
> capabilities. It seems that all callbacks (in pci_error_handlers
> structure) registered by drivers that gets called during error
> recovery are not mandatory. So the intention of the infrastructure
> design seems to be to allow forward progress even when a specific
> callback has not been registered by a driver. However, if error
> handler structure itself has not been registered, it doesn't make
> sense to allow forward progress.
>
> As a result of the current design, in the case of a single device
> having an AER-unaware driver or in the case of any function in a
> multi-function card having an AER-unaware driver, a successful
> recovery is reported.
>
> Typical scenario this happens is when a PCI device is detached
> from a KVM host and the pci-stub driver on the host claims the
> device. The pci-stub driver does not have error handling capabilities
> but the AER infrastructure still reports that the device recovered
> successfully.
>
> The changes proposed here leaves the device(s)in an unrecovered state
> if the driver for the device or for any device in the subtree
> does not have error handler structure registered. This reflects
> the true state of the device and prevents any partial recovery (or no
> recovery at all) reported as successful.
>
>
> v3:
>   - Fixed a checkpatch/build issue
>
> v2:
>   - Made changes so that all devices in the subtree have the error
> state set correctly.
>
>
> Reviewed-by: Linas Vepstas  gmail.com>
> Reviewed-by: Myron Stowe  redhat.com>
> Reviewed-by: Bjorn Helgaas  google.com>
> Signed-off-by: Vijay Mohan Pandarathil 

I added this to my -next branch, which will be merged for v3.8.

Please double-check that I resolved the merge conflict correctly.  It
should show up here in the next few minutes:

http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next

>  drivers/pci/pcie/aer/aerdrv.h  |  5 -
>  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++---
>  include/linux/pci.h|  3 +++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index 94a7598..22f840f 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -87,6 +87,9 @@ struct aer_broadcast_data {
>  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
> enum pci_ers_result new)
>  {
> +   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> +   return PCI_ERS_RESULT_NO_AER_DRIVER;
> +
> if (new == PCI_ERS_RESULT_NONE)
> return orig;
>
> @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum 
> pci_ers_result orig,
> break;
> case PCI_ERS_RESULT_DISCONNECT:
> if (new == PCI_ERS_RESULT_NEED_RESET)
> -   orig = new;
> +   orig = PCI_ERS_RESULT_NEED_RESET;
> break;
> default:
> break;
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
> b/drivers/pci/pcie/aer/aerdrv_core.c
> index 9f80cb0..b67f91a 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev *dev, 
> void *data)
>dev->driver ?
>"no AER-aware driver" : "no driver");
> }
> -   return 0;
> +
> +   /*
> +* If there's any device in the subtree that does not
> +* have an error_detected callback, returning
> +* PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> +* the subsequent mmio_enabled/slot_reset/resume
> +* callbacks of "any" device in the subtree. All the
> +* devices in the subtree are left in the error state
> +* without recovery.
> +*/
> +
> +   if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> +   vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> +   else
> +   vote = PCI_ERS_RESULT_NONE;
> +   } else {
&

Re: [PATCH] PCI/PM: Keep runtime PM enabled for unbound PCI devices

2012-11-28 Thread Bjorn Helgaas

On Tue, Nov 20, 2012 at 1:08 AM, Huang Ying  wrote:
> For unbound PCI devices, what we need is:
>
>  - Always in D0 state, because some devices does not work again after
>being put into D3 by the PCI bus.
>
>  - In SUSPENDED state if allowed, so that the parent devices can still
>be put into low power state.
>
> To satisfy these requirement, the runtime PM for the unbound PCI
> devices are disabled and set to SUSPENDED state.  One issue of this
> solution is that the PCI devices will be put into SUSPENDED state even
> if the SUSPENDED state is forbidden via the sysfs interface
> (.../power/control) of the device.  This is not an issue for most
> devices, because most PCI devices are not used at all if unbounded.
> But there are exceptions.  For example, unbound VGA card can be used
> for display, but suspend its parents make it stop working.
>
> To fix the issue, we keep the runtime PM enabled when the PCI devices
> are unbound.  But the runtime PM callbacks will do nothing if the PCI
> devices are unbound.  This way, we can put the PCI devices into
> SUSPENDED state without put the PCI devices into D3 state.

Does this fix a reported problem?  Is there a bug report URL?  What
problem would a user notice?

This doesn't look critical enough to try to put in v3.7 (correct me if
I'm wrong).  It's getting pretty late for the v3.8 merge window, but
it looks like it qualifies as a bug fix that we would merge even after
the merge window, so I'll put it in my -next branch headed for v3.8.
Then it can be backported to v3.6 and v3.7.

> Signed-off-by: Huang Ying 
> Cc: Rafael J. Wysocki 
> Cc: Alan Stern 
> CC: sta...@vger.kernel.org  # v3.6+
> ---
>  drivers/pci/pci-driver.c |   69 
> +++
>  drivers/pci/pci.c|2 +
>  2 files changed, 43 insertions(+), 28 deletions(-)
>
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1900,6 +1900,8 @@ void pci_pm_init(struct pci_dev *dev)
> u16 pmc;
>
> pm_runtime_forbid(&dev->dev);
> +   pm_runtime_set_active(&dev->dev);
> +   pm_runtime_enable(&dev->dev);
> device_enable_async_suspend(&dev->dev);
> dev->wakeup_prepared = false;
>
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -256,31 +256,26 @@ struct drv_dev_and_id {
>  static long local_pci_probe(void *_ddi)
>  {
> struct drv_dev_and_id *ddi = _ddi;
> -   struct device *dev = &ddi->dev->dev;
> -   struct device *parent = dev->parent;
> +   struct pci_dev *pci_dev = ddi->dev;
> +   struct pci_driver *pci_drv = ddi->drv;
> +   struct device *dev = &pci_dev->dev;
> int rc;
>
> -   /* The parent bridge must be in active state when probing */
> -   if (parent)
> -   pm_runtime_get_sync(parent);
> -   /* Unbound PCI devices are always set to disabled and suspended.
> -* During probe, the device is set to enabled and active and the
> -* usage count is incremented.  If the driver supports runtime PM,
> -* it should call pm_runtime_put_noidle() in its probe routine and
> -* pm_runtime_get_noresume() in its remove routine.
> -*/
> -   pm_runtime_get_noresume(dev);
> -   pm_runtime_set_active(dev);
> -   pm_runtime_enable(dev);
> -
> -   rc = ddi->drv->probe(ddi->dev, ddi->id);
> +   /*
> +* Unbound PCI devices are always put in D0, regardless of
> +* runtime PM status.  During probe, the device is set to
> +* active and the usage count is incremented.  If the driver
> +* supports runtime PM, it should call pm_runtime_put_noidle()
> +* in its probe routine and pm_runtime_get_noresume() in its
> +* remove routine.
> +*/
> +   pm_runtime_get_sync(dev);
> +   pci_dev->driver = pci_drv;
> +   rc = pci_drv->probe(pci_dev, ddi->id);
> if (rc) {
> -   pm_runtime_disable(dev);
> -   pm_runtime_set_suspended(dev);
> -   pm_runtime_put_noidle(dev);
> +   pci_dev->driver = NULL;
> +   pm_runtime_put_sync(dev);
> }
> -   if (parent)
> -   pm_runtime_put(parent);
> return rc;
>  }
>
> @@ -330,10 +325,8 @@ __pci_device_probe(struct pci_driver *dr
> id = pci_match_device(drv, pci_dev);
> if (id)
> error = pci_call_probe(drv, pci_dev, id);
> -   if (error >= 0) {
> -   pci_dev->driver = drv;
> +   if (error >= 0)
> error = 0;
> -   }
> }
> return error;
>  }
> @@ -369,9 +362,7 @@ static int pci_device_remove(struct devi
> }
>
> /* Undo the runtime PM settings in local_pci_probe() */
> -   pm_runtime_disable(dev);
> -   pm_runtime_set_suspended(dev);
> -   pm_runtime_put_noidle(dev);
> +   pm_runtime_put_sync(dev);
>
> /*
>  * If the d

Re: [PATCH v2 RESEND] Add NumaChip remote PCI support

2012-11-28 Thread Bjorn Helgaas

On Wed, Nov 21, 2012 at 1:39 AM, Daniel J Blueman
 wrote:
> Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
> preventing access to AMD Northbridges which shouldn't respond.
>
> v2: Use PCI_DEVFN in precomputed constant limit; drop unneeded includes
>
> Signed-off-by: Daniel J Blueman 
> ---
>  arch/x86/include/asm/numachip/numachip.h |   20 +
>  arch/x86/kernel/apic/apic_numachip.c |2 +
>  arch/x86/pci/Makefile|1 +
>  arch/x86/pci/numachip.c  |  134 
> ++
>  4 files changed, 157 insertions(+)
>  create mode 100644 arch/x86/include/asm/numachip/numachip.h
>  create mode 100644 arch/x86/pci/numachip.c
>
> diff --git a/arch/x86/include/asm/numachip/numachip.h 
> b/arch/x86/include/asm/numachip/numachip.h
> new file mode 100644
> index 000..d35e71a
> --- /dev/null
> +++ b/arch/x86/include/asm/numachip/numachip.h
> @@ -0,0 +1,20 @@
> +/*
> + * This file is subject to the terms and conditions of the GNU General Public
> + * License.  See the file "COPYING" in the main directory of this archive
> + * for more details.
> + *
> + * Numascale NumaConnect-specific header file
> + *
> + * Copyright (C) 2012 Numascale AS. All rights reserved.
> + *
> + * Send feedback to 
> + *
> + */
> +
> +#ifndef _ASM_X86_NUMACHIP_NUMACHIP_H
> +#define _ASM_X86_NUMACHIP_NUMACHIP_H
> +
> +extern int __init pci_numachip_init(void);
> +
> +#endif /* _ASM_X86_NUMACHIP_NUMACHIP_H */
> +
> diff --git a/arch/x86/kernel/apic/apic_numachip.c 
> b/arch/x86/kernel/apic/apic_numachip.c
> index a65829a..9c2aa89 100644
> --- a/arch/x86/kernel/apic/apic_numachip.c
> +++ b/arch/x86/kernel/apic/apic_numachip.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -179,6 +180,7 @@ static int __init numachip_system_init(void)
> return 0;
>
> x86_cpuinit.fixup_cpu_id = fixup_cpu_id;
> +   x86_init.pci.arch_init = pci_numachip_init;
>
> map_csrs();
>
> diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
> index 3af5a1e..ee0af58 100644
> --- a/arch/x86/pci/Makefile
> +++ b/arch/x86/pci/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_STA2X11)   += sta2x11-fixup.o
>  obj-$(CONFIG_X86_VISWS)+= visws.o
>
>  obj-$(CONFIG_X86_NUMAQ)+= numaq_32.o
> +obj-$(CONFIG_X86_NUMACHIP) += numachip.o

It looks like this depends on CONFIG_PCI_MMCONFIG for
pci_mmconfig_lookup().  Are there config constraints that force
CONFIG_PCI_MMCONFIG=y when CONFIG_X86_NUMACHIP=y?

>  obj-$(CONFIG_X86_INTEL_MID)+= mrst.o
>
> diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c
> new file mode 100644
> index 000..3773e05
> --- /dev/null
> +++ b/arch/x86/pci/numachip.c
> @@ -0,0 +1,129 @@
> +/*
> + * This file is subject to the terms and conditions of the GNU General Public
> + * License.  See the file "COPYING" in the main directory of this archive
> + * for more details.
> + *
> + * Numascale NumaConnect-specific PCI code
> + *
> + * Copyright (C) 2012 Numascale AS. All rights reserved.
> + *
> + * Send feedback to 
> + *
> + * PCI accessor functions derived from mmconfig_64.c
> + *
> + */
> +
> +#include 
> +#include 
> +
> +static u8 limit __read_mostly;
> +
> +static inline char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, 
> unsigned int devfn)
> +{
> +   struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
> +
> +   if (cfg && cfg->virt)
> +   return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 
> 12));
> +   return NULL;
> +}

Most of this file is copied directly from mmconfig_64.c (as you
mentioned above).  I wonder if we could avoid the code duplication by
making the pci_dev_base() implementation in mmconfig_64.c a weak
definition.  Then you could just supply a non-weak pci_dev_base() here
that would override that default version.  Your version would look
something like:

  char __iomem *pci_dev_base(unsigned int seg, unsigned int bus,
unsigned int devfn)
  {
  struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);

  if (cfg && cfg->virt && devfn < limit)
  return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
  return NULL;
  }

That would be different from what you have in this patch because reads
& writes to devices above "limit" would return -EINVAL rather than 0
as you do here.  Would that be a problem?

> +static int pci_mmcfg_read_numachip(unsigned int seg, unsigned int bus,
> + unsigned int devfn, int reg, int len, u32 *value)
> +{
> +   char __iomem *addr;
> +
> +   /* Why do we have this when nobody checks it. How about a BUG()!? -AK 
> */
> +   if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
> +err:   *value = -1;
> +   return -EINVAL;
> +   }
> +
> +   /* Ensure AMD Northbridges don't decode reads to other devices */
> +   if (unlikely(bus == 0 && d

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-28 Thread Bjorn Helgaas

On Tue, Nov 27, 2012 at 6:49 AM, Justin Piszcz  wrote:
>
>> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on?
>
> ===> FOR I/OAT DMA
> Latest status, it _appears_ its working on the X9SRL-F now, thank you!
>
> 1) Supermicro X9SRL-F (GOOD)
> [0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00
> [0.738719] ioatdma :00:04.0: irq 75 for MSI/MSI-X
> [0.739088] ioatdma :00:04.1: irq 76 for MSI/MSI-X
> [0.739408] ioatdma :00:04.2: irq 77 for MSI/MSI-X
> [0.739739] ioatdma :00:04.3: irq 78 for MSI/MSI-X
> [0.740040] ioatdma :00:04.4: irq 79 for MSI/MSI-X
> [0.740342] ioatdma :00:04.5: irq 80 for MSI/MSI-X
> [0.740670] ioatdma :00:04.6: irq 81 for MSI/MSI-X
> [0.740971] ioatdma :00:04.7: irq 82 for MSI/MSI-X

Good.  You have two issues, and I'm going to separate them and only
address the first one here.  I opened a bug report [1] against the
IOAT driver.  It should do something more useful when
CONFIG_PCI_MMCONFIG=n so we don't have to debug this again in the
future.  But otherwise, it sounds like this issue is resolved.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=51101
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-28 Thread Bjorn Helgaas

On Tue, Nov 27, 2012 at 7:35 AM, Justin Piszcz  wrote:
>
>
> -Original Message-
> From: Justin Piszcz [mailto:jpis...@lucidpixels.com]
> Sent: Tuesday, November 27, 2012 8:56 AM
> To: 'Bjorn Helgaas'
> Cc: 'Bruno Prémont'; supp...@supermicro.com; linux-kernel@vger.kernel.org;
> 'Dan Williams'
> Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
>
>> It is _not_ working on the:
>
>> 2) Supermicro X8DTH-F (the boot drive in this system is running off a
> PCI-e
>> card, could the IRQ for the I/O controller be getting re-mapped and
> fail?)--
>> worse case I can move the SSD from the 6.0gbpa SATA card to the
> motherboard
>> and see if that works, but that kind of defeats the purpose of a 6.0gbps
>> SATA SSD.
>
> When I removed the Highpoint 2-port SATA card and plugged it into the
> motherboard, the system boots (plugged the SSD into the motherboard).
> So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or
> it will fail to initialize the Highpoint 2-port SATA controller card!
> I also tried upgrading the BIOS (of the mobo, no diff)
> I also tried just leaving the SATA card in and plugging it into the
> motherboard (no diff)
> Removed the Highpoint 2-port SATA card and then success, it would be nice to
> use that card with IOMMU support though, is it just not compatible
> (marvell-problem?) or is a driver bug?  Based on the pictures/etc sent
> earlier?

I would guess this is a core bug, but it's hard to tell without more
information.

If you boot with "intel_iommu=off", I would guess the Highpoint card
would work (this should have the same effect as turning off
CONFIG_INTEL_IOMMU).  I'd like to compare the complete dmesg log for
that boot with the one that fails.

It sounds like it might be hard to collect the log for the failing
case -- you said the boot fails when the Highpoint card is in the
system even if the SSD is connected to the motherboard instead of the
Highpoint card.  The panic in the photo2 image looks like it's just a
failure to mount the root filesystem, which is what I'd expect if we
can't find the SSD.  It seems like we ought to be able to *boot* with
the SSD connected to the motherboard, even if the Highpoint card
doesn't work.  But worst-case, a video of the failing boot might be
enough, especially if you can slow it down with "boot_delay="

> $ dmesg|grep -i iommu
> [0.055134] dmar: IOMMU 0: reg_base_addr cfdfe000 ver 1:0 cap
> c90780106f0462 ecap f020f6
> [0.055396] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap
> c90780106f0462 ecap f020f6
> [0.760665] IOMMU 0 0xcfdfe000: using Queued invalidation
> [0.760803] IOMMU 1 0xfecfe000: using Queued invalidation
> [0.760937] IOMMU: Setting RMRR:
> [0.761102] IOMMU: Setting identity map for device :00:1d.0
> [0xbf7ec000 - 0xbf7f]
> [0.761329] IOMMU: Setting identity map for device :00:1d.1
> [0xbf7ec000 - 0xbf7f]
> [0.761542] IOMMU: Setting identity map for device :00:1d.2
> [0xbf7ec000 - 0xbf7f]
> [0.761758] IOMMU: Setting identity map for device :00:1d.7
> [0xbf7ec000 - 0xbf7f]
> [0.761974] IOMMU: Setting identity map for device :00:1a.0
> [0xbf7ec000 - 0xbf7f]
> [0.762190] IOMMU: Setting identity map for device :00:1a.1
> [0xbf7ec000 - 0xbf7f]
> [0.762407] IOMMU: Setting identity map for device :00:1a.2
> [0xbf7ec000 - 0xbf7f]
> [0.762620] IOMMU: Setting identity map for device :00:1a.7
> [0xbf7ec000 - 0xbf7f]
> [0.762816] IOMMU: Setting identity map for device :00:1d.0 [0xec000
> - 0xe]
> [0.763010] IOMMU: Setting identity map for device :00:1d.1 [0xec000
> - 0xe]
> [0.763197] IOMMU: Setting identity map for device :00:1d.2 [0xec000
> - 0xe]
> [0.763382] IOMMU: Setting identity map for device :00:1d.7 [0xec000
> - 0xe]
> [0.763567] IOMMU: Setting identity map for device :00:1a.0 [0xec000
> - 0xe]
> [0.763749] IOMMU: Setting identity map for device :00:1a.1 [0xec000
> - 0xe]
> [0.763934] IOMMU: Setting identity map for device :00:1a.2 [0xec000
> - 0xe]
> [0.764127] IOMMU: Setting identity map for device :00:1a.7 [0xec000
> - 0xe]
> [0.764311] IOMMU: Prepare 0-16MiB unity mapping for LPC
> [0.764465] IOMMU: Setting identity map for device :00:1f.0 [0x0 -
> 0xff]
>
> --
>
>
> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
> this a Linux/ASPM implementation issue?
> [0.632170]  pci:ff: ACPI _OSC support notification failed, disabling
> PCIe ASPM
> [

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-29 Thread Bjorn Helgaas

On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz  wrote:
>
>
> -Original Message-
> From: Robert Hancock [mailto:hancock...@gmail.com]
> Sent: Wednesday, November 28, 2012 7:55 PM
> To: Justin Piszcz
> Cc: Bjorn Helgaas; Bruno Prémont; supp...@supermicro.com;
> linux-kernel@vger.kernel.org; Dan Williams
> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
> bug question
>
> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz 
> wrote:
>>
>>
>> -Original Message-
>> From: Robert Hancock [mailto:hancock...@gmail.com]
>> Sent: Wednesday, November 28, 2012 7:35 PM
>> To: Justin Piszcz
>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; supp...@supermicro.com;
>> linux-kernel@vger.kernel.org; 'Dan Williams'
>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
> ACPI/firmware
>> bug question
>>
>>
>> What does lspci -vv show on that controller? Not sure what actual
>> chipset that controller is, but there's a known issue with some Marvell
>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>> memory read/write requests from the wrong PCI function ID and the IOMMU
>> rightly denies access as the function listed in the requests doesn't
>> have any mapping to that memory. I don't think there's presently a
>> workaround other than disabling DMAR. We could (and likely should) be
>> detecting that device and adding some kind of quirk for it.
>>
>> That sounds likely...
>> It is shown below:
>>
>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>> Adapter
>>
>> lspci -vv output:
>>
>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>> controller
>
> Yeah, that's one of those controllers I think. But I can't tell from
> the bit of the dmesg you posted exactly what's going on. Can you post
> a full boot log from having the card installed and some drive attached
> (by putting the boot drive on another controller for example)?
>
>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>> this a Linux/ASPM implementation issue?
>>> [0.632170]  pci:ff: ACPI _OSC support notification failed,
>> disabling
>>> PCIe ASPM
>>> [0.632239]  pci:ff: Unable to request _OSC control (_OSC support
>>> mask: 0x08)
>>
>> What's the full dmesg from this machine (or is it already posted
> somewhere)?
>>
>> It is now available here:
>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>
>> Is that the same boot log? It doesn't have this error in it.
>
> Yes, the error is here: (its towards the bottom)
>
>  [7.973015] ata14.00: qc timeout (cmd 0xa1)
> [8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [   51.824682] dmar: DRHD: handling fault status reg 502
> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set

You have these devices:

pci :04:00.0: [10de:01d3] type 00 class 0x03 nVidia G72
pci :84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
pci :84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE

I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
and if you get rid of that driver, they'll probably go away.

But this 84:00.1 DMAR error:

dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff0
DMAR:[fault reason 02] Present bit in context entry is clear

looks like the probable cause of the Marvell issue.  It looks similar
to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
DMAR rejects DMA that appears to be from bb:dd.1.

Another report that's even more similar is
https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
both bb:dd.0 and bb:dd.1 exist (as in

Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-29 Thread Bjorn Helgaas

[+cc Jeff, linux-ide, David, Joerg, iommu]

On Thu, Nov 29, 2012 at 7:39 PM, Robert Hancock  wrote:
> On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas  wrote:
>> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz  
>> wrote:
>>>
>>>
>>> -Original Message-
>>> From: Robert Hancock [mailto:hancock...@gmail.com]
>>> Sent: Wednesday, November 28, 2012 7:55 PM
>>> To: Justin Piszcz
>>> Cc: Bjorn Helgaas; Bruno Prémont; supp...@supermicro.com;
>>> linux-kernel@vger.kernel.org; Dan Williams
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>>> bug question
>>>
>>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz 
>>> wrote:
>>>>
>>>>
>>>> -Original Message-
>>>> From: Robert Hancock [mailto:hancock...@gmail.com]
>>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>>> To: Justin Piszcz
>>>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; supp...@supermicro.com;
>>>> linux-kernel@vger.kernel.org; 'Dan Williams'
>>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>>> ACPI/firmware
>>>> bug question
>>>>
>>>>
>>>> What does lspci -vv show on that controller? Not sure what actual
>>>> chipset that controller is, but there's a known issue with some Marvell
>>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>>> rightly denies access as the function listed in the requests doesn't
>>>> have any mapping to that memory. I don't think there's presently a
>>>> workaround other than disabling DMAR. We could (and likely should) be
>>>> detecting that device and adding some kind of quirk for it.
>>>>
>>>> That sounds likely...
>>>> It is shown below:
>>>>
>>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>>> Adapter
>>>>
>>>> lspci -vv output:
>>>>
>>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>>> controller
>>>
>>> Yeah, that's one of those controllers I think. But I can't tell from
>>> the bit of the dmesg you posted exactly what's going on. Can you post
>>> a full boot log from having the card installed and some drive attached
>>> (by putting the boot drive on another controller for example)?
>>>
>>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>>> this a Linux/ASPM implementation issue?
>>>>> [0.632170]  pci:ff: ACPI _OSC support notification failed,
>>>> disabling
>>>>> PCIe ASPM
>>>>> [0.632239]  pci:ff: Unable to request _OSC control (_OSC support
>>>>> mask: 0x08)
>>>>
>>>> What's the full dmesg from this machine (or is it already posted
>>> somewhere)?
>>>>
>>>> It is now available here:
>>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>>
>>>> Is that the same boot log? It doesn't have this error in it.
>>>
>>> Yes, the error is here: (its towards the bottom)
>>>
>>>  [7.973015] ata14.00: qc timeout (cmd 0xa1)
>>> [8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   51.824682] dmar: DRHD: handling fault status reg 502
>>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>>
>> You have these devices:
>>
>> pci :04:00.0: [10de:01d3] type 00 class 0x030

Re: [RFC PATCH v5 4/8] ACPI, PCI: avoid building pci_slot as module

2013-02-03 Thread Bjorn Helgaas

On Sun, Feb 3, 2013 at 1:18 PM, Rafael J. Wysocki  wrote:
> On Monday, January 28, 2013 02:09:22 PM Bjorn Helgaas wrote:
>> On Sun, Jan 20, 2013 at 5:01 PM, Rafael J. Wysocki  wrote:
>> > On Saturday, January 19, 2013 12:07:42 AM Jiang Liu wrote:
>> >> As discussed in thread at https://patchwork.kernel.org/patch/1946851/,
>> >> there's no value in supporting CONFIG_ACPI_PCI_SLOT=m any more.
>> >> So change Kconfig and code to only support building pci_slot as
>> >> built-in driver.
>> >>
>> >> Signed-off-by: Jiang Liu 
>> >
>> > Acked-by: Rafael J. Wysocki 
>>
>> Acked-by: Bjorn Helgaas 
>>
>> I think we should eventually get rid of acpi_pci_register_driver() and
>> do this initialization directly from acpi_pci_root_add().  But
>> removing the module option here is a good first step.
>>
>> Rafael, do you want to apply this (and [6/8]) via your tree?  If not,
>> I can take it.
>
> Both applied to bleeding-edge.  I've added your ACK to the [6/8] too, hope
> that's OK.

Yep, that's fine, thanks!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/7] PCI: don't touch enable_cnt in pci_device_shutdown()

2013-02-04 Thread Bjorn Helgaas

On Mon, Feb 4, 2013 at 3:20 PM, Khalid Aziz  wrote:
> On Mon, 2013-02-04 at 15:55 +0400, Konstantin Khlebnikov wrote:
>> Matthew Garrett and Alan Cox said (see LKML link below) that clearing 
>> bus-master
>> for all PCI devices may lead to unpredictable consequences, some devices 
>> ignores
>> this bit and continues DMA, some of them hang after that or crash whole 
>> system.
>> Probably we should leave here only warning and disable bus-mastering for each
>> driver individually in ->shutdown() callback.
>
> Agreed that the right place for shutting down a PCI device properly and
> clearing its Bus Master bit, is the driver shutdown routine, if only all
> drivers supplied a shutdown routine. As it is today, there are too many
> drivers that do not provide a shutdown routine, ata_piix, Marvell SATA
> driver, ATI AGP driver just to name a few among a large number of them.
> Yet kexec is expected to work inspite of these drivers especially since
> kdump depends on it. So until all PCI drivers supply a shutdown routine,
> this is just a band-aid to disable interrupt and Bus Master bit in
> pci_device_shutdown(). Most drivers do seem to supply a suspend and
> resume function and it was discussed many years ago if it is feasible to
> use the suspend() routine for drivers to shut devices down cleanly.
> Maybe it is time to revisit that discussion.

This patch as posted doesn't do anything with IRQs.  It only clears
PCI_COMMAND_MASTER.

I'm open to considering something with IRQs, but I don't understand
exactly what we should do.  In your response to the previous version
(https://lkml.org/lkml/2013/1/28/720) you suggested this:

  pci_clear_master(pci_dev);
  pcibios_disable_device(pci_dev);

Did you figure out specifically why pcibios_disable_device() helps?
Using pcibios_disable_device() doesn't seem like the ideal solution
because on most architectures, it is an empty function with no obvious
connection to IRQs.  On x86 with ACPI, it cleans up some ACPI PCI IRQ
stuff, but as far as I can tell, it doesn't actually touch the PCI
device itself or even the IOAPIC to which it's connected, so I'm not
sure how this would help kexec.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/7] PCI: don't touch enable_cnt in pci_device_shutdown()

2013-02-05 Thread Bjorn Helgaas

On Tue, Feb 5, 2013 at 8:28 AM, Khalid Aziz  wrote:
> On Mon, 2013-02-04 at 16:13 -0700, Bjorn Helgaas wrote:
>> On Mon, Feb 4, 2013 at 3:20 PM, Khalid Aziz  wrote:
>> > On Mon, 2013-02-04 at 15:55 +0400, Konstantin Khlebnikov wrote:
>> >> Matthew Garrett and Alan Cox said (see LKML link below) that clearing 
>> >> bus-master
>> >> for all PCI devices may lead to unpredictable consequences, some devices 
>> >> ignores
>> >> this bit and continues DMA, some of them hang after that or crash whole 
>> >> system.
>> >> Probably we should leave here only warning and disable bus-mastering for 
>> >> each
>> >> driver individually in ->shutdown() callback.
>> >
>> > Agreed that the right place for shutting down a PCI device properly and
>> > clearing its Bus Master bit, is the driver shutdown routine, if only all
>> > drivers supplied a shutdown routine. As it is today, there are too many
>> > drivers that do not provide a shutdown routine, ata_piix, Marvell SATA
>> > driver, ATI AGP driver just to name a few among a large number of them.
>> > Yet kexec is expected to work inspite of these drivers especially since
>> > kdump depends on it. So until all PCI drivers supply a shutdown routine,
>> > this is just a band-aid to disable interrupt and Bus Master bit in
>> > pci_device_shutdown(). Most drivers do seem to supply a suspend and
>> > resume function and it was discussed many years ago if it is feasible to
>> > use the suspend() routine for drivers to shut devices down cleanly.
>> > Maybe it is time to revisit that discussion.
>>
>> This patch as posted doesn't do anything with IRQs.  It only clears
>> PCI_COMMAND_MASTER.
>>
>> I'm open to considering something with IRQs, but I don't understand
>> exactly what we should do.  In your response to the previous version
>> (https://lkml.org/lkml/2013/1/28/720) you suggested this:
>>
>>   pci_clear_master(pci_dev);
>>   pcibios_disable_device(pci_dev);
>>
>> Did you figure out specifically why pcibios_disable_device() helps?
>> Using pcibios_disable_device() doesn't seem like the ideal solution
>> because on most architectures, it is an empty function with no obvious
>> connection to IRQs.  On x86 with ACPI, it cleans up some ACPI PCI IRQ
>> stuff, but as far as I can tell, it doesn't actually touch the PCI
>> device itself or even the IOAPIC to which it's connected, so I'm not
>> sure how this would help kexec.
>
> My reading of the code was that pcibios_disable_device() does clear the
> interrupt on x86 and ia64. I am not deeply familiar with the ACPI code
> and I might be interpreting it incorrectly, so please do correct me if I
> am reading it incorrectly. Here is the code sequence I see:
>
> pcibios_disable_device() ->
>pcibios_disable_irq() ->
>acpi_pci_irq_disable() ->
>acpi_pci_link_free_irq() ->
>   acpi_evaluate_object(link->device->handle, "_DIS", NULL,
> NULL);
>
> My understanding is the evaluation of ACPI _DIS method will disable the
> interrupt from the device. Does that sound reasonable?

I see the code you're looking at in acpi_pci_link_free_irq(), but we
only evaluate _DIS if link->refcnt == 0, and I don't think refcnt is
ever zero at that point.

refcnt starts out at zero in acpi_pci_link_add() (called when we find
PNP0C0F devices), and it's incremented in acpi_pci_link_allocate_irq()
(called in the pci_enable_device() path), but as far as I can tell,
it's never decremented, so I doubt that _DIS is ever evaluated.

If we did evaluate _DIS, it would act on an "interrupt link" device,
not on the PCI device itself.  I guess that could help, but only for
devices connected to such a link device.  For others, I guess we might
be able to accomplish something similar by updating local APIC and/or
IOAPIC config.  I don't think we do that today, at least not in the
pci_disable_device() path, but it might be something interesting to
explore.  There is also the INTx Disable bit, though it's obviously
only on new PCI devices.

> ... The two ways I can think of are to stop
> DMA by clearing Bus Master bit and turn off the interrupt, which have
> been shown to get kexec (and thus kdump) working on machines it didn't
> work on before.

I was just curious if you had actually verified that _DIS was being
evaluated and making a difference here, or if the Bus Master bit was
really the important part.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/22] PCI: Iterate pci host bridge instead of pci root bus

2013-02-05 Thread Bjorn Helgaas

On Tue, Feb 5, 2013 at 4:55 PM, Yinghai Lu  wrote:
> On Sat, Feb 2, 2013 at 9:05 PM, Yinghai Lu  wrote:
>
>>> Your solution is to introduce for_each_pci_host_bridge() as an
>>> iterator through the known host bridges.  There are two scenarios
>>> where we use something like this:
>>>
>>> 1) We want to perform an operation on every known host bridge.
>>>
>>> 2) We want to initialize something for every host bridge.
>>>
>>> In my opinion, the only instance of scenario 1) is bus_rescan_store(),
>>> where we want to rescan all PCI host bridges.
>>>
>>> In every other case, we're doing some kind of initialization of all
>>> the host bridges.  For these cases, for_each_pci_host_bridge() is the
>>> wrong solution because it doesn't work for hot-added bridges.  I think
>>> these cases should be changed to use pcibios_root_bridge_prepare() or
>>> something something else called in the host bridge add path.
>>
>> Yes, will check if those for_pci_host_bridge could be converted to
>> adding calling in pcibios_root_bridge_prepare one by one.
>
> I went through those patches again, here are findings:
> 1. run-time iteration includes at least 3 occurrence are addressed.
> a. in pci-sysfs::bus_rescan_store
> b. powerpc pci_64.c::sys_pciconfig_iobase system call
> c. probe.c::pci_create_root_bus/pci_find_bus
>
> 2. edac: i7core_edac.c::i7core_pci_lastbus()
> this one is not support host bridge hotplug, and in i7core_probe it will
> bail-out if probed before overall.
> To make edac with i7 to support host-bridge hotadd, will change the structure 
> of
> that i7core_probe.
> But I'm not worried about that yet, because Nehalem-EX does not support EDAC,
> and assume we will NOT have use case with new -EX intel cpu with
> hotplug support to use EDAC.
>
> 3. pcibios_resource_survey/pcibios_assign_unassigned_resource in __init path.
> it is with x86/frv/microblaze/mn10300.
> for x86/acpi, we have per root bus calling in pci_root.c::acpi_pci_root_add().
> other three does not support pci_root_bridge hotplug yet.
>
> 4. looks like all other changes are __init path, and those arch does not 
> support
> pci_root_bridge hotplug yet.
>
> So looks like this patch set is ok now.

Maybe.  I'd rather not introduce for_each_pci_host_bridge() at all, if
we can avoid it.  Every place it's used is a place we have to audit to
make sure it's safe.  I think your audit above is correct and
complete, but it relies on way too much architecture knowledge.  It's
better if we can deduce correctness without knowing which arches
support hotplug and which CPUs support EDAC.

As soon as for_each_pci_host_bridge() is in the tree, those uses can
be copied to even more places.  It's a macro, so it's usable by any
module, even out-of-tree ones that we'll never see and can't fix.  So
we won't really have a good way to deprecate and remove it.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/22] PCI: Iterate pci host bridge instead of pci root bus

2013-02-06 Thread Bjorn Helgaas

On Wed, Feb 6, 2013 at 1:53 AM, Mauro Carvalho Chehab
 wrote:
> Em Tue, 5 Feb 2013 16:47:10 -0800
> Yinghai Lu  escreveu:
>
>> On Tue, Feb 5, 2013 at 4:19 PM, Bjorn Helgaas  wrote:
>> >
>> > Maybe.  I'd rather not introduce for_each_pci_host_bridge() at all, if
>> > we can avoid it.  Every place it's used is a place we have to audit to
>> > make sure it's safe.  I think your audit above is correct and
>> > complete, but it relies on way too much architecture knowledge.  It's
>> > better if we can deduce correctness without knowing which arches
>> > support hotplug and which CPUs support EDAC.
>> >
>> > As soon as for_each_pci_host_bridge() is in the tree, those uses can
>> > be copied to even more places.  It's a macro, so it's usable by any
>> > module, even out-of-tree ones that we'll never see and can't fix.  So
>> > we won't really have a good way to deprecate and remove it.
>>
>> Now we only have two references in modules.
>>
>> drivers/edac/i7core_edac.c: for_each_pci_host_bridge(host_bridge) {
>> drivers/pci/hotplug/sgi_hotplug.c:  
>> for_each_pci_host_bridge(host_bridge) {
>>
>> for the sgi_hotplug.c, it should be same problem that have for acpiphp
>> and pciehp.
>> need to make it support pci host bridge hotplug anyway.
>>
>> for edac, we need to check Mauro about their plan.
>
> The i7core_pci_lastbus() code at i7core_edac is there to make it work
> with some Nehalem/Nehalem-EP machines that hide the memory controller's
> PCI ID by using an artificially low last bus.

I don't really understand how this helps.  An example would probably
make it clearer.

i7core_edac.c has some very creative use of PCI infrastructure.
Normally a driver has a pci_device_id table that identifies the
vendor/device IDs of the devices it cares about, and the driver's
.probe() method is called for every matching device.

But i7core_edac only has two entries in its id_table.  When we find a
device that matches one of those two entries, we call i7core_probe(),
which then gropes around for all the *other* devices related to that
original one.  This is a bit messy.

I'd like it a lot better if the device IDs in
pci_dev_descr_i7core_nehalem[], pci_dev_descr_lynnfield[], etc., were
just in the pci_device_id table directly.  Then i7core_probe() would
be called directly for every device you care about, and you could sort
them out there.  That should work without any need for
pci_get_device(), i7core_pci_lastbus(), etc.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/22] PCI: Iterate pci host bridge instead of pci root bus

2013-02-06 Thread Bjorn Helgaas

On Tue, Feb 5, 2013 at 5:47 PM, Yinghai Lu  wrote:
> On Tue, Feb 5, 2013 at 4:19 PM, Bjorn Helgaas  wrote:
>>
>> Maybe.  I'd rather not introduce for_each_pci_host_bridge() at all, if
>> we can avoid it.  Every place it's used is a place we have to audit to
>> make sure it's safe.  I think your audit above is correct and
>> complete, but it relies on way too much architecture knowledge.  It's
>> better if we can deduce correctness without knowing which arches
>> support hotplug and which CPUs support EDAC.
>>
>> As soon as for_each_pci_host_bridge() is in the tree, those uses can
>> be copied to even more places.  It's a macro, so it's usable by any
>> module, even out-of-tree ones that we'll never see and can't fix.  So
>> we won't really have a good way to deprecate and remove it.
>
> Now we only have two references in modules.
>
> drivers/edac/i7core_edac.c: for_each_pci_host_bridge(host_bridge) {
> drivers/pci/hotplug/sgi_hotplug.c:  for_each_pci_host_bridge(host_bridge) 
> {
>
> for the sgi_hotplug.c, it should be same problem that have for acpiphp
> and pciehp.
> need to make it support pci host bridge hotplug anyway.
>
> for edac, we need to check Mauro about their plan.
>
> After those two are addressed, we can drop that EXPORT_SYMBOL_GPL for
> pci_get_next_host_bridge.
>
> We do have pci_get_domain_bus_and_slot() as export symbol.
> So we export pci_get_next_host_bridge should be ok now.
> and it would be better than export root buses list.

I think you're missing the point.

Search the tree for uses of "for_each_pci_dev()."  Almost every
occurrence is a bug because that code doesn't work correctly for
hot-added devices.  That code gets run for devices that are present at
boot, but not for devices hot-added later.

You're proposing to add "for_each_pci_host_bridge()."  That will have
the exact same problem as for_each_pci_dev() already does.  Code that
uses it will work for host bridges present at boot, but not for
bridges hot-added later.

Why would we want to add an interface when every use of that interface
will be a design bug?  I think we should fix the existing users of
pci_root_buses by changing their designs so they will work correctly
with host bridge hotplug.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/22] PCI: Iterate pci host bridge instead of pci root bus

2013-02-06 Thread Bjorn Helgaas

On Wed, Feb 6, 2013 at 11:59 AM, Yinghai Lu  wrote:
> On Wed, Feb 6, 2013 at 9:54 AM, Bjorn Helgaas  wrote:
>> I think you're missing the point.
>>
>> Search the tree for uses of "for_each_pci_dev()."  Almost every
>> occurrence is a bug because that code doesn't work correctly for
>> hot-added devices.  That code gets run for devices that are present at
>> boot, but not for devices hot-added later.
>>
>> You're proposing to add "for_each_pci_host_bridge()."  That will have
>> the exact same problem as for_each_pci_dev() already does.  Code that
>> uses it will work for host bridges present at boot, but not for
>> bridges hot-added later.
>>
>> Why would we want to add an interface when every use of that interface
>> will be a design bug?  I think we should fix the existing users of
>> pci_root_buses by changing their designs so they will work correctly
>> with host bridge hotplug.
>
> I'm a little confused about what you want.
>
> In boot stage using for_each_pci_host_bridge or pci_root_buses is fine.

No, it's not.  Boot-time should work exactly the same way as hot-add
time, unless there's a reason it can't.  There might be corner cases
where we can't do that yet, e.g., the pcibios_resource_survey stuff,
but in general I don't think there's anything that *forces* us to
iterate through host bridges at boot-time.

> For those cases that it should support host-bridge by nature.
> there are two solutions:
> 1. use for_each_pci_host_bridge, and it is hotplug-safe.

I'm trying to tell you that for_each_pci_host_bridge() is NOT
hotplug-safe.  Your series makes it safer than it was, in the sense
that it probably fixes stray pointer references when a host bridge
hotplug happens while somebody's traversing pci_root_buses.  But the
whole point of for_each_pci_host_bridge() is to run some code for
every bridge we know about *right now*.  If a host bridge is added
right after the for_each_pci_host_bridge() loop exits, that code
doesn't get run.  In most cases, that's a bug.  The only exception I
know about is the /sys/.../rescan interface.

> and make sgi_hotplug to use acpi_pci_driver interface.
> and acpi_pci_root_add() will call .add in the acpi_pci_driver.
>
> 2. make them all to be built-in, and  those acpi_pci_driver should be 
> registered
> much early before acpi_pci_root_add is called.
> then we don't need to call for_each_host_bridge for it.
>
> So difference between them:
> 1. still keep the module support, and register acpi_pci_driver later.
> 2. built-in support only, and need to register acpi_pci_driver early.

acpi_pci_driver is going away, so I don't want to add any more uses of
it.  Obviously it's only relevant to x86 and ia64 anyway.

What I'd like is a change where for_each_pci_host_bridge() is used
only inside the PCI core and defined somewhere like drivers/pci/pci.h
so it's not even available outside.

The other uses should be changed so they use
pcibios_root_bridge_prepare() or something similar.  That way, it will
be obvious that the code supports hot-added bridges, and when it gets
copied to other places, we'll be copying a working design instead of a
broken one.

I don't want to have to audit these places and figure out "this is
safe because this arch doesn't support host bridge hotplug" or "this
is safe because this CPU doesn't support XYZ."  That's not a
maintainable solution.  The safety should be apparent from the code
itself, without requiring extra platform-specific knowledge.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/22] PCI: Iterate pci host bridge instead of pci root bus

2013-02-06 Thread Bjorn Helgaas

On Wed, Feb 6, 2013 at 3:05 PM, Rafael J. Wysocki  wrote:
> On Wednesday, February 06, 2013 01:53:50 PM Yinghai Lu wrote:
>> On Wed, Feb 6, 2013 at 1:43 PM, Rafael J. Wysocki  wrote:
>> > On Wednesday, February 06, 2013 01:28:27 PM Yinghai Lu wrote:
>> >> so you still did not answer you want 1 or 2 yet:
>> >>
>> >> for sgi_hotplug,
>> >>
>> >> 1. still keep the module support, and register acpi_pci_driver later.
>> >> 2. built-in support only, and need to register acpi_pci_driver early.
>> >
>> > Please work with the assumption that acpi_pci_driver is not going to be 
>> > there
>> > any more.
>> >
>>
>> I think I could change ioapic and iommu hotplug to weak add/remove because 
>> they
>> should be built-in by nature.
>>
>> but how about others like sgi_hotplug etc?

I think that could be handled with pcibios_root_bridge_prepare() or
something similar -- something that happens in the "add host bridge"
path.  I'm not saying we necessarily have all the right hooks in place
yet.  It's just that I'd rather figure out what the right hooks *are*
and add them if necessary, than just convert pci_root_buses to
for_each_pci_host_bridge().

> I'd really prefer to wait for the patchset removing acpi_pci_driver (from
> Myron) before proceeding with more changes in that area.
>
> Myron, do you have a prototype based on the current linux-next?

I think it's really the acpiphp/pciehp issue that's holding up the
removal of acpi_pci_driver.  I'm not sure if anybody's working on
that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 25/26] PCI: Disable mem in the ioapic removing path

2013-02-08 Thread Bjorn Helgaas

On Fri, Feb 8, 2013 at 12:28 PM, Yinghai Lu  wrote:
> For physical hot plug should be ok, but for remove/rescan path will
> need us to disable that.
>
> otherwise rescan mmio resource for pci ioapic device will not be
> sized and allocated, aka skiped.

When we scan other PCI devices, we can size memory BARs even if
PCI_COMMAND_MEMORY is already set.  So there must be something
different about IOAPICs?  Or maybe it's something different about
rescan vs. the initial scan?

> For ioapic_probe:pci_enable_device will not enable the device
> correctly, and will bail out early.

Exactly where and why do we bail out early?  The only early bail out I
see is where __pci_enable_device_flags() returns if "dev->enable_cnt >
1".  Is that what you mean?

> So we can just disable mmio for all removing case, and that will not hurt
> real hotplug path.

> Signed-off-by: 
> ---
>  drivers/pci/ioapic.c |   13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/drivers/pci/ioapic.c b/drivers/pci/ioapic.c
> index 3c6bbdd..8dacfd0 100644
> --- a/drivers/pci/ioapic.c
> +++ b/drivers/pci/ioapic.c
> @@ -88,6 +88,17 @@ exit_free:
> return -ENODEV;
>  }
>
> +static void pci_disable_device_mem(struct pci_dev *dev)
> +{
> +   u16 pci_command;
> +
> +   pci_read_config_word(dev, PCI_COMMAND, &pci_command);
> +   if (pci_command & PCI_COMMAND_MEMORY) {
> +   pci_command &= ~PCI_COMMAND_MEMORY;
> +   pci_write_config_word(dev, PCI_COMMAND, pci_command);
> +   }
> +}
> +
>  static void ioapic_remove(struct pci_dev *dev)
>  {
> struct ioapic *ioapic = pci_get_drvdata(dev);
> @@ -95,6 +106,8 @@ static void ioapic_remove(struct pci_dev *dev)
> acpi_unregister_ioapic(ioapic->handle, ioapic->gsi_base);
> pci_release_region(dev, 0);
> pci_disable_device(dev);
> +   /* need to disable it, otherwise remove/rescan will not work */
> +   pci_disable_device_mem(dev);
> kfree(ioapic);
>  }
>
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 26/26] PCI, x86, ACPI: Add ioapic hotplug support with acpi host bridge.

2013-02-08 Thread Bjorn Helgaas

On Fri, Feb 8, 2013 at 12:28 PM, Yinghai Lu  wrote:
> We need to have ioapic setup before normal pci drivers.
> otherwise other pci driver can not setup irq.
>
> Make ioapic built-in, so can call add/remove during host-bridge add/remove
> the same as the booting path.
>
> Also need to make it depends on X86_IO_APIC.

There are a lot of different things happening in this patch.  I'd
prefer if they were split out so we can see what's happening.  For
example, the Kconfig changes could be their own patch.  The removal of
the PCI device table is worthy of mention and its own patch (if
possible), since it's a pretty major change to how the driver binds to
devices.

> Signed-off-by: 
> ---
>  arch/x86/kernel/acpi/boot.c |   10 +-
>  drivers/acpi/pci_root.c |4 +
>  drivers/pci/Kconfig |3 +-
>  drivers/pci/ioapic.c|  216 
> ++-
>  include/linux/pci-acpi.h|8 ++
>  5 files changed, 172 insertions(+), 69 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 350879f..338163b 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -697,16 +697,18 @@ EXPORT_SYMBOL(acpi_unmap_lsapic);
>
>  int acpi_register_ioapic(acpi_handle handle, u64 phys_addr, u32 gsi_base)
>  {
> -   /* TBD */
> -   return -EINVAL;
> +   unsigned long long id = 0;
> +
> +   acpi_evaluate_integer(handle, "_UID", NULL, &id);
> +
> +   return  __mp_register_ioapic(id, phys_addr, gsi_base, true);
>  }
>
>  EXPORT_SYMBOL(acpi_register_ioapic);
>
>  int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base)
>  {
> -   /* TBD */
> -   return -EINVAL;
> +   return mp_unregister_ioapic(gsi_base);
>  }
>
>  EXPORT_SYMBOL(acpi_unregister_ioapic);
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 417487a..54d61ce 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -587,6 +587,8 @@ static int acpi_pci_root_add(struct acpi_device *device)
> pci_assign_unassigned_bus_resources(root->bus);
> }
>
> +   acpi_pci_ioapic_add(root);
> +
> mutex_lock(&acpi_pci_root_lock);
> list_for_each_entry(driver, &acpi_pci_drivers, node)
> if (driver->add)
> @@ -626,6 +628,8 @@ static int acpi_pci_root_remove(struct acpi_device 
> *device, int type)
> driver->remove(root);
> mutex_unlock(&acpi_pci_root_lock);
>
> +   acpi_pci_ioapic_remove(root);
> +
> device_set_run_wake(root->bus->bridge, false);
> pci_acpi_remove_bus_pm_notifier(device);
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 6d51aa6..720989f 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -110,10 +110,11 @@ config PCI_PASID
>   If unsure, say N.
>
>  config PCI_IOAPIC
> -   tristate "PCI IO-APIC hotplug support" if X86
> +   bool "PCI IO-APIC hotplug support" if X86
> depends on PCI
> depends on ACPI
> depends on HOTPLUG
> +   depends on X86_IO_APIC
> default !X86
>
>  config PCI_LABEL
> diff --git a/drivers/pci/ioapic.c b/drivers/pci/ioapic.c
> index 8dacfd0..12e56c7 100644
> --- a/drivers/pci/ioapic.c
> +++ b/drivers/pci/ioapic.c
> @@ -22,70 +22,128 @@
>  #include 
>  #include 
>
> -struct ioapic {
> -   acpi_handle handle;
> +struct acpi_pci_ioapic {
> +   acpi_handle root_handle;
> u32 gsi_base;
> +   struct pci_dev *pdev;
> +   struct list_head list;
>  };
>
> -static int ioapic_probe(struct pci_dev *dev, const struct pci_device_id *ent)
> +static LIST_HEAD(ioapic_list);
> +static DEFINE_MUTEX(ioapic_list_lock);
> +
> +static acpi_status setup_res(struct acpi_resource *acpi_res, void *data)
> +{
> +   struct resource *res;
> +   struct acpi_resource_address64 addr;
> +   acpi_status status;
> +   unsigned long flags;
> +   u64 start, end;
> +
> +   status = acpi_resource_to_address64(acpi_res, &addr);
> +   if (!ACPI_SUCCESS(status))
> +   return AE_OK;
> +
> +   if (addr.resource_type == ACPI_MEMORY_RANGE) {
> +   if (addr.info.mem.caching == ACPI_PREFETCHABLE_MEMORY)
> +   return AE_OK;
> +   flags = IORESOURCE_MEM;
> +   } else
> +   return AE_OK;
> +
> +   start = addr.minimum + addr.translation_offset;
> +   end = addr.maximum + addr.translation_offset;
> +
> +   res = data;
> +   res->flags = flags;
> +   res->start = start;
> +   res->end = end;
> +
> +   return AE_OK;
> +}
> +
> +static void handle_ioapic_add(acpi_handle handle, struct pci_dev **pdev,
> +u32 *pgsi_base)
>  {
> -   acpi_handle handle;
> acpi_status status;
> unsigned long long gsb;
> -   struct ioapic *ioapic;
> +   struct pci_dev *dev;
> +   u32 gsi_base;
> int ret;
>

Re: [PATCH v2 25/26] PCI: Disable mem in the ioapic removing path

2013-02-08 Thread Bjorn Helgaas

On Fri, Feb 8, 2013 at 3:33 PM, Yinghai Lu  wrote:
> On Fri, Feb 8, 2013 at 1:14 PM, Bjorn Helgaas  wrote:
>> On Fri, Feb 8, 2013 at 12:28 PM, Yinghai Lu  wrote:
>>> For physical hot plug should be ok, but for remove/rescan path will
>>> need us to disable that.
>>>
>>> otherwise rescan mmio resource for pci ioapic device will not be
>>> sized and allocated, aka skiped.
>>
>> When we scan other PCI devices, we can size memory BARs even if
>> PCI_COMMAND_MEMORY is already set.  So there must be something
>> different about IOAPICs?  Or maybe it's something different about
>> rescan vs. the initial scan?
>
> it is in drivers/pci/setup-bus.c::__dev_sort_resources()
> it will skip the reallocation for ioapic controller.
>
> ...
> /* Don't touch ioapic devices already enabled by firmware */
> if (class == PCI_CLASS_SYSTEM_PIC) {
> u16 command;
> pci_read_config_word(dev, PCI_COMMAND, &command);
> if (command & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY))
> return;
> }
>
>>
>>> For ioapic_probe:pci_enable_device will not enable the device
>>> correctly, and will bail out early.
>>
>> Exactly where and why do we bail out early?  The only early bail out I
>> see is where __pci_enable_device_flags() returns if "dev->enable_cnt >
>> 1".  Is that what you mean?
>
> pci_enable_device==>pci_enable_device_flags
> ==>do_pci_enable_device==>pcibios_enable_device==>pci_enable_resources
> ...
> if (!r->parent) {
> dev_err(&dev->dev, "device not available "
> "(can't reserve %pR)\n", r);
> return -EINVAL;
> }
> ...
>
> only reassign one will have get it's parent.

Hmmm, OK.  So basically this patch is a hack to work around previous
hacks elsewhere.

We are like flies in a spider's web, bound tighter and tighter by
every loop of silk.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the final tree (acpi tree related)

2013-02-11 Thread Bjorn Helgaas

[+cc Rafael, since you mentioned the ACPI tree]

On Mon, Feb 11, 2013 at 12:34 AM, Stephen Rothwell  
wrote:
> Hi all,
>
> After merging the final tree, today's linux-next build (sparc64 defconfig)
> failed like this:
>
> arch/sparc/include/asm/processor.h: Assembler messages:
> arch/sparc/include/asm/processor.h:10: Error: Unknown opcode: `extern'
>
> Caused by commit 3a242f58a5f4 ("sparc idle: rename pm_idle to
> sparc_idle") from the acpi tree.
>
> I have applied this patch for today:
>
> From: Stephen Rothwell 
> Date: Mon, 11 Feb 2013 18:30:19 +1100
> Subject: [PATCH] sparc idle: protect variable declarations against the 
> assembler
>
> Signed-off-by: Stephen Rothwell 
> ---
>  arch/sparc/include/asm/processor.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/sparc/include/asm/processor.h 
> b/arch/sparc/include/asm/processor.h
> index 34baa35..622cfa5 100644
> --- a/arch/sparc/include/asm/processor.h
> +++ b/arch/sparc/include/asm/processor.h
> @@ -7,6 +7,8 @@
>  #endif
>
>  #define nop()  __asm__ __volatile__ ("nop")
> +#ifndef __ASSEMBLY__
>  extern void (*sparc_idle)(void);
> +#endif
>
>  #endif
> --
> 1.8.1
>
> --
> Cheers,
> Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.8-rc7] PCI hotplug wakeup oops

2013-02-11 Thread Bjorn Helgaas

[+cc Rafael]

On Mon, Feb 11, 2013 at 10:08 AM, Daniel J Blueman  wrote:
> On 11 February 2013 21:03, Daniel J Blueman  wrote:
>> With 3.8-rc7, when unplugging the Thunderbolt ethernet adapter (bus 0a
>> [1]) on a Macbook Pro 10,1, we see the PCIe port correctly released:
>>
>> pciehp :06:03.0:pcie24: Card not present on Slot(3)
>> tg3 :0a:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not
>> clear MAC_TX_MODE=
>> tg3 :0a:00.0 eth0: No firmware running
>> tg3 :0a:00.0 eth0: Link is down
>> [sched_delayed] sched: RT throttling activated
>> pcieport :00:01.1: System wakeup enabled by ACPI
>> pciehp :09:00.0:pcie24: unloading service driver pciehp
>> pci_bus :0a: busn_res: [bus 0a] is released
>> pci_bus :09: busn_res: [bus 09-0a] is released
>>
>> After some activity later (eg I can reproduce this by switching to a
>> text console and back), often we'll see an oops:
>>
>> Unable to handle kernel paging request at 1070
>> pci_pme_list_scan+0x3d/0xe0
>> Call Trace:
>> process_one_work+0x193
>> ? process_one_work+0x131
>> ? pci_pme_wakeup+0x60
>> worker_thread+0x15d
>>
>> (gdb) list *(pci_pme_list_scan+0x3d)
>> 0x8123f6dd is in pci_pme_list_scan (drivers/pci/pci.c:1556).
>> 1551/*
>> 1552 * If bridge is in low power state, 
>> the
>> 1553 * configuration space of 
>> subordinate devices
>> 1554 * may be not accessible
>> 1555 */
>> 1556if (bridge && bridge->current_state 
>> != PCI_D0)
>> 1557continue;
>> 1558pci_pme_wakeup(pme_dev->dev, NULL);
>> 1559} else {
>> 1560list_del(&pme_dev->list);
>>
>> Since a panic in vsnprintf happens after the oops (hence I can't catch
>> it with EFI pstore), it is almost certainly significant heap
>> corruption; this would explain why pme_dev became null (the load has
>> been ordered ahead).
>>
>> I'll see what I can find out with memory poisoning and list debugging.
>
> Enabling a bunch of related debugging, we see pme_dev is non-null and:
>
> BUG: Unable to handle NULL pointer dereference at
> pci_bus_read_config_word+0x6c
> PGD 26314c067 PUD 2633f9067 PMD 0
> Oops:  [#1] SMP
> pci_check_pme_status+0x4f
> pci_pme_wakeup+0x21
> pci_pme_list_scan+0xd5
> process_one_work+0x1ca
> ? process_one_work+0x160
> ? pci_pme_wakeup+0x60
> worker_thread+0x14e
>
> Anyway, it looks like the device being unplugged wasn't removed from
> pci_pme_list as pci_pme_active(dev, false) wasn't called.
>
> From a quick review, I wasn't able to find the right place in the
> call-chain which I only see releases the child busses and PCIe port
> drivers. Anyone?

It looks like drivers *add* devices to pci_pme_list when they use
pci_enable_wake() or pci_wake_from_d3().  But many drivers never
remove their devices, and I don't see any place where the core does it
either.  My guess is we need to remove it in pci_stop_dev() (we
already do pcie_aspm_exit_link_state() there) or somewhere similar.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bisected] [-next-20130204] usb/hcd: irq 18: nobody cared

2013-02-11 Thread Bjorn Helgaas

[+cc linux-acpi, linux-pci]

On Mon, Feb 11, 2013 at 01:06:27PM -0800, Yinghai Lu wrote:
> On Mon, Feb 11, 2013 at 11:57 AM, Yinghai Lu  wrote:
> >>
> >> Bjorn, Rafael,
> >>
> >> acpi_pci_irq_add_prt need to be called after pci bridge get scanned,
> >> so we can not call it from pci_acpi_setup, after we move dev_register
> >> for pci_dev early.
> >>
> >> The attached debug patch move down that calling into
> >> pci_bus_add_devices and that will make prt works again.
> >>
> >> Can acpi provide another hook after bridge get scanned?
> 
> Rafael, Bjorn,
> 
> Can you check attached patch?

[Yinghai's patch included below for ease of review]

> Subject: [PATCH] ACPI, PCI: add acpi_platform_notify_scan()
> 
> Peter Hurley found irq nobody cared, and dmesg has
> 
> [8.983246] pci :00:1e.0: can't derive routing for PCI INT A
> [8.983600] snd_ctxfi :09:02.0: PCI INT A: no GSI - using ISA IRQ 5
> 
> bisect to
> | commit 4f535093cf8f6da8cfda7c36c2c1ecd2e9586ee4
> | PCI: Put pci_dev in device tree as early as possible
> 
> It turns out we need to call acpi_pci_irq_add_prt() after the pci bridges
> are scanned.
> 
> Add acpi_platform_notify_scan() to call acpi_pci_irq_add_prt later.
> 
> Reported-and-tested-by: Peter Hurley 
> Signed-off-by: Yinghai Lu 
> 
> ---
>  drivers/acpi/glue.c |   12 
>  drivers/base/core.c |1 +
>  drivers/pci/bus.c   |4 
>  drivers/pci/pci-acpi.c  |   27 +--
>  include/acpi/acpi_bus.h |1 +
>  include/linux/device.h  |3 +--
>  6 files changed, 36 insertions(+), 12 deletions(-)
> 
> Index: linux-2.6/drivers/acpi/glue.c
> ===
> --- linux-2.6.orig/drivers/acpi/glue.c
> +++ linux-2.6/drivers/acpi/glue.c
> @@ -312,6 +312,17 @@ static int acpi_platform_notify(struct d
>   return ret;
>  }
>  
> +static int acpi_platform_notify_scan(struct device *dev)
> +{
> + struct acpi_bus_type *type;
> +
> + type = acpi_get_bus_type(dev->bus);
> + if (type && type->setup_scan)
> + type->setup_scan(dev);
> +
> + return 0;
> +}
> +
>  static int acpi_platform_notify_remove(struct device *dev)
>  {
>   struct acpi_bus_type *type;
> @@ -331,6 +342,7 @@ int __init init_acpi_device_notify(void)
>   return 0;
>   }
>   platform_notify = acpi_platform_notify;
> + platform_notify_scan = acpi_platform_notify_scan;
>   platform_notify_remove = acpi_platform_notify_remove;
>   return 0;
>  }
> Index: linux-2.6/drivers/base/core.c
> ===
> --- linux-2.6.orig/drivers/base/core.c
> +++ linux-2.6/drivers/base/core.c
> @@ -44,6 +44,7 @@ early_param("sysfs.deprecated", sysfs_de
>  #endif
>  
>  int (*platform_notify)(struct device *dev) = NULL;
> +int (*platform_notify_scan)(struct device *dev) = NULL;

I don't like the idea of adding another global function pointer just
for this.  That seems like overkill for a small, local, problem.

>  int (*platform_notify_remove)(struct device *dev) = NULL;
>  static struct kobject *dev_kobj;
>  struct kobject *sysfs_dev_char_kobj;
> Index: linux-2.6/drivers/pci/bus.c
> ===
> --- linux-2.6.orig/drivers/pci/bus.c
> +++ linux-2.6/drivers/pci/bus.c
> @@ -170,6 +170,10 @@ int pci_bus_add_device(struct pci_dev *d
>  {
>   int retval;
>  
> + /* need to be called after bridge is scanned */
> + if (platform_notify_scan)
> + platform_notify_scan(&dev->dev);
> +
>   /*
>* Can not put in pci_device_add yet because resources
>* are not assigned yet for some devices.
> Index: linux-2.6/drivers/pci/pci-acpi.c
> ===
> --- linux-2.6.orig/drivers/pci/pci-acpi.c
> +++ linux-2.6/drivers/pci/pci-acpi.c
> @@ -307,6 +307,22 @@ static void pci_acpi_setup(struct device
>   struct pci_dev *pci_dev = to_pci_dev(dev);
>   acpi_handle handle = ACPI_HANDLE(dev);
>   struct acpi_device *adev;
> +
> + if (acpi_bus_get_device(handle, &adev) || !adev->wakeup.flags.valid)
> + return;
> +
> + device_set_wakeup_capable(dev, true);
> + acpi_pci_sleep_wake(pci_dev, false);
> +
> + pci_acpi_add_pm_notifier(adev, pci_dev);
> + if (adev->wakeup.flags.run_wake)
> + device_set_run_wake(dev, true);
> +}
> +
> +static void pci_acpi_setup_scan(struct device *dev)
> +{
> + struct pci_dev *pci_dev = to_pci_dev(dev);
> + acpi_handle handle = ACPI_HANDLE(dev);
>   acpi_status status;
>   acpi_handle dummy;
>  
> @@ -326,16 +342,6 @@ static void pci_acpi_setup(struct device
>   pci_dev->subordinate->number : pci_dev->bus->number;
>   acpi_pci_irq_add_prt(handle, pci_domain_nr(pci_dev->bus), bus);

The _PRT describes motherboard interrupt wiring, which has nothing to
do with PCI bus numbers.  O

Re: [PATCH v2 0/7] pci/e1000e: return runtime-pm back to work

2013-02-11 Thread Bjorn Helgaas

On Mon, Feb 4, 2013 at 1:23 PM, Rafael J. Wysocki  wrote:
> On Monday, February 04, 2013 03:55:47 PM Konstantin Khlebnikov wrote:
>> This patchset contains some fixes for e1000e diver (broken since v2.6.35)
>> and some related fixes and useful debug for PCI code.
>>
>> All together this fixes my regression report for v3.8-rc1:
>> https://lkml.org/lkml/2013/1/1/25
>>
>> patchset was seriously reworked since v1:
>> http://lkml.org/lkml/2013/1/18/147
>>
>> ---
>>
>> Konstantin Khlebnikov (6):
>>   e1000e: fix pci-device enable-counter balance
>>   PCI: don't touch enable_cnt in pci_device_shutdown()
>>   PCI: catch enable-counter underflows
>>   e1000e: fix runtime power management transitions
>>   PCI/PM: warn about incomplete actions in ->runtime_suspend() callback
>>   e1000e: fix accessing to suspended device
>>
>> Rafael J. Wysocki (1):
>>   PCI/PM: clear state_saved during suspend
>>
>>
>>  drivers/net/ethernet/intel/e1000e/ethtool.c |   13 
>>  drivers/net/ethernet/intel/e1000e/netdev.c  |   82 
>> +++
>>  drivers/pci/pci-driver.c|   21 +--
>>  drivers/pci/pci.c   |3 +
>>  4 files changed, 53 insertions(+), 66 deletions(-)
>
> Acked-by: Rafael J. Wysocki 
>
> for all patches except for [6/7].

I added Rafael's acks and applied patches
  [2/7] PCI: don't touch enable_cnt in pci_device_shutdown()
  [3/7] PCI: catch enable-counter underflows
to my pci/konstantin-runtime-pm branch for v3.9.

I assume the others will go through the e1000e maintainer.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/7] pci/e1000e: return runtime-pm back to work

2013-02-11 Thread Bjorn Helgaas

On Mon, Feb 11, 2013 at 5:34 PM, Bjorn Helgaas  wrote:
> On Mon, Feb 4, 2013 at 1:23 PM, Rafael J. Wysocki  wrote:
>> On Monday, February 04, 2013 03:55:47 PM Konstantin Khlebnikov wrote:
>>> This patchset contains some fixes for e1000e diver (broken since v2.6.35)
>>> and some related fixes and useful debug for PCI code.
>>>
>>> All together this fixes my regression report for v3.8-rc1:
>>> https://lkml.org/lkml/2013/1/1/25
>>>
>>> patchset was seriously reworked since v1:
>>> http://lkml.org/lkml/2013/1/18/147
>>>
>>> ---
>>>
>>> Konstantin Khlebnikov (6):
>>>   e1000e: fix pci-device enable-counter balance
>>>   PCI: don't touch enable_cnt in pci_device_shutdown()
>>>   PCI: catch enable-counter underflows
>>>   e1000e: fix runtime power management transitions
>>>   PCI/PM: warn about incomplete actions in ->runtime_suspend() callback
>>>   e1000e: fix accessing to suspended device
>>>
>>> Rafael J. Wysocki (1):
>>>   PCI/PM: clear state_saved during suspend
>>>
>>>
>>>  drivers/net/ethernet/intel/e1000e/ethtool.c |   13 
>>>  drivers/net/ethernet/intel/e1000e/netdev.c  |   82 
>>> +++
>>>  drivers/pci/pci-driver.c|   21 +--
>>>  drivers/pci/pci.c   |3 +
>>>  4 files changed, 53 insertions(+), 66 deletions(-)
>>
>> Acked-by: Rafael J. Wysocki 
>>
>> for all patches except for [6/7].
>
> I added Rafael's acks and applied patches
>   [2/7] PCI: don't touch enable_cnt in pci_device_shutdown()
>   [3/7] PCI: catch enable-counter underflows
> to my pci/konstantin-runtime-pm branch for v3.9.

Oops, I missed
  [4/7] PCI/PM: Clear state_saved during suspend

I applied that one, too.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 12 (drm_pci.c)

2013-02-12 Thread Bjorn Helgaas

On Tue, Feb 12, 2013 at 08:06:00AM -0800, Randy Dunlap wrote:
> On 02/11/13 21:09, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20130211:
> > 
> 
> 
> when CONFIG_PCI is not enabled (on x86_64):
> 
>   CC [M]  drivers/gpu/drm/drm_pci.o
> drivers/gpu/drm/drm_pci.c: In function 'drm_pcie_get_speed_cap_mask':
> drivers/gpu/drm/drm_pci.c:485:2: error: implicit declaration of function 
> 'pcie_capability_read_dword' [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> make[4]: *** [drivers/gpu/drm/drm_pci.o] Error 1

This one is my fault.  I sent the following patch to Dave to fix it up.


commit ed0708e69f71fab656afc1c891f3c54c9b105664
Author: Bjorn Helgaas 
Date:   Fri Feb 8 15:18:35 2013 -0700

drm/pci: define drm_pcie_get_speed_cap_mask() only when CONFIG_PCI=y

Move drm_pcie_get_speed_cap_mask() under #ifdef CONFIG_PCI because it
it used only for PCI devices (evergreen, r600, r770), and it uses
PCI interfaces that only exist when CONFIG_PCI=y.

Previously, we tried to compile drm_pcie_get_speed_cap_mask() even when
CONFIG_PCI=n, which fails.

Tested-by: Fengguang Wu 
Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/gpu/drm/drm_pci.c b/drivers/gpu/drm/drm_pci.c
index 754bc96..2b818c7 100644
--- a/drivers/gpu/drm/drm_pci.c
+++ b/drivers/gpu/drm/drm_pci.c
@@ -439,33 +439,6 @@ int drm_pci_init(struct drm_driver *driver, struct 
pci_driver *pdriver)
return 0;
 }
 
-#else
-
-int drm_pci_init(struct drm_driver *driver, struct pci_driver *pdriver)
-{
-   return -1;
-}
-
-#endif
-
-EXPORT_SYMBOL(drm_pci_init);
-
-/*@}*/
-void drm_pci_exit(struct drm_driver *driver, struct pci_driver *pdriver)
-{
-   struct drm_device *dev, *tmp;
-   DRM_DEBUG("\n");
-
-   if (driver->driver_features & DRIVER_MODESET) {
-   pci_unregister_driver(pdriver);
-   } else {
-   list_for_each_entry_safe(dev, tmp, &driver->device_list, 
driver_item)
-   drm_put_dev(dev);
-   }
-   DRM_INFO("Module unloaded\n");
-}
-EXPORT_SYMBOL(drm_pci_exit);
-
 int drm_pcie_get_speed_cap_mask(struct drm_device *dev, u32 *mask)
 {
struct pci_dev *root;
@@ -514,3 +487,30 @@ int drm_pcie_get_speed_cap_mask(struct drm_device *dev, 
u32 *mask)
return 0;
 }
 EXPORT_SYMBOL(drm_pcie_get_speed_cap_mask);
+
+#else
+
+int drm_pci_init(struct drm_driver *driver, struct pci_driver *pdriver)
+{
+   return -1;
+}
+
+#endif
+
+EXPORT_SYMBOL(drm_pci_init);
+
+/*@}*/
+void drm_pci_exit(struct drm_driver *driver, struct pci_driver *pdriver)
+{
+   struct drm_device *dev, *tmp;
+   DRM_DEBUG("\n");
+
+   if (driver->driver_features & DRIVER_MODESET) {
+   pci_unregister_driver(pdriver);
+   } else {
+   list_for_each_entry_safe(dev, tmp, &driver->device_list, 
driver_item)
+   drm_put_dev(dev);
+   }
+   DRM_INFO("Module unloaded\n");
+}
+EXPORT_SYMBOL(drm_pci_exit);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCI: Remove not needed check in disable aspm link

2013-04-02 Thread Bjorn Helgaas

On Mon, Apr 1, 2013 at 6:03 PM, Yinghai Lu  wrote:
> On Mon, Apr 1, 2013 at 4:52 PM, Bjorn Helgaas  wrote:
>> On Fri, Mar 29, 2013 at 11:04:48AM -0700, Yinghai Lu wrote:
>>> attatched -v3 again
>>>
>>> > Please check attached -v3.
>>
>> It's getting late in the v3.9 cycle already, and while your v3 patch
>> probably fixes Roman's problem, I can't convince myself that it is
>> safe in general.
>>
>> I think the safest thing to do at this point is to revert 8c33f51df
>> ("PCI/ACPI: Request _OSC control before scanning PCI root bus") with a
>> patch like the one below.
>
> Agreed.
>
>>
>> That does mean the booting path and hotplug paths will be different (we set
>> aspm_disabled after boot but before hotplug), but it was that way for a
>> long time before 8c33f51df.  I think it's more important to fix this recent
>> ath5k regression than to fix a long-standing hotplug bug that nobody ever
>> complained about.
>>
>> Obviously, I think we should fix the hotplug bug and clean up the ASPM
>> mess, too.  But we need to do that when we have more time to do it right
>> and test it.
>
> Sure.
>
>>
>> Bjorn
>>
>>
>> commit 96e5d01cd536458435ef0678d9fa3dc542afb41f
>> Author: Bjorn Helgaas 
>> Date:   Mon Apr 1 15:47:39 2013 -0600
>>
>> Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>>
>> This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6.
>>
>> Conflicts:
>> drivers/acpi/pci_root.c
>>
>> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=55211
>> Signed-off-by: Bjorn Helgaas 
>
> Acked-by: Yinghai Lu 
>
> stable need this reverting too.

I updated the changelog and added this to my for-linus branch, headed for v3.9.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] pci: Add PCI ROM helper for platform-provided ROM images

2013-04-02 Thread Bjorn Helgaas

On Mon, Apr 1, 2013 at 11:47 AM, Matthew Garrett
 wrote:
> On Mon, 2013-04-01 at 11:39 -0600, Bjorn Helgaas wrote:
>
>> Chris still has problems (see
>> https://bugzilla.redhat.com/show_bug.cgi?id=927451), but I don't know
>> whether they are related to these patches or something else.
>
> I think they're unrelated. The log he posts using this patch gives the
> correct output - the ROM image comes from the platform method rather
> than from PCI. I think Ben probably needs to look at that.

OK, I added these three patches to my for-linus branch, headed for v3.9.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH bugfix 3.9] PCI: Don't try to disable Bus Master on disconnected PCI devices

2013-04-03 Thread Bjorn Helgaas

[+cc linux-pci]

On Mon, Apr 1, 2013 at 2:00 AM, Konstantin Khlebnikov
 wrote:
> BUMP. This is degradation from 3.8, so this patch must be in 3.9.
>
> I still don't like this forced clearing bus-master bit. But this hack
> definitely fixes problems in kexec, so there is reason to keep it here.

Applied to for-linus for v3.9, thanks!

> Konstantin Khlebnikov wrote:
>>
>> This is fix for commit 7897e6022761ace7377f0f784fca059da55f5d71 from
>> v3.9-rc1
>> ("PCI: Disable Bus Master unconditionally in pci_device_shutdown()")
>> in turn that was fix for b566a22c23327f18ce941ffad0ca907e50a53d41 from
>> v3.5-rc1
>> ("PCI: disable Bus Master on PCI device shutdown")
>>
>> Unfortunately fixing one bug uncovers another:
>> ->shutdown() callback might switch device to deep sleep state.
>> PCI config space no longer available after that.
>>
>> Link: https://lkml.org/lkml/2013/3/12/529
>> Signed-off-by: Konstantin Khlebnikov
>> Reported-and-Tested-by: Vivek Goyal
>> Cc: Bjorn Helgaas
>> Cc: Rafael J. Wysocki
>> ---
>>   drivers/pci/pci-driver.c |5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 1fa1e48..79277fb 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -390,9 +390,10 @@ static void pci_device_shutdown(struct device *dev)
>>
>> /*
>>  * Turn off Bus Master bit on the device to tell it to not
>> -* continue to do DMA
>> +* continue to do DMA. Don't touch devices in D3cold or unknown
>> states.
>>  */
>> -   pci_clear_master(pci_dev);
>> +   if (pci_dev->current_state <= PCI_D3hot)
>> +   pci_clear_master(pci_dev);
>>   }
>>
>>   #ifdef CONFIG_PM
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] PCI / PM: Disable runtime PM of PCIe ports

2013-04-03 Thread Bjorn Helgaas

On Sat, Mar 30, 2013 at 4:38 PM, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> The runtime PM of PCIe ports turns out to be quite fragile, as in
> some cases things work while in some other cases they don't and we
> don't seem to have a good way to determine whether or not they are
> going to work in advance.
>
> For this reason, avoid enabling runtime PM for PCIe ports by
> keeping their runtime PM reference counters always above 0 for the
> time being.
>
> Signed-off-by: Rafael J. Wysocki 
> ---
>
> This version also removes the no longer necessary (and empty anyway)
> port_runtime_pm_black_list[] table.

I applied this to for-linus for v3.9, and added a stable tag for v3.6+.  Thanks!

> ---
>  drivers/pci/pcie/portdrv_pci.c |   13 -
>  1 file changed, 13 deletions(-)
>
> Index: linux-pm/drivers/pci/pcie/portdrv_pci.c
> ===
> --- linux-pm.orig/drivers/pci/pcie/portdrv_pci.c
> +++ linux-pm/drivers/pci/pcie/portdrv_pci.c
> @@ -185,14 +185,6 @@ static const struct dev_pm_ops pcie_port
>  #endif /* !PM */
>
>  /*
> - * PCIe port runtime suspend is broken for some chipsets, so use a
> - * black list to disable runtime PM for these chipsets.
> - */
> -static const struct pci_device_id port_runtime_pm_black_list[] = {
> -   { /* end: all zeroes */ }
> -};
> -
> -/*
>   * pcie_portdrv_probe - Probe PCI-Express port devices
>   * @dev: PCI-Express port device being probed
>   *
> @@ -225,16 +217,11 @@ static int pcie_portdrv_probe(struct pci
>  * it by default.
>  */
> dev->d3cold_allowed = false;
> -   if (!pci_match_id(port_runtime_pm_black_list, dev))
> -   pm_runtime_put_noidle(&dev->dev);
> -
> return 0;
>  }
>
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
> -   if (!pci_match_id(port_runtime_pm_black_list, dev))
> -   pm_runtime_get_noresume(&dev->dev);
> pcie_port_device_remove(dev);
> pci_disable_device(dev);
>  }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Update][PATCH] PCI / ACPI: Always resume devices on ACPI wakeup notifications

2013-04-03 Thread Bjorn Helgaas

On Thu, Mar 28, 2013 at 3:07 PM, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
> Subject: PCI / ACPI: Always resume devices on ACPI wakeup notifications
>
> It turns out that the _Lxx control methods provided by some BIOSes
> clear the PME Status bit of PCI devices they handle, which means that
> pci_acpi_wake_dev() cannot really use that bit to check whether or
> not the device has signalled wakeup.
>
> One symptom of the problem is, for example, that when an affected PCI
> USB controller is runtime-suspended, then plugging in a new USB device
> into one of the controller's ports will not wake up the controller,
> which should happen.
>
> For this reason, make pci_acpi_wake_dev() always attempt to resume
> the device it is called for regardless of the device's PME Status bit
> value (that bit still has to be cleared if set at this point,
> though).
>
> Reported-by: Sarah Sharp 
> Signed-off-by: Rafael J. Wysocki 
> Acked-by: Matthew Garrett 
> Cc: 3.7+ 
> ---
>
> The changelog in this version is slightly better than in the previous one, 
> IMHO.

I applied this to for-linus for v3.9.  Thanks!

> ---
>  drivers/pci/pci-acpi.c |   15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> Index: linux-pm/drivers/pci/pci-acpi.c
> ===
> --- linux-pm.orig/drivers/pci/pci-acpi.c
> +++ linux-pm/drivers/pci/pci-acpi.c
> @@ -53,14 +53,15 @@ static void pci_acpi_wake_dev(acpi_handl
> return;
> }
>
> -   if (!pci_dev->pm_cap || !pci_dev->pme_support
> -|| pci_check_pme_status(pci_dev)) {
> -   if (pci_dev->pme_poll)
> -   pci_dev->pme_poll = false;
> +   /* Clear PME Status if set. */
> +   if (pci_dev->pme_support)
> +   pci_check_pme_status(pci_dev);
>
> -   pci_wakeup_event(pci_dev);
> -   pm_runtime_resume(&pci_dev->dev);
> -   }
> +   if (pci_dev->pme_poll)
> +   pci_dev->pme_poll = false;
> +
> +   pci_wakeup_event(pci_dev);
> +   pm_runtime_resume(&pci_dev->dev);
>
> if (pci_dev->subordinate)
> pci_pme_wakeup_bus(pci_dev->subordinate);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCI, ACPI: Don't query OSC support with all possible controls

2013-04-03 Thread Bjorn Helgaas

[+cc Bob for spec typo question]

On Wed, Mar 27, 2013 at 10:28 PM, Yinghai Lu  wrote:
> Found problem on system that firmware that could handle pci aer.
> Firmware get error reporting after pci injecting error, before os boots.
> But after os boots, firmware can not get report anymore, even pci=noaer
> is passed.
>
> Root cause: BIOS _OSC has problem with query bit checking.
> It turns out that BIOS vendor is copying example code from ACPI Spec.
> In ACPI Spec 5.0, page 290:
>
> If (Not(And(CDW1,1))) // Query flag clear?
> {   // Disable GPEs for features granted native control.
> If (And(CTRL,0x01)) // Hot plug control granted?
> {
> Store(0,HPCE) // clear the hot plug SCI enable bit
> Store(1,HPCS) // clear the hot plug SCI status bit
> }
> ...
> }
>
> When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffe.
> So it will get into code path that should be for control set only.
> BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"

Isn't this just a typo in the spec?  Shouldn't it be using "LNot"
instead of "Not"?

 If (LNot(And(CDW1,1))) // Query flag clear?

Of course, that doesn't change the need for your Linux change, though
a comment about the hazard might be nice for future readers.

> Current kernel code is using _OSC query to notify firmware about support
> from OS and then use _OSC to set control bits.
> During query support, current code is using all possible controls.
> So will execute code that should be only for control set stage.
>
> That will have problem when pci=noaer or aer firmware_first is used.
> As firmware have that control set for os aer already in query support stage,
> but later will not os aer handling.
>
> We should avoid passing all possible controls, just use osc_control_set
> instead.
> That should workaround BIOS bugs with affected systems on the field
> as more bios vendors are copying sample code from ACPI spec.
>
> Signed-off-by: Yinghai Lu 
> Cc: sta...@kernel.org
>
> ---
>  drivers/acpi/pci_root.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/acpi/pci_root.c
> ===
> --- linux-2.6.orig/drivers/acpi/pci_root.c
> +++ linux-2.6/drivers/acpi/pci_root.c
> @@ -201,8 +201,8 @@ static acpi_status acpi_pci_query_osc(st
> *control &= OSC_PCI_CONTROL_MASKS;
> capbuf[OSC_CONTROL_TYPE] = *control | root->osc_control_set;
> } else {
> -   /* Run _OSC query for all possible controls. */
> -   capbuf[OSC_CONTROL_TYPE] = OSC_PCI_CONTROL_MASKS;
> +   /* Run _OSC query only with existing controls. */
> +   capbuf[OSC_CONTROL_TYPE] = root->osc_control_set;
> }
>
> status = acpi_pci_run_osc(root->device->handle, capbuf, &result);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-03 Thread Bjorn Helgaas

[+cc David and iommu list, Yinghai, Jiang]

On Mon, Mar 4, 2013 at 12:04 PM, Neil Horman  wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.
>
> Signed-off-by: Neil Horman 
> CC: Prarit Bhargava 
> CC: Don Zickus 
> CC: Don Dutile 
> CC: Bjorn Helgaas 
> CC: Asit Mallick 
> CC: linux-...@vger.kernel.org
>
> ---
>
> Change notes:
>
> v2)
>
> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
> chipset series is x86 only.  I decided however to keep the quirk as a regular
> quirk, not an early_quirk.  Early quirks have no way currently to determine if
> BIOS has properly disabled the feature in the iommu, at least not without
> significant hacking, and since its quite possible this will be a short lived
> quirk, should Don Z's workaround code prove successful (and it looks like it 
> may
> well), I don't think that necessecary.
>
> * Removed the WARNING banner from the quirk, and added the HW_ERR token to the
> string, I opted to leave the newlines in place however, as I really couldnt
> find a way to keep the text on a single line is still legible from a code
> perspective.  I think theres enough language in there that using cscope on 
> just
> about any substring however will turn it up, and again, this may be a short
> lived quirk.
> ---
>  arch/x86/kernel/quirks.c | 18 ++
>  include/linux/pci_ids.h  |  2 ++
>  2 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
> index 26ee48a..a718ea2 100644
> --- a/arch/x86/kernel/quirks.c
> +++ b/arch/x86/kernel/quirks.c
> @@ -5,6 +5,7 @@
>  #include 
>
>  #include 
> +#include "../../../drivers/iommu/irq_remapping.h"
>
>  #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)
>
> @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 
> PCI_DEVICE_ID_AMD_15H_NB_F5,
> quirk_amd_nb_node);
>
>  #endif
> +
> +static void intel_remapping_check(struct pci_dev *dev)
> +{
> +   u8 revision;
> +
> +   pci_read_config_byte(dev, PCI_REVISION_ID, &revision);
> +
> +   if ((revision == 0x13) && irq_remapping_enabled) {
> +pr_warn(HW_ERR "This system BIOS has enabled interrupt 
> remapping\n"
> +"on a chipset that contains an errata making that\n"
> +"feature unstable.  Please reboot with nointremap\n"
> +"added to the kernel command line and contact\n"
> +"your BIOS vendor for an update");
> +   }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, 
> intel_remapping_check);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, 
> intel_remapping_check);

This started as an IOMMU change, and I'm not an expert in that area,
so I added David and the IOMMU list.  I'd rather have him deal with
this than me.

Is this something we can just *fix* in the kernel, e.g., by turning
off interrupt remapping ourselves, or does it have to be done before
the OS boots?

> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 31717bd..54027a6 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2732,6 +2732,8 @@
>  #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2  0x2db2
>  #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV20x2db3
>  #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340
> +#define PCI_DEVICE_ID_INTEL_5500_IOHUB 0x3403
&g

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas

On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman  wrote:
> On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
>> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
>> > );
>> > > +
>> > > +   if ((revision == 0x13) && irq_remapping_enabled) {
>> > > +pr_warn(HW_ERR "This system BIOS has enabled interrupt 
>> > > remapping\n"
>> > > +"on a chipset that contains an errata making 
>> > > that\n"
>> > > +"feature unstable.  Please reboot with 
>> > > nointremap\n"
>> > > +"added to the kernel command line and contact\n"
>> > > +"your BIOS vendor for an update");
>>
>> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
>>
> Ok, copy that. I'll repost shortly

When you do, please include URLs for any problem reports or bugzillas you have.

I assume Windows "just works" in this situation?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas

On Thu, Apr 4, 2013 at 9:39 AM, Neil Horman  wrote:
> On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote:
>> On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman  wrote:
>> > On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote:
>> >> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote:
>> >> > );
>> >> > > +
>> >> > > +   if ((revision == 0x13) && irq_remapping_enabled) {
>> >> > > +pr_warn(HW_ERR "This system BIOS has enabled 
>> >> > > interrupt remapping\n"
>> >> > > +"on a chipset that contains an errata making 
>> >> > > that\n"
>> >> > > +"feature unstable.  Please reboot with 
>> >> > > nointremap\n"
>> >> > > +"added to the kernel command line and 
>> >> > > contact\n"
>> >> > > +"your BIOS vendor for an update");
>> >>
>> >> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'.
>> >>
>> > Ok, copy that. I'll repost shortly
>>
>> When you do, please include URLs for any problem reports or bugzillas you 
>> have.
>>
> Well, those are going to be vendor specific, so I'm not sure we can really do
> that, at least not in any meaningful way.

Sorry, I don't understand your point.  It's useful to know who
reported it (e.g., for future testers) and what happened and what
bugzillas it solved.  Of course it applies only to machines with this
chipset and certain BIOS revisions.

>> I assume Windows "just works" in this situation?
> No more or less than linux does in this case.  The Intel provided errata
> indicates that the only acceptable workaround is to disable remapping in the
> BIOS, so I would presume that if a windows system has a BIOS that doesn't
> implement this fix, its just as exposed as we are.

It sounds like the effect of this bug is that on Linux, devices may
not work at all because of lost interrupts.  Either Windows must never
enable remapping (so it never sees the bug), or it must be designed in
a way that tolerates the problem.  I can't believe these machines
shipped with Windows certification if devices didn't work correctly.

Either way, I don't understand why we can't make the quirk just fix
this.  Booting with "nointremap" only sets disable_irq_remap, which is
only used by irq_remapping_supported().  Early quirks are run before
irq_remapping_supported () is ever called, so an early quirk ought to
be just as effective as the command line option.  Here's the relevant
call tree I see:

  start_kernel
setup_arch
  parse_early_param
  early_quirks
rest_init
  ...

The x86 setup_arch() does call generic_apic_probe(), but as far as I
can tell, none of the APIC .probe() methods reference
disable_irq_remap, so that doesn't look like a problem.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] PCI: Handle device quirks when accessing sysfs resource entries

2013-04-04 Thread Bjorn Helgaas

On Thu, Mar 21, 2013 at 6:51 PM, Robert Hancock  wrote:
> On 03/20/2013 10:35 PM, Myron Stowe wrote:
>>
>> Sysfs includes entries to memory regions that back a PCI device's BARs.
>> The pci-sysfs entries backing I/O Port BARs can be accessed by userspace,
>> providing direct access to the device's registers.  File permissions
>> prevent random users from accessing the device's registers through these
>> files, but don't stop a privileged app that chooses to ignore the purpose
>> of these files from doing so.
>>
>> There are devices with abnormally strict restrictions with respect to
>> accessing their registers; aspects that are typically handled by the
>> device's driver.  When these access restrictions are not followed - as
>> when a userspace app such as "udevadm info --attribute-walk
>> --path=/sys/..." parses though reading all the device's sysfs entries - it
>> can cause such devices to fail.
>>
>> This patch introduces a quirking mechanism that can be used to detect
>> accesses that do no meet the device's restrictions, letting a device
>> specific method intervene and decide how to progress.
>>
>> Reported-by: Xiangliang Yu 
>> Signed-off-by: Myron Stowe 
>
>
> I honestly don't think there's much point in even attempting this strategy.
> This list of devices in the quirk can't possibly be complete. It would
> likely be easier to enumerate a white-list of devices that can deal with
> their IO ports being read willy-nilly than a blacklist of those that don't,
> as there's likely countless devices that fall into this category. Even if
> they don't choke as badly as these ones do, it's quite likely that bad
> behavior will result.

I agree, I'm not inclined to add a bunch of code that only fixes one
device when there are likely many others with similar problems.

> I think there's a few things that need to be done:
>
> -Fix the bug in udevadm that caused it to trawl through these files
> willy-nilly,

We can argue about whether this is actually a bug in udevadm, but
either way, my opinion is that the data that udevadm prints out from
these files is uninteresting in the same way the data from "uevent,"
"dev," "modalias," "resource," etc. is uninteresting.  So to me, a
udevadm change seems justified for that reason.

> -Fix the kernel so that access through these files complies with the
> kernel's mechanisms for claiming IO/memory regions to prevent access
> conflicts (i.e. opening these files should claim the resource region they
> refer to, and should fail with EBUSY or something if another process or a
> kernel driver is using it).
>
> -Reconsider whether supporting read/write on the resource files for IO port
> regions like these makes any sense. Obviously mmap isn't very practical for
> IO port access on x86 but you could even do something like an ioctl for this
> purpose. Not very many pieces of software would need to access these files
> so it's likely OK if the API is a bit ugly. That would prevent something
> like grepping through sysfs from generating port accesses to random devices.

This is the approach I'd like to push on for a kernel fix.  I'm not a
VM person, but if it were possible to support .mmap() in such a way
that a handler would be called for every access to the region, we
could easily support I/O port access that way.

Along that line, I'm concerned that we may have a hole in the MEM BAR
.mmap() path.  What happens if we have BARs from two different devices
in the same page, and we mmap() one of them?  Does the user also get
access to the second device's BAR on the same page?  If so, that could
also be fixed with an .mmap() trap approach.  Performance would suck
for these small BARs, of course.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] Quirk for buggy dma source tags with Intel IOMMU.

2013-04-04 Thread Bjorn Helgaas

On Thu, Mar 7, 2013 at 7:35 PM, Andrew Cooks  wrote:
> This patch creates a quirk to allow the Intel IOMMU to be enabled for devices
> that use incorrect tags during DMA. It is similar to the quirk for Ricoh
> devices, but allows mapping multiple functions and mapping of 'ghost'
> functions that do not correspond to real devices. Devices that need this
> include a variety of Marvell 88SE91xx based SATA controllers. [1][2]
>
> Changelog:
> v4: Process feedback received from Alex Williamson.
>  * don't assume function 0 is a real device.
>  * exit early if no ghost functions are known, or all known functions have
>been mapped.
>  * cleanup failure case so mapping succeeds or fails for all ghost functions
>per device.
>  * improve comments.
>
> v3:
>  * Adopt David Woodhouse's terminology by referring to the quirky functions as
>  'ghost' functions.
>  * Unmap ghost functions when device is detached from IOMMU.
>  * Stub function for when CONFIG_PCI_QUIRKS is not enabled.
>
>
>  This patch was generated against 3.9-rc1, but will also apply to 3.7.10.
>
>  Bug reports:
>  1. https://bugzilla.redhat.com/show_bug.cgi?id=757166
>  2. https://bugzilla.kernel.org/show_bug.cgi?id=42679
>
> Signed-off-by: Andrew Cooks 
> ---
>  drivers/iommu/intel-iommu.c |   69 
> +++
>  drivers/pci/quirks.c|   67 +-
>  include/linux/pci.h |5 +++
>  include/linux/pci_ids.h |1 +
>  4 files changed, 141 insertions(+), 1 deletions(-)

I'm OK with the pci/quirks.c part of this, but the bulk of the
interesting code is in intel-iommu.c, so I assume the IOMMU folks will
take care of this.

Bjorn

> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0099667..f53f3e3 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1674,6 +1674,69 @@ static int domain_context_mapping_one(struct 
> dmar_domain *domain, int segment,
> return 0;
>  }
>
> +static void iommu_detach_dev(struct intel_iommu *iommu, u8 bus, u8 devfn);
> +
> +static void unmap_ghost_dma_fn(struct pci_dev *pdev, u8 fn_map)
> +{
> +   u8 fn;
> +   struct intel_iommu *iommu;
> +
> +   iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
> +   pdev->devfn);
> +
> +   /* something must be seriously fubar if we can't lookup the iommu. */
> +   BUG_ON(!iommu);
> +
> +   for (fn = 0; fn <= 7 && fn_map << fn; fn++) {
> +   if (fn == PCI_FUNC(pdev->devfn))
> +   continue;
> +   if (fn_map & (1< +   iommu_detach_dev(iommu,
> +   pdev->bus->number,
> +   PCI_DEVFN(PCI_SLOT(pdev->devfn), fn));
> +   dev_dbg(&pdev->dev, "quirk; ghost func %d unmapped",
> +   fn);
> +   }
> +   }
> +}
> +
> +/* For quirky devices like Marvell 88SE91xx chips that use ghost functions. 
> */
> +static int map_ghost_dma_fn(struct dmar_domain *domain,
> +   struct pci_dev *pdev,
> +   int translation)
> +{
> +   u8 fn, fn_map;
> +   u8 fn_mapped = 0;
> +   int err = 0;
> +
> +   fn_map = pci_get_dma_source_map(pdev);
> +
> +   /* this is the common, non-quirky case. */
> +   if (!fn_map)
> +   return 0;
> +
> +   for (fn = 0; fn <= 7 && fn_map << fn; fn++) {
> +   if (fn == PCI_FUNC(pdev->devfn))
> +   continue;
> +   if (fn_map & (1< +   err = domain_context_mapping_one(domain,
> +   pci_domain_nr(pdev->bus),
> +   pdev->bus->number,
> +   PCI_DEVFN(PCI_SLOT(pdev->devfn), fn),
> +   translation);
> +   if (err) {
> +   dev_err(&pdev->dev,
> +   "mapping ghost func %d failed", fn);
> +   unmap_ghost_dma_fn(pdev, fn_mapped);
> +   return err;
> +   }
> +   dev_dbg(&pdev->dev, "quirk; ghost func %d mapped", 
> fn);
> +   fn_mapped |= (1< +   }
> +   }
> +   return 0;
> +}
> +
>  static int
>  domain_context_mapping(struct dmar_domain *domain, struct pci_dev *pdev,
> int translation)
> @@ -1687,6 +1750,11 @@ domain_context_mapping(struct dmar_domain *domain, 
> struct pci_dev *pdev,
> if (ret)
> return ret;
>
> +   /* quirk for undeclared/ghost pci functions */
> +   ret = map_ghost_dma_fn(domain, pdev, translation);
> +   if (ret)
> +   return ret;
> +
> /* dependent device mapping */
> tmp = pci_find_upstr

Re: [PATCH 0/4] PCI/PM: D3cold support for system suspend

2013-04-04 Thread Bjorn Helgaas

On Mon, Jan 28, 2013 at 9:34 PM, Huang Ying  wrote:
> [PATCH 1/4] PCI/ACPI: Add target state as parameter to 
> pci_platform_pm_ops->run_wake
> [PATCH 2/4] PCI: Rename pci_dev->runtime_d3cold to pci_dev->set_d3cold
> [PATCH 3/4] PCI/PM: Set pci_dev->set_d3cold in pci_set_power_state
> [PATCH 4/4] PCI/PM: Enable D3cold support for system suspend

Do we still need these?

What problem do they solve?  Or what new functionality do they add?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas

On Thu, Apr 4, 2013 at 11:51 AM, Neil Horman  wrote:

> Oh, you want the bug report that I'm fixing this against?  Sure, I can do 
> that.
> I thought you wanted me to include a url in the WARN_TAINT, with which user
> could report occurances of this bug.  Yeah, the bug that this is reported in 
> is:
> https://bugzilla.redhat.com/show_bug.cgi?id=887006
>
> Its standing in for about a dozen or so variants of this issue we've seen

Exactly -- I'm just hoping for something in the changelog.  BTW, this
particular bugzilla is not public.

> Regardless, theres also the security issue to consider here - namely that
> disabling irq remapping opens up users of virt to a possible security bug
> (potential irq injection).  Some users may wish to live with the remapping
> error, given that error typically leads to devices that need to be
> restarted/reset to start working again, rather than live with the security 
> hole.
> I rather like the warning, that gives users a choice, but I'll spin up a 
> version
> that just disables it if you would rather.

I don't believe users will want to make a choice like that or even be
sophisticated enough to do it, at least not based on something in
dmesg.  I'm pretty sure I'm not  :)

The only supportable thing I can imagine doing would be:

  - Disable interrupt remapping if this chipset defect is present, so
devices work reliably (they don't need whatever restart/reset you
referred to above).
  - Disable virt functionality when interrupt remapping is disabled to
avoid the security problem (I don't know the details of this.)
  - Add a command-line option to enable interrupt remapping (I think
"intremap=on" is currently parsed too early, but maybe this could be
reworked so the option could override the quirk disable).
  - Add release notes saying "boot with 'intremap=on' if you want the
virt functionality and can accept unreliable devices."

That way the default behavior is safe and reliable (though perhaps
lacking some functionality), and you have told the user a way to get
safe and unreliable operation if he's willing to accept that.  At
least, that's what I think I would want if I were in RH's shoes.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-04 Thread Bjorn Helgaas

On Thu, Apr 4, 2013 at 2:04 PM, Neil Horman  wrote:
> On Thu, Apr 04, 2013 at 10:40:07AM -0700, Yinghai Lu wrote:
>> On Thu, Apr 4, 2013 at 10:27 AM, Don Dutile  wrote:
>> >> You need to move the quirk to early_quirk to append nointremap to
>> >> avoid extra rebooting.
>> >>
>> > The pci-dev's of all the (minimally, root, 5500-chipset) pci-dev's are
>> > known/scanned/created by that time?
>>
>> in arch/x86/kernel/early-quirk.c
>>
>> and on top of
>> https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/commit/?h=for-x86-early-quirk-usb&id=de38757e964cfee20e6da1977572a2191d7f4aa0
>>
>> You could add one entry in early_qrk[].
>>
>> Some one already try to use that path to disable x2apic on some thinkpad.
>>
>> So it should work on nointrremap too.
>>
> See my last email to Bjorn.  Doing this in early-quirks in such a way that we
> can detect an iommu that has interrupt remapping enabled (so we don't just
> unilaterally print this quirk all the time) requires that we be able to parse
> acpi tables very early in the boot.  If you know of how to do that, I can make
> this happen.  If not, I suppose another alternative would be to have the early
> quirk set a flag that tells us this is a bogus chip, and if we try to enable 
> irq
> remapping with that flag set, we should fail, and report the error at that 
> time,
> but I'm not sure I like that solution.

I like that solution :)  It seems very simple -- you don't have to
parse any tables or anything.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 26/27] PCI: Make quirk_io_region to use addon_res

2013-04-04 Thread Bjorn Helgaas

On Wed, Mar 13, 2013 at 5:28 PM, Yinghai Lu  wrote:
> After they are put in add-on resources, they will be safely claimed later.
>
> Signed-off-by: Yinghai Lu 
> ---
>  drivers/pci/quirks.c |  155 
> +++---
>  1 file changed, 70 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 7e5392c..6d379d6 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -324,29 +324,61 @@ static void quirk_cs5536_vsa(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CS5536_ISA, 
> quirk_cs5536_vsa);
>
> -static void quirk_io_region(struct pci_dev *dev, unsigned region,
> -   unsigned size, int nr, const char *name)
> +static int quirk_read_io(struct pci_dev *dev, struct resource *res, int port)
>  {
> -   region &= ~(size-1);
> -   if (region) {
> -   struct pci_bus_region bus_region;
> -   struct resource *res = dev->resource + nr;
> +   struct pci_bus_region bus_region;
> +   u16 region;
> +   unsigned size = to_pci_dev_addon_resource(res)->size;
>
> -   res->name = pci_name(dev);
> -   res->start = region;
> -   res->end = region + size - 1;
> -   res->flags = IORESOURCE_IO;
> +   pci_read_config_word(dev, port, ®ion);
> +   region &= ~(size - 1);
>
> -   /* Convert from PCI bus to resource space.  */
> -   bus_region.start = res->start;
> -   bus_region.end = res->end;
> -   pcibios_bus_to_resource(dev, res, &bus_region);
> +   /* Convert from PCI bus to resource space.  */
> +   bus_region.start = region;
> +   bus_region.end = region + size - 1;
>
> -   if (pci_claim_resource(dev, nr) == 0)
> -   dev_info(&dev->dev, "quirk: %pR claimed by %s\n",
> -res, name);
> -   }
> -}
> +   pcibios_bus_to_resource(dev, res, &bus_region);
> +
> +   res->flags |= IORESOURCE_IO;
> +   dev_info(&dev->dev, "PIO at %pR\n", res);
> +
> +   return 0;
> +}
> +static int quirk_write_io(struct pci_dev *dev, struct resource *res, int 
> port)
> +{
> +   struct pci_bus_region bus_region;
> +   u16 region;
> +   unsigned size = to_pci_dev_addon_resource(res)->size;
> +
> +   pci_read_config_word(dev, port, ®ion);
> +   region &= size - 1;
> +
> +   /* Convert to PCI bus space.  */
> +   pcibios_resource_to_bus(dev, &bus_region, res);
> +   region |= bus_region.start & (~(size - 1));
> +
> +   pci_write_config_word(dev, port, region);
> +
> +   return 0;
> +}
> +static struct resource_ops quirk_io_ops = {
> +   .read = quirk_read_io,
> +   .write = quirk_write_io,
> +};
> +
> +static void quirk_io_region(struct pci_dev *dev, int port,
> +   unsigned size, char *name)
> +{
> +   u16 region;
> +
> +   pci_read_config_word(dev, port, ®ion);
> +   region &= size - 1;
> +
> +   if (!region)
> +   return;
> +
> +   pci_add_dev_addon_resource(dev, port, size, &quirk_io_ops, name);
> +}
>
>  /*
>   * ATI Northbridge setups MCE the processor if you even
> @@ -374,12 +406,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 
> PCI_DEVICE_ID_ATI_RS100,   quirk_ati_
>   */
>  static void quirk_ali7101_acpi(struct pci_dev *dev)
>  {
> -   u16 region;
> -
> -   pci_read_config_word(dev, 0xE0, ®ion);
> -   quirk_io_region(dev, region, 64, PCI_BRIDGE_RESOURCES, "ali7101 
> ACPI");
> -   pci_read_config_word(dev, 0xE2, ®ion);
> -   quirk_io_region(dev, region, 32, PCI_BRIDGE_RESOURCES+1, "ali7101 
> SMB");
> +   quirk_io_region(dev, 0xE0, 64, "ali7101 ACPI");
> +   quirk_io_region(dev, 0xE2, 32, "ali7101 SMB");

This patch has two changes that need to be separated:

  1) Refactoring quirk_io_region() so the pci_read_config_word() is
done by quirk_io_region() rather than the caller.
  2) Whatever pci_dev resource changes we end up making.

I think part 1 can be done at any time and is probably not controversial.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 02/27] PCI: Add pci_dev_resource_idx() helper

2013-04-04 Thread Bjorn Helgaas

On Wed, Mar 13, 2013 at 5:27 PM, Yinghai Lu  wrote:
> Use resource pointer to get index in pci resources array/list.
>
> -v2: export symbol for acpiphp compiling error, found by
>Steven Newbury 
>
> Signed-off-by: Yinghai Lu 
> ---
>  drivers/pci/probe.c |9 +
>  include/linux/pci.h |1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 9cb3eb3..1df75f7 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -114,6 +114,15 @@ struct resource *pci_dev_resource_n(struct pci_dev *dev, 
> int n)
>  }
>  EXPORT_SYMBOL(pci_dev_resource_n);
>
> +int pci_dev_resource_idx(struct pci_dev *dev, struct resource *res)
> +{
> +   if (res >= dev->resource &&
> +   res <= dev->resource + (PCI_NUM_RESOURCES - 1))
> +   return res - dev->resource;
> +
> +   return -1;
> +}

I'm dubious about the whole idea of a resource *index*.  I'd like to
get away from that concept completely.

Some uses of this are just for printing and are obviously unnecessary
(get_res_add_size()).

Other places we pass around the index, e.g.,
reassign_resources_sorted() passes it to pci_assign_resource() and
pci_reassign_resource(), and eventually we just pass the index to
pci_dev_resource_n() to get back what we started with.  I'd rather
just pass around a pointer instead of this half pointer/half index
strategy.

We might need a "struct pci_resource" or something that contains the
type (BAR, IOV, bridge window, etc), a BAR number, etc.  There's
already a "struct pci_resource" in cpqphp, but that's isolated to
cpqphp and could be easily changed if you wanted that name.

It's going to be very confusing to have a "struct pci_dev_resource"
and a "pci_dev_resource_n()" that have nothing to do with each other.

>  static u64 pci_size(u64 base, u64 maxbase, u64 mask)
>  {
> u64 size = mask & maxbase;  /* Find the significant bits */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 00d5367..aefff8b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -339,6 +339,7 @@ struct pci_dev {
>  };
>
>  struct resource *pci_dev_resource_n(struct pci_dev *dev, int n);
> +int pci_dev_resource_idx(struct pci_dev *dev, struct resource *res);
>
>  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
>  {
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 03/27] PCI: pci resource iterator

2013-04-04 Thread Bjorn Helgaas

On Wed, Mar 13, 2013 at 5:27 PM, Yinghai Lu  wrote:
> From: Ram Pai 
>
> Currently pci_dev structure holds an array of 17 PCI resources; six base
> BARs, one ROM BAR, four BRIDGE BARs, six sriov BARs.  This is wasteful.
> A bridge device just needs the 4 bridge resources. A non-bridge device
> just needs the six base resources and one ROM resource. The sriov
> resources are needed only if the device has SRIOV capability.
>
> The pci_dev structure needs to be re-organized to avoid unnecessary
> bloating.  However too much code outside the pci-bus driver, assumes the
> internal details of the pci_dev structure, thus making it hard to
> re-organize the datastructure.
>
> As a first step this patch provides generic methods to access the
> resource structure of the pci_dev.
>
> Finally we can re-organize the resource structure in the pci_dev
> structure and correspondingly update the methods.
>
> -v2: Consolidated iterator interface as per Bjorn's suggestion.
> -v3: Add the idx back - Yinghai Lu
> -v7: Change to use bitmap for searching - Yinghai Lu
> -v8: Fix acpiphp module compiling error that is found by
> Steven Newbury  - Yinghai Lu
>
> Signed-off-by: Ram Pai 
> Signed-off-by: Yinghai Lu 
> ---
>  drivers/pci/probe.c |   48 
>  include/linux/pci.h |   24 
>  2 files changed, 72 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 1df75f7..ac751a6 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -123,6 +123,54 @@ int pci_dev_resource_idx(struct pci_dev *dev, struct 
> resource *res)
> return -1;
>  }
>
> +static void __init_res_idx_mask(unsigned long *mask, int flag)
> +{
> +   bitmap_zero(mask, PCI_NUM_RESOURCES);
> +   if (flag & PCI_STD_RES)
> +   bitmap_set(mask, PCI_STD_RESOURCES,
> +   PCI_STD_RESOURCE_END - PCI_STD_RESOURCES + 1);
> +   if (flag & PCI_ROM_RES)
> +   bitmap_set(mask, PCI_ROM_RESOURCE, 1);
> +#ifdef CONFIG_PCI_IOV
> +   if (flag & PCI_IOV_RES)
> +   bitmap_set(mask, PCI_IOV_RESOURCES,
> +   PCI_IOV_RESOURCE_END - PCI_IOV_RESOURCES + 1);
> +#endif
> +   if (flag & PCI_BRIDGE_RES)
> +   bitmap_set(mask, PCI_BRIDGE_RESOURCES,
> +   PCI_BRIDGE_RESOURCE_END - PCI_BRIDGE_RESOURCES + 1);
> +}
> +
> +static DECLARE_BITMAP(res_idx_mask[1 << PCI_RES_BLOCK_NUM], 
> PCI_NUM_RESOURCES);
> +static int __init pci_res_idx_mask_init(void)
> +{
> +   int i;
> +
> +   for (i = 0; i < (1 << PCI_RES_BLOCK_NUM); i++)
> +   __init_res_idx_mask(res_idx_mask[i], i);
> +
> +   return 0;
> +}
> +postcore_initcall(pci_res_idx_mask_init);
> +
> +static inline unsigned long *get_res_idx_mask(int flag)
> +{
> +   return res_idx_mask[flag & ((1 << PCI_RES_BLOCK_NUM) - 1)];
> +}
> +
> +int pci_next_resource_idx(int i, int flag)
> +{
> +   i++;
> +   if (i < PCI_NUM_RESOURCES)
> +   i = find_next_bit(get_res_idx_mask(flag), PCI_NUM_RESOURCES, 
> i);
> +
> +   if (i < PCI_NUM_RESOURCES)
> +   return i;
> +
> +   return -1;
> +}
> +EXPORT_SYMBOL(pci_next_resource_idx);
> +
>  static u64 pci_size(u64 base, u64 maxbase, u64 mask)
>  {
> u64 size = mask & maxbase;  /* Find the significant bits */
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index aefff8b..127a856 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -341,6 +341,30 @@ struct pci_dev {
>  struct resource *pci_dev_resource_n(struct pci_dev *dev, int n);
>  int pci_dev_resource_idx(struct pci_dev *dev, struct resource *res);
>
> +#define PCI_STD_RES(1<<0)
> +#define PCI_ROM_RES(1<<1)
> +#define PCI_IOV_RES(1<<2)
> +#define PCI_BRIDGE_RES (1<<3)
> +#define PCI_RES_BLOCK_NUM  4
> +
> +#define PCI_ALL_RES(PCI_STD_RES | PCI_ROM_RES | PCI_BRIDGE_RES | 
> PCI_IOV_RES)
> +#define PCI_NOSTD_RES  (PCI_ALL_RES & ~PCI_STD_RES)
> +#define PCI_NOIOV_RES  (PCI_ALL_RES & ~PCI_IOV_RES)
> +#define PCI_NOROM_RES  (PCI_ALL_RES & ~PCI_ROM_RES)
> +#define PCI_NOBRIDGE_RES   (PCI_ALL_RES & ~PCI_BRIDGE_RES)
> +#define PCI_STD_ROM_RES(PCI_STD_RES | PCI_ROM_RES)
> +#define PCI_STD_IOV_RES(PCI_STD_RES | PCI_IOV_RES)
> +#define PCI_STD_ROM_IOV_RES(PCI_STD_RES | PCI_ROM_RES | PCI_IOV_RES)
> +
> +int pci_next_resource_idx(int i, int flag);
> +
> +#define for_each_pci_resource(dev, res, i, flag)   \
> +   for (i = pci_next_resource_idx(-1, flag),   \
> +   res = pci_dev_resource_n(dev, i);   \
> +res;   \
> +i = pci_next_resource_idx(i, flag),\
> +   res = pci_dev_resource_n(dev, i))
> +
>  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
>  {
>  #ifdef CONFIG_PCI_IOV

I think this turn

Re: [PATCH v3 04/27] PCI: Add is_pci_*_resource_idx() helpers

2013-04-04 Thread Bjorn Helgaas

On Wed, Mar 13, 2013 at 5:27 PM, Yinghai Lu  wrote:
> According to resource pointer find out if the resource is some type resource
> like bridge, sriov, or std.
>
> Signed-off-by: Yinghai Lu 
> ---
>  include/linux/pci.h |   23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 127a856..efb348b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -92,6 +92,29 @@ enum {
> DEVICE_COUNT_RESOURCE = PCI_NUM_RESOURCES,
>  };
>
> +static inline bool is_pci_std_resource_idx(int i)
> +{
> +   return i >= PCI_STD_RESOURCES && i <= PCI_STD_RESOURCE_END;
> +}
> +
> +static inline bool is_pci_rom_resource_idx(int i)
> +{
> +   return i == PCI_ROM_RESOURCE;
> +}
> +
> +static inline bool is_pci_iov_resource_idx(int i)
> +{
> +#ifdef CONFIG_PCI_IOV
> +   return i >= PCI_IOV_RESOURCES && i <= PCI_IOV_RESOURCE_END;
> +#endif
> +   return false;
> +}
> +
> +static inline bool is_pci_bridge_resource_idx(int i)
> +{
> +   return i >= PCI_BRIDGE_RESOURCES && i <= PCI_BRIDGE_RESOURCE_END;
> +}
> +
>  typedef int __bitwise pci_power_t;
>
>  #define PCI_D0 ((pci_power_t __force) 0)

1) I don't like adding more "_idx()" interfaces.

2) The only one of these that's even used is "is_pci_bridge_resource_idx()"

3) I think adding a wrapper struct with a "type" or "flags" field
would make this trivial, e.g., "pres->flags & PCI_RESOURCE_WINDOW" or
something.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] pci: Add PCI ROM helper for platform-provided ROM images

2013-04-05 Thread Bjorn Helgaas

On Fri, Apr 5, 2013 at 2:31 PM, Chris Murphy  wrote:
>
> On Apr 2, 2013, at 2:10 PM, Bjorn Helgaas  wrote:
>
>> On Mon, Apr 1, 2013 at 11:47 AM, Matthew Garrett
>>  wrote:
>>> On Mon, 2013-04-01 at 11:39 -0600, Bjorn Helgaas wrote:
>>>
>>>> Chris still has problems (see
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=927451), but I don't know
>>>> whether they are related to these patches or something else.
>>>
>>> I think they're unrelated. The log he posts using this patch gives the
>>> correct output - the ROM image comes from the platform method rather
>>> than from PCI. I think Ben probably needs to look at that.
>>
>> OK, I added these three patches to my for-linus branch, headed for v3.9.
>
> Are they in 3.9.0-0.rc5.git2.1.f19? I'm seeing a regression from 3.8.5 with 
> the radeon driver not finding BIOS ROM as well.
> https://bugzilla.redhat.com/show_bug.cgi?id=949083

No.  I haven't asked Linus to pull my branch yet (was just thinking it
was time to do that, coincidentally :))

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-05 Thread Bjorn Helgaas

On Fri, Apr 5, 2013 at 1:31 PM, Neil Horman  wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.

I'd still like to mention the bugzilla URL in the changelog
(https://bugzilla.redhat.com/show_bug.cgi?id=887006) if it can be made
public.

> ...

> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 3755ef4..bfa3139 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, int 
> func)
>  }
>  #endif
>
> +#ifdef CONFIG_IRQ_REMAP
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +   u8 revision;
> +
> +   revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID);
> +
> +   /*
> +* Revision 0x13 of this chipset supports irq remapping
> +* but has an erratum that breaks its behavior, flag it as such
> +*/
> +   if (revision == 0x13)
> +   irq_remap_broken = 1;
> +
> +}
> +#else
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +}
> +#endif
> +
>  #define QFLAG_APPLY_ONCE   0x1
>  #define QFLAG_APPLIED  0x2
>  #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
> @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = {
>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
> { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
> +   { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> +   { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> {}
>  };
>
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index d56f8c1..2b56e92 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -19,6 +19,7 @@
>  int irq_remapping_enabled;
>
>  int disable_irq_remap;
> +int irq_remap_broken;
>  int disable_sourceid_checking;
>  int no_x2apic_optout;
>
> @@ -216,6 +217,17 @@ int irq_remapping_supported(void)
> if (disable_irq_remap)
> return 0;
>
> +   if (irq_remap_broken) {
> +   WARN_TAINT(1, TAIN_FIRMWARE_WORKAROUND,

This looks like a typo (s/TAIN/TAINT/).

> +  "This system BIOS has enabled interrupt 
> remapping\n"
> +  "on a chipset that contains an erratum making 
> that\n"
> +  "feature unstable.  Please reboot with 
> nointremap\n"
> +  "added to the kernel command line and contact\n"
> +  "your BIOS vendor for an update");

I suspect your updated message won't mention "nointremap", but if it
does, Documentation/kernel-parameters.txt says that option is
deprecated and "intremap=off" should be used instead.

> +   disable_irq_remap = 1;

Tell me if I have this correct:

Before this patch, we had interrupt remapping enabled and
virtualization enabled.  This is safe, but devices might need resets
to deal with lost or spurious interrupts.

After this patch, these same machines will by default have interrupt
remapping disabled and virtualization enabled.  The lost or spurious
interrupt problem should be gone, but we now have the IRQ injection
security bug.

If that's really the change we're making, I'm not comfortable applying
this patch.  But I don't know the details of the IRQ injection
problem, so maybe my understanding of the implications is wrong.

> +   return 0;
> +   }
> +
> if (!remap_ops || !remap_ops->supported)
> return 0;
>
> diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> index ecb6376..d7537e4 100644
> --- a/drivers/iommu/irq_

Re: [GIT PULL] PCI updates for v3.9

2013-04-05 Thread Bjorn Helgaas

[+cc linux-kernel]

On Fri, Apr 5, 2013 at 7:31 PM, Bjorn Helgaas  wrote:
> Hi Linus,
>
> Here are some fixes for v3.9.  They include fixes for an ASPM problem
> that affects pre-1.1 PCIe devices, a kexec problem, the platform ROM
> image problem, a couple hotplug issues related to PM, and a fix for
> PCI-EISA bridges that have been broken for a long time.
>
> Bjorn
>
>
> The following changes since commit 8bb9660418e05bb1845ac1a2428444d78e322cc7:
>
>   Linux 3.9-rc4 (2013-03-23 16:52:44 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
> tags/pci-v3.9-fixes-1
>
> for you to fetch changes up to de7d5f729c72638f41d7c17487bccb1c570ff144:
>
>   PCI/PM: Disable runtime PM of PCIe ports (2013-04-03 15:54:59 -0600)
>
> 
> PCI updates for v3.9:
>
>   ASPM
>   Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>   kexec
>   PCI: Don't try to disable Bus Master on disconnected PCI devices
>   Platform ROM images
>   PCI: Add PCI ROM helper for platform-provided ROM images
>   nouveau: Attempt to use platform-provided ROM image
>   radeon: Attempt to use platform-provided ROM image
>   Hotplug
>   PCI/ACPI: Always resume devices on ACPI wakeup notifications
>   PCI/PM: Disable runtime PM of PCIe ports
>   EISA
>   EISA/PCI: Fix bus res reference
>   EISA/PCI: Init EISA early, before PNP
>
> 
> Bjorn Helgaas (3):
>   Merge branch 'pci/mjg-rom' into for-linus
>   Merge branch 'pci/yinghai-eisa' into for-linus
>   Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>
> Konstantin Khlebnikov (1):
>   PCI: Don't try to disable Bus Master on disconnected PCI devices
>
> Matthew Garrett (3):
>   PCI: Add PCI ROM helper for platform-provided ROM images
>   nouveau: Attempt to use platform-provided ROM image
>   radeon: Attempt to use platform-provided ROM image
>
> Rafael J. Wysocki (2):
>   PCI/ACPI: Always resume devices on ACPI wakeup notifications
>   PCI/PM: Disable runtime PM of PCIe ports
>
> Yinghai Lu (2):
>   EISA/PCI: Fix bus res reference
>   EISA/PCI: Init EISA early, before PNP
>
>  drivers/acpi/pci_root.c | 76 
> -
>  drivers/eisa/pci_eisa.c | 67 +++---
>  drivers/gpu/drm/nouveau/core/subdev/bios/base.c | 17 ++
>  drivers/gpu/drm/radeon/radeon_bios.c| 26 +
>  drivers/pci/pci-acpi.c  | 15 ++---
>  drivers/pci/pci-driver.c|  5 +-
>  drivers/pci/pcie/portdrv_pci.c  | 13 -
>  drivers/pci/rom.c   | 67 ++
>  include/linux/pci.h |  1 +
>  9 files changed, 169 insertions(+), 118 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v5 4/8] ACPI, PCI: avoid building pci_slot as module

2013-01-28 Thread Bjorn Helgaas

On Mon, Jan 28, 2013 at 3:00 PM, Yinghai Lu  wrote:
> On Mon, Jan 28, 2013 at 1:52 PM, Bjorn Helgaas  wrote:
>> On Mon, Jan 28, 2013 at 2:29 PM, Yinghai Lu  wrote:
> ...
>>> If bios have messed up slot name or idx, we will get strange 1-1
>>> other crazy name.
>>>
>>> if you really need to put it as built-in, may need to some command
>>> line that user could switch it off.
>>
>> It would save us all a lot of time if you gave an example and worked
>> through the scenario where this is a problem.
>>
>> We already have the choice of having pci_slot built-in, so if there's
>> a bug in that config, we already have the bug.  This patch merely
>> removes a config where the bug might be covered up.
>
>
> for distribution, current it is with module, so user could blacklist
> in module.conf
>
> Now with built-in or not, distribution will have it built-in, and user
> have no chance to
> disable it.

CONFIG_ACPI_PCI_SLOT=y in RHEL6, so evidently they have this problem.

Asking users to edit module.conf by hand is not a solution, just like
asking users to boot with a command line option is not a solution.
That sort of stuff is fine for a hobbyist OS intended only for techie
geeks.  It's not fine for Linux.

If you would give a concrete example of the ACPI namespace info and
device config, hotplug sequence, etc., required to show the problem,
we could have a useful discussion about ways to fix it.  But if all
you have is FUD about "this might break and users won't have the
ability to edit modules.conf," that doesn't help me see why this patch
is a bad idea.

>> I don't know why "adding a command line switch" appeals to you as the
>> solution to every problem.  As far as I'm concerned that's not a
>> solution to ANY problem.  It might be a band-aid to enable users to
>> limp along while we figure out a correct solution, but it's certainly
>> not a resolution.
>
> Looks like you want to remove command line support, right ?
>
> Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] e1000e: fix resuming from runtime-suspend

2013-01-28 Thread Bjorn Helgaas

[+cc Rafael, author of patch you cited]

On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
 wrote:
> Bug was introduced in commit 23606cf5d1192c2b17912cb2ef6e62f9b11de133
> ("e1000e / PCI / PM: Add basic runtime PM support (rev. 4)") in v2.6.35
>
> Signed-off-by: Konstantin Khlebnikov 
> Cc: e1000-de...@lists.sourceforge.net
> Cc: Jeff Kirsher 
> Cc: Bruce Allan 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index fbf75fd..2853c11 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5691,14 +5691,17 @@ static int e1000_runtime_suspend(struct device *dev)
> struct pci_dev *pdev = to_pci_dev(dev);
> struct net_device *netdev = pci_get_drvdata(pdev);
> struct e1000_adapter *adapter = netdev_priv(netdev);
> +   int retval;
> +   bool wake;
>
> -   if (e1000e_pm_ready(adapter)) {
> -   bool wake;
> +   if (!e1000e_pm_ready(adapter))
> +   return 0;
>
> -   __e1000_shutdown(pdev, &wake, true);
> -   }
> +   retval = __e1000_shutdown(pdev, &wake, true);
> +   if (!retval)
> +   e1000_power_off(pdev, true, wake);
>
> -   return 0;
> +   return retval;
>  }
>
>  static int e1000_idle(struct device *dev)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] e1000e: fix pci device enable counter balance

2013-01-28 Thread Bjorn Helgaas

[+cc Rafael]

On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
 wrote:
> __e1000_shutdown() calls pci_disable_device() at the end, thus 
> __e1000_resume()
> should call pci_enable_device_mem() to keep enable counter in balance.
>
> Bug was introduced in commit 23606cf5d1192c2b17912cb2ef6e62f9b11de133
> ("e1000e / PCI / PM: Add basic runtime PM support (rev. 4)") in v2.6.35
>
> Signed-off-by: Konstantin Khlebnikov 
> Cc: e1000-de...@lists.sourceforge.net
> Cc: Jeff Kirsher 
> Cc: Bruce Allan 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 2853c11..6bce796 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5598,6 +5598,13 @@ static int __e1000_resume(struct pci_dev *pdev)
> pci_restore_state(pdev);
> pci_save_state(pdev);
>
> +   err = pci_enable_device_mem(pdev);
> +   if (err) {
> +   dev_err(&pdev->dev,
> +   "Cannot re-enable PCI device after suspend.\n");
> +   return err;
> +   }

Reviewed-by: Bjorn Helgaas 

Seems right to me, and the other users I looked at (igb, azx,
virtio_pci) call pci_disable_device() in .suspend() and call
pci_enable_device() in .resume() as you propose to do here.

I assume the e1000 folks will handle this patch (and the previous one).

> +
> e1000e_set_interrupt_capability(adapter);
> if (netif_running(netdev)) {
> err = e1000_request_irq(adapter);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] PCI: revert preparing for wakeup in runtime-suspend finalization

2013-01-28 Thread Bjorn Helgaas

[+cc Rafael]

On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
 wrote:
> This patch effectively reverts commit 42eca2302146fed51335b95128e949ee6f54478f
> ("PCI: Don't touch card regs after runtime suspend D3")
>
> | This patch checks whether the pci state is saved and doesn't attempt to hit
> | any registers after that point if it is.
>
> This seems completely wrong. Yes, PCI configuration space has been saved by
> driver, but this doesn't means that all job is done and device has been
> suspended and ready for waking up in the future.
>
> For example driver e1000e for ethernet in my thinkpad x220 saves pci-state
> but device cannot wakeup after that, because it needs some ACPI callbacks
> which usually called from pci_finish_runtime_suspend().
>
> | Optimus (dual-gpu) laptops seem to have their own form of D3cold, but
> | unfortunately enter it on normal D3 transitions via the ACPI callback.
>
> Hardware which disappears from the bus unexpectedly is exception, so let's
> handle it as an exception. Its driver should set device state to D3cold and
> the rest code will handle it properly.

Functions in D3cold don't have power, so it's completely expected that
they would disappear from the bus and not respond to config accesses.
Maybe Dave was referring to D3hot, where functions *should* respond to
config accesses.  I dunno.

Just to be clear, it sounds like 42eca230 caused a regression on your
e1000e device?  If so, I guess we should revert it unless you and Dave
can figure out a better patch that fixes both your e1000e device and
the Optimus issue.

> Signed-off-by: Konstantin Khlebnikov 
> Cc: linux-...@vger.kernel.org
> Cc: Bjorn Helgaas 
> Cc: Dave Airlie 
> ---
>  drivers/pci/pci-driver.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index f79cbcd..030dbf0 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1005,10 +1005,11 @@ static int pci_pm_runtime_suspend(struct device *dev)
> return 0;
> }
>
> -   if (!pci_dev->state_saved) {
> +   if (!pci_dev->state_saved)
> pci_save_state(pci_dev);
> +
> +   if (pci_dev->current_state != PCI_D3cold)
> pci_finish_runtime_suspend(pci_dev);
> -   }
>
> return 0;
>  }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] PCI: don't touch enable_cnt in pci_device_shutdown()

2013-01-28 Thread Bjorn Helgaas

On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
 wrote:
> comment in commit b566a22c23327f18ce941ffad0ca907e50a53d41
> ("PCI: disable Bus Master on PCI device shutdown") says:
>
> | Disable Bus Master bit on the device in pci_device_shutdown() to ensure PCI
> | devices do not continue to DMA data after shutdown.  This can cause memory
> | corruption in case of a kexec where the current kernel shuts down and
> | transfers control to a new kernel while a PCI device continues to DMA to
> | memory that does not belong to it any more in the new kernel.
>
> Seems like pci_clear_master() must be used here instead of 
> pci_disable_device(),
> because it disables Bus Muster unconditionally and doesn't changes enable_cnt.
>
> Signed-off-by: Konstantin Khlebnikov 
> Cc: linux-...@vger.kernel.org
> Cc: Bjorn Helgaas 
> Cc: Khalid Aziz 
> ---
>  drivers/pci/pci-driver.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 030dbf0..853d605 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -392,7 +392,7 @@ static void pci_device_shutdown(struct device *dev)
>  * Turn off Bus Master bit on the device to tell it to not
>  * continue to do DMA
>  */
> -   pci_disable_device(pci_dev);
> +   pci_clear_master(pci_dev);

We currently only call pci_enable_device() and pci_disable_device()
from drivers, and I think that's a nice division that's worth keeping.
 It keeps the core's mitts off device operation and helps preserve the
enable_cnt integrity.  So I like this change from that perspective.

Any objections to this, Khalid?

>  }
>
>  #ifdef CONFIG_PM
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] PCI: don't touch enable_cnt in pci_device_shutdown()

2013-01-28 Thread Bjorn Helgaas

[+cc Khalid new email addr]

On Mon, Jan 28, 2013 at 4:40 PM, Bjorn Helgaas  wrote:
> On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
>  wrote:
>> comment in commit b566a22c23327f18ce941ffad0ca907e50a53d41
>> ("PCI: disable Bus Master on PCI device shutdown") says:
>>
>> | Disable Bus Master bit on the device in pci_device_shutdown() to ensure PCI
>> | devices do not continue to DMA data after shutdown.  This can cause memory
>> | corruption in case of a kexec where the current kernel shuts down and
>> | transfers control to a new kernel while a PCI device continues to DMA to
>> | memory that does not belong to it any more in the new kernel.
>>
>> Seems like pci_clear_master() must be used here instead of 
>> pci_disable_device(),
>> because it disables Bus Muster unconditionally and doesn't changes 
>> enable_cnt.
>>
>> Signed-off-by: Konstantin Khlebnikov 
>> Cc: linux-...@vger.kernel.org
>> Cc: Bjorn Helgaas 
>> Cc: Khalid Aziz 
>> ---
>>  drivers/pci/pci-driver.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 030dbf0..853d605 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -392,7 +392,7 @@ static void pci_device_shutdown(struct device *dev)
>>  * Turn off Bus Master bit on the device to tell it to not
>>  * continue to do DMA
>>  */
>> -   pci_disable_device(pci_dev);
>> +   pci_clear_master(pci_dev);
>
> We currently only call pci_enable_device() and pci_disable_device()
> from drivers, and I think that's a nice division that's worth keeping.
>  It keeps the core's mitts off device operation and helps preserve the
> enable_cnt integrity.  So I like this change from that perspective.
>
> Any objections to this, Khalid?
>
>>  }
>>
>>  #ifdef CONFIG_PM
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] e1000e: fix pci device enable counter balance

2013-01-28 Thread Bjorn Helgaas

[+cc Rafael @sisk.pl]

On Mon, Jan 28, 2013 at 4:09 PM, Bjorn Helgaas  wrote:
> [+cc Rafael]
>
> On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
>  wrote:
>> __e1000_shutdown() calls pci_disable_device() at the end, thus 
>> __e1000_resume()
>> should call pci_enable_device_mem() to keep enable counter in balance.
>>
>> Bug was introduced in commit 23606cf5d1192c2b17912cb2ef6e62f9b11de133
>> ("e1000e / PCI / PM: Add basic runtime PM support (rev. 4)") in v2.6.35
>>
>> Signed-off-by: Konstantin Khlebnikov 
>> Cc: e1000-de...@lists.sourceforge.net
>> Cc: Jeff Kirsher 
>> Cc: Bruce Allan 
>> ---
>>  drivers/net/ethernet/intel/e1000e/netdev.c |7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index 2853c11..6bce796 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -5598,6 +5598,13 @@ static int __e1000_resume(struct pci_dev *pdev)
>> pci_restore_state(pdev);
>> pci_save_state(pdev);
>>
>> +   err = pci_enable_device_mem(pdev);
>> +   if (err) {
>> +   dev_err(&pdev->dev,
>> +   "Cannot re-enable PCI device after suspend.\n");
>> +   return err;
>> +   }
>
> Reviewed-by: Bjorn Helgaas 
>
> Seems right to me, and the other users I looked at (igb, azx,
> virtio_pci) call pci_disable_device() in .suspend() and call
> pci_enable_device() in .resume() as you propose to do here.
>
> I assume the e1000 folks will handle this patch (and the previous one).
>
>> +
>> e1000e_set_interrupt_capability(adapter);
>> if (netif_running(netdev)) {
>> err = e1000_request_irq(adapter);
>>

I'm still missing something.  In your original report
(https://lkml.org/lkml/2013/1/1/25), you noticed that "enable_cnt ==
0" immediately after boot, after e1000e had claimed the device:

> Right after boot it looks like this:
>
> root@zurg:/sys/bus/pci/devices# lspci
> ...
> 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network 
> Connection (rev 04)
> ...
> root@zurg:/sys/bus/pci/devices# cat \:00\:19.0/enable
> 0
> here must be '1', not '0'

But these patches only change the e1000e suspend/resume path.  How
could they change the enable_cnt before you've even done a suspend?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v5 4/8] ACPI, PCI: avoid building pci_slot as module

2013-01-28 Thread Bjorn Helgaas

On Mon, Jan 28, 2013 at 7:45 PM, Jiang Liu  wrote:
> On 2013-1-29 10:21, Yinghai Lu wrote:
>> On Mon, Jan 28, 2013 at 6:07 PM, Jiang Liu  wrote:
>>> Could we use quirk to auto-disable PCIe native hotplug for
>>> problematic platforms?
>>
>> that is some kind of boot command line way?
> We could negotiate between acpiphp and pciehp if those problematic
> platforms/chipsets could be identified by DMI info or PCI device ID.

That's a sub-optimal solution because it requires us to identify all
the affected systems and also may require ongoing maintenance in the
future.  If we could get specifics on the issue, we might be able to
design a better solution that works for everybody.

(BTW, this thread is specifically about pci_slot, which I think is a
slightly different issue than the acpiphp/pciehp conflicts.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

2013-01-29 Thread Bjorn Helgaas

On Tue, Jan 29, 2013 at 1:28 PM, Borislav Petkov  wrote:
> Hi,
>
> this is rc5 + tip/master from 2 days ago, when resuming I get this fun
> message:
>
> ...
> [15117.684975] Restarting tasks ... done.
> [15117.687201] video LNXVIDEO:00: Restoring backlight state
> [15117.720469] ehci-pci :00:1d.0: power state changed by ACPI to D3cold
> [15117.721414] ehci-pci :00:1a.0: power state changed by ACPI to D3cold
> [15117.949185] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
> [15118.617192] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow 
> Control: Rx/Tx
> [15118.617198] e1000e :00:19.0 eth0: 10/100 speed: disabling TSO
> [15123.971346] Uhhuh. NMI received for unknown reason 2c on CPU 0.
> [15123.971353] Do you have a strange power saving mode enabled?
> [15123.971356] Dazed and confused, but trying to continue
>
> Machine is thinkpad x230. Any and all sensible suggestions are welcome.

Konstantin has some fixes for an e1000e power management issue related
to suspend/resume that he observed on an x220.  He didn't see an NMI,
and apparently his problem has been around for a long time, so no idea
whether it could be related.  I just noticed the conjunction of
thinkpad/e1000e/resume/power saving in both reports.

https://lkml.org/lkml/2013/1/18/147
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MAINTAINERS: update SGI & ia64 Altix stuff

2013-01-29 Thread Bjorn Helgaas

Dimitri and Robin have taken over GRU maintenance.

Linux on Altix is no longer maintained except as part of ia64, and
there's already a separate IA64 maintainer entry.

Signed-off-by: Bjorn Helgaas 
---
 MAINTAINERS |   11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index fa309ab..0526e21 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6768,7 +6768,8 @@ S:Supported
 F: drivers/net/ethernet/sfc/
 
 SGI GRU DRIVER
-M: Jack Steiner 
+M: Dimitri Sivanich 
+M: Robin Holt 
 S: Maintained
 F: drivers/misc/sgi-gru/
 
@@ -6963,14 +6964,6 @@ L:   linux-fb...@vger.kernel.org
 S: Maintained
 F: drivers/video/smscufx.c
 
-SN-IA64 (Itanium) SUB-PLATFORM
-M: Jes Sorensen 
-L: linux-al...@sgi.com
-L: linux-i...@vger.kernel.org
-W: http://www.sgi.com/altix
-S: Maintained
-F: arch/ia64/sn/
-
 SOC-CAMERA V4L2 SUBSYSTEM
 M: Guennadi Liakhovetski 
 L: linux-me...@vger.kernel.org

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Driver core: treat unregistered bus_types as having no devices

2013-01-29 Thread Bjorn Helgaas

A bus_type has a list of devices (klist_devices), but the list and the
subsys_private structure that contains it are not initialized until the
bus_type is registered with bus_register().

The panic/reboot path has fixups that look up devices in pci_bus_type.  If
we panic before registering pci_bus_type, the bus_type exists but the list
does not, so mach_reboot_fixups() trips over a null pointer and panics
again:

mach_reboot_fixups
  pci_get_device
..
  bus_find_device(&pci_bus_type, ...)
bus->p is NULL

Signed-off-by: Bjorn Helgaas 
---
 drivers/base/bus.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 24eb078..6856303 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -290,7 +290,7 @@ int bus_for_each_dev(struct bus_type *bus, struct device 
*start,
struct device *dev;
int error = 0;
 
-   if (!bus)
+   if (!bus || !bus->p)
return -EINVAL;
 
klist_iter_init_node(&bus->p->klist_devices, &i,
@@ -324,7 +324,7 @@ struct device *bus_find_device(struct bus_type *bus,
struct klist_iter i;
struct device *dev;
 
-   if (!bus)
+   if (!bus || !bus->p)
return NULL;
 
klist_iter_init_node(&bus->p->klist_devices, &i,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Driver core: treat unregistered bus_types as having no devices

2013-01-29 Thread Bjorn Helgaas

On Tue, Jan 29, 2013 at 4:44 PM, Bjorn Helgaas  wrote:
> A bus_type has a list of devices (klist_devices), but the list and the
> subsys_private structure that contains it are not initialized until the
> bus_type is registered with bus_register().
>
> The panic/reboot path has fixups that look up devices in pci_bus_type.  If
> we panic before registering pci_bus_type, the bus_type exists but the list
> does not, so mach_reboot_fixups() trips over a null pointer and panics
> again:
>
> mach_reboot_fixups
>   pci_get_device
> ..
>   bus_find_device(&pci_bus_type, ...)
>     bus->p is NULL
>
> Signed-off-by: Bjorn Helgaas 
> ---
>  drivers/base/bus.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 24eb078..6856303 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -290,7 +290,7 @@ int bus_for_each_dev(struct bus_type *bus, struct device 
> *start,
> struct device *dev;
> int error = 0;
>
> -   if (!bus)
> +   if (!bus || !bus->p)
> return -EINVAL;
>
> klist_iter_init_node(&bus->p->klist_devices, &i,
> @@ -324,7 +324,7 @@ struct device *bus_find_device(struct bus_type *bus,
> struct klist_iter i;
> struct device *dev;
>
> -   if (!bus)
> +   if (!bus || !bus->p)
> return NULL;
>
> klist_iter_init_node(&bus->p->klist_devices, &i,
>

Sorry, I meant to include this in the original post:

Joonsoo reported a problem when panicking before PCI was initialized.
I think this patch should be sufficient to replace the patch he posted
here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
reboot_fixups in early boot phase")
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Driver core: treat unregistered bus_types as having no devices

2013-01-30 Thread Bjorn Helgaas

On Wed, Jan 30, 2013 at 1:09 AM, Greg Kroah-Hartman
 wrote:
> On Tue, Jan 29, 2013 at 04:47:13PM -0700, Bjorn Helgaas wrote:
>> On Tue, Jan 29, 2013 at 4:44 PM, Bjorn Helgaas  wrote:
>> > A bus_type has a list of devices (klist_devices), but the list and the
>> > subsys_private structure that contains it are not initialized until the
>> > bus_type is registered with bus_register().
>> >
>> > The panic/reboot path has fixups that look up devices in pci_bus_type.  If
>> > we panic before registering pci_bus_type, the bus_type exists but the list
>> > does not, so mach_reboot_fixups() trips over a null pointer and panics
>> > again:
>> >
>> > mach_reboot_fixups
>> >   pci_get_device
>> >     ..
>> >   bus_find_device(&pci_bus_type, ...)
>> > bus->p is NULL
>> >
>> > Signed-off-by: Bjorn Helgaas 
>> > ---
>> >  drivers/base/bus.c |4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/base/bus.c b/drivers/base/bus.c
>> > index 24eb078..6856303 100644
>> > --- a/drivers/base/bus.c
>> > +++ b/drivers/base/bus.c
>> > @@ -290,7 +290,7 @@ int bus_for_each_dev(struct bus_type *bus, struct 
>> > device *start,
>> > struct device *dev;
>> > int error = 0;
>> >
>> > -   if (!bus)
>> > +   if (!bus || !bus->p)
>> > return -EINVAL;
>> >
>> > klist_iter_init_node(&bus->p->klist_devices, &i,
>> > @@ -324,7 +324,7 @@ struct device *bus_find_device(struct bus_type *bus,
>> > struct klist_iter i;
>> > struct device *dev;
>> >
>> > -   if (!bus)
>> > +   if (!bus || !bus->p)
>> > return NULL;
>> >
>> > klist_iter_init_node(&bus->p->klist_devices, &i,
>> >
>>
>> Sorry, I meant to include this in the original post:
>>
>> Joonsoo reported a problem when panicking before PCI was initialized.
>> I think this patch should be sufficient to replace the patch he posted
>> here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
>> reboot_fixups in early boot phase")
>
> Don't you think that should be applied as well?  That makes things a bit
> more explicit in the boot process as to exactly what is going on.  The
> driver core changes is good to have if other people mess things up like
> this in the future, don't rely on it to protect you from foolish things.

mach_reboot_fixups() is not part of the boot process at all.  That's
why it seems ugly to me to clutter it with a system_state check.  I
don't really think of it as having messed up by calling
pci_get_device() without checking system_state.  Certainly we don't
expect *every* caller to check system_state, and it's not trivial to
determine at every call site whether pci_bus has been registered or
not.

But if anybody wants to apply the https://lkml.org/lkml/2012/12/28/75
too, it's OK with me.

Tangential question ... I was curious as to why bus_for_each_dev() and
bus_find_device() (and other driver core things) check for "!bus" in
the first place.  For all the callers I looked at, it's obvious at
compile-time that bus can't be NULL.  The question of whether "bus->p"
is null can't be answered at compile-time because it depends on
whether the bus has been registered.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Uhhuh. NMI received for unknown reason 2c on CPU 0.

2013-01-30 Thread Bjorn Helgaas

On Tue, Jan 29, 2013 at 8:42 PM, Borislav Petkov  wrote:
> On Tue, Jan 29, 2013 at 02:32:56PM -0700, Bjorn Helgaas wrote:
>> Konstantin has some fixes for an e1000e power management issue related
>> to suspend/resume that he observed on an x220. He didn't see an NMI,
>> and apparently his problem has been around for a long time,
>
> Yeah, this is one of those issues you don't see *every* s/r cycle and
> besides, I just got this box and haven't run 3.{6,7} on it yet (maybe
> never will :-)).
>
>> so no idea whether it could be related. I just noticed the conjunction
>> of thinkpad/e1000e/resume/power saving in both reports.
>>
>> https://lkml.org/lkml/2013/1/18/147
>
> Yes, thanks Bjorn, that was a good suggestion. Btw, from reading the
> thread, those patches still need cooking a bit more, AFAICR people's
> objections/comments. Or should I go ahead and test them?

You're right, I don't think we're quite ready to merge those patches.
But if your NMI is easy to reproduce, it might be worth removing
e1000e altogether to see if it still happens.  I noticed in your
original log that the NMI occurred 5 seconds after the e1000e message,
and I could imagine some CPU or PCI response timeout being 5 seconds.

> It's just that I'm overly cautious every time I hear e1000e is involved:
>
> www.linux-magazine.com/content/download/62169/484085/file/Security_Lessons_Ftrace.pdf

Thanks for the pointer, that was an interesting read :)

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 1/2] Implementation of pci_fixup_irqs for descendants of a specified bus

2013-01-30 Thread Bjorn Helgaas

On Fri, Jan 18, 2013 at 4:37 AM, Andrew Murray  wrote:
> Continuing from discussion with Thierry (lkml.org/lkml/2013/1/18/107) perhaps
> this will be useful to fold into your patchset.
> ---
> This patch provides pci_bus_fixup_irqs which performs the same
> function as pci_fixup_irqs but only to descendants of the specified
> bus.

I think pci_fixup_irqs() is a broken design to begin with because it
is only called once at boot-time and it iterates over all the devices
we've found so far.  Any hot-added devices never get the fixups.

Adding pci_bus_fixup_irqs() addresses part of that, by adding a way to
fixup a subset of devices, e.g., maybe it could be done for hot-added
things.

But I think it would be better to do a more extensive refactoring and
do the IRQ fixups directly somewhere in the pci_device_add() path.
That way we can do it a device at a time, for every device (present at
boot and hot-added later), and before any drivers claim the device.

> This can reduce unnecessary fixing up of device irqs when new buses
> are added.
>
> Signed-off-by: Andrew Murray 
> Signed-off-by: Liviu Dudau 
> ---
>  drivers/pci/setup-irq.c |   15 +++
>  include/linux/pci.h |3 +++
>  2 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/setup-irq.c b/drivers/pci/setup-irq.c
> index eb219a1..ea91874 100644
> --- a/drivers/pci/setup-irq.c
> +++ b/drivers/pci/setup-irq.c
> @@ -62,3 +62,18 @@ pci_fixup_irqs(u8 (*swizzle)(struct pci_dev *, u8 *),
> for_each_pci_dev(dev)
> pdev_fixup_irq(dev, swizzle, map_irq);
>  }
> +
> +void __init
> +pci_bus_fixup_irqs(struct pci_bus *bus,
> +   u8 (*swizzle)(struct pci_dev *, u8 *),
> +   int (*map_irq)(const struct pci_dev *, u8, u8))
> +{
> +   struct pci_dev *dev;
> +
> +   list_for_each_entry(dev, &bus->devices, bus_list) {
> +   pdev_fixup_irq(dev, swizzle, map_irq);
> +
> +   if (dev->subordinate)
> +   pci_bus_fixup_irqs(dev->subordinate, swizzle, 
> map_irq);
> +   }
> +}
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 5faa831..1b3c2eb 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -953,6 +953,9 @@ void pdev_enable_device(struct pci_dev *);
>  int pci_enable_resources(struct pci_dev *, int mask);
>  void pci_fixup_irqs(u8 (*)(struct pci_dev *, u8 *),
> int (*)(const struct pci_dev *, u8, u8));
> +void pci_bus_fixup_irqs(struct pci_bus *bus,
> +   u8 (*swizzle)(struct pci_dev *, u8 *),
> +   int (*map_irq)(const struct pci_dev *, u8, u8));
>  #define HAVE_PCI_REQ_REGIONS   2
>  int __must_check pci_request_regions(struct pci_dev *, const char *);
>  int __must_check pci_request_regions_exclusive(struct pci_dev *, const char 
> *);
> --
> 1.7.0.4
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Extend interfaces to access PCIe capabilities registers

2013-01-30 Thread Bjorn Helgaas

On Fri, Jan 25, 2013 at 5:55 PM, Myron Stowe  wrote:
> This series is a minor extension to Jiang Liu's recent efforts - [PATCH v3
> 00/32] provide interfaces to access PCIe capabilities registers - which
> adds an additional PCI Express accessor for obtaining a device's
> Capabilities Register.
>
> Reference: https://lkml.org/lkml/2012/8/1/253
> ---
>
> Myron Stowe (2):
>   PCI: Use PCI Express Capability accessors
>   PCI: introduce accessor to retrieve PCIe Capabilities Register
>
>
>  drivers/pci/access.c|4 ++--
>  drivers/pci/pcie/portdrv_core.c |2 +-
>  include/linux/pci.h |   11 ++-
>  3 files changed, 13 insertions(+), 4 deletions(-)

I applied these to pci/misc for v3.9.  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCI: pciehp: hide ENTRY messages behind ctrl_dbg()

2013-01-31 Thread Bjorn Helgaas

[+cc Rafael]

On Mon, Jan 28, 2013 at 3:00 AM, Paul Bolle  wrote:
> In each suspend and resume cycle my laptop prints these messages at
> KERN_INFO level:
> pciehp :00:1c.1:pcie04: pciehp_suspend ENTRY
> pciehp :00:1c.0:pcie04: pciehp_suspend ENTRY
>
> and
> pciehp :00:1c.0:pcie04: pciehp_resume ENTRY
> pciehp :00:1c.1:pcie04: pciehp_resume ENTRY
>
> Messages like these are probably only useful while debugging pciehp, so
> let's hide them behind the ctrl_dbg() macro.
>
> Signed-off-by: Paul Bolle 
> ---
> 0) Compile tested only!
>
> 1) Perhaps a better solution is to drop these messages entirely.
>
>  drivers/pci/hotplug/pciehp_core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_core.c 
> b/drivers/pci/hotplug/pciehp_core.c
> index 939bd1d..ed1702d 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -293,7 +293,8 @@ static void pciehp_remove(struct pcie_device *dev)
>  #ifdef CONFIG_PM
>  static int pciehp_suspend (struct pcie_device *dev)
>  {
> -   dev_info(&dev->device, "%s ENTRY\n", __func__);
> +   struct controller *ctrl = get_service_data(dev);
> +   ctrl_dbg(ctrl, "%s ENTRY\n", __func__);
> return 0;

I'd propose to just drop these messages completely.  Rafael, would you
have any objection to that?

>  }
>
> @@ -303,8 +304,8 @@ static int pciehp_resume (struct pcie_device *dev)
> struct slot *slot;
> u8 status;
>
> -   dev_info(&dev->device, "%s ENTRY\n", __func__);
> ctrl = get_service_data(dev);
> +   ctrl_dbg(ctrl, "%s ENTRY\n", __func__);
>
> /* reinitialize the chipset's event detection logic */
> pcie_enable_notification(ctrl);
> --
> 1.7.11.7
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PCI: Document PCI Hotplug resource reservation parameters

2013-01-31 Thread Bjorn Helgaas

On Wed, Jan 23, 2013 at 5:29 AM, Yijing Wang  wrote:
> Document pci hotplug resource reservation parameters hpiosize
> and hpmemsize into Documentation/kernel-parameters.txt.
> The two parameters can override default hotplug io size(256 bytes)
> and default mem size(2M).
>
> Signed-off-by: Yijing Wang 
> ---
>  Documentation/kernel-parameters.txt |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 1fb269b..46eff3d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2263,6 +2263,12 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
> the default.
> off: Turn ECRC off
> on: Turn ECRC on.
> +   hpiosize=nn[KMG]The fixed amount of bus space which is
> +   reserved for hotplug bridge's IO window.
> +   Default size is 256 bytes.
> +   hpmemsize=nn[KMG]   The fixed amount of bus space which is
> +   reserved for hotplug bridge's memory window.
> +   Default size is 2 megabytes.
> realloc=Enable/disable reallocating PCI bridge 
> resources
> if allocations done by BIOS are too small to
> accommodate resources required by all child
> --
> 1.7.1
>
>

Added to pci/misc for v3.9.  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] PCI: Document PCIE BUS MPS parameters

2013-01-31 Thread Bjorn Helgaas

On Tue, Jan 29, 2013 at 6:40 PM, Yijing Wang  wrote:
> v0->v1: Update MPS parameters as non-arch and add MRRS
> description into pcie_bus_perf parameter suggested
> by Andrew Murray.
> v1->v2: Update some semantic problems and add MPS and MRRS
> explanation suggested by Joe Lawrence and Randy Dunlap.
> v2->v3: Update some semantic problems and the description
> of pcie_bus_safe and pcie_bus_peer2peer suggested
> by Bjorn Helgaas.
> v3->v4: Update pcie_bus_safe description suggested by Jon Mason
>
> Document PCIE BUS MPS parameters pcie_bus_tune_off, pcie_bus_safe,
> pcie_bus_peer2peer, pcie_bus_perf into Documentation/kernel-parameters.txt.
> These parameters were introduced by Jon Mason  at
> commit 5f39e6705 and commit b03e7495a8.
>
> Signed-off-by: Yijing Wang 
> ---
>  Documentation/kernel-parameters.txt |   14 ++
>  1 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 363e348..6ad9b95 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2227,6 +2227,20 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
> This sorting is done to get a device
> order compatible with older (<= 2.4) kernels.
> nobfsortDon't sort PCI devices into breadth-first 
> order.
> +   pcie_bus_tune_off   Disable PCIe MPS (Max Payload Size)
> +   tuning and use the BIOS-configured MPS 
> defaults.
> +   pcie_bus_safe   Set every device's MPS to the largest MPSS
> +   (Max Payload Size Support) common to all 
> devices
> +   below the root complex.
> +   pcie_bus_perf   Configure device MPS to the largest
> +   allowable MPS based on its parent bus. Also 
> set
> +   MRRS (Max Read Request Size) to the largest 
> supported
> +   value (no larger than the MPS that the device 
> or bus
> +   can support) for best performance.
> +   pcie_bus_peer2peer  Set every device's MPS to 128B, which
> +   every device is guaranteed to support. This
> +   configuration allows peer-to-peer DMA between 
> any pair
> +   of devices possibly at the cost of reduced 
> performance.
> cbiosize=nn[KMG]The fixed amount of bus space which is
> reserved for the CardBus bridge's IO window.
> The default value is 256 bytes.

Added to pci/misc for v3.9 with the changes below.  If we need to
change it more, let me know and I'll amend that branch.

I tweaked the pcie_bus_safe text slightly.

The hot-plug case was mentioned in your early patches, but not in the
v4 patch.  I think that's an important use case and we should provide
some guidance if we can.  I know you have some patches in the works
related to this, but as far as I can tell, the only want to ensure
that hot-added devices will work correctly is to use
pcie_bus_peer2peer so every MPS is set to 128.

The reason I think that is because in the pciehp_configure_device()
path, where we configure MPS for newly-added devices, there's nothing
that will change the MPS of any upstream bridges.  So we could have a
root port with MPS=256 and we never change that, even if the new
device only supports MPS=128.

Hopefully we can make this smarter in the future, but the
documentation should reflect the behavior of the current code.

So let me know if any of the below is inaccurate.

+   pcie_bus_safe   Set every device's MPS to the largest value
+   supported by all devices below the root complex.

+   pcie_bus_peer2peer  Set every device's MPS to 128B, which
+   every device is guaranteed to support. This
+   configuration allows peer-to-peer DMA between
+   any pair of devices, possibly at the cost of
+   reduced performance.  This also guarantees
+   that hot-added devices will work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices

2013-02-01 Thread Bjorn Helgaas

On Fri, Feb 1, 2013 at 12:55 PM, Joe Lawrence  wrote:
> On Thu, 31 Jan 2013, Myron Stowe wrote:

>> PCI: ASPM exit link state code is skipping devices
>>
>> From: Myron Stowe 
>>
>> On PCI bus hotplug removal, 'pcie_aspm_exit_link_state' can potentially
>> skip parent devices that have link_state allocated.  Instead of exiting
>> early if a given device in not PCIe, check whether or not the device's
>> parent has link_state allocated.  This enables 'pcie_aspm_exit_link_state'
>> to properly clean up parent link_state when the last function in a slot
>> might not be PCIe.
>>
>> Reported-by: Joe Lawrence 
>> Signed-off-by: Myron Stowe 
>> ---
>>
>>  drivers/pci/pcie/aspm.c |5 +
>>  1 files changed, 1 insertions(+), 4 deletions(-)
>>
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 8474b6a..aa52727 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -634,10 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>   struct pci_dev *parent = pdev->bus->self;
>>   struct pcie_link_state *link, *root, *parent_link;
>>
>> - if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
>> - return;
>> - if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>> - (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
>> + if (!parent || !parent->link_state)
>>   return;
>>
>>   down_read(&pci_bus_sem);
>>
>>
>>
>
> Hi Myron,
>
> The patch tests eliminates the reported crash when hotplugging our
> problem PCI bus.
>
> The additional checks removed from pcie_aspm_exit_link_state (ie, the
> simplification) do appear to be unnecessary as pcie_aspm_init_link_state
> is already preventing such devices from allocating link state.

I added a Tested-by for you, Joe, and applied this to my pci/joe-aspm
branch for v3.9.  Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v5 0/8] introduce PCI bus notifier chain to get rid of the ACPI PCI subdriver interfaces

2013-02-01 Thread Bjorn Helgaas

On Fri, Feb 1, 2013 at 9:13 AM, Jiang Liu  wrote:
> On 01/29/2013 10:04 AM, Jiang Liu wrote:
>> On 2013-1-29 8:34, Rafael J. Wysocki wrote:
>>> On Monday, January 28, 2013 01:56:33 PM Bjorn Helgaas wrote:
>>>> On Fri, Jan 18, 2013 at 9:07 AM, Jiang Liu  wrote:
>>>>> This is an RFC patchset to address review comments in thread at:
>>>>> https://patchwork.kernel.org/patch/1946851/. The patch just pasts
>>>>> compilation. If no objection to the new implementation, I will
>>>>> go on to modify acpiphp driver and conduct tests.
>>>>>
>>>>> The main changes from V4 to V5 includes:
>>>>> 1) introduce a dedicated notifier chain for PCI buses
>>>>> 2) change pci_slot as built-in driver
>>>>> 3) unify the way to create/destroy PCI slots
>>>>> 4) introduce a kernel option to disable PCIe native hotplug
>>>>>
>>>>> TODO:
>>>>> 1) change acpiphp as built-in and unify the way to create/destroy ACPI
>>>>>based hotplug slots.
>>>>> 2) change other ACPI PCI subdriver in Yinghai's root bridge hotplug series
>>>>>to use the PCI bus notifier chain.
>>>>> 3) Remove the ACPI PCI subdriver interface eventaully.
>>>>>
>>>>> Jiang Liu (8):
>>>>>   PCI: make PCI device create/destroy logic symmetric
>>>>>   PCI: split registration of PCI bus devices into two stages
>>>>>   PCI: add a blocking notifier chain for PCI bus addition/removal
>>>>>   ACPI, PCI: avoid building pci_slot as module
>>>>>   PCI, ACPI: hook PCI bus notifications to create/destroy PCI slots
>>>>>   pci_slot: replace printk(KERN_xxx) with pr_xxx()
>>>>>   PCI/PCIe: add "pci=nopciehp" to disable PCIe native hotplug
>>>>>   PCI/PCIe: only claim PME from firmware when CONFIG_PCIE_PME is
>>>>> enabled
>>>>>
>>>>>  Documentation/kernel-parameters.txt |2 +
>>>>>  drivers/acpi/Kconfig|5 +-
>>>>>  drivers/acpi/internal.h |5 +
>>>>>  drivers/acpi/pci_root.c |8 +-
>>>>>  drivers/acpi/pci_slot.c |  217 
>>>>> ++-
>>>>>  drivers/acpi/scan.c |1 +
>>>>>  drivers/pci/bus.c   |   26 -
>>>>>  drivers/pci/pci.c   |2 +
>>>>>  drivers/pci/pci.h   |1 +
>>>>>  drivers/pci/pcie/portdrv_core.c |7 +-
>>>>>  drivers/pci/pcie/portdrv_pci.c  |3 +
>>>>>  drivers/pci/probe.c |7 +-
>>>>>  drivers/pci/remove.c|   15 +--
>>>>>  include/linux/pci.h |   21 
>>>>>  14 files changed, 142 insertions(+), 178 deletions(-)
>>>>
>>>> I think the problem we're trying to solve is that we don't initialize
>>>> hot-added devices, correctly, e.g., we don't set up AER, we don't
>>>> update acpi/pci_slot stuff, we probably don't set up PME etc.  We also
>>>> have similar issues like IOMMU init on powerpc.
>>>>
>>>> Notifier chains seem like an unnecessarily complicated way to deal
>>>> with this.  They're great for communicating between modules that stay
>>>> at arm's length from each other.  But that's not the case here --
>>>> everything is PCI and is quite closely coupled.  I think AER, PME,
>>>> slot, etc., should  be initialized directly in pci_device_add() or
>>>> somewhere nearby.
>>>
>>> I agree.
>>>
>>>> This might sound a bit radical because it implies some fairly
>>>> far-reaching changes.  It means this code can't be a module (the only
>>>> one that can be built as a module today is pciehp, and I think
>>>> everybody agrees that we should make it static as soon as we can
>>>> figure out the acpiphp/pciehp issue).  I think it also means the
>>>> pcieportdrv concept is of dubious value, since all the services should
>>>> be known at build-time and we probably don't need a registration
>>>> interface for them.
>>>
>>> It is of dubious value regardless.  It just adds complexity for no gain.
>>> Moreover, these things are in fact not mutually independent.
>>>
>>> I've had a lo

Re: [PATCH v2] PCI: pciehp: drop ENTRY messages

2013-02-01 Thread Bjorn Helgaas

On Thu, Jan 31, 2013 at 11:35 AM, Paul Bolle  wrote:
> In each suspend and resume cycle my laptop prints these messages at
> KERN_INFO level:
> pciehp :00:1c.1:pcie04: pciehp_suspend ENTRY
> pciehp :00:1c.0:pcie04: pciehp_suspend ENTRY
>
> and
> pciehp :00:1c.0:pcie04: pciehp_resume ENTRY
> pciehp :00:1c.1:pcie04: pciehp_resume ENTRY
>
> Drop these messages, that were probably used to debug the suspend and
> resume code, but now serve no purpose.
>
> Signed-off-by: Paul Bolle 

I applied this to pci/misc for v3.9.  Thanks!

Bjorn

> ---
> 0) v1 was called "PCI: pciehp: hide ENTRY messages behind ctrl_dbg()".
> But Bjorn and Rafael prefer to drop these messages completely.
>
> 1) Not even compile tested!
>
>  drivers/pci/hotplug/pciehp_core.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_core.c 
> b/drivers/pci/hotplug/pciehp_core.c
> index 939bd1d..7d72c5e 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -293,7 +293,6 @@ static void pciehp_remove(struct pcie_device *dev)
>  #ifdef CONFIG_PM
>  static int pciehp_suspend (struct pcie_device *dev)
>  {
> -   dev_info(&dev->device, "%s ENTRY\n", __func__);
> return 0;
>  }
>
> @@ -303,7 +302,6 @@ static int pciehp_resume (struct pcie_device *dev)
> struct slot *slot;
> u8 status;
>
> -   dev_info(&dev->device, "%s ENTRY\n", __func__);
> ctrl = get_service_data(dev);
>
> /* reinitialize the chipset's event detection logic */
> --
> 1.7.11.7
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] acpiphp: create companion ACPI devices before creating PCI devices

2013-02-01 Thread Bjorn Helgaas

On Wed, Jan 30, 2013 at 9:10 AM, Jiang Liu  wrote:
> From: Jiang Liu 
>
> With commit 4f535093cf8f6da8c "PCI: Put pci_dev in device tree as
> early as possible", companion ACPI devices should be created before
> creating correspoding PCI devices, otherwise it will break the ACPI
> PCI binding logic.
>
> Signed-off-by: Jiang Liu 
> ---
> Hi Bjorn,
> This patch set applies to your pci/yinghai-root-bus branch.
> There are still other potential bugs in the acpiphp driver under
> investigation. So I will send out these two first to catch up with
> the 3.9 merging window.
> Thanks!
> Gerry

I assume this fixes some sort of user-visible issue with acpiphp.
What failure does the user see?

> ---
>  drivers/pci/hotplug/acpiphp_glue.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
> b/drivers/pci/hotplug/acpiphp_glue.c
> index bd784ff..acb7af2 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -839,6 +839,9 @@ static int __ref enable_device(struct acpiphp_slot *slot)
> if (slot->flags & SLOT_ENABLED)
> goto err_exit;
>
> +   list_for_each_entry(func, &slot->funcs, sibling)
> +   acpiphp_bus_add(func);
> +
> num = pci_scan_slot(bus, PCI_DEVFN(slot->device, 0));
> if (num == 0) {
> /* Maybe only part of funcs are added. */
> @@ -862,9 +865,6 @@ static int __ref enable_device(struct acpiphp_slot *slot)
> }
> }
>
> -   list_for_each_entry(func, &slot->funcs, sibling)
> -   acpiphp_bus_add(func);
> -
> pci_bus_assign_resources(bus);
> acpiphp_sanitize_bus(bus);
> acpiphp_set_hpp_values(bus);
> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] acpiphp: create companion ACPI devices before creating PCI devices

2013-02-01 Thread Bjorn Helgaas

On Fri, Feb 1, 2013 at 4:06 PM, Bjorn Helgaas  wrote:
> On Wed, Jan 30, 2013 at 9:10 AM, Jiang Liu  wrote:
>> From: Jiang Liu 
>>
>> With commit 4f535093cf8f6da8c "PCI: Put pci_dev in device tree as
>> early as possible", companion ACPI devices should be created before
>> creating correspoding PCI devices, otherwise it will break the ACPI
>> PCI binding logic.
>>
>> Signed-off-by: Jiang Liu 

Applied to pci/yinghai-root-bus for v3.9.  Thanks!

>> ---
>> Hi Bjorn,
>> This patch set applies to your pci/yinghai-root-bus branch.
>> There are still other potential bugs in the acpiphp driver under
>> investigation. So I will send out these two first to catch up with
>> the 3.9 merging window.
>> Thanks!
>> Gerry
>
> I assume this fixes some sort of user-visible issue with acpiphp.
> What failure does the user see?
>
>> ---
>>  drivers/pci/hotplug/acpiphp_glue.c |6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
>> b/drivers/pci/hotplug/acpiphp_glue.c
>> index bd784ff..acb7af2 100644
>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>> @@ -839,6 +839,9 @@ static int __ref enable_device(struct acpiphp_slot *slot)
>> if (slot->flags & SLOT_ENABLED)
>> goto err_exit;
>>
>> +   list_for_each_entry(func, &slot->funcs, sibling)
>> +   acpiphp_bus_add(func);
>> +
>> num = pci_scan_slot(bus, PCI_DEVFN(slot->device, 0));
>> if (num == 0) {
>> /* Maybe only part of funcs are added. */
>> @@ -862,9 +865,6 @@ static int __ref enable_device(struct acpiphp_slot *slot)
>> }
>> }
>>
>> -   list_for_each_entry(func, &slot->funcs, sibling)
>> -   acpiphp_bus_add(func);
>> -
>> pci_bus_assign_resources(bus);
>> acpiphp_sanitize_bus(bus);
>> acpiphp_set_hpp_values(bus);
>> --
>> 1.7.9.5
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] acpiphp: remove dead code for PCI host bridge hotplug

2013-02-01 Thread Bjorn Helgaas

On Wed, Jan 30, 2013 at 9:10 AM, Jiang Liu  wrote:
> From: Jiang Liu 
>
> Commit 668192b678201d2fff27c "PCI: acpiphp: Move host bridge hotplug
> to pci_root.c" has moved PCI host bridge hotplug logic from acpiphp
> to pci_root, but there is still PCI host bridge hotplug related
> dead code left in acpiphp. So remove those dead code.
>
> Now companion ACPI devices are always created before corresponding
> PCI devices. And the ACPI event handle_hotplug_event_bridge() will be
> installed only if it has associated PCI device. So remove dead code to
> handle bridge hot-adding in function handle_hotplug_event_bridge().
>
> Signed-off-by: Jiang Liu 

Applied to pci/yinghai-root-bus-hotplug for v3.9.  Thanks!

> ---
>  drivers/pci/hotplug/acpiphp.h  |   13 +---
>  drivers/pci/hotplug/acpiphp_glue.c |  124 
> 
>  2 files changed, 14 insertions(+), 123 deletions(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp.h b/drivers/pci/hotplug/acpiphp.h
> index b3ead7a..b70ac00 100644
> --- a/drivers/pci/hotplug/acpiphp.h
> +++ b/drivers/pci/hotplug/acpiphp.h
> @@ -79,7 +79,6 @@ struct acpiphp_bridge {
> /* Ejectable PCI-to-PCI bridge (PCI bridge and PCI function) */
> struct acpiphp_func *func;
>
> -   int type;
> int nr_slots;
>
> u32 flags;
> @@ -146,10 +145,6 @@ struct acpiphp_attention_info
>  /* PCI bus bridge HID */
>  #define ACPI_PCI_HOST_HID  "PNP0A03"
>
> -/* PCI BRIDGE type */
> -#define BRIDGE_TYPE_HOST   0
> -#define BRIDGE_TYPE_P2P1
> -
>  /* ACPI _STA method value (ignore bit 4; battery present) */
>  #define ACPI_STA_PRESENT   (0x0001)
>  #define ACPI_STA_ENABLED   (0x0002)
> @@ -158,13 +153,7 @@ struct acpiphp_attention_info
>  #define ACPI_STA_ALL   (0x000f)
>
>  /* bridge flags */
> -#define BRIDGE_HAS_STA (0x0001)
> -#define BRIDGE_HAS_EJ0 (0x0002)
> -#define BRIDGE_HAS_HPP (0x0004)
> -#define BRIDGE_HAS_PS0 (0x0010)
> -#define BRIDGE_HAS_PS1 (0x0020)
> -#define BRIDGE_HAS_PS2 (0x0040)
> -#define BRIDGE_HAS_PS3 (0x0080)
> +#define BRIDGE_HAS_EJ0 (0x0001)
>
>  /* slot flags */
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
> b/drivers/pci/hotplug/acpiphp_glue.c
> index acb7af2..4681d2c 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -325,8 +325,8 @@ static void init_bridge_misc(struct acpiphp_bridge 
> *bridge)
> return;
> }
>
> -   /* install notify handler */
> -   if (bridge->type != BRIDGE_TYPE_HOST) {
> +   /* install notify handler for P2P bridges */
> +   if (!pci_is_root_bus(bridge->pci_bus)) {
> if ((bridge->flags & BRIDGE_HAS_EJ0) && bridge->func) {
> status = 
> acpi_remove_notify_handler(bridge->func->handle,
> ACPI_SYSTEM_NOTIFY,
> @@ -369,27 +369,12 @@ static struct acpiphp_func 
> *acpiphp_bridge_handle_to_function(acpi_handle handle
>  static inline void config_p2p_bridge_flags(struct acpiphp_bridge *bridge)
>  {
> acpi_handle dummy_handle;
> +   struct acpiphp_func *func;
>
> if (ACPI_SUCCESS(acpi_get_handle(bridge->handle,
> -   "_STA", &dummy_handle)))
> -   bridge->flags |= BRIDGE_HAS_STA;
> -
> -   if (ACPI_SUCCESS(acpi_get_handle(bridge->handle,
> -   "_EJ0", &dummy_handle)))
> +   "_EJ0", &dummy_handle))) {
> bridge->flags |= BRIDGE_HAS_EJ0;
>
> -   if (ACPI_SUCCESS(acpi_get_handle(bridge->handle,
> -   "_PS0", &dummy_handle)))
> -   bridge->flags |= BRIDGE_HAS_PS0;
> -
> -   if (ACPI_SUCCESS(acpi_get_handle(bridge->handle,
> -   "_PS3", &dummy_handle)))
> -   bridge->flags |= BRIDGE_HAS_PS3;
> -
> -   /* is this ejectable p2p bridge? */
> -   if (bridge->flags & BRIDGE_HAS_EJ0) {
> -   struct acpiphp_func *func;
> -
> dbg("found ejectable p2p bridge\n");
>
> /* make link between PCI bridge and PCI function */
> @@ -412,7 +397,6 @@ static void add_host_bridge(struct acpi_pci_root *root)
> if (bridge == NULL)
> return;
>
> -   bridge->type = BRIDGE_TYPE_HOST;
> bridge->handle = handle;
>
> bridge->pci_bus = root->bus;
> @@ -432,7 +416,6 @@ static void add_p2p_bridge(acpi_handle *handle)
> return;
> }
>
> -   bridge->type = BRIDGE_TYPE_P2P;
> bridge->handle = handle;
> config_p2p_bridge_flags(bridge);
>
> @@ -543,7 +526,7 @@ static void cleanup_bridge(struct acpiphp_bridge *bridge)
> acpi_status status;
> acpi_handle handle = b

Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase

2013-01-24 Thread Bjorn Helgaas

On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim  wrote:
> During early boot phase, PCI bus subsystem is not yet initialized.
> If panic is occured in early boot phase and panic_timeout is set,
> code flow go into emergency_restart() and hit mach_reboot_fixups(), then
> encounter another panic. When second panic, we can't hold a panic_lock, so
> code flow go into panic_smp_self_stop() which prevent system to restart.
>
> For avoid second panic, skip reboot_fixups in early boot phase.
> It makes panic_timeout works in early boot phase.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Signed-off-by: Joonsoo Kim 
>
> diff --git a/arch/x86/kernel/reboot_fixups_32.c 
> b/arch/x86/kernel/reboot_fixups_32.c
> index c8e41e9..b9b8ec9 100644
> --- a/arch/x86/kernel/reboot_fixups_32.c
> +++ b/arch/x86/kernel/reboot_fixups_32.c
> @@ -89,6 +89,10 @@ void mach_reboot_fixups(void)
> if (in_interrupt())
> return;
>
> +   /* During early boot phase, PCI is not yet initialized */
> +   if (system_state == SYSTEM_BOOTING)
> +   return;
> +
> for (i=0; i < ARRAY_SIZE(fixups_table); i++) {
> cur = &(fixups_table[i]);
> dev = pci_get_device(cur->vendor, cur->device, NULL);

I guess you're saying that if we call pci_get_device() too early, it
panics?  Did you figure out why that happens?

If we call pci_get_device() before PCI has been initialized, it would
be good if it just returned NULL, indicating that we didn't find any
matching device.  I looked briefly, and I thought that's what would
happen, but apparently I'm missing something.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci-sysfs: replace mutex_lock with mutex_trylock to avoid potential deadlock situation

2013-01-24 Thread Bjorn Helgaas

On Thu, Dec 27, 2012 at 12:42 AM, Lin Feng  wrote:
> There is a potential deadlock situation when we manipulate the pci-sysfs user
> interfaces from different bus hierarchy simultaneously, described as 
> following:
>
> path1: sysfs remove device: | path2: sysfs rescan device:
> sysfs_schedule_callback_work()  | sysfs_write_file()
>   remove_callback() |   flush_write_buffer()
> *1* mutex_lock(&pci_remove_rescan_mutex)|*2*  sysfs_get_active(attr_sd)
>   ...   | dev_attr_store()
> device_remove_file()|   dev_rescan_store()
>   ...   |*4*  
> mutex_lock(&pci_remove_rescan_mutex)
> *3*   sysfs_deactivate(sd)  | ...
> wait_for_completion()   |*5*  sysfs_put_active(attr_sd)
> *6* mutex_unlock(&pci_remove_rescan_mutex)
>
> If path1 firstly holds the pci_remove_rescan_mutex at *1*, then another path
> called path2 actived and runs to *2* before path1 runs to *3*, we now runs
> to a deadlock situation:
> Path1 holds the mutex waiting path2 to decrease sysfs_dirent's s_active
> counter at *5*, but path2 is blocked at *4* when trying to get the
> pci_remove_rescan_mutex. The mutex won't be put by path1 until it reach
> *6*, but it's now blocked at *3*.
>
> The circumvented approach is to avoid manipulating(remove/scan) the pci-tree 
> at
> the same time. If we find someone else is manipulating the pci-tree we simply
> abort current operation without touching the pci-tree concurrently.
>
> *dmesg ifno*:
> (snip)
> 1000e :1c:00.0: eth9: Intel(R) PRO/1000 Network Connection
> sd 13:2:0:0: [sdb] Attached SCSI disk
> e1000e :1c:00.0: eth9: MAC: 0, PHY: 4, PBA No: D50228-005
> e1000e :1c:00.1: Disabling ASPM  L1
> e1000e :1c:00.1: Interrupt Throttling Rate (ints/sec) set to dynamic 
> conservative mode
> e1000e :1c:00.1: irq 143 for MSI/MSI-X
> e1000e :1c:00.1: eth10: (PCI Express:2.5GT/s:Width x4) 00:15:17:cd:96:bf
> e1000e :1c:00.1: eth10: Intel(R) PRO/1000 Network Connection
> e1000e :1c:00.1: eth10: MAC: 0, PHY: 4, PBA No: D50228-005
> INFO: task bash:62982 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> bashD  0 62982  62978 0x0080
>  88038b277db8 0082 88038b277fd8 00013940
>  88038b276010 00013940 00013940 00013940
>  88038b277fd8 00013940 880377449e30 8806e822c670
> Call Trace:
>  [] schedule+0x29/0x70
>  [] schedule_preempt_disabled+0xe/0x10
>  [] __mutex_lock_slowpath+0xd3/0x150
>  [] mutex_lock+0x2b/0x50
>  [] dev_rescan_store+0x5c/0x80
>  [] dev_attr_store+0x20/0x30
>  [] sysfs_write_file+0xef/0x170
>  [] vfs_write+0xc8/0x190
>  [] sys_write+0x51/0x90
>  [] system_call_fastpath+0x16/0x1b
> INFO: task bash:64141 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> bashD 81610460 0 64141  64136 0x0080
>  8803540e9db8 0086 8803540e9fd8 00013940
>  8803540e8010 00013940 00013940 00013940
>  8803540e9fd8 00013940 8807db338a10 8806f09abc60
> Call Trace:
>  [] schedule+0x29/0x70
>  [] schedule_preempt_disabled+0xe/0x10
>  [] __mutex_lock_slowpath+0xd3/0x150
>  [] mutex_lock+0x2b/0x50
>  [] dev_rescan_store+0x5c/0x80
>  [] dev_attr_store+0x20/0x30
>  [] sysfs_write_file+0xef/0x170
>  [] vfs_write+0xc8/0x190
>  [] sys_write+0x51/0x90
>  [] system_call_fastpath+0x16/0x1b
> INFO: task kworker/u:3:64451 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u:3 D 81610460 0 64451  2 0x0080
>  8807d51b7a30 0046 8807d51b7fd8 00013940
>  8807d51b6010 00013940 00013940 00013940
>  8807d51b7fd8 00013940 8807db339420 88037744b250
> Call Trace:
>  [] schedule+0x29/0x70
>  [] schedule_timeout+0x19d/0x220
>  [] ? __slab_free+0x1f2/0x2f0
>  [] wait_for_common+0x11e/0x190
>  [] ? try_to_wake_up+0x2c0/0x2c0
>  [] wait_for_completion+0x1d/0x20
>  [] sysfs_addrm_finish+0xb8/0xd0
>  [] ? sysfs_schedule_callback+0x1e0/0x1e0
>  [] sysfs_hash_and_remove+0x60/0xb0
>  [] sysfs_remove_file+0x39/0x50
>  [] device_remove_file+0x17/0x20
>  [] bus_remove_device+0xdc/0x180
>  [] device_del+0x120/0x1d0
>  [] device_unregister+0x22/0x60
>  [] pci_stop_bus_device+0x94/0xa0
>  [] pci_stop_bus_device+0x40/0xa0
>  [] pci_stop_bus_device+0x40/0xa0
>  [] pci_stop_bus_device+0x40/0xa0
>  [] pci_stop_and_remove_bus_device+0x16/0x30
>  [] remove_callback+0x29/0x40
>  [] sysfs_schedule_callback_work+0x24/0x70
>  [] process_one_work+0x179/0x4b0
>  [] worker_thread+0x12e/0x330
>  [] ? manage_workers+0x110/0x110
>  [] kthread+0x9e/0xb0
>  [] kernel_thread_helper+0x4/0x1

Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase

2013-01-24 Thread Bjorn Helgaas

[+cc Greg for driver core]

On Fri, Jan 25, 2013 at 10:13:03AM +0900, Joonsoo Kim wrote:
> Hello, Bjorn.
> 
> On Thu, Jan 24, 2013 at 10:45:13AM -0700, Bjorn Helgaas wrote:
> > On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim  wrote:
> > > During early boot phase, PCI bus subsystem is not yet initialized.
> > > If panic is occured in early boot phase and panic_timeout is set,
> > > code flow go into emergency_restart() and hit mach_reboot_fixups(), then
> > > encounter another panic. When second panic, we can't hold a panic_lock, so
> > > code flow go into panic_smp_self_stop() which prevent system to restart.
> > >
> > > For avoid second panic, skip reboot_fixups in early boot phase.
> > > It makes panic_timeout works in early boot phase.
> > >
> > > Cc: Thomas Gleixner 
> > > Cc: Ingo Molnar 
> > > Cc: "H. Peter Anvin" 
> > > Signed-off-by: Joonsoo Kim 
> > >
> > > diff --git a/arch/x86/kernel/reboot_fixups_32.c 
> > > b/arch/x86/kernel/reboot_fixups_32.c
> > > index c8e41e9..b9b8ec9 100644
> > > --- a/arch/x86/kernel/reboot_fixups_32.c
> > > +++ b/arch/x86/kernel/reboot_fixups_32.c
> > > @@ -89,6 +89,10 @@ void mach_reboot_fixups(void)
> > > if (in_interrupt())
> > > return;
> > >
> > > +   /* During early boot phase, PCI is not yet initialized */
> > > +   if (system_state == SYSTEM_BOOTING)
> > > +   return;
> > > +
> > > for (i=0; i < ARRAY_SIZE(fixups_table); i++) {
> > > cur = &(fixups_table[i]);
> > > dev = pci_get_device(cur->vendor, cur->device, NULL);
> > 
> > I guess you're saying that if we call pci_get_device() too early, it
> > panics?  Did you figure out why that happens?
> > 
> > If we call pci_get_device() before PCI has been initialized, it would
> > be good if it just returned NULL, indicating that we didn't find any
> > matching device.  I looked briefly, and I thought that's what would
> > happen, but apparently I'm missing something.
> 
> In bus_find_device(), klist_iter_init_node() is called with 
> @bus->p->klist_device. Before initialization, bus->p is NULL,
> so panic is occured.

I see.  pci_bus_type.p is initialized by __bus_register() in this path:

  pci_driver_init# postcore_initcall
bus_register(&pci_bus_type)
  __bus_register
priv = kzalloc(sizeof(struct subsys_private))
bus->p = priv
klist_init(&priv->klist_devices, klist_devices_get, klist_devices_put)

I was hoping we could statically initialize the klist, but that doesn't
seem likely.

But I wonder if we could do something like the following.  If we could,
then callers wouldn't have to worry about whether or not the bus has been
initialized.

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 24eb078..ede19b8 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -290,7 +290,7 @@ int bus_for_each_dev(struct bus_type *bus, struct device 
*start,
struct device *dev;
int error = 0;
 
-   if (!bus)
+   if (!bus || !bus->p)
return -EINVAL;
 
klist_iter_init_node(&bus->p->klist_devices, &i,
@@ -324,7 +324,7 @@ struct device *bus_find_device(struct bus_type *bus,
struct klist_iter i;
struct device *dev;
 
-   if (!bus)
+   if (!bus || !bus->p)
return NULL;
 
klist_iter_init_node(&bus->p->klist_devices, &i,
@@ -440,7 +440,7 @@ int bus_for_each_drv(struct bus_type *bus, struct 
device_driver *start,
struct device_driver *drv;
int error = 0;
 
-   if (!bus)
+   if (!bus || !bus->p)
return -EINVAL;
 
klist_iter_init_node(&bus->p->klist_drivers, &i,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, reboot: skip reboot_fixups in early boot phase

2013-01-24 Thread Bjorn Helgaas

On Thu, Jan 24, 2013 at 9:14 PM, Greg Kroah-Hartman
 wrote:
> On Thu, Jan 24, 2013 at 07:59:01PM -0700, Bjorn Helgaas wrote:
>> [+cc Greg for driver core]
>>
>> On Fri, Jan 25, 2013 at 10:13:03AM +0900, Joonsoo Kim wrote:
>> > Hello, Bjorn.
>> >
>> > On Thu, Jan 24, 2013 at 10:45:13AM -0700, Bjorn Helgaas wrote:
>> > > On Fri, Dec 28, 2012 at 6:50 AM, Joonsoo Kim  wrote:
>> > > > During early boot phase, PCI bus subsystem is not yet initialized.
>> > > > If panic is occured in early boot phase and panic_timeout is set,
>> > > > code flow go into emergency_restart() and hit mach_reboot_fixups(), 
>> > > > then
>> > > > encounter another panic. When second panic, we can't hold a 
>> > > > panic_lock, so
>> > > > code flow go into panic_smp_self_stop() which prevent system to 
>> > > > restart.
>> > > >
>> > > > For avoid second panic, skip reboot_fixups in early boot phase.
>> > > > It makes panic_timeout works in early boot phase.
>> > > >
>> > > > Cc: Thomas Gleixner 
>> > > > Cc: Ingo Molnar 
>> > > > Cc: "H. Peter Anvin" 
>> > > > Signed-off-by: Joonsoo Kim 
>> > > >
>> > > > diff --git a/arch/x86/kernel/reboot_fixups_32.c 
>> > > > b/arch/x86/kernel/reboot_fixups_32.c
>> > > > index c8e41e9..b9b8ec9 100644
>> > > > --- a/arch/x86/kernel/reboot_fixups_32.c
>> > > > +++ b/arch/x86/kernel/reboot_fixups_32.c
>> > > > @@ -89,6 +89,10 @@ void mach_reboot_fixups(void)
>> > > > if (in_interrupt())
>> > > > return;
>> > > >
>> > > > +   /* During early boot phase, PCI is not yet initialized */
>> > > > +   if (system_state == SYSTEM_BOOTING)
>> > > > +   return;
>> > > > +
>> > > > for (i=0; i < ARRAY_SIZE(fixups_table); i++) {
>> > > > cur = &(fixups_table[i]);
>> > > > dev = pci_get_device(cur->vendor, cur->device, NULL);
>> > >
>> > > I guess you're saying that if we call pci_get_device() too early, it
>> > > panics?  Did you figure out why that happens?
>> > >
>> > > If we call pci_get_device() before PCI has been initialized, it would
>> > > be good if it just returned NULL, indicating that we didn't find any
>> > > matching device.  I looked briefly, and I thought that's what would
>> > > happen, but apparently I'm missing something.
>> >
>> > In bus_find_device(), klist_iter_init_node() is called with
>> > @bus->p->klist_device. Before initialization, bus->p is NULL,
>> > so panic is occured.
>>
>> I see.  pci_bus_type.p is initialized by __bus_register() in this path:
>>
>>   pci_driver_init# postcore_initcall
>> bus_register(&pci_bus_type)
>>   __bus_register
>> priv = kzalloc(sizeof(struct subsys_private))
>>   bus->p = priv
>>   klist_init(&priv->klist_devices, klist_devices_get, klist_devices_put)
>>
>> I was hoping we could statically initialize the klist, but that doesn't
>> seem likely.
>>
>> But I wonder if we could do something like the following.  If we could,
>> then callers wouldn't have to worry about whether or not the bus has been
>> initialized.
>
> 
>
> I have no objection to that patch, but really, someone wants to call
> pci_find_device() before PCI is initialized?  Can't that be fixed
> instead, as that is the root problem, not the driver core.
>
> But, to paper over your subsystem's bugs, I guess I can take it :)

The caller is in the native_machine_emergency_restart() path.
Joonsoo's original patch does what I think you're suggesting:

>> > > > +   /* During early boot phase, PCI is not yet initialized */
>> > > > +   if (system_state == SYSTEM_BOOTING)
>> > > > +   return;
>> > > > +
>> > > > for (i=0; i < ARRAY_SIZE(fixups_table); i++) {
>> > > > cur = &(fixups_table[i]);
>> > > > dev = pci_get_device(cur->vendor, cur->device, NULL);

I think it's sort of ugly to check system_state before using
pci_get_device(), and there's not really an obvious connection between
system_state and PCI initialization.

But if you prefer that, Joonsoo's original patch is fine with me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: remove redundant function calls in pci_reassigndev_resource_alignment()

2013-01-24 Thread Bjorn Helgaas

On Fri, Dec 28, 2012 at 1:55 AM, Lin Feng  wrote:
>
>
> On 12/28/2012 03:45 PM, Yinghai Lu wrote:
>> On Thu, Dec 27, 2012 at 11:31 PM, Lin Feng  wrote:
>>> pci_reassigndev_resource_alignment() potentially calls
>>> pci_specified_resource_alignment() twice, which is redundant.
>>>
>>> pci_is_reassigndev() is only called in pci_reassigndev_resource_alignment(),
>>> and from sematic/functionality aspects pci_specified_resource_alignment() is
>>> sufficient to substitute, so also make some cleanup.
>>>
>>> Signed-off-by: Lin Feng 
>>> Cc: Yinghai Lu 
>>> ---
>>>  drivers/pci/pci.c |   16 ++--
>>>  1 files changed, 2 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 5cb5820..789f401 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -3766,18 +3766,6 @@ resource_size_t 
>>> pci_specified_resource_alignment(struct pci_dev *dev)
>>> return align;
>>>  }
>>>
>>> -/**
>>> - * pci_is_reassigndev - check if specified PCI is target device to reassign
>>> - * @dev: the PCI device to check
>>> - *
>>> - * RETURNS: non-zero for PCI device is a target device to reassign,
>>> - *  or zero is not.
>>> - */
>>> -int pci_is_reassigndev(struct pci_dev *dev)
>>> -{
>>> -   return (pci_specified_resource_alignment(dev) != 0);
>>> -}
>>> -
>>>  /*
>>>   * This function disables memory decoding and releases memory resources
>>>   * of the device specified by kernel's boot parameter 
>>> 'pci=resource_alignment='.
>>> @@ -3792,7 +3780,8 @@ void pci_reassigndev_resource_alignment(struct 
>>> pci_dev *dev)
>>> resource_size_t align, size;
>>> u16 command;
>>>
>>> -   if (!pci_is_reassigndev(dev))
>>> +   align = pci_specified_resource_alignment(dev);
>>> +   if (!align)
>>> return;
>>>
>>> if (dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
>>> @@ -3808,7 +3797,6 @@ void pci_reassigndev_resource_alignment(struct 
>>> pci_dev *dev)
>>> command &= ~PCI_COMMAND_MEMORY;
>>> pci_write_config_word(dev, PCI_COMMAND, command);
>>>
>>> -   align = pci_specified_resource_alignment(dev);
>>> for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
>>> r = &dev->resource[i];
>>> if (!(r->flags & IORESOURCE_MEM))
>>> --
>>> 1.7.1
>>>
>>
>> looks the same as the one that i posted 6 months ago.
>>
>> http://copilotco.com/mail-archives/linux-kernel.2012/msg18498.html
> Sorry, I didn't notice your patchset before :)
> Yes, they are the same. But why does nobody care this?

I applied Yinghai's patch to my pci/misc branch, headed for v3.9.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] PCI: Document PCIE BUS MPS parameters

2013-01-24 Thread Bjorn Helgaas

[+cc Jon, can you make sure this documentation is accurate?]

On Wed, Jan 23, 2013 at 7:58 PM, Yijing Wang  wrote:
> v0->v1: Update MPS parameters as non-arch and add MRRS
> description into pcie_bus_perf parameter suggested
> by Andrew Murray.
> v1->v2: Update some semantic problems and add MPS and MRRS
> explanation suggested by Joe Lawrence and Randy Dunlap.
>
> Document PCIE BUS MPS parameters pcie_bus_tune_off, pcie_bus_safe,
> pcie_bus_peer2peer, pcie_bus_perf into Documentation/kernel-parameters.txt.
> These parameters were introduced by Jon Mason  at
> commit 5f39e6705 and commit b03e7495a8.
>
> Signed-off-by: Yijing Wang 
> ---
>  Documentation/kernel-parameters.txt |   16 
>  1 files changed, 16 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 363e348..0bb279f 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2227,6 +2227,22 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
> This sorting is done to get a device
> order compatible with older (<= 2.4) kernels.
> nobfsortDon't sort PCI devices into breadth-first 
> order.
> +   pcie_bus_tune_off   Disable PCIe MPS (Max Payload Size)
> +   turning and using the BIOS-configured MPS 
> defaults.

s/turning/tuning/
s/using/use/

> +   pcie_bus_safe   Use the smallest common denominator MPS
> +   of the entire tree below a root complex for 
> every
> +   device on that fabric. Can avoid inconsistent 
> MPS
> +   problem caused by hotplug.

I'm not sure about this.  "smallest common denominator" is colloquial
and not exact.  I think you probably mean "the smallest supported MPS
of any device below a root complex."

I'm also not sure how this will avoid MPS problems caused by hotplug.
I think you have to assume that a hot-added device may support only a
128B MPS, which means the only way to guarantee that you can use that
device is to set MPS=128 for any device that will send packets to the
hot-added device.  Or we could reconfigure things at the time of the
hot-add, but I don't think we do that today (except for the hot-added
device itself).

> +   pcie_bus_perf   Configure pcie device MPS to the largest

Capitalize "PCIe" consistently.  Better yet, drop it completely since
it's obvious that we're only talking about PCIe here.

> +   allowable MPS based on its parent bus. Also 
> set
> +   MRRS (Max Read Request Size) to the largest 
> supported
> +   value (no larger than the MPS that the device 
> or bus
> +   can support) for Max performance.

s/Max/best/

> +   pcie_bus_peer2peer  Make the system-wide MPS the smallest
> +   possible value (128B). This configuration 
> could prevent
> +   peer to peer DMA transmission from working by 
> having
> +   the MPS on one root port different than the 
> MPS on
> +   another.

I think the idea is that pcie_bus_peer2peer would *allow* peer-to-peer
DMA to work, which is the opposite of what you wrote.  Maybe something
like this would be better:

  Set every device's MPS to 128B, which every device is guaranteed to support.
  This configuration allows peer-to-peer DMA between any pair of devices,
  possibly at the cost of reduced performance.

> cbiosize=nn[KMG]The fixed amount of bus space which is
> reserved for the CardBus bridge's IO window.
> The default value is 256 bytes.
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci-sysfs: replace mutex_lock with mutex_trylock to avoid potential deadlock situation

2013-01-25 Thread Bjorn Helgaas

[+cc Yinghai]

On Fri, Jan 25, 2013 at 3:02 AM, Gu Zheng  wrote:
> Hi Bjorn,
> Thanks for your review and comments! Please refer to inlined comments 
> below.
>
> On 01/25/2013 07:12 AM, Bjorn Helgaas wrote:
>
>> On Thu, Dec 27, 2012 at 12:42 AM, Lin Feng  wrote:
>>> There is a potential deadlock situation when we manipulate the pci-sysfs 
>>> user
>>> interfaces from different bus hierarchy simultaneously, described as 
>>> following:
>>>
>>> path1: sysfs remove device: | path2: sysfs rescan device:
>>> sysfs_schedule_callback_work()  | sysfs_write_file()
>>>   remove_callback() |   flush_write_buffer()
>>> *1* mutex_lock(&pci_remove_rescan_mutex)|*2*  sysfs_get_active(attr_sd)
>>>   ...   | dev_attr_store()
>>> device_remove_file()|   dev_rescan_store()
>>>   ...   |*4*  
>>> mutex_lock(&pci_remove_rescan_mutex)
>>> *3*   sysfs_deactivate(sd)  | ...
>>> wait_for_completion()   |*5*  sysfs_put_active(attr_sd)
>>> *6* mutex_unlock(&pci_remove_rescan_mutex)
> ...snip...
>>> Reported-by: Taku Izumi 
>>> Signed-off-by: Lin Feng 
>>> Signed-off-by: Gu Zheng 
>>> ---
>>>  drivers/pci/pci-sysfs.c |   42 ++
>>>  1 files changed, 26 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>> index 05b78b1..d2efbb0 100644
>>> --- a/drivers/pci/pci-sysfs.c
>>> +++ b/drivers/pci/pci-sysfs.c
>>> @@ -295,10 +295,13 @@ static ssize_t bus_rescan_store(struct bus_type *bus,
> const char *buf,
>>> return -EINVAL;
>>>
>>> if (val) {
>>> -   mutex_lock(&pci_remove_rescan_mutex);
>>> -   while ((b = pci_find_next_bus(b)) != NULL)
>>> -   pci_rescan_bus(b);
>>> -   mutex_unlock(&pci_remove_rescan_mutex);
>>> +   if (mutex_trylock(&pci_remove_rescan_mutex)) {
>>> +   while ((b = pci_find_next_bus(b)) != NULL)
>>> +   pci_rescan_bus(b);
>>> +   mutex_unlock(&pci_remove_rescan_mutex);
>>> +   } else {
>>> +   return 0;
>> What are the semantics of returning 0 from a sysfs store function?
>> Does the user's write just get dropped?  I would think we'd return
>> "count" for that case.
>
> Oh, yes, return "count" seems suitable here, although we did not reach the
> user's target goal(rescan the bus), but the user's write has been flushed yet.
> But the user still can not judge whether pci_rescan_bus() was successfully 
> done
> only by the return value. Shall we return a suitable error here to tell the 
> user
> that his write was written, but pci_rescan_bus() was not done ?
>
>> Is there some sort of automatic retry in libc
>> or something if we return zero?
>
> No, there is not any extra operations in libc if we return zero indeed.
>
>> Are you relying on the user code to
>> notice that nothing was written and do its own retry?
>>
>
> Yes, but it seems impractical.
>
>> The last seems most likely, but that seems like it complicates the
>> user's life unnecessarily.
>>
>>> +   }
>>> }
>>> return count;
>>>  }
>>> @@ -319,9 +322,12 @@ dev_rescan_store(struct device *dev, struct 
>>> device_attribute *attr,
>>> return -EINVAL;
>>>
>>> if (val) {
>>> -   mutex_lock(&pci_remove_rescan_mutex);
>>> -   pci_rescan_bus(pdev->bus);
>>> -   mutex_unlock(&pci_remove_rescan_mutex);
>>> +   if (mutex_trylock(&pci_remove_rescan_mutex)) {
>>> +   pci_rescan_bus(pdev->bus);
>>> +   mutex_unlock(&pci_remove_rescan_mutex);
>>> +   } else {
>>> +   return 0;
>>> +   }
>>> }
>>> return count;
>>>  }
>>> @@ -330,9 +336,10 @@ static void remove_callback(struct device *dev)
>>>  {
>>> struct pci_dev *pdev = to_pci_dev(dev);
>>>
>>> -   mutex_lock(&pci_remove_rescan_mutex);

Re: [PATCH] pci-sysfs: replace mutex_lock with mutex_trylock to avoid potential deadlock situation

2013-01-25 Thread Bjorn Helgaas

On Fri, Jan 25, 2013 at 11:57 AM, Yinghai Lu  wrote:
> On Fri, Jan 25, 2013 at 9:44 AM, Bjorn Helgaas  wrote:
>> [+cc Yinghai]
>>
>> On Fri, Jan 25, 2013 at 3:02 AM, Gu Zheng
>
> Hi, Gu,
>
> Can you check if two patches in
>
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=shortlog;h=refs/heads/for-pci-root-bus-hotplug-part3
>
> could solve your problem?
>
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=277df390baeab7ba6aa136356b677a096c890c0c
>
> PCI: Rescan bus using callback method too
>
>
> http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=282e2db3b58d56b5236ee755e1527574df0d298e
>
> PCI, sysfs: Clean up rescan/remove with scheule_callback

You ignored and clipped my concerns about similar synchronization
issues outside sysfs, so let me quote it again here:

I'm sorry that you tripped over this deadlock, because now I'm worried
about related locking issues outside sysfs :)  The mutex you're
fiddling with is only in sysfs, but the routines *protected* by that
mutex are used in other places, too.  So what happens when a hotplug
driver does a rescan at the same time a user does a rescan or remove
via sysfs?  I don't even know what the rules are for protecting
scan/remove, but I don't have confidence that the issue you're fixing
is the only one.

If we're going to fix the sysfs deadlock (and we should), I want to
either see an argument for why we don't have a problem outside of
sysfs, or I want to fix sysfs and non-sysfs at the same time.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v10 00/11] PCI, ACPI: pci root bus hotplug support / pci match_driver

2013-01-25 Thread Bjorn Helgaas

On Tue, Jan 22, 2013 at 3:19 PM, Yinghai Lu  wrote:
> On Tue, Jan 22, 2013 at 2:09 PM, Rafael J. Wysocki  wrote:
>> On Monday, January 21, 2013 01:20:41 PM Yinghai Lu wrote:
>>> It includes
>>> 1. preparing patches for pci root bus hotadd/hotremove support
>>> 2. move root bus hotadd from acpiphp to pci_root.c
>>> 3. add hot-remove support
>>> 4. add acpi_hp_work to be shared with acpiphp and root-bus hotplug
>>> 5. add match_driver to add pci device to device tree early but
>>>not attach driver for hotplug path.
>>>
>>> based on pci/next + pm/acpi-scan
>>>
>>> could get from
>>> 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
>>> for-pci-root-bus-hotplug
>>>
>>> -v9: merges several patches together for easy review, requested by Rafael.
>>> -v10: address comments from Rafael.
>>>
>>> Jiang Liu (2):
>>>   PCI: Fix a device reference count leakage issue in pci_dev_present()
>>>   PCI: make PCI device create/destroy logic symmetric
>>>
>>> Tang Chen (1):
>>>   PCI, ACPI: debug print for installation of acpi root bridge's
>>> notifier
>>>
>>> Yinghai Lu (8):
>>>   PCI, acpiphp: Add is_hotplug_bridge detection
>>>   PCI: Add root bus children dev's res to fail list
>>>   PCI: Set dev_node early for pci_dev
>>>   PCI, ACPI, acpiphp: Rename alloc_acpiphp_hp_work() to alloc_acpi_hp_work
>>>   PCI, acpiphp: Move and enhance hotplug support of pci host bridge
>>>   PCI, acpiphp: Don't bailout even no slots found yet.
>>>   PCI: Skip attaching driver in device_add()
>>>   PCI: Put pci dev to device tree as early as possible
>>
>> OK
>>
>> Please feel free to add
>>
>> Acked-by: Rafael J. Wysocki 
>>
>> to all of the patches in this series I haven't acked already.

I first pulled in
"git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
acpi-scan" again (to pci/acpi-scan2), added your acks, Rafael, and put
this series on a pci/yinghai-root-bus branch based on pci/acpi-scan2.

I reworked some of the changelogs a bit, but I don't think I made any
code changes except that in [10/11] I just inlined the
pci_bus_attach_device() code rather than making a new function, since
it's small, there's only one caller, and I didn't think we needed any
more pci_* and pci_bus_* functions than we already have.

Let me know if I messed anything up.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/14] Rewrite Tegra PCIe driver

2013-01-28 Thread Bjorn Helgaas

On Wed, Jan 9, 2013 at 1:43 PM, Thierry Reding
 wrote:
> This patch series contains an almost complete rewrite of the Tegra PCIe
> driver. The code is moved to the drivers/pci/host directory and turned
> into a proper platform driver, adding MSI and DT support while at it.
> Other PCI host controller drivers can be added to that directory in an
> attempt to make it easier to factor out common code.
>
> Patches 1 to 4 add generic OF helpers to parse a PCI node's ranges
> property as well as obtain the bus, device and function numbers from a
> node's reg property.
>
> Patch 5 provides an implementation of a cache for I/O mappings. This can
> be used in situations where a large physical memory region needs to be
> ioremap()'ed. On Tegra, the PCIe standard and extended configuration
> spaces can be accessed through a 256 MiB window. Mapping that in one
> chunk is wasteful and may not always work because the vmalloc area may
> not be large enough. The implementation in this patch uses an LRU based
> approach to map a limited amount of pages on an as-needed basis.
>
> Patches 6 and 7 prepare the ARM PCI code to enable PCI host controller
> drivers to load after the init stage, which can happen due to deferred
> probing, and to allow private data to be passed for each controller.
>
> Patches 8 and 9 move some of the Tegra specific code around to allow it
> to be used from outside the arch/arm/mach-tegra directory.
>
> Patch 10 moves the Tegra PCIe controller driver to the drivers/pci/host
> directory and turns it into a proper platform driver. It also adds MSI
> (based on patches by NVIDIA) and DT support.
>
> Patches 11 to 14 add device tree based probing for the TEC, Harmony and
> TrimSlice boards.
>
> Thierry
>
> Andrew Murray (1):
>   of/pci: Provide support for parsing PCI DT ranges property
>
> Thierry Reding (13):
>   of/pci: Add of_pci_get_devfn() function
>   of/pci: Add of_pci_get_bus() function
>   of/pci: Add of_pci_parse_bus_range() function
>   lib: Add I/O map cache implementation
>   ARM: pci: Keep pci_common_init() around after init
>   ARM: pci: Allow passing per-controller private data
>   ARM: tegra: Move tegra_pcie_xclk_clamp() to PMC
>   ARM: tegra: Move pmc.h to include/mach
>   PCI: tegra: Move PCIe driver to drivers/pci/host
>   ARM: tegra: tamonten: Add PCIe support
>   ARM: tegra: tec: Add PCIe support
>   ARM: tegra: harmony: Initialize PCIe from DT
>   ARM: tegra: trimslice: Initialize PCIe from DT
>
>  .../bindings/pci/nvidia,tegra20-pcie.txt   |  115 ++
>  arch/arm/Kconfig   |2 +
>  arch/arm/boot/dts/tegra20-harmony.dts  |   20 +-
>  arch/arm/boot/dts/tegra20-tamonten.dtsi|2 +-
>  arch/arm/boot/dts/tegra20-tec.dts  |  198 +++
>  arch/arm/boot/dts/tegra20-trimslice.dts|   12 +
>  arch/arm/boot/dts/tegra20.dtsi |   45 +
>  arch/arm/include/asm/mach/pci.h|1 +
>  arch/arm/kernel/bios32.c   |9 +-
>  arch/arm/mach-tegra/Kconfig|5 -
>  arch/arm/mach-tegra/Makefile   |3 -
>  arch/arm/mach-tegra/board-dt-tegra20.c |   24 -
>  arch/arm/mach-tegra/board-harmony-pcie.c   |   84 --
>  arch/arm/mach-tegra/board.h|4 +-
>  arch/arm/mach-tegra/common.c   |2 +-
>  arch/arm/mach-tegra/include/mach/pmc.h |   24 +
>  arch/arm/mach-tegra/iomap.h|3 -
>  arch/arm/mach-tegra/pcie.c |  887 -
>  arch/arm/mach-tegra/pmc.c  |   16 +
>  arch/arm/mach-tegra/pmc.h  |   23 -
>  drivers/of/address.c   |   63 +
>  drivers/of/of_pci.c|   78 +-
>  drivers/pci/Kconfig|2 +
>  drivers/pci/Makefile   |3 +
>  drivers/pci/host/Kconfig   |8 +
>  drivers/pci/host/Makefile  |1 +
>  drivers/pci/host/pci-tegra.c   | 1322 
> 
>  include/linux/io.h |   12 +
>  include/linux/of_address.h |9 +
>  include/linux/of_pci.h |3 +
>  lib/ioremap.c  |  266 
>  31 files changed, 2203 insertions(+), 1043 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/pci/nvidia,tegra20-pcie.txt
>  delete mode 100644 arch/arm/mach-tegra/board-harmony-pcie.c
>  create mode 100644 arch/arm/mach-tegra/include/mach/pmc.h
>  delete mode 100644 arch/arm/mach-tegra/pcie.c
>  delete mode 100644 arch/arm/mach-tegra/pmc.h
>  create mode 100644 drivers/pci/host/Kconfig
>  create mode 100644 drivers/pci/host/Makefile
>  create mode 100644 drivers/pci/host/pci-tegra.c

Hi Thierry,

Since

Re: [RFC PATCH v5 0/8] introduce PCI bus notifier chain to get rid of the ACPI PCI subdriver interfaces

2013-01-28 Thread Bjorn Helgaas

On Fri, Jan 18, 2013 at 9:07 AM, Jiang Liu  wrote:
> This is an RFC patchset to address review comments in thread at:
> https://patchwork.kernel.org/patch/1946851/. The patch just pasts
> compilation. If no objection to the new implementation, I will
> go on to modify acpiphp driver and conduct tests.
>
> The main changes from V4 to V5 includes:
> 1) introduce a dedicated notifier chain for PCI buses
> 2) change pci_slot as built-in driver
> 3) unify the way to create/destroy PCI slots
> 4) introduce a kernel option to disable PCIe native hotplug
>
> TODO:
> 1) change acpiphp as built-in and unify the way to create/destroy ACPI
>based hotplug slots.
> 2) change other ACPI PCI subdriver in Yinghai's root bridge hotplug series
>to use the PCI bus notifier chain.
> 3) Remove the ACPI PCI subdriver interface eventaully.
>
> Jiang Liu (8):
>   PCI: make PCI device create/destroy logic symmetric
>   PCI: split registration of PCI bus devices into two stages
>   PCI: add a blocking notifier chain for PCI bus addition/removal
>   ACPI, PCI: avoid building pci_slot as module
>   PCI, ACPI: hook PCI bus notifications to create/destroy PCI slots
>   pci_slot: replace printk(KERN_xxx) with pr_xxx()
>   PCI/PCIe: add "pci=nopciehp" to disable PCIe native hotplug
>   PCI/PCIe: only claim PME from firmware when CONFIG_PCIE_PME is
> enabled
>
>  Documentation/kernel-parameters.txt |2 +
>  drivers/acpi/Kconfig|5 +-
>  drivers/acpi/internal.h |5 +
>  drivers/acpi/pci_root.c |8 +-
>  drivers/acpi/pci_slot.c |  217 
> ++-
>  drivers/acpi/scan.c |1 +
>  drivers/pci/bus.c   |   26 -
>  drivers/pci/pci.c   |2 +
>  drivers/pci/pci.h   |1 +
>  drivers/pci/pcie/portdrv_core.c |7 +-
>  drivers/pci/pcie/portdrv_pci.c  |3 +
>  drivers/pci/probe.c |7 +-
>  drivers/pci/remove.c|   15 +--
>  include/linux/pci.h |   21 
>  14 files changed, 142 insertions(+), 178 deletions(-)

I think the problem we're trying to solve is that we don't initialize
hot-added devices, correctly, e.g., we don't set up AER, we don't
update acpi/pci_slot stuff, we probably don't set up PME etc.  We also
have similar issues like IOMMU init on powerpc.

Notifier chains seem like an unnecessarily complicated way to deal
with this.  They're great for communicating between modules that stay
at arm's length from each other.  But that's not the case here --
everything is PCI and is quite closely coupled.  I think AER, PME,
slot, etc., should  be initialized directly in pci_device_add() or
somewhere nearby.

This might sound a bit radical because it implies some fairly
far-reaching changes.  It means this code can't be a module (the only
one that can be built as a module today is pciehp, and I think
everybody agrees that we should make it static as soon as we can
figure out the acpiphp/pciehp issue).  I think it also means the
pcieportdrv concept is of dubious value, since all the services should
be known at build-time and we probably don't need a registration
interface for them.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v5 4/8] ACPI, PCI: avoid building pci_slot as module

2013-01-28 Thread Bjorn Helgaas

On Sun, Jan 20, 2013 at 5:01 PM, Rafael J. Wysocki  wrote:
> On Saturday, January 19, 2013 12:07:42 AM Jiang Liu wrote:
>> As discussed in thread at https://patchwork.kernel.org/patch/1946851/,
>> there's no value in supporting CONFIG_ACPI_PCI_SLOT=m any more.
>> So change Kconfig and code to only support building pci_slot as
>> built-in driver.
>>
>> Signed-off-by: Jiang Liu 
>
> Acked-by: Rafael J. Wysocki 

Acked-by: Bjorn Helgaas 

I think we should eventually get rid of acpi_pci_register_driver() and
do this initialization directly from acpi_pci_root_add().  But
removing the module option here is a good first step.

Rafael, do you want to apply this (and [6/8]) via your tree?  If not,
I can take it.

>> ---
>>  drivers/acpi/Kconfig|5 +
>>  drivers/acpi/internal.h |5 +
>>  drivers/acpi/pci_slot.c |   13 +
>>  drivers/acpi/scan.c |1 +
>>  4 files changed, 8 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index 0300bf6..7efd0d0 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -299,7 +299,7 @@ config ACPI_DEBUG_FUNC_TRACE
>> is about half of the penalty and is rarely useful.
>>
>>  config ACPI_PCI_SLOT
>> - tristate "PCI slot detection driver"
>> + bool "PCI slot detection driver"
>>   depends on SYSFS
>>   default n
>>   help
>> @@ -308,9 +308,6 @@ config ACPI_PCI_SLOT
>> i.e., segment/bus/device/function tuples, with physical slots in
>> the system.  If you are unsure, say N.
>>
>> -   To compile this driver as a module, choose M here:
>> -   the module will be called pci_slot.
>> -
>>  config X86_PM_TIMER
>>   bool "Power Management Timer Support" if EXPERT
>>   depends on X86
>> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
>> index e050254..7374cfc 100644
>> --- a/drivers/acpi/internal.h
>> +++ b/drivers/acpi/internal.h
>> @@ -67,6 +67,11 @@ struct acpi_ec {
>>
>>  extern struct acpi_ec *first_ec;
>>
>> +#ifdef   CONFIG_ACPI_PCI_SLOT
>> +void acpi_pci_slot_init(void);
>> +#else
>> +static inline void acpi_pci_slot_init(void) { }
>> +#endif
>>  int acpi_pci_root_init(void);
>>  int acpi_ec_init(void);
>>  int acpi_ec_ecdt_probe(void);
>> diff --git a/drivers/acpi/pci_slot.c b/drivers/acpi/pci_slot.c
>> index d22585f..a7d7e77 100644
>> --- a/drivers/acpi/pci_slot.c
>> +++ b/drivers/acpi/pci_slot.c
>> @@ -330,19 +330,8 @@ static struct dmi_system_id acpi_pci_slot_dmi_table[] 
>> __initdata = {
>>   {}
>>  };
>>
>> -static int __init
>> -acpi_pci_slot_init(void)
>> +void __init acpi_pci_slot_init(void)
>>  {
>>   dmi_check_system(acpi_pci_slot_dmi_table);
>>   acpi_pci_register_driver(&acpi_pci_slot_driver);
>> - return 0;
>>  }
>> -
>> -static void __exit
>> -acpi_pci_slot_exit(void)
>> -{
>> - acpi_pci_unregister_driver(&acpi_pci_slot_driver);
>> -}
>> -
>> -module_init(acpi_pci_slot_init);
>> -module_exit(acpi_pci_slot_exit);
>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>> index f8a0d0f..cb964ac 100644
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ -1732,6 +1732,7 @@ int __init acpi_scan_init(void)
>>
>>   acpi_power_init();
>>   acpi_pci_root_init();
>> + acpi_pci_slot_init();
>>
>>   /*
>>* Enumerate devices in the ACPI namespace.
>>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v5 4/8] ACPI, PCI: avoid building pci_slot as module

2013-01-28 Thread Bjorn Helgaas

On Mon, Jan 28, 2013 at 2:29 PM, Yinghai Lu  wrote:
> On Mon, Jan 28, 2013 at 1:09 PM, Bjorn Helgaas  wrote:
>> On Sun, Jan 20, 2013 at 5:01 PM, Rafael J. Wysocki  wrote:
>>> On Saturday, January 19, 2013 12:07:42 AM Jiang Liu wrote:
>>>> As discussed in thread at https://patchwork.kernel.org/patch/1946851/,
>>>> there's no value in supporting CONFIG_ACPI_PCI_SLOT=m any more.
>>>> So change Kconfig and code to only support building pci_slot as
>>>> built-in driver.
>>>>
>>>> Signed-off-by: Jiang Liu 
>>>
>>> Acked-by: Rafael J. Wysocki 
>>
>> Acked-by: Bjorn Helgaas 
>>
>> I think we should eventually get rid of acpi_pci_register_driver() and
>> do this initialization directly from acpi_pci_root_add().  But
>> removing the module option here is a good first step.
>>
>> Rafael, do you want to apply this (and [6/8]) via your tree?  If not,
>> I can take it.
>
> If bios have messed up slot name or idx, we will get strange 1-1
> other crazy name.
>
> if you really need to put it as built-in, may need to some command
> line that user could switch it off.

It would save us all a lot of time if you gave an example and worked
through the scenario where this is a problem.

We already have the choice of having pci_slot built-in, so if there's
a bug in that config, we already have the bug.  This patch merely
removes a config where the bug might be covered up.

I don't know why "adding a command line switch" appeals to you as the
solution to every problem.  As far as I'm concerned that's not a
solution to ANY problem.  It might be a band-aid to enable users to
limp along while we figure out a correct solution, but it's certainly
not a resolution.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] PCI changes for v3.9

2013-02-23 Thread Bjorn Helgaas

The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed:

  Linux 3.8-rc2 (2013-01-02 18:13:21 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
tags/pci-v3.9-changes

for you to fetch changes up to 018ba0a6efada61b9bc17500101d81c3d35807c2:

  Merge branch 'pci/yinghai-root-bus-hotplug' into next (2013-02-19 11:42:17 
-0700)



PCI changes for the v3.9 merge window:

  Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)

  PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)

  Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)

  Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)


Alex Williamson (1):
  PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS

Bjorn Helgaas (23):
  ACPI: Remove unused struct acpi_pci_root.id member
  x86/PCI: Remove unused pci_root_bus
  frv/PCI: Remove unused pci_root_bus
  mn10300/PCI: Remove unused pci_root_bus
  PCI: Use "unsigned long" for __pci_enable_device_flags to match ioport.h
  PCI: Drop "__" prefix on __pci_enable_device_flags()
  Merge branch 'pci/misc' into next
  Merge branch 'pci/yinghai-survey-resources' into next
  Merge branch 'acpi-scan' of git://git.kernel.org/.../rafael/linux-pm into 
pci/yinghai-survey-resources+acpi-scan
  Merge branch 'pci/yinghai-survey-resources+acpi-scan' into next
  Merge branch 'pci/rafael-set-root-bridge-handle' into next
  Merge branch 'acpi-scan' of git://git.kernel.org/.../rafael/linux-pm into 
pci/acpi-scan2
  Merge branch 'pci/yijing-ari' into next
  Merge branch 'pci/acpi-scan2' into next
  Merge branch 'pci/yinghai-root-bus-hotplug' into next
  Merge branch 'pci/yinghai-root-bus-hotplug' into next
  Merge branch 'pci/joe-aspm' into next
  Merge branch 'pci/misc' into next
  PCI: Use atomic_inc_return() rather than atomic_add_return()
  Merge branch 'pci/konstantin-runtime-pm' into next
  Merge branch 'pci/jiang-pci_slot-kconfig' into next
  Merge branch 'pci/misc' into next
  Merge branch 'pci/yinghai-root-bus-hotplug' into next

Jiang Liu (6):
  ACPI: remove unused acpi_op_bind and acpi_op_unbind
  PCI: Fix reference count leak in pci_dev_present()
  PCI: Make device create/destroy logic symmetric
  PCI: acpiphp: Create companion ACPI devices before creating PCI devices
  PCI: acpiphp: Remove dead code for PCI host bridge hotplug
  ACPI / PCI: Make pci_slot built-in only, not a module

Joe Lawrence (1):
  PCI/ASPM: Don't touch ASPM if forcibly disabled

Konstantin Khlebnikov (2):
  PCI: Disable Bus Master unconditionally in pci_device_shutdown()
  PCI: Catch attempts to disable already-disabled devices

Mika Westerberg (1):
  ACPI / scan: Fix check of device_attach() return value.

Myron Stowe (3):
  PCI: Introduce accessor to retrieve PCIe Capabilities Register
  PCI: Use PCI Express Capability accessor
  PCI/ASPM: Deallocate upstream link state even if device is not PCIe

Paul Bolle (1):
  PCI: pciehp: Drop suspend/resume ENTRY messages

Rafael J. Wysocki (25):
  ACPI: Separate adding ACPI device objects from probing ACPI drivers
  ACPI: Change the ordering of PCI root bridge driver registrarion
  ACPI: Make acpi_bus_add() and acpi_bus_start() visibly different
  ACPI: Reduce the usage of struct acpi_bus_ops
  ACPI: Replace struct acpi_bus_ops with enum type
  ACPI: Change the ordering of acpi_bus_check_add()
  ACPI / PCI: Fold acpi_pci_root_start() into acpi_pci_root_add()
  ACPI: Remove acpi_start_single_object() and acpi_bus_start()
  ACPI: Remove the arguments of acpi_bus_add() that are not used
  ACPI: Drop the seco

Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-25 Thread Bjorn Helgaas

On Mon, Feb 25, 2013 at 9:37 AM, Shuah Khan  wrote:
> On Wed, 2013-02-20 at 18:19 -0700, Bjorn Helgaas wrote:
>> On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
>> > pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
>> > it doesn't have interfaces to return PCI bus and PCI device id. Drivers
>> > (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
>> > and AMD_IOMMU driver also has a module specific interface to calculate PCI
>> > device id from bus number and devfn.
>> >
>> > Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI 
>> > device
>> > id respectively to avoid the need for duplicate definitions in other 
>> > modules.
>> > AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver 
>> > defines
>> > an interface to calculate device id from bus number, and devfn pair.
>> >
>> > Signed-off-by: Shuah Khan 
>> > ---
>> >  include/uapi/linux/pci.h |4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
>> > index 3c292bc0..6b2c8b3 100644
>> > --- a/include/uapi/linux/pci.h
>> > +++ b/include/uapi/linux/pci.h
>> > @@ -30,6 +30,10 @@
>> >  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>> >  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>> >  #define PCI_FUNC(devfn)((devfn) & 0x07)
>> > +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
>> > +
>> > +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
>> > +#define PCI_BUS(x) (((x) >> 8) & 0xff)
>> >
>> >  /* Ioctls for /proc/bus/pci/X/Y nodes. */
>> >  #define PCIIOC_BASE('P' << 24 | 'C' << 16 | 'I' << 8)
>>
>> David, can you point me at a description of include/uapi ... what is
>> there and why, and how we should decide what new things go in
>> include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
>> there should be something in Documentation/?
>>
>> I'm guessing it's something to do with being exported to userland, but
>> I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
>> exportable in the sense of being used for syscalls, etc.
>>
>
> Bjorn,David,
>
> Looks like the following thread answers some of the questions about when
> this uapi export was done on the existing defines.
>
> https://lkml.org/lkml/2011/7/28/198
>
> Sounds like the concern is that the older defines PCI_DEVFN, PCI_SLOT,
> PCI_FUNC,  and PCI_DEVID could be exported, but not the new ones I
> added. I could find any discussion on whether these four older defines
> are exportable or the reasons for the export in the above thread.

I think David's disintegration script took include/linux/pci.h, left
the #ifdef __KERNEL__ parts there, and moved everything else (which
wasn't much) to include/uapi/linux/pci.h.

It's obvious that the PCIIOC_ #defines need to be exported to
user-space for ioctls.  It's not obvious to me why PCI_DEVFN,
PCI_SLOT, and PCI_FUNC need to be exported to user-space.  But I can
imagine user-space using functionality like that, even if it's not
connected to a kernel interface.  I assume the intent of the
disintegration is that only include/uapi would be exposed to
user-space, so keeping those definitions in include/linux/pci.h would
break any user programs that used them.

> So the question is if uapi/linux.pci.h isn't the right place, do you
> have a recommendation on where they belong. The only alternative I can
> think of is include/linux/pci.h. It makes functional and logical sense
> to add the new defines to where the existing ones are defines. At least,
> not knowing the details of the change that moved PCI_DEVFN etc. to
> uapi/pci.h, that is my conclusion.

Using the linux-fullhist tree, I found these:

059d367 Import 2.1.82 -- moved PCI_DEVFN outside #ifdef __KERNEL__
b039547 Import 2.1.76 -- PCI_DEVFN was inside #ifdef __KERNEL__
f6d9739 Import 2.1.68pre1 -- added #ifdef __KERNEL__ (enclosing PCI_DEVFN)
940649f Import 1.3.0 -- added PCI_DEVFN

There's no indication of *why* PCI_DEVFN was exported, of course.

Bottom line, I think it's reasonable to keep PCI_DEVFN, et al., in
uapi/linux/pci.h to keep from breaking user-programs, even though if
we were adding them today we would probably put them in the
kernel-only linux/pci.h.  For the new ones you're adding, I'd propose
putting them in the kernel-only linux/pci.h because we know no user
programs use them.

It's not nice and consistent, but it does follow the simple rule of
"don't expose things to user-space unnecessarily."  We might want to
add a comment to keep somebody from cleaning it up later.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-25 Thread Bjorn Helgaas

On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
> and AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
> id respectively to avoid the need for duplicate definitions in other modules.
> AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
> an interface to calculate device id from bus number, and devfn pair.
>
> Signed-off-by: Shuah Khan 
> ---
>  include/uapi/linux/pci.h |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index 3c292bc0..6b2c8b3 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -30,6 +30,10 @@
>  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn)((devfn) & 0x07)
> +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
> +
> +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> +#define PCI_BUS(x) (((x) >> 8) & 0xff)

BTW, in the next round, maybe we should call this PCI_BUS_NR() or
similar to avoid confusion with "struct pci_bus"?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Abysmal HDD/USB write speed after sleep on a UEFI system

2013-02-25 Thread Bjorn Helgaas

On Tue, Feb 19, 2013 at 9:22 AM, Alan Stern  wrote:
> On Tue, 12 Feb 2013, Bjorn Helgaas wrote:
>
>> [+cc linux-pci, Rafael, Alan]
>>
>> [https://bugzilla.kernel.org/show_bug.cgi?id=53551]
>>
>> On Tue, Feb 12, 2013 at 1:13 PM, Artem S. Tashkinov  
>> wrote:
>> > Feb 13, 2013 01:32:53 AM, Linus Torvalds wrote:
>> > On Tue, Feb 12, 2013 at 10:29 AM, Artem S. Tashkinov wrote:
>> >>> Feb 12, 2013 11:30:20 PM, Linus Torvalds wrote:
>> >>>>
>> >>>>A few things to try to pinpoint:
>> >>>>
>> >>>> (a) Is it *only* write performance that suffers, or is it other
>> >>>>performance too? Networking (DMA? Perhaps only writing *to* the
>> >>>>network?)? CPU?
>> >>>
>> >>> I  've tested hdpard -tT --direct and the output on boot and after 
>> >>> suspend
>> >>> is quite similar.
>> >>>
>> >>> I  've also checked my network read/write speed, and it  's the same
>> >>> ~ 100MBit/sec (I have no 1Gbit computers on my network
>> >>> unfortunately).
>> >>
>> >>Ok. So it really sounds like just USB and HD writes. Which is quite
>> >>odd, since they have basically nothing in common I can think of
>> >>(except the obvious block layer issues).
>
> There's a slight chance that we might get some ideas by comparing
> usbmon traces showing disk activity before and after the
> problem-causing suspend.

Where are we at with this, Artem?  I assume it's still a problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Abysmal HDD/USB write speed after sleep on a UEFI system

2013-02-26 Thread Bjorn Helgaas

On Mon, Feb 25, 2013 at 11:35 PM, Artem S. Tashkinov  wrote:
> Feb 26, 2013 03:57:52 AM, Bjorn Helgaas wrote:
>>
>>Where are we at with this, Artem?  I assume it's still a problem.
>>
>
> Yes, it is, Bjorn.
>
> In order to eliminate this problem I switched back to MBR yesterday, because
> so far I haven't received any instructions or guidance as to how I can debug
> it further. I'm absolutely sure USB write speed is just another manifestation 
> of
> it so I decided not to debug USB specifically (it just doesn't make too much
> sense).
>
> What I see is that something terribly wrong is going on but if Linus has no 
> ideas
> I, as an average Joe, don't have a slightest clue as to what I can do.
>
> The bug report with necessary, but seemingly useless information, can be
> found here: https://bugzilla.kernel.org/show_bug.cgi?id=53551
>
> If anyone comes up with new ideas I can quickly try UEFI again now that I
> have two HDDs at my disposal (the old one is formatted as GPT, the new one is
> MBR).

The ideas I saw are:

1) Figure out whether it ever worked.  If an older kernel worked
correctly and a newer one is broken, bisection is at least a
possibility.  You mentioned that it did work before (Feb 12), but in
the past you never suspended twice in one boot session, whereas maybe
you did when seeing the problem?

2) Try the "setpci" to set the MSI address back to the original value
to see if it makes a difference (see my Feb 12 message).

3) Collect "lspci -vvv -" output to investigate the XHCI
Unsupported Request errors.

4) Use usbmon to collect traces before and after the suspend.

I googled around a bit looking for similar reports.  I found lots of
suspend issues, mostly with Windows, but no leads yet.  It looks like
the board has been around for a while, so you would think we'd have
some other reports of a problem this bad.  But maybe it really is
related to UEFI and nobody really uses that yet?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: do not try to assign irq 255

2013-02-27 Thread Bjorn Helgaas

[+cc Andy]

On Wed, Feb 20, 2013 at 11:53 PM, Hannes Reinecke  wrote:
> On 02/20/2013 05:57 PM, Yinghai Lu wrote:
>>
>> On Tue, Feb 19, 2013 at 11:58 PM, Hannes Reinecke  wrote:


>>> Apparently this device is meant to use MSI _only_ so the BIOS developer
>>> didn't feel the need to assign an INTx here.
>>>
>>> According to PCI-3.0, section 6.8 (Message Signalled Interrupts):

 It is recommended that devices implement interrupt pins to
 provide compatibility in systems that do not support MSI
 (devices default to interrupt pins). However, it is expected
 that the need for interrupt pins will diminish over time.
 Devices that do not support interrupt pins due to pin
 constraints (rely on polling for device service) may implement
 messages to increase performance without adding additional pins. >
 Therefore, system configuration software must not assume that a
 message capable device has an interrupt pin.
>>>
>>>
>>> Which sounds to me as if the implementation is valid...
>>
>>
>> it seems you mess pin with interrupt line.
>>
>> current code:
>>  unsigned char irq;
>>
>>  pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq);
>>  dev->pin = irq;
>>  if (irq)
>>  pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &irq);
>>  dev->irq = irq;
>>
>> so if the device does not have interrupt pin implemented, pin should be
>> zero.
>> and  pin and irq in dev should
>> be all 0.
>>
> But the device _has_ an interrupt pin implemented.
> The whole point here is that the interrupt line is _NOT_ zero.
>
> 00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series
> Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04) (prog-if 30
> [XHCI])
> Subsystem: Hewlett-Packard Company Device [103c:179b]
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> SERR-  Interrupt: pin A routed to IRQ 255
> Region 0: Memory at d472 (64-bit, non-prefetchable) [size=64K]
> Capabilities: [70] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
> PME(D0-,D1-,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [80] MSI: Enable- Count=1/8 Maskable- 64bit+
> Address:   Data: 
>
> So at one point we have to decide that ->irq is not valid, despite it being
> not set to zero.
> An alternative fix would be this:
>
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index 68a921d..4a480cb 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -469,6 +469,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
> } else {
> dev_warn(&dev->dev, "PCI INT %c: no GSI\n",
>  pin_name(pin));
> +   dev->irq = 0;
> }
> return 0;
> }
>
> Which probably is a better solution, as here ->irq is _definitely_
> not valid, so we should reset it to '0' to avoid confusion on upper
> layers.

I didn't like the pci_read_irq() change because the PCI spec doesn't
say anything about any PCI_INTERRUPT_LINE values being invalid.

I like this solution better, but I still don't quite understand it.
We have the following code in acpi_pci_irq_enable().  We have
previously tried to look up "gsi," but the _PRT doesn't mention this
device, so we have "gsi == -1" at this point:

/*
 * No IRQ known to the ACPI subsystem - maybe the BIOS /
 * driver reported one, then use it. Exit in any case.
 */
if (gsi < 0) {
u32 dev_gsi;
/* Interrupt Line values above 0xF are forbidden */
if (dev->irq > 0 && (dev->irq <= 0xF) &&
(acpi_isa_irq_to_gsi(dev->irq, &dev_gsi) == 0)) {
dev_warn(&dev->dev, "PCI INT %c: no GSI -
using ISA IRQ %d\n",
 pin_name(pin), dev->irq);
acpi_register_gsi(&dev->dev, dev_gsi,
  ACPI_LEVEL_SENSITIVE,
  ACPI_ACTIVE_LOW);
} else {
dev_warn(&dev->dev, "PCI INT %c: no GSI\n",
 pin_name(pin));
}

return 0;
}

1) I don't know where the restriction of 0x1-0xF came from.
Presumably this value of dev->irq came from PCI_INTERRUPT_LINE, and I
don't know what forbids values > 0xF.  The test was added by Andy
Grover in the attached commit.  This is ancient history; probably Andy
doesn't remember either :)

2) The proposed change (setting "dev->irq = 0" when we didn't find
anything in the _PRT and dev->irq > 0xF) throws away some informatio

Re: [PATCH] ACPI, PCI: Get PRT entry during acpi_pci_enable_irq()

2013-02-19 Thread Bjorn Helgaas

On Fri, Feb 15, 2013 at 6:37 PM, Yinghai Lu  wrote:
> On Fri, Feb 15, 2013 at 5:26 PM, Yinghai Lu  wrote:
>> On Fri, Feb 15, 2013 at 4:39 PM, Bjorn Helgaas  wrote:
>>> On Thu, Feb 14, 2013 at 5:50 PM, Yinghai Lu  wrote:
>>>> On Tue, Feb 12, 2013 at 12:22 PM, Rafael J. Wysocki  wrote:
>>>>> On Tuesday, February 12, 2013 11:11:23 AM Yinghai Lu wrote:
>>>>>> Peter Hurley found "irq 18 nobody cared" with pci-next, and dmesg has
>>>>>>
>>>>>> [8.983246] pci :00:1e.0: can't derive routing for PCI INT A
>>>>>> [8.983600] snd_ctxfi :09:02.0: PCI INT A: no GSI - using ISA IRQ 
>>>>>> 5
>>>>>>
>>>>>> bisect to
>>>>>> | commit 4f535093cf8f6da8cfda7c36c2c1ecd2e9586ee4
>>>>>> | PCI: Put pci_dev in device tree as early as possible
>>>>>>
>>>>>> It turns out we need to call acpi_pci_irq_add_prt() after the pci bridges
>>>>>> are scanned.
>>>>>>
>>>>>> Bjorn said:
>>>>>>  The bus number binding means acpi_pci_irq_add_prt() has to happen
>>>>>>  after enumerating everything below a bridge, and it will prevent us
>>>>>>  from doing any bus number reassignment for hotplug.
>>>>>>
>>>>>>  I think we should remove the bus numbers from the cached _PRT (or
>>>>>>  maybe even remove the _PRT caching completely).  When we enable a PCI
>>>>>>  device's IRQ, we should search up the PCI device tree looking for a
>>>>>>  _PRT associated with each node, and applying normal PCI bridge
>>>>>>  swizzling when we don't find a _PRT.  I think this can be done without
>>>>>>  using PCI bus numbers at all.
>>>>>>
>>>>>> So here we try to remove _PRT caching completely.
>>>>>>
>>>>>> -v2: check !handle early.
>>>>>>
>>>>>> Reported-and-tested-by: Peter Hurley 
>>>>>> Suggested-by: Bjorn Helgaas 
>>>>>> Signed-off-by: Yinghai Lu 
>>>>>
>>>>> Acked-by: Rafael J. Wysocki 
>>>>>
>>>>>> ---
>>>>>>  drivers/acpi/pci_irq.c  |   95 
>>>>>> +---
>>>>>>  drivers/acpi/pci_root.c |   18 
>>>>>>  drivers/pci/pci-acpi.c  |   24 ---
>>>>>>  include/acpi/acpi_drivers.h |5 --
>>>>>>  4 files changed, 38 insertions(+), 104 deletions(-)
>>>>
>>>> Bjorn,
>>>>
>>>> Can you put this one into pci/next?
>>>
>>> I'm not sure what this patch is based on or what the best way to merge
>>> it is.  It doesn't apply cleanly to my next or
>>> pci/yinghai-root-bus-hotplug branches.
>>
>> My fault, that is based on pci/next + pm/linux-next
>>
>> linux-next removed
>> acpi_power_resource_(un)register_device ...
>>
>>>
>>> I did apply it manually on top of pci/yinghai-root-bus-hotplug to try
>>> it out, but we need to tweak the messages a little bit.
>>>
>>> Previously we printed "ACPI: PCI Interrupt Routing Table [%s._PRT]"
>>> once when loading it, which was fine.  Now we print it every time we
>>> look at a _PRT, which is too much because it isn't really adding any
>>> information.
>>>
>>> We also print "ACPI Exception: AE_NOT_FOUND, Evaluating _PRT
>>> [AE_NOT_FOUND] (20121018/pci_irq-259)" if we find ACPI nodes without
>>> _PRTs, which we shouldn't do, because that's a common and normal
>>> situation.
>>
>> Sure. Can you have separated patch to do that ?
>>
>> Or want me to resend the patch.
>
> Please check attached updated version that remove print out ...
>
> and it could be applied cleanly on top of pci/yinghai-root-bus-hotplug

Thanks, I applied this to pci/yinghai-root-bus-hotplug and merged it
into my next branch.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: at drivers/pci/pci.c:1397 pci_disable_device

2013-02-19 Thread Bjorn Helgaas

On Tue, Feb 19, 2013 at 11:34 AM, Jiri Slaby  wrote:
> Hi,
>
> so I hit that one:
> +   dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
> + "disabling already-disabled device");
>
> during suspend (to ram):
> WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x90/0xa0()
> Hardware name: To Be Filled By O.E.M.
> Device e1000e
> disabling already-disabled device
> Modules linked in: dvb_usb_dib0700 dib0090 dib7000p dib7000m dib0070
> dib8000 dib3000mc dibx000_common microcode
> Pid: 31027, comm: kworker/u:35 Not tainted 3.8.0-rc7-next-20130218_64+ #1768
> Call Trace:
>  [] ? do_pci_disable_device+0x30/0x60
>  [] warn_slowpath_common+0x7f/0xc0
>  [] warn_slowpath_fmt+0x46/0x50
>  [] pci_disable_device+0x90/0xa0
>  [] __e1000_shutdown+0x262/0x8b0
>  [] e1000_suspend+0x23/0x50
>  [] ? wait_for_completion+0x31/0x100
>  [] pci_pm_suspend+0x77/0x140
>  [] ? __pm_runtime_barrier+0x1a/0x130
>  [] ? pci_pm_poweroff+0xf0/0xf0
>  [] dpm_run_callback+0x58/0x90
>  [] __device_suspend+0xeb/0x280
>  [] async_suspend+0x1f/0xa0
>  [] async_run_entry_fn+0x3b/0x140
>  [] process_one_work+0x174/0x410
>  [] worker_thread+0x116/0x400
>  [] ? busy_worker_rebind_fn+0xc0/0xc0
>  [] kthread+0xc0/0xd0
>  [] ? kthread_create_on_node+0x130/0x130
>  [] ret_from_fork+0x7c/0xb0
>  [] ? kthread_create_on_node+0x130/0x130
> ---[ end trace 6c5060f8b8fb9175 ]---
> e1000e :00:19.0: System wakeup enabled by ACPI

I think Konstantin posted an e1000e patch that would fix this, but
it's going via the e1000e folks.  Hopefully both my tree and the
e1000e patch will be merged soon.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: do not try to assign irq 255

2013-02-19 Thread Bjorn Helgaas

On Mon, Feb 18, 2013 at 3:09 AM, Hannes Reinecke  wrote:
> The PCI config space reseves a byte for the interrupt line,
> so irq 255 actually refers to 'not set'.
> However, the 'irq' field for struct pci_dev is an integer,
> so the original meaning is lost, causing the system to
> assign an interrupt '255', which fails.
>
> So we should _not_ assign an interrupt value here, and
> allow upper layers to fixup things.
>
> This patch make PCI devices with MSI interrupts only
> (like the xhci device on certain HP laptops) work properly.
>
> Cc: Frederik Himpe 
> Cc: Oliver Neukum 
> Cc: David Haerdeman 
> Cc: linux-...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Hannes Reinecke 
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6186f03..a2db887f 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -923,7 +923,8 @@ static void pci_read_irq(struct pci_dev *dev)
> dev->pin = irq;
> if (irq)
> pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &irq);
> -   dev->irq = irq;
> +   if (irq < 255)
> +   dev->irq = irq;
>  }
>
>  void set_pcie_port_type(struct pci_dev *pdev)

Is there a bugzilla or other URL with more information (problem
description, hardware involved, dmesg logs, acpidump, etc)?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] pci: Add PCI_BUS() and PCI_DEVID() interfaces to return bus number and device id

2013-02-20 Thread Bjorn Helgaas

On Mon, Feb 11, 2013 at 4:00 PM, Shuah Khan  wrote:
> pci defines PCI_DEVFN(), PCI_SLOT(), and PCI_FUNC() interfaces, however,
> it doesn't have interfaces to return PCI bus and PCI device id. Drivers
> (AMD IOMMU, and AER) implement module specific definitions for PCI_BUS()
> and AMD_IOMMU driver also has a module specific interface to calculate PCI
> device id from bus number and devfn.
>
> Add PCI_BUS and PCI_DEVID interfaces to return PCI bus number and PCI device
> id respectively to avoid the need for duplicate definitions in other modules.
> AER driver code and AMD IOMMU driver define PCI_BUS. AMD IOMMU driver defines
> an interface to calculate device id from bus number, and devfn pair.
>
> Signed-off-by: Shuah Khan 
> ---
>  include/uapi/linux/pci.h |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index 3c292bc0..6b2c8b3 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -30,6 +30,10 @@
>  #define PCI_DEVFN(slot, func)  slot) & 0x1f) << 3) | ((func) & 0x07))
>  #define PCI_SLOT(devfn)(((devfn) >> 3) & 0x1f)
>  #define PCI_FUNC(devfn)((devfn) & 0x07)
> +#define PCI_DEVID(bus, devfn)  u16)bus) << 8) | devfn)
> +
> +/* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> +#define PCI_BUS(x) (((x) >> 8) & 0xff)
>
>  /* Ioctls for /proc/bus/pci/X/Y nodes. */
>  #define PCIIOC_BASE('P' << 24 | 'C' << 16 | 'I' << 8)

David, can you point me at a description of include/uapi ... what is
there and why, and how we should decide what new things go in
include/uapi/linux/pci.h as opposed to include/linux/pci.h?  Maybe
there should be something in Documentation/?

I'm guessing it's something to do with being exported to userland, but
I'm not sure the things in this patch (PCI_DEV_ID, PCI_BUS) are really
exportable in the sense of being used for syscalls, etc.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] PCI: fix system hang issue of Marvell SATA host controller

2013-03-12 Thread Bjorn Helgaas

On Tue, Mar 12, 2013 at 3:22 AM, Xiangliang Yu  wrote:
> Hi, Myron
>> > Now, the situation is like this:
>> > I captured the PCIE trace with analyzer and found that 1st BE is 0x
>> > when
>> > accessing IO port space. But 9125 spec has some limitation, and the BE
>> > must
>> > be
>> > 0x0100, to access the 2nd byte only. So, the chip will go to bad.
>>
>> Great, this is new, interesting, data.  Is the 9125 spec publicly
>> accessible and/or could you elaborate on the "some limitation"
>> comment?
> 9125 spec is publicly accessible.

Please provide a URL for the spec.

>> A byte enable (BE) of 0x suggests the CPU did a 32-bit I/O port
>> read.  Does the 9125 device only support one-byte I/O port accesses
>> and when presented with larger request types it doesn't respond
>> properly?
> Yes, the hardware engineer had confirmed the situation.

Please provide a URL for the erratum describing this 9125 issue.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-03-12 Thread Bjorn Helgaas

On Sat, Mar 9, 2013 at 2:20 AM, Chris Clayton  wrote:
> Hi Bjorn,
>
>
> On 03/08/13 22:57, Bjorn Helgaas wrote:
>>
>> [+cc Rafael, in case the _OSC thing rings a bell with him]
>>
>> On Fri, Mar 8, 2013 at 3:44 AM, Chris Clayton 
>> wrote:
>>>
>>> On 03/08/13 00:39, Bjorn Helgaas wrote:
>>>>
>>>> On Thu, Mar 7, 2013 at 1:21 PM, Chris Clayton 
>>>> wrote:
>>>>>
>>>>> On 03/07/13 17:30, Bjorn Helgaas wrote:

>>>> It looks like something's going wrong when we evaluate _OSC.  Can you
>>>> collect an acpidump from your machine?
>>>>
>>> A bziped file containing the output from acpidump is attached.
>>
>>
>> Thanks.  I opened this bug report:
>> https://bugzilla.kernel.org/show_bug.cgi?id=54981 to keep track of
>> your logs.
>>
>
> Excellent, thanks.
>
>
>> Here's your _OSC method from the acpidump:
>>
>>  Method (_OSC, 4, Serialized) {
>>  ...
>>  If (LAnd (LEqual (Arg0, GUID), NEXP))
>>  ... # normal case
>>  Else {
>>  Or (CDW1, 0x04, CDW1)  # "unrecognized UUID" error
>>  Return (Local0)
>>  }
>>
>> It fails with "unrecognized UUID" if either (1) we supply the wrong
>> UUID or (2) "NEXP" is false.  I have no idea what NEXP is; your
>> DSDT.dsl never sets it, so maybe it's related to a BIOS setup option
>> or something?  I found a BIOS manual [1] but didn't see anything
>> likely.  I guess it might be worth you looking, or maybe trying a
>> "reset to defaults" if it's not too destructive for you.  You don't
>
>
> As the manual showed, there aren't too many user-changeable settings in the
> BIOS on this machine, so I tried a "reset to defaults". Unfortunately, it
> made no difference.

Yeah, no surprise, I guess, since it works fine in Windows with the
current BIOS settings.

I looked at the DSDTs from several other machines and quite a few of
them use similar NEXP tests.  NEXP is in the GNVS area, which is
apparently used for communication between BIOS/ACPI/SMI, and I think
it means "Native PCIe support."  The BIOS probably initializes it,
possibly based on the machine type or something.

The _OSC description (ACPI 5.0 sec 6.2.10.3) is pretty clear that if a
host bridge doesn't have an _OSC method, the OS should not use
features like PCIe native hotplug.  That's why the Linux pciehp driver
isn't doing anything here.  It's possible that Windows is using PCIe
native hotplug anyway (maybe they figure "unrecognized UUID" is
different from _OSC being completely absent), or maybe there's some
way it could be using ACPI hotplug (though I *think* that would
require some more methods that your box doesn't suppy).

My guess is the former, and maybe we can figure out a way to relax
Linux's strictness around _OSC handling.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-03-15 Thread Bjorn Helgaas

On Tue, Mar 12, 2013 at 4:20 PM, Bjorn Helgaas  wrote:
> On Sat, Mar 9, 2013 at 2:20 AM, Chris Clayton  
> wrote:
>> On 03/08/13 22:57, Bjorn Helgaas wrote:

>>> Thanks.  I opened this bug report:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=54981 to keep track of
>>> your logs.

Hi Chris,

The current Linux acpiphp driver doesn't do anything unless it finds
devices with _EJ0 or _RMV methods, and your DSDT has neither.  But I
think that implementation is incorrect because I'm not convinced that
those methods are required in order to do hotplug via ACPI.  For
example, your DSDT *does* have an _L01 method that does notifications
to the root ports.  I suspect that hotplug events on your box generate
an SCI that invokes that method.  Linux basically ignores the
resulting notify events, and I suspect that hotplug works on Windows
because it is paying attention to them.

Can you build a kernel with CONFIG_ACPI_DEBUG=y, do the following, and
attach all the output to the bugzilla?

1) Boot with empty ExpressCard slot (without using "pcie_ports=native")
2) # echo 0x00010004 > /sys/module/acpi/parameters/debug_layer
3) # echo 0x0804 > /sys/module/acpi/parameters/debug_level
4) # lspci -vv
5) # setpci -s 1c.3 0x42.w
6) # setpci -s 1c.3 0x5a.w
7) # setpci -s 1c.3 0xd8.l
8) Insert ExpressCard
9) # setpci -s 1c.3 0x5a.w
10) # dmesg

Here's what I think we'll see:

- Slot Implemented (bit 8 of XCAP at 0x42) set, indicating a slot is
implemented below this root port
- Hot Plug SCI Enable (bit 30 of MPC at 0xd8) set, indicating that the
root port should generate an SCI whenever a hotplug event is detected
- Presence Detect State (bit 6 of SLTSTS at 0x5a) change from 0 with
the slot empty to 1 with the slot occupied
- pciehp doing nothing (since _OSC didn't grant the OS permission to
use PCIe native hotplug)
- dmesg indication of the SCI, leading to a Bus Check notification to
\_SB.PCI0.RP04, which is the 1c.3 root port leading to the ExpressCard
slot

A Bus Check notification means we're supposed to re-enumerate starting
at that device.  If we *did* re-enumerate, we would find the new
ExpressCard.

Thanks,
  Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] seq_file: Use seq_puts when seq_printf has only a format with no args

2013-03-16 Thread Bjorn Helgaas

On Sat, Mar 16, 2013 at 7:50 AM, Joe Perches  wrote:
> Instead of converting the 800 or so uses of seq_printf with
> a constant format (without a % substitution) to seq_puts,
> maybe there's another way to slightly speed up these outputs.
>
> Taking a similar approach to commit abd84d60eb
> ("tracing: Optimize trace_printk() with one arg to use trace_puts()")
> use the preprocessor to convert seq_printf(seq, "string constant")
> to seq_puts(seq, "string constant")
>
> By stringifying __VA_ARGS__, we can, at compile time, determine
> the number of args that are being passed to seq_printf() and
> call seq_puts or seq_printf appropriately.
>
> The actual function definition for seq_printf must now
> be enclosed in parenthesis to avoid further macro expansion.

This is certainly a neat trick.

But I don't really like the fact that it complicates things for every
future code reader, especially when a trivial change in the caller
would accomplish the same thing.  Do you have any idea how much
performance we would gain in exchange for the complication?

Checkpatch could look for additions of seq_printf() with constant formats.

> Signed-off-by: Joe Perches 
> ---
>  fs/seq_file.c|  7 ++-
>  include/linux/seq_file.h | 24 
>  2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 38bb59f..d3a957d 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -405,7 +405,12 @@ int seq_vprintf(struct seq_file *m, const char *f, 
> va_list args)
>  }
>  EXPORT_SYMBOL(seq_vprintf);
>
> -int seq_printf(struct seq_file *m, const char *f, ...)
> +/*
> + * seq_printf is also a macro that expands to seq_printf or seq_puts.
> + * This means that the actual function definition must be in parenthesis
> + * to prevent the preprocessor from expanding this function name again.
> + */
> +int (seq_printf)(struct seq_file *m, const char *f, ...)
>  {
> int ret;
> va_list args;
> diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
> index 68a04a3..7255f01 100644
> --- a/include/linux/seq_file.h
> +++ b/include/linux/seq_file.h
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct seq_operations;
>  struct file;
> @@ -92,6 +93,29 @@ int seq_write(struct seq_file *seq, const void *data, 
> size_t len);
>  __printf(2, 3) int seq_printf(struct seq_file *, const char *, ...);
>  __printf(2, 0) int seq_vprintf(struct seq_file *, const char *, va_list 
> args);
>
> +/*
> + * A little optimization trick is done here. If there's only one
> + * argument, there's no need to scan the string for printf formats.
> + * seq_puts() will suffice. But how can we take advantage of
> + * using seq_puts() when seq_printf() has only one argument?
> + * By stringifying the args and checking the size we can tell
> + * whether or not there are args. __stringify(__VA_ARGS__) will
> + * turn into "" with a size of 1 when there are no args, anything
> + * else will be bigger. All we need to do is define a string to this,
> + * and then take its size and compare to 1. If it's bigger, use
> + * seq_printf() otherwise, optimize it to seq_puts(). Then just
> + * let gcc optimize the rest.  The actual function definition of
> + * seq_printf must be (seq_printf) to prevent further macro expansion.
> + */
> +#define seq_printf(seq, fmt, ...)  \
> +do {   \
> +   char va_args[] = __stringify(__VA_ARGS__);  \
> +   if (sizeof(va_args) > 1)\
> +   seq_printf(seq, fmt, ##__VA_ARGS__);\
> +   else\
> +   seq_puts(seq, fmt); \
> +} while (0)
> +
>  int seq_path(struct seq_file *, const struct path *, const char *);
>  int seq_dentry(struct seq_file *, struct dentry *, const char *);
>  int seq_path_root(struct seq_file *m, const struct path *path,
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6731 matches

Mail list logo