Re: [Linux-stm32] [PATCH v2 00/14] Introduce STM32MP1 RCC in secured mode

2021-03-11 Thread Alex G.
2-As you suggest, create a new "secure" dtb per boards (Not my wish for maintenance perspectives). I agree with Alex (G) that the "secure" option should be opt-in. That way existing setups remain working and no extra requirements are imposed on MP1 users. Esp. since as

Re: [PATCH v2 00/14] Introduce STM32MP1 RCC in secured mode

2021-03-09 Thread Alex G.
On 1/26/21 3:01 AM, gabriel.fernan...@foss.st.com wrote: From: Gabriel Fernandez Platform STM32MP1 can be used in configuration where some clocks and IP resets can relate as secure resources. These resources are moved from a RCC clock/reset handle to a SCMI clock/reset_domain handle. The RCC c

Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"

2021-02-02 Thread Alex G.
On 2/2/21 2:16 PM, Bjorn Helgaas wrote: On Tue, Feb 02, 2021 at 01:50:20PM -0600, Alex G. wrote: On 1/29/21 3:56 PM, Bjorn Helgaas wrote: On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote: On 1/28/21 5:51 PM, Sinan Kaya wrote: On 1/28/2021 6:39 PM, Bjorn Helgaas wrote: AFAICT, this

Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"

2021-02-02 Thread Alex G.
On 1/29/21 3:56 PM, Bjorn Helgaas wrote: On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote: On 1/28/21 5:51 PM, Sinan Kaya wrote: On 1/28/2021 6:39 PM, Bjorn Helgaas wrote: AFAICT, this thread petered out with no resolution. If the bandwidth change notifications are important to

Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"

2021-01-28 Thread Alex G.
On 1/28/21 5:51 PM, Sinan Kaya wrote: On 1/28/2021 6:39 PM, Bjorn Helgaas wrote: AFAICT, this thread petered out with no resolution. If the bandwidth change notifications are important to somebody, please speak up, preferably with a patch that makes the notifications disabled by default and add

Re: [PATCH v2 1/2] drm/bridge: sii902x: Enable I/O and core VCC supplies if present

2020-10-20 Thread Alex G.
On 10/20/20 2:16 AM, Sam Ravnborg wrote: Hi Alex. [snip] diff --git a/drivers/gpu/drm/bridge/sii902x.c b/drivers/gpu/drm/bridge/sii902x.c index 33fd33f953ec..d15e9f2c0d8a 100644 --- a/drivers/gpu/drm/bridge/sii902x.c +++ b/drivers/gpu/drm/bridge/sii902x.c @@ -17,6 +17,7 @@ #include

Re: [PATCH v2 1/2] drm/bridge: sii902x: Enable I/O and core VCC supplies if present

2020-10-19 Thread Alex G.
On 9/28/20 12:30 PM, Alexandru Gagniuc wrote: On the SII9022, the IOVCC and CVCC12 supplies must reach the correct voltage before the reset sequence is initiated. On most boards, this assumption is true at boot-up, so initialization succeeds. However, when we try to initialize the chip with inco

Re: [PATCH 1/2] drm/bridge: sii902x: Enable I/O and core VCC supplies if present

2020-09-28 Thread Alex G.
On 9/26/20 1:49 PM, Sam Ravnborg wrote: Hi Alexandru On Thu, Sep 24, 2020 at 03:05:05PM -0500, Alexandru Gagniuc wrote: On the SII9022, the IOVCC and CVCC12 supplies must reach the correct voltage before the reset sequence is initiated. On most boards, this assumption is true at boot-up, so ini

Re: [PATCH 4/5] PCI: only return true when dev io state is really changed

2020-09-25 Thread Alex G.
Hi Ethan, On 9/24/20 9:34 PM, Ethan Zhao wrote: When uncorrectable error happens, AER driver and DPC driver interrupt handlers likely call pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with pci_channel_io_frozen the same time. If pci_dev_set_io_state() return true even if

Re: [PATCH 1/2] drm/bridge: sii902x: Enable I/O and core VCC supplies if present

2020-09-24 Thread Alex G.
On 9/24/20 3:22 PM, Fabio Estevam wrote: Hi Fabio, On Thu, Sep 24, 2020 at 5:16 PM Alexandru Gagniuc wrote: + ret = regulator_enable(sii902x->cvcc12); + if (ret < 0) { + dev_err(dev, "Failed to enable cvcc12 supply: %d\n", ret); + regulator_disable(sii9

Re: [PATCH v3 3/3] PCI: pciehp: Add dmi table for in-band presence disabled

2019-10-21 Thread Alex G.
On 10/21/19 1:19 PM, Stuart Hayes wrote: On 10/21/19 8:47 AM, Mika Westerberg wrote: On Thu, Oct 17, 2019 at 03:32:56PM -0400, Stuart Hayes wrote: Some systems have in-band presence detection disabled for hot-plug PCI slots, but do not report this in the slot capabilities 2 (SLTCAP2) register

Re: [PATCH 0/3] PCI: pciehp: Do not turn off slot if presence comes up after link

2019-10-02 Thread Alex G.
On 10/1/19 11:13 PM, Lukas Wunner wrote: On Tue, Oct 01, 2019 at 05:14:16PM -0400, Stuart Hayes wrote: This patch set is based on a patch set [1] submitted many months ago by Alexandru Gagniuc, who is no longer working on it. [1] https://patchwork.kernel.org/cover/10909167/ [v3,0/4] PCI: p

Re: [PATCH 3/3] PCI: pciehp: Add dmi table for in-band presence disabled

2019-10-01 Thread Alex G.
On 10/1/19 4:14 PM, Stuart Hayes wrote: Some systems have in-band presence detection disabled for hot-plug PCI slots, but do not report this in the slot capabilities 2 (SLTCAP2) register. On these systems, presence detect can become active well after the link is reported to be active, which c

Re: [PATCH] Revert "PCI/LINK: Report degraded links via link bandwidth notification"

2019-04-29 Thread Alex G
On 4/29/19 1:56 PM, Bjorn Helgaas wrote: From: Bjorn Helgaas This reverts commit e8303bb7a75c113388badcc49b2a84b4121c1b3e. e8303bb7a75c added logging whenever a link changed speed or width to a state that is considered degraded. Unfortunately, it cannot differentiate signal integrity-related

Re: [PATCH] PCI: Add link_change error handler and vfio-pci user

2019-04-24 Thread Alex G
On 4/24/19 12:19 PM, Alex Williamson wrote: On Wed, 24 Apr 2019 16:45:45 + wrote: On 4/23/2019 5:42 PM, Alex Williamson wrote: diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 7e12d0163863..233cd4b5b6e8 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2403,6 +2403

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-23 Thread Alex G
On 4/22/19 5:43 PM, Alex Williamson wrote: On systems that don't support any PCIe services other than bandwidth notification, pcie_message_numbers() can return zero vectors, causing the vector reallocation in pcie_port_enable_irq_vec() to retry with zero, which fails, resulting in fallback to

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-23 Thread Alex G
On 4/23/19 12:10 PM, Bjorn Helgaas wrote: On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote: On 4/22/19 7:33 PM, Alex Williamson wrote: There is nothing wrong happening here that needs to fill logs. I thought maybe if I enabled notification of autonomous bandwidth changes that it might

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-23 Thread Alex G
On 4/23/19 11:22 AM, Alex Williamson wrote: Nor should pci-core decide what link speed changes are intended or errors. Minimally we should be enabling drivers to receive this feedback. Thanks, Not errors. pci core reports that a link speed change event has occured. Period. Alex

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-23 Thread Alex G
On 4/23/19 10:34 AM, Alex Williamson wrote: On Tue, 23 Apr 2019 09:33:53 -0500 Alex G wrote: On 4/22/19 7:33 PM, Alex Williamson wrote: On Mon, 22 Apr 2019 19:05:57 -0500 Alex G wrote: echo :07:00.0:pcie010 | sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind That&#

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-23 Thread Alex G
On 4/22/19 7:33 PM, Alex Williamson wrote: On Mon, 22 Apr 2019 19:05:57 -0500 Alex G wrote: echo :07:00.0:pcie010 | sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind That's a bad solution for users, this is meaningless tracking of a device whose driver is act

Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation

2019-04-22 Thread Alex G
On 4/22/19 5:43 PM, Alex Williamson wrote: [ 329.725607] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link) [ 708.151488] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by

Re: [PATCH v1 1/3] PCI / ACPI: Do not export pci_get_hp_params()

2019-04-22 Thread Alex G
On 4/22/19 3:58 PM, Bjorn Helgaas wrote: On Fri, Feb 08, 2019 at 10:24:11AM -0600, Alexandru Gagniuc wrote: This is only used within drivers/pci, and there is no reason to make it available outside of the PCI core. Signed-off-by: Alexandru Gagniuc Applied the whole series to pci/hotplug for

Fixing the GHES driver vs not causing issues in the first place

2019-03-29 Thread Alex G.
The issue of dying inside the GHES driver has popped up a few times before. I've looked into fixing this before, but we didn't quite come to agreement because the verbiage in the ACPI spec is vague:     " When a fatal uncorrected error occurs, the system is       restarted to prevent propagation o

Re: [PATCH v2] PCI/LINK: bw_notification: Do not leave interrupt handler NULL

2019-03-25 Thread Alex G.
On 3/25/19 5:25 PM, Bjorn Helgaas wrote: On Fri, Mar 22, 2019 at 07:36:51PM -0500, Alexandru Gagniuc wrote: A threaded IRQ with a NULL handler does not work with level-triggered interrupts. request_threaded_irq() will return an error: genirq: Threaded irq requested with handler=NULL and !ONE

Re: [PATCH] PCI/LINK: Request a one-shot IRQ with NULL handler

2019-03-25 Thread Alex G.
Hi Borislav, Thanks for the update. We've settled on a different fix [1], since Lukas was not happy with IRQF_ONESHOT [2]. Alex [1] https://lore.kernel.org/linux-pci/20190323003700.7294-1-mr.nuke...@gmail.com/ [2] https://lore.kernel.org/linux-pci/20190318043314.noyj6t4sh26sp...@wunner.de/

Re: [PATCH v3] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2019-03-20 Thread Alex G
On 3/20/19 4:44 PM, Linus Torvalds wrote: On Wed, Mar 20, 2019 at 1:52 PM Bjorn Helgaas wrote: AFAICT, the consensus there was that it would be better to find some sort of platform solution instead of dealing with it in individual drivers. The PCI core isn't really a driver, but I think the s

Re: [PATCH] PCI/LINK: bw_notification: Do not leave interrupt handler NULL

2019-03-20 Thread Alex G.
On 3/20/19 8:46 AM, Bjorn Helgaas wrote: Hi Alexandru, On Mon, Mar 18, 2019 at 08:12:04PM -0500, Alexandru Gagniuc wrote: A threaded IRQ with a NULL handler does not work with level-triggered interrupts. request_threaded_irq() will return an error: genirq: Threaded irq requested with handle

Re: [GIT PULL] PCI changes for v5.1

2019-03-17 Thread Alex G
On 3/17/19 4:18 PM, Linus Torvalds wrote: On Fri, Mar 8, 2019 at 9:31 AM Bjorn Helgaas wrote: - Report PCIe links that become degraded at run-time (Alexandru Gagniuc) Gaah. Only now as I'm about to do the rc1 release am I looking at new runtime warnings, and noticing that this causes

Re: [PATCH v2] PCI: pciehp: Report degraded links via link bandwidth notification

2018-12-27 Thread Alex G.
On 12/7/18 12:20 PM, Alexandru Gagniuc wrote: A warning is generated when a PCIe device is probed with a degraded link, but there was no similar mechanism to warn when the link becomes degraded after probing. The Link Bandwidth Notification provides this mechanism. Use the link bandwidth notific

Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-11-05 Thread Alex G.
ping On 09/18/2018 05:15 PM, Alexandru Gagniuc wrote: When a PCI device is gone, we don't want to send IO to it if we can avoid it. We expose functionality via the irq_chip structure. As users of that structure may not know about the underlying PCI device, it's our responsibility to guard agains

Re: [PATCH] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

2018-08-29 Thread Alex G.
Should I resubmit this rebased on 4.19-rc*, or just leave this patch as is? Alex On 07/30/2018 04:21 PM, Alexandru Gagniuc wrote: When a PCI device is gone, we don't want to send IO to it if we can avoid it. We expose functionality via the irq_chip structure. As users of that structure may not

Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER

2018-08-09 Thread Alex G.
On 08/09/2018 02:18 PM, Bjorn Helgaas wrote: On Thu, Aug 09, 2018 at 02:00:23PM -0500, Alex G. wrote: On 08/09/2018 01:29 PM, Bjorn Helgaas wrote: On Thu, Aug 09, 2018 at 04:46:32PM +, alex_gagn...@dellteam.com wrote: On 08/09/2018 09:16 AM, Bjorn Helgaas wrote: (snip_

Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER

2018-08-09 Thread Alex G.
On 08/09/2018 01:29 PM, Bjorn Helgaas wrote: On Thu, Aug 09, 2018 at 04:46:32PM +, alex_gagn...@dellteam.com wrote: On 08/09/2018 09:16 AM, Bjorn Helgaas wrote: (snip_ enable_ecrc_checking() disable_ecrc_checking() I don't immediately see how this would affect FFS, but the bit

Re: [PATCH v3] PCI/AER: Do not clear AER bits if we don't own AER

2018-08-07 Thread Alex G.
On 08/07/2018 08:14 PM, Bjorn Helgaas wrote: On Mon, Jul 30, 2018 at 06:35:31PM -0500, Alexandru Gagniuc wrote: When we don't own AER, we shouldn't touch the AER error bits. Clearing error bits willy-nilly might cause firmware to miss some errors. In theory, these bits get cleared by FFS, or

Re: [PATCH v5] PCI: Check for PCIe downtraining conditions

2018-07-31 Thread Alex G.
On 07/31/2018 01:40 AM, Tal Gilboa wrote: [snip] @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)   /* Advanced Error Reporting */   pci_aer_init(dev); +    /* Check link and detect downtrain errors */ +    pcie_check_upstream_link(dev); This is called for e

Re: [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER

2018-07-24 Thread Alex G.
On 07/23/2018 11:52 AM, Alexandru Gagniuc wrote: When we don't own AER, we shouldn't touch the AER error bits. Clearing error bits willy-nilly might cause firmware to miss some errors. In theory, these bits get cleared by FFS, or via ACPI _HPX method. These mechanisms are not subject to the pr

Re: [PATCH v5] PCI: Check for PCIe downtraining conditions

2018-07-23 Thread Alex G.
On 07/23/2018 05:14 PM, Jakub Kicinski wrote: On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote: On 7/24/2018 12:01 AM, Jakub Kicinski wrote: On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote: PCIe downtraining happens when both the device and PCIe port are capable of a larger

Re: [PATCH v3] PCI: Check for PCIe downtraining conditions

2018-07-23 Thread Alex G.
On 07/23/2018 12:21 AM, Tal Gilboa wrote: On 7/19/2018 6:49 PM, Alex G. wrote: On 07/18/2018 08:38 AM, Tal Gilboa wrote: On 7/16/2018 5:17 PM, Bjorn Helgaas wrote: [+cc maintainers of drivers that already use pcie_print_link_status() and GPU folks] [snip] +    /* Multi-function PCIe

Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER

2018-07-19 Thread Alex G.
On 07/19/2018 11:58 AM, Sinan Kaya wrote: On 7/19/2018 8:55 AM, Alex G. wrote: I find the intent clearer if we check it here rather than having to do the mental parsing of the state of aer_cap. I don't feel too strong about my comment to be honest. This was a style/maintenance co

Re: [PATCH] PCI/AER: Do not clear AER bits if we don't own AER

2018-07-19 Thread Alex G.
On 07/17/2018 10:41 AM, Sinan Kaya wrote: On 7/17/2018 8:31 AM, Alexandru Gagniuc wrote: +    if (pcie_aer_get_firmware_first(dev)) +    return -EIO; Can you move this to closer to the caller pci_aer_init()? I could move it there. although pci_cleanup_aer_error_status_regs() is call

Re: [PATCH v3] PCI: Check for PCIe downtraining conditions

2018-07-19 Thread Alex G.
On 07/18/2018 08:38 AM, Tal Gilboa wrote: On 7/16/2018 5:17 PM, Bjorn Helgaas wrote: [+cc maintainers of drivers that already use pcie_print_link_status() and GPU folks] [snip] +    /* Multi-function PCIe share the same link/status. */ +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtf

Re: [PATCH v3] PCI: Check for PCIe downtraining conditions

2018-07-19 Thread Alex G.
On 07/18/2018 04:53 PM, Bjorn Helgaas wrote: [+cc Mike (hfi1)] On Mon, Jul 16, 2018 at 10:28:35PM +, alex_gagn...@dellteam.com wrote: On 7/16/2018 4:17 PM, Bjorn Helgaas wrote: ... The easiest way to detect this is with pcie_print_link_status(), since the bottleneck is usually the link

Re: [PATCH v3] PCI/AER: Fix aerdrv loading with "pcie_ports=native" parameter

2018-07-03 Thread Alex G.
On 07/03/2018 11:38 AM, Bjorn Helgaas wrote: > On Mon, Jul 02, 2018 at 11:16:01AM -0500, Alexandru Gagniuc wrote: >> According to the documentation, "pcie_ports=native", linux should use >> native AER and DPC services. While that is true for the _OSC method >> parsing, this is not the only place

Re: [PATCH v2] PCI/AER: Fix aerdrv loading with "pcie_ports=native" parameter

2018-07-02 Thread Alex G.
On 07/02/2018 08:16 AM, Bjorn Helgaas wrote: > On Sat, Jun 30, 2018 at 11:39:00PM -0500, Alex G wrote: >> On 06/30/2018 04:31 PM, Bjorn Helgaas wrote: >>> [+cc Borislav, linux-acpi, since this involves APEI/HEST] >> >> Borislav is not the relevant maintainer here,

Re: [PATCH v2] PCI/AER: Fix aerdrv loading with "pcie_ports=native" parameter

2018-06-30 Thread Alex G
On 06/30/2018 04:31 PM, Bjorn Helgaas wrote: [+cc Borislav, linux-acpi, since this involves APEI/HEST] Borislav is not the relevant maintainer here, since we're not contingent on APEI handling. I think Keith has a lot more experience with this part of the kernel. On Tue, Jun 19, 2018 at 02

Re: [PATCH] PCI: DPC: Clear AER status bits before disabling port containment

2018-06-26 Thread Alex G.
On 06/19/2018 04:57 PM, Bjorn Helgaas wrote: > On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote: >> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote: >>> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote: AER status bits are sticky, and they survive system resets. Downstrea

Re: [PATCH v2] PCI: Check for PCIe downtraining conditions

2018-06-01 Thread Alex G.
On 06/01/2018 10:10 AM, Sinan Kaya wrote: > On 6/1/2018 11:06 AM, Alex G. wrote: >> On 06/01/2018 10:03 AM, Sinan Kaya wrote: >>> On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote: >>>> + /* Multi-function PCIe share the same link/status. */ >>

Re: [PATCH v2] PCI: Check for PCIe downtraining conditions

2018-06-01 Thread Alex G.
On 06/01/2018 10:12 AM, Andy Shevchenko wrote: > On Fri, Jun 1, 2018 at 6:01 PM, Alexandru Gagniuc > wrote: >> PCIe downtraining happens when both the device and PCIe port are >> capable of a larger bus width or higher speed than negotiated. >> Downtraining might be indicative of other problems i

Re: [PATCH v2] PCI: Check for PCIe downtraining conditions

2018-06-01 Thread Alex G.
On 06/01/2018 10:03 AM, Sinan Kaya wrote: > On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote: >> +/* Multi-function PCIe share the same link/status. */ >> +if (PCI_FUNC(dev->devfn) != 0) >> +return; > > How about virtual functions? I have almost no clue about those. Is your conce

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 12:27 PM, Alex G. wrote: > On 05/31/2018 12:11 PM, Sinan Kaya wrote: >> On 5/31/2018 12:49 PM, Alex G. wrote: >>>>bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap); >>>>bw_avail = pcie_bandwidth_available(dev, &am

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 10:30 AM, Sinan Kaya wrote: > On 5/31/2018 11:05 AM, Alexandru Gagniuc wrote: >> +if (dev_cur_speed < max_link_speed) >> +pci_warn(dev, "PCIe downtrain: link speed is %s (%s capable)", >> + pcie_bus_speed_name(dev_cur_speed), >> +

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 12:11 PM, Sinan Kaya wrote: > On 5/31/2018 12:49 PM, Alex G. wrote: >>> bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap); >>> bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width, >>> *parent

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 11:49 AM, Alex G. wrote: > > > On 05/31/2018 11:13 AM, Sinan Kaya wrote: >> On 5/31/2018 12:01 PM, Alex G. wrote: >>>> PCI: Add pcie_print_link_status() to log link speed and whether it's >>>> limited >>> This one

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 11:13 AM, Sinan Kaya wrote: > On 5/31/2018 12:01 PM, Alex G. wrote: >>> PCI: Add pcie_print_link_status() to log link speed and whether it's >>> limited >> This one, I have, but it's not what I need. This looks at the available &g

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 10:54 AM, Sinan Kaya wrote: > On 5/31/2018 11:46 AM, Alex G. wrote: >>> https://lkml.org/lkml/2018/3/30/553 >> Oh, pcie_get_speed_cap()/pcie_get_width_cap() seems to handle the >> capability. Not seeing one for status and speed name. >> >>>

Re: [PATCH] PCI: Check for PCIe downtraining conditions

2018-05-31 Thread Alex G.
On 05/31/2018 10:38 AM, Sinan Kaya wrote: > On 5/31/2018 11:29 AM, alex_gagn...@dellteam.com wrote: >> On 5/31/2018 10:28 AM, Sinan Kaya wrote: >>> On 5/31/2018 11:05 AM, Alexandru Gagniuc wrote: +static void pcie_max_link_cap(struct pci_dev *dev, enum pci_bus_speed *speed, +

Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices

2018-05-23 Thread Alex G.
On 05/23/2018 09:32 AM, Jes Sorensen wrote: > On 05/23/2018 10:26 AM, Matthew Wilcox wrote: >> On Wed, May 23, 2018 at 10:20:10AM -0400, Jes Sorensen wrote: +++ b/drivers/pci/pcie/aer/aerdrv_stats.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 >>> >>> Fix the formatting plea

Re: [PATCH 1/5] PCI/AER: Define and allocate aer_stats structure for AER capable devices

2018-05-23 Thread Alex G.
On 05/23/2018 09:20 AM, Jes Sorensen wrote: > On 05/22/2018 06:28 PM, Rajat Jain wrote: >> new file mode 100644 >> index ..b9f251992209 >> --- /dev/null >> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c >> @@ -0,0 +1,64 @@ >> +// SPDX-License-Identifier: GPL-2.0 > > Fix the formatting pleas

Re: [PATCH 5/5] Documentation/PCI: Add details of PCI AER statistics

2018-05-22 Thread Alex G.
On 05/22/2018 05:28 PM, Rajat Jain wrote: > Add the PCI AER statistics details to > Documentation/PCI/pcieaer-howto.txt > > Signed-off-by: Rajat Jain > --- > Documentation/PCI/pcieaer-howto.txt | 35 + > 1 file changed, 35 insertions(+) > > diff --git a/Documentation

Re: [PATCH 2/5] PCI/AER: Add sysfs stats for AER capable devices

2018-05-22 Thread Alex G.
On 05/22/2018 05:28 PM, Rajat Jain wrote: > Add the following AER sysfs stats to represent the counters for each > kind of error as seen by the device: > > dev_total_cor_errs > dev_total_fatal_errs > dev_total_nonfatal_errs > > Signed-off-by: Rajat Jain > --- > drivers/pci/pci-sysfs.c

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 01:45 PM, Luck, Tony wrote: > On Tue, May 22, 2018 at 01:19:34PM -0500, Alex G. wrote: >> Firmware started passing "fatal" GHES headers with the explicit intent of >> crashing an OS. At the same time, we've learnt how to handle these errors in >>

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 01:13 PM, Rafael J. Wysocki wrote: (snip) Of course, you are free to have a differing opinion and I don't have to convince you about my point. You need to convince me about your point to get the patch in through my tree, which you haven't done so far. My point is that crossing yo

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 01:10 PM, Rafael J. Wysocki wrote: On Tue, May 22, 2018 at 7:57 PM, Luck, Tony wrote: On Tue, May 22, 2018 at 04:54:26PM +0200, Borislav Petkov wrote: I especially don't want to have the case where a PCIe error is *really* fatal and then we noodle in some handlers debating about

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 12:57 PM, Luck, Tony wrote: On Tue, May 22, 2018 at 04:54:26PM +0200, Borislav Petkov wrote: I especially don't want to have the case where a PCIe error is *really* fatal and then we noodle in some handlers debating about the severity because it got marked as recoverable intermitte

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 09:54 AM, Borislav Petkov wrote: > On Tue, May 22, 2018 at 09:39:15AM -0500, Alex G. wrote: >> No, the problem is with the current approach, not with mine. The problem >> is trying to handle the error outside of the existing handler. That's a >> no-no,

Re: [PATCH v6 2/2] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-22 Thread Alex G.
On 05/22/2018 10:15 AM, Tyler Baicar wrote: > On 5/22/2018 10:32 AM, Alex G. wrote: >> I think the biggest problem is having a policy to panic on "fatal" >> errors, instead of letting the error handler make that decision. I'd >> much rather kill that stupid po

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 08:50 AM, Borislav Petkov wrote: > On Tue, May 22, 2018 at 08:38:39AM -0500, Alex G. wrote: >>> It looks like the *real* reason for this change is that you >>> re-introduce ghes_severity() as a different function in the second >>> patch. >> >

Re: [PATCH v6 2/2] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-22 Thread Alex G.
On 05/22/2018 04:02 AM, Rafael J. Wysocki wrote: > On Mon, May 21, 2018 at 3:49 PM, Alexandru Gagniuc > wrote: >> The policy was to panic() when GHES said that an error is "Fatal". >> This logic is wrong for several reasons, as it doesn't account for the >> cause of the error. >> >> PCIe fatal er

Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-22 Thread Alex G.
On 05/22/2018 03:55 AM, Rafael J. Wysocki wrote: > On Mon, May 21, 2018 at 3:49 PM, Alexandru Gagniuc > wrote: >> ghes_severity() is a misnomer in this case, as it implies the severity >> of the entire GHES structure. Instead, it maps one CPER value to a >> GHES_SEV* value. ghes_cper_severity()

Re: [PATCH v6 2/2] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-21 Thread Alex G.
On 05/21/2018 09:27 AM, Tyler Baicar wrote: > On 5/21/2018 9:49 AM, Alexandru Gagniuc wrote: (snip) > Hello Alex, > > There is a compile warning if CONFIG_HAVE_ACPI_APEI_NMI is not selected. > >   CC  drivers/acpi/apei/ghes.o > drivers/acpi/apei/ghes.c:483:12: warning: ‘ghes_severity’ defin

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 12:41 PM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 12:01:52PM -0500, Alex G. wrote: >> I understand your concern with unhandled AER errors evolving into MCE's. >> That's extremely rare, but when it happens you still panic due to the >> MCE. >

Re: [RFC PATCH v4 2/3] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-11 Thread Alex G.
On 05/11/2018 11:19 AM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 11:12:24AM -0500, Alex G. wrote: >> Because the GHES structure uses CPER values, but all the code is written >> to use GHES_SEV_ values. GHES_SEV_ is a made up enum, specifically for >> linux. >

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 11:29 AM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 11:12:25AM -0500, Alex G. wrote: >>> I think *you* didn't get it: IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER) is not >>> enough of a check to confirm that there actually *is* an AER driver to >>&g

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 11:02 AM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 10:54:09AM -0500, Alex G. wrote: >> That being clarified, should I replace "crackmonkey" with "broken" in >> the commit message? > > Keep your opinion *outside* of commit messag

Re: [RFC PATCH v4 2/3] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-11 Thread Alex G.
On 05/11/2018 10:58 AM, Borislav Petkov wrote: > On Fri, May 11, 2018 at 10:45:49AM -0500, Alex G. wrote: >> >> >> On 05/11/2018 10:39 AM, Borislav Petkov wrote: >>> On Mon, Apr 30, 2018 at 04:33:51PM -0500, Alexandru Gagniuc wrote: >>>> ghes_severity()

Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-05-11 Thread Alex G.
On 05/11/2018 10:40 AM, Borislav Petkov wrote: > On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote: >> The policy was to panic() when GHES said that an error is "Fatal". >> This logic is wrong for several reasons, as it doesn't take into >> account what caused the error. >> >> PCI

Re: [RFC PATCH v4 2/3] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-11 Thread Alex G.
On 05/11/2018 10:39 AM, Borislav Petkov wrote: > On Mon, Apr 30, 2018 at 04:33:51PM -0500, Alexandru Gagniuc wrote: >> ghes_severity() is a misnomer in this case, as it implies the severity >> of the entire GHES structure. Instead, it maps one CPER value to a >> monotonically increasing number. >

Re: [PATCH] nvme-pci: Avoid use of goto in nvme_reset_work()

2018-05-10 Thread Alex G.
On 05/10/2018 12:00 PM, Keith Busch wrote: > On Thu, May 10, 2018 at 11:46:33AM -0500, Alexandru Gagniuc wrote: >> This patch started as a challenge from Keith relating to code >> structuring with goto vs return. I think the final result improves >> readability on two counts: >> First, it clarifi

Re: [RFC PATCH v4 2/3] acpi: apei: Rename ghes_severity() to ghes_cper_severity()

2018-05-04 Thread Alex G.
On 05/04/2018 06:56 AM, Shiju Jose wrote: Hi Alex, Hi -Original Message- From: Alexandru Gagniuc [mailto:mr.nuke...@gmail.com] [snip] -static inline int ghes_severity(int severity) +static inline int ghes_cper_severity(int severity) [...] else ratelimi

Re: [RFC PATCH v3 3/3] acpi: apei: Warn when GHES marks correctable errors as "fatal"

2018-05-02 Thread Alex G.
On 05/02/2018 02:10 PM, Pavel Machek wrote: > On Thu 2018-04-26 13:20:57, Borislav Petkov wrote: >> On Wed, Apr 25, 2018 at 03:39:51PM -0500, Alexandru Gagniuc wrote: >>> There seems to be a culture amongst BIOS teams to want to crash the >>> OS when an error can't be handled in firmware. Marking G

Re: [PATCH RESEND] PCI/AER: Use a common function to print AER error bits

2018-04-30 Thread Alex G.
On 04/30/2018 12:15 PM, Bjorn Helgaas wrote: > On Sat, Apr 28, 2018 at 12:07:48PM -0500, Alex G. wrote: (snip) >> I could update the offending line to say: >> + info.first_error = PCI_ERR_CAP_FEP(aer->cap_control); > > That's what I would have expected. So I

Re: [PATCH RESEND] PCI/AER: Use a common function to print AER error bits

2018-04-28 Thread Alex G.
On 04/28/2018 11:46 AM, Alex G. wrote: On 04/27/2018 05:43 PM, Bjorn Helgaas wrote: On Tue, Apr 17, 2018 at 12:09:43PM -0500, Alexandru Gagniuc wrote: (snip) +    memset(&info, 0, sizeof(info)); +    info.severity = aer_severity; +    info.status = status; +    info.mask =

Re: [PATCH RESEND] PCI/AER: Use a common function to print AER error bits

2018-04-28 Thread Alex G.
On 04/27/2018 05:43 PM, Bjorn Helgaas wrote: On Tue, Apr 17, 2018 at 12:09:43PM -0500, Alexandru Gagniuc wrote: On errors reported from CPER, cper_print_bits() was used to log the AER bits. This resulted in hard-to-understand messages, without a prefix. Instead use __aer_print_error() for both n

Re: [RFC PATCH v3 3/3] acpi: apei: Warn when GHES marks correctable errors as "fatal"

2018-04-26 Thread Alex G.
On 04/26/2018 06:20 AM, Borislav Petkov wrote: Pasting the same comment from last time since you missed it: "No, I don't want any of that crap issuing stuff in dmesg and then people opening bugs and running around and trying to replace hardware. We either can handle the error and log a normal r

Re: [RFC PATCH v3 2/3] acpi: apei: Do not panic() on PCIe errors reported through GHES

2018-04-26 Thread Alex G.
Hi Borislav, On 04/26/2018 06:19 AM, Borislav Petkov wrote: On Wed, Apr 25, 2018 at 03:39:50PM -0500, Alexandru Gagniuc wrote: @@ -932,7 +971,7 @@ static void __process_error(struct ghes *ghes) static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) { struct ghes *ghes; -

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-25 Thread Alex G.
On 04/25/2018 12:15 PM, Borislav Petkov wrote: > On Wed, Apr 25, 2018 at 10:00:53AM -0500, Alex G. wrote: >> Firmware-first. > > Ok, my guess was right. > >> We could probably use more of the native AER print functions, but that's >> beyond the scope of th

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-25 Thread Alex G.
On 04/25/2018 09:01 AM, Borislav Petkov wrote: > On Mon, Apr 23, 2018 at 11:19:25PM -0500, Alex G. wrote: >> That tells you what FFS said about the error. > > I betcha those status and command values have a human-readable counterparts. > > Btw, what do you abbreviate

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-23 Thread Alex G.
On 04/22/2018 05:48 AM, Borislav Petkov wrote: On Thu, Apr 19, 2018 at 05:55:08PM -0500, Alex G. wrote: How does such an error look like, in detail? It's green on the soft side, with lots of red accents, as well as some textured white shades: [ 51.414616] pciehp :b0:06.0:pc

Re: [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages

2018-04-20 Thread Alex G.
On 04/20/2018 02:27 AM, James Morse wrote: > Hi Alex, > > On 04/16/2018 10:59 PM, Alex G. wrote: >> On 04/13/2018 11:38 AM, James Morse wrote: >>> This assumes a cache-invalidate will clear the error, which I don't > think we're >>> guaranteed

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-19 Thread Alex G.
On 04/19/2018 02:03 PM, Borislav Petkov wrote: > (snip useful explanation). > > On Thu, Apr 19, 2018 at 12:40:54PM -0500, Alex G. wrote: >> On the r740xd, FW just hides those errors from the OS with no further >> notification. On this machine BIOS sets things up such that n

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-19 Thread Alex G.
SURPRISE!!! On 04/19/2018 11:45 AM, Borislav Petkov wrote: > On Thu, Apr 19, 2018 at 11:26:57AM -0500, Alex G. wrote: >> At a very high level, I'm working with Dell on improving server >> reliability, with a focus on NVME hotplug and surprise removal. One of >> the fe

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-19 Thread Alex G.
On 04/19/2018 10:35 AM, James Morse wrote: > Hi Alex, > > (I haven't read through all this yet, just on this one:) > > On 04/19/2018 03:57 PM, Alex G. wrote: >> Maybe it's better move the AER handling to NMI/IRQ context, since >> ghes_handle_aer() is only s

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-19 Thread Alex G.
On 04/19/2018 10:40 AM, Borislav Petkov wrote: > On Thu, Apr 19, 2018 at 09:57:07AM -0500, Alex G. wrote: >> ghes_severity() is a one-to-one mapping from a set of unsorted >> severities to monotonically increasing numbers. The "one-to-one" mapping >> part of t

Re: [RFC PATCH v2 2/4] acpi: apei: Split GHES handlers outside of ghes_do_proc

2018-04-19 Thread Alex G.
On 04/19/2018 10:29 AM, Borislav Petkov wrote: > On Thu, Apr 19, 2018 at 09:57:08AM -0500, Alex G. wrote: >> And that was the motivation behind my splitting it in this patch. > > By "split" I don't mean add a function pointer which gets selected and > then cal

Re: [RFC PATCH v2 4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal"

2018-04-19 Thread Alex G.
On 04/18/2018 12:54 PM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:03PM -0500, Alexandru Gagniuc wrote: (snip) >> + >> +corrected_sev = max(corrected_sev, sec_sev); >> +} >> + >> +if ((sev >= GHES_SEV_PANIC) && (corrected_sev < sev)) { >> +pr_warn("FIR

Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

2018-04-19 Thread Alex G.
On 04/18/2018 12:54 PM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:02PM -0500, Alexandru Gagniuc wrote: >> Firmware is evil: >> - ACPI was created to "try and make the 'ACPI' extensions somehow >> Windows specific" in order to "work well with NT and not the others >> even if they a

Re: [RFC PATCH v2 2/4] acpi: apei: Split GHES handlers outside of ghes_do_proc

2018-04-19 Thread Alex G.
On 04/19/2018 09:30 AM, Borislav Petkov wrote: > On Thu, Apr 19, 2018 at 09:19:03AM -0500, Alex G. wrote: >> On the other side, you lose readability as soon as you get a few more >> handlers and the function becomes too long. > > No you don't - you split it p

Re: [RFC PATCH v2 2/4] acpi: apei: Split GHES handlers outside of ghes_do_proc

2018-04-19 Thread Alex G.
On 04/18/2018 12:52 PM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:01PM -0500, Alexandru Gagniuc wrote: >> static void ghes_do_proc(struct ghes *ghes, >> const struct acpi_hest_generic_status *estatus) >> { >> int sev, sec_sev; >> struct acpi_hest_gene

Re: [RFC PATCH v2 1/4] EDAC, GHES: Remove unused argument to ghes_edac_report_mem_error

2018-04-17 Thread Alex G.
On 04/17/2018 04:36 AM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:00PM -0500, Alexandru Gagniuc wrote: > > <--- Insert commit message here. > > A possible candidate would be some blurb about what commit removed the > use of that first arg. I didn't consider any commit message pork

Re: [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages

2018-04-16 Thread Alex G.
On 04/13/2018 11:38 AM, James Morse wrote: > Hi Alex, > > On 09/04/18 19:11, Alex G. wrote: >> On 04/06/2018 01:24 PM, James Morse wrote: >> Do you have any ETA on when your SEA patches are going to make it >> upstream? There's not much point in updating my patch

  1   2   >