2-As you suggest, create a new "secure" dtb per boards (Not my
wish for maintenance perspectives).
I agree with Alex (G) that the "secure" option should be opt-in.
That way existing setups remain working and no extra requirements
are imposed on MP1 users. Esp. since as
On 1/26/21 3:01 AM, gabriel.fernan...@foss.st.com wrote:
From: Gabriel Fernandez
Platform STM32MP1 can be used in configuration where some clocks and
IP resets can relate as secure resources.
These resources are moved from a RCC clock/reset handle to a SCMI
clock/reset_domain handle.
The RCC c
On 2/2/21 2:16 PM, Bjorn Helgaas wrote:
On Tue, Feb 02, 2021 at 01:50:20PM -0600, Alex G. wrote:
On 1/29/21 3:56 PM, Bjorn Helgaas wrote:
On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
On 1/28/21 5:51 PM, Sinan Kaya wrote:
On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
AFAICT, this
On 1/29/21 3:56 PM, Bjorn Helgaas wrote:
On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
On 1/28/21 5:51 PM, Sinan Kaya wrote:
On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
AFAICT, this thread petered out with no resolution.
If the bandwidth change notifications are important to
On 1/28/21 5:51 PM, Sinan Kaya wrote:
On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
AFAICT, this thread petered out with no resolution.
If the bandwidth change notifications are important to somebody,
please speak up, preferably with a patch that makes the notifications
disabled by default and add
On 10/20/20 2:16 AM, Sam Ravnborg wrote:
Hi Alex.
[snip]
diff --git a/drivers/gpu/drm/bridge/sii902x.c b/drivers/gpu/drm/bridge/sii902x.c
index 33fd33f953ec..d15e9f2c0d8a 100644
--- a/drivers/gpu/drm/bridge/sii902x.c
+++ b/drivers/gpu/drm/bridge/sii902x.c
@@ -17,6 +17,7 @@
#include
On 9/28/20 12:30 PM, Alexandru Gagniuc wrote:
On the SII9022, the IOVCC and CVCC12 supplies must reach the correct
voltage before the reset sequence is initiated. On most boards, this
assumption is true at boot-up, so initialization succeeds.
However, when we try to initialize the chip with inco
On 9/26/20 1:49 PM, Sam Ravnborg wrote:
Hi Alexandru
On Thu, Sep 24, 2020 at 03:05:05PM -0500, Alexandru Gagniuc wrote:
On the SII9022, the IOVCC and CVCC12 supplies must reach the correct
voltage before the reset sequence is initiated. On most boards, this
assumption is true at boot-up, so ini
Hi Ethan,
On 9/24/20 9:34 PM, Ethan Zhao wrote:
When uncorrectable error happens, AER driver and DPC driver interrupt
handlers likely call
pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with
pci_channel_io_frozen the same time.
If pci_dev_set_io_state() return true even if
On 9/24/20 3:22 PM, Fabio Estevam wrote:
Hi Fabio,
On Thu, Sep 24, 2020 at 5:16 PM Alexandru Gagniuc wrote:
+ ret = regulator_enable(sii902x->cvcc12);
+ if (ret < 0) {
+ dev_err(dev, "Failed to enable cvcc12 supply: %d\n", ret);
+ regulator_disable(sii9
On 10/21/19 1:19 PM, Stuart Hayes wrote:
On 10/21/19 8:47 AM, Mika Westerberg wrote:
On Thu, Oct 17, 2019 at 03:32:56PM -0400, Stuart Hayes wrote:
Some systems have in-band presence detection disabled for hot-plug PCI
slots, but do not report this in the slot capabilities 2 (SLTCAP2) register
On 10/1/19 11:13 PM, Lukas Wunner wrote:
On Tue, Oct 01, 2019 at 05:14:16PM -0400, Stuart Hayes wrote:
This patch set is based on a patch set [1] submitted many months ago by
Alexandru Gagniuc, who is no longer working on it.
[1] https://patchwork.kernel.org/cover/10909167/
[v3,0/4] PCI: p
On 10/1/19 4:14 PM, Stuart Hayes wrote:
Some systems have in-band presence detection disabled for hot-plug PCI slots,
but do not report this in the slot capabilities 2 (SLTCAP2) register. On
these systems, presence detect can become active well after the link is
reported to be active, which c
On 4/29/19 1:56 PM, Bjorn Helgaas wrote:
From: Bjorn Helgaas
This reverts commit e8303bb7a75c113388badcc49b2a84b4121c1b3e.
e8303bb7a75c added logging whenever a link changed speed or width to a
state that is considered degraded. Unfortunately, it cannot differentiate
signal integrity-related
On 4/24/19 12:19 PM, Alex Williamson wrote:
On Wed, 24 Apr 2019 16:45:45 +
wrote:
On 4/23/2019 5:42 PM, Alex Williamson wrote:
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 7e12d0163863..233cd4b5b6e8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2403,6 +2403
On 4/22/19 5:43 PM, Alex Williamson wrote:
On systems that don't support any PCIe services other than bandwidth
notification, pcie_message_numbers() can return zero vectors, causing
the vector reallocation in pcie_port_enable_irq_vec() to retry with
zero, which fails, resulting in fallback to
On 4/23/19 12:10 PM, Bjorn Helgaas wrote:
On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:
On 4/22/19 7:33 PM, Alex Williamson wrote:
There is nothing wrong happening here that needs to fill logs. I
thought maybe if I enabled notification of autonomous bandwidth
changes that it might
On 4/23/19 11:22 AM, Alex Williamson wrote:
Nor should pci-core decide what link speed changes are intended or
errors. Minimally we should be enabling drivers to receive this
feedback. Thanks,
Not errors. pci core reports that a link speed change event has occured.
Period.
Alex
On 4/23/19 10:34 AM, Alex Williamson wrote:
On Tue, 23 Apr 2019 09:33:53 -0500
Alex G wrote:
On 4/22/19 7:33 PM, Alex Williamson wrote:
On Mon, 22 Apr 2019 19:05:57 -0500
Alex G wrote:
echo :07:00.0:pcie010 |
sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind
That
On 4/22/19 7:33 PM, Alex Williamson wrote:
On Mon, 22 Apr 2019 19:05:57 -0500
Alex G wrote:
echo :07:00.0:pcie010 |
sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind
That's a bad solution for users, this is meaningless tracking of a
device whose driver is act
On 4/22/19 5:43 PM, Alex Williamson wrote:
[ 329.725607] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at :00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[ 708.151488] vfio-pci :07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by
On 4/22/19 3:58 PM, Bjorn Helgaas wrote:
On Fri, Feb 08, 2019 at 10:24:11AM -0600, Alexandru Gagniuc wrote:
This is only used within drivers/pci, and there is no reason to make
it available outside of the PCI core.
Signed-off-by: Alexandru Gagniuc
Applied the whole series to pci/hotplug for
The issue of dying inside the GHES driver has popped up a few times before.
I've looked into fixing this before, but we didn't quite come to agreement
because the verbiage in the ACPI spec is vague:
" When a fatal uncorrected error occurs, the system is
restarted to prevent propagation o
On 3/25/19 5:25 PM, Bjorn Helgaas wrote:
On Fri, Mar 22, 2019 at 07:36:51PM -0500, Alexandru Gagniuc wrote:
A threaded IRQ with a NULL handler does not work with level-triggered
interrupts. request_threaded_irq() will return an error:
genirq: Threaded irq requested with handler=NULL and !ONE
Hi Borislav,
Thanks for the update. We've settled on a different fix [1], since Lukas
was not happy with IRQF_ONESHOT [2].
Alex
[1]
https://lore.kernel.org/linux-pci/20190323003700.7294-1-mr.nuke...@gmail.com/
[2]
https://lore.kernel.org/linux-pci/20190318043314.noyj6t4sh26sp...@wunner.de/
On 3/20/19 4:44 PM, Linus Torvalds wrote:
On Wed, Mar 20, 2019 at 1:52 PM Bjorn Helgaas wrote:
AFAICT, the consensus there was that it would be better to find some
sort of platform solution instead of dealing with it in individual
drivers. The PCI core isn't really a driver, but I think the s
On 3/20/19 8:46 AM, Bjorn Helgaas wrote:
Hi Alexandru,
On Mon, Mar 18, 2019 at 08:12:04PM -0500, Alexandru Gagniuc wrote:
A threaded IRQ with a NULL handler does not work with level-triggered
interrupts. request_threaded_irq() will return an error:
genirq: Threaded irq requested with handle
On 3/17/19 4:18 PM, Linus Torvalds wrote:
On Fri, Mar 8, 2019 at 9:31 AM Bjorn Helgaas wrote:
- Report PCIe links that become degraded at run-time (Alexandru Gagniuc)
Gaah. Only now as I'm about to do the rc1 release am I looking at new
runtime warnings, and noticing that this causes
On 12/7/18 12:20 PM, Alexandru Gagniuc wrote:
A warning is generated when a PCIe device is probed with a degraded
link, but there was no similar mechanism to warn when the link becomes
degraded after probing. The Link Bandwidth Notification provides this
mechanism.
Use the link bandwidth notific
ping
On 09/18/2018 05:15 PM, Alexandru Gagniuc wrote:
When a PCI device is gone, we don't want to send IO to it if we can
avoid it. We expose functionality via the irq_chip structure. As
users of that structure may not know about the underlying PCI device,
it's our responsibility to guard agains
Should I resubmit this rebased on 4.19-rc*, or just leave this patch as is?
Alex
On 07/30/2018 04:21 PM, Alexandru Gagniuc wrote:
When a PCI device is gone, we don't want to send IO to it if we can
avoid it. We expose functionality via the irq_chip structure. As
users of that structure may not
On 08/09/2018 02:18 PM, Bjorn Helgaas wrote:
On Thu, Aug 09, 2018 at 02:00:23PM -0500, Alex G. wrote:
On 08/09/2018 01:29 PM, Bjorn Helgaas wrote:
On Thu, Aug 09, 2018 at 04:46:32PM +, alex_gagn...@dellteam.com wrote:
On 08/09/2018 09:16 AM, Bjorn Helgaas wrote:
(snip_
On 08/09/2018 01:29 PM, Bjorn Helgaas wrote:
On Thu, Aug 09, 2018 at 04:46:32PM +, alex_gagn...@dellteam.com wrote:
On 08/09/2018 09:16 AM, Bjorn Helgaas wrote:
(snip_
enable_ecrc_checking()
disable_ecrc_checking()
I don't immediately see how this would affect FFS, but the bit
On 08/07/2018 08:14 PM, Bjorn Helgaas wrote:
On Mon, Jul 30, 2018 at 06:35:31PM -0500, Alexandru Gagniuc wrote:
When we don't own AER, we shouldn't touch the AER error bits. Clearing
error bits willy-nilly might cause firmware to miss some errors. In
theory, these bits get cleared by FFS, or
On 07/31/2018 01:40 AM, Tal Gilboa wrote:
[snip]
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct
pci_dev *dev)
/* Advanced Error Reporting */
pci_aer_init(dev);
+ /* Check link and detect downtrain errors */
+ pcie_check_upstream_link(dev);
This is called for e
On 07/23/2018 11:52 AM, Alexandru Gagniuc wrote:
When we don't own AER, we shouldn't touch the AER error bits. Clearing
error bits willy-nilly might cause firmware to miss some errors. In
theory, these bits get cleared by FFS, or via ACPI _HPX method. These
mechanisms are not subject to the pr
On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
PCIe downtraining happens when both the device and PCIe port are
capable of a larger
On 07/23/2018 12:21 AM, Tal Gilboa wrote:
On 7/19/2018 6:49 PM, Alex G. wrote:
On 07/18/2018 08:38 AM, Tal Gilboa wrote:
On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
[+cc maintainers of drivers that already use pcie_print_link_status()
and GPU folks]
[snip]
+ /* Multi-function PCIe
On 07/19/2018 11:58 AM, Sinan Kaya wrote:
On 7/19/2018 8:55 AM, Alex G. wrote:
I find the intent clearer if we check it here rather than having to do
the mental parsing of the state of aer_cap.
I don't feel too strong about my comment to be honest. This was a
style/maintenance co
On 07/17/2018 10:41 AM, Sinan Kaya wrote:
On 7/17/2018 8:31 AM, Alexandru Gagniuc wrote:
+ if (pcie_aer_get_firmware_first(dev))
+ return -EIO;
Can you move this to closer to the caller pci_aer_init()?
I could move it there. although pci_cleanup_aer_error_status_regs() is
call
On 07/18/2018 08:38 AM, Tal Gilboa wrote:
On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
[+cc maintainers of drivers that already use pcie_print_link_status()
and GPU folks]
[snip]
+ /* Multi-function PCIe share the same link/status. */
+ if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtf
On 07/18/2018 04:53 PM, Bjorn Helgaas wrote:
[+cc Mike (hfi1)]
On Mon, Jul 16, 2018 at 10:28:35PM +, alex_gagn...@dellteam.com wrote:
On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
...
The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link
On 07/03/2018 11:38 AM, Bjorn Helgaas wrote:
> On Mon, Jul 02, 2018 at 11:16:01AM -0500, Alexandru Gagniuc wrote:
>> According to the documentation, "pcie_ports=native", linux should use
>> native AER and DPC services. While that is true for the _OSC method
>> parsing, this is not the only place
On 07/02/2018 08:16 AM, Bjorn Helgaas wrote:
> On Sat, Jun 30, 2018 at 11:39:00PM -0500, Alex G wrote:
>> On 06/30/2018 04:31 PM, Bjorn Helgaas wrote:
>>> [+cc Borislav, linux-acpi, since this involves APEI/HEST]
>>
>> Borislav is not the relevant maintainer here,
On 06/30/2018 04:31 PM, Bjorn Helgaas wrote:
[+cc Borislav, linux-acpi, since this involves APEI/HEST]
Borislav is not the relevant maintainer here, since we're not contingent
on APEI handling. I think Keith has a lot more experience with this part
of the kernel.
On Tue, Jun 19, 2018 at 02
On 06/19/2018 04:57 PM, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote:
>> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
>>> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
AER status bits are sticky, and they survive system resets. Downstrea
On 06/01/2018 10:10 AM, Sinan Kaya wrote:
> On 6/1/2018 11:06 AM, Alex G. wrote:
>> On 06/01/2018 10:03 AM, Sinan Kaya wrote:
>>> On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote:
>>>> + /* Multi-function PCIe share the same link/status. */
>>
On 06/01/2018 10:12 AM, Andy Shevchenko wrote:
> On Fri, Jun 1, 2018 at 6:01 PM, Alexandru Gagniuc
> wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems i
On 06/01/2018 10:03 AM, Sinan Kaya wrote:
> On 6/1/2018 11:01 AM, Alexandru Gagniuc wrote:
>> +/* Multi-function PCIe share the same link/status. */
>> +if (PCI_FUNC(dev->devfn) != 0)
>> +return;
>
> How about virtual functions?
I have almost no clue about those. Is your conce
On 05/31/2018 12:27 PM, Alex G. wrote:
> On 05/31/2018 12:11 PM, Sinan Kaya wrote:
>> On 5/31/2018 12:49 PM, Alex G. wrote:
>>>>bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>>>bw_avail = pcie_bandwidth_available(dev, &am
On 05/31/2018 10:30 AM, Sinan Kaya wrote:
> On 5/31/2018 11:05 AM, Alexandru Gagniuc wrote:
>> +if (dev_cur_speed < max_link_speed)
>> +pci_warn(dev, "PCIe downtrain: link speed is %s (%s capable)",
>> + pcie_bus_speed_name(dev_cur_speed),
>> +
On 05/31/2018 12:11 PM, Sinan Kaya wrote:
> On 5/31/2018 12:49 PM, Alex G. wrote:
>>> bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>> bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width,
>>> *parent
On 05/31/2018 11:49 AM, Alex G. wrote:
>
>
> On 05/31/2018 11:13 AM, Sinan Kaya wrote:
>> On 5/31/2018 12:01 PM, Alex G. wrote:
>>>> PCI: Add pcie_print_link_status() to log link speed and whether it's
>>>> limited
>>> This one
On 05/31/2018 11:13 AM, Sinan Kaya wrote:
> On 5/31/2018 12:01 PM, Alex G. wrote:
>>> PCI: Add pcie_print_link_status() to log link speed and whether it's
>>> limited
>> This one, I have, but it's not what I need. This looks at the available
&g
On 05/31/2018 10:54 AM, Sinan Kaya wrote:
> On 5/31/2018 11:46 AM, Alex G. wrote:
>>> https://lkml.org/lkml/2018/3/30/553
>> Oh, pcie_get_speed_cap()/pcie_get_width_cap() seems to handle the
>> capability. Not seeing one for status and speed name.
>>
>>>
On 05/31/2018 10:38 AM, Sinan Kaya wrote:
> On 5/31/2018 11:29 AM, alex_gagn...@dellteam.com wrote:
>> On 5/31/2018 10:28 AM, Sinan Kaya wrote:
>>> On 5/31/2018 11:05 AM, Alexandru Gagniuc wrote:
+static void pcie_max_link_cap(struct pci_dev *dev, enum pci_bus_speed
*speed,
+
On 05/23/2018 09:32 AM, Jes Sorensen wrote:
> On 05/23/2018 10:26 AM, Matthew Wilcox wrote:
>> On Wed, May 23, 2018 at 10:20:10AM -0400, Jes Sorensen wrote:
+++ b/drivers/pci/pcie/aer/aerdrv_stats.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
>>>
>>> Fix the formatting plea
On 05/23/2018 09:20 AM, Jes Sorensen wrote:
> On 05/22/2018 06:28 PM, Rajat Jain wrote:
>> new file mode 100644
>> index ..b9f251992209
>> --- /dev/null
>> +++ b/drivers/pci/pcie/aer/aerdrv_stats.c
>> @@ -0,0 +1,64 @@
>> +// SPDX-License-Identifier: GPL-2.0
>
> Fix the formatting pleas
On 05/22/2018 05:28 PM, Rajat Jain wrote:
> Add the PCI AER statistics details to
> Documentation/PCI/pcieaer-howto.txt
>
> Signed-off-by: Rajat Jain
> ---
> Documentation/PCI/pcieaer-howto.txt | 35 +
> 1 file changed, 35 insertions(+)
>
> diff --git a/Documentation
On 05/22/2018 05:28 PM, Rajat Jain wrote:
> Add the following AER sysfs stats to represent the counters for each
> kind of error as seen by the device:
>
> dev_total_cor_errs
> dev_total_fatal_errs
> dev_total_nonfatal_errs
>
> Signed-off-by: Rajat Jain
> ---
> drivers/pci/pci-sysfs.c
On 05/22/2018 01:45 PM, Luck, Tony wrote:
> On Tue, May 22, 2018 at 01:19:34PM -0500, Alex G. wrote:
>> Firmware started passing "fatal" GHES headers with the explicit intent of
>> crashing an OS. At the same time, we've learnt how to handle these errors in
>>
On 05/22/2018 01:13 PM, Rafael J. Wysocki wrote:
(snip)
Of course, you are free to have a differing opinion and I don't have
to convince you about my point. You need to convince me about your
point to get the patch in through my tree, which you haven't done so
far.
My point is that crossing yo
On 05/22/2018 01:10 PM, Rafael J. Wysocki wrote:
On Tue, May 22, 2018 at 7:57 PM, Luck, Tony wrote:
On Tue, May 22, 2018 at 04:54:26PM +0200, Borislav Petkov wrote:
I especially don't want to have the case where a PCIe error is *really*
fatal and then we noodle in some handlers debating about
On 05/22/2018 12:57 PM, Luck, Tony wrote:
On Tue, May 22, 2018 at 04:54:26PM +0200, Borislav Petkov wrote:
I especially don't want to have the case where a PCIe error is *really*
fatal and then we noodle in some handlers debating about the severity
because it got marked as recoverable intermitte
On 05/22/2018 09:54 AM, Borislav Petkov wrote:
> On Tue, May 22, 2018 at 09:39:15AM -0500, Alex G. wrote:
>> No, the problem is with the current approach, not with mine. The problem
>> is trying to handle the error outside of the existing handler. That's a
>> no-no,
On 05/22/2018 10:15 AM, Tyler Baicar wrote:
> On 5/22/2018 10:32 AM, Alex G. wrote:
>> I think the biggest problem is having a policy to panic on "fatal"
>> errors, instead of letting the error handler make that decision. I'd
>> much rather kill that stupid po
On 05/22/2018 08:50 AM, Borislav Petkov wrote:
> On Tue, May 22, 2018 at 08:38:39AM -0500, Alex G. wrote:
>>> It looks like the *real* reason for this change is that you
>>> re-introduce ghes_severity() as a different function in the second
>>> patch.
>>
>
On 05/22/2018 04:02 AM, Rafael J. Wysocki wrote:
> On Mon, May 21, 2018 at 3:49 PM, Alexandru Gagniuc
> wrote:
>> The policy was to panic() when GHES said that an error is "Fatal".
>> This logic is wrong for several reasons, as it doesn't account for the
>> cause of the error.
>>
>> PCIe fatal er
On 05/22/2018 03:55 AM, Rafael J. Wysocki wrote:
> On Mon, May 21, 2018 at 3:49 PM, Alexandru Gagniuc
> wrote:
>> ghes_severity() is a misnomer in this case, as it implies the severity
>> of the entire GHES structure. Instead, it maps one CPER value to a
>> GHES_SEV* value. ghes_cper_severity()
On 05/21/2018 09:27 AM, Tyler Baicar wrote:
> On 5/21/2018 9:49 AM, Alexandru Gagniuc wrote:
(snip)
> Hello Alex,
>
> There is a compile warning if CONFIG_HAVE_ACPI_APEI_NMI is not selected.
>
> CC drivers/acpi/apei/ghes.o
> drivers/acpi/apei/ghes.c:483:12: warning: ‘ghes_severity’ defin
On 05/11/2018 12:41 PM, Borislav Petkov wrote:
> On Fri, May 11, 2018 at 12:01:52PM -0500, Alex G. wrote:
>> I understand your concern with unhandled AER errors evolving into MCE's.
>> That's extremely rare, but when it happens you still panic due to the
>> MCE.
>
On 05/11/2018 11:19 AM, Borislav Petkov wrote:
> On Fri, May 11, 2018 at 11:12:24AM -0500, Alex G. wrote:
>> Because the GHES structure uses CPER values, but all the code is written
>> to use GHES_SEV_ values. GHES_SEV_ is a made up enum, specifically for
>> linux.
>
On 05/11/2018 11:29 AM, Borislav Petkov wrote:
> On Fri, May 11, 2018 at 11:12:25AM -0500, Alex G. wrote:
>>> I think *you* didn't get it: IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER) is not
>>> enough of a check to confirm that there actually *is* an AER driver to
>>&g
On 05/11/2018 11:02 AM, Borislav Petkov wrote:
> On Fri, May 11, 2018 at 10:54:09AM -0500, Alex G. wrote:
>> That being clarified, should I replace "crackmonkey" with "broken" in
>> the commit message?
>
> Keep your opinion *outside* of commit messag
On 05/11/2018 10:58 AM, Borislav Petkov wrote:
> On Fri, May 11, 2018 at 10:45:49AM -0500, Alex G. wrote:
>>
>>
>> On 05/11/2018 10:39 AM, Borislav Petkov wrote:
>>> On Mon, Apr 30, 2018 at 04:33:51PM -0500, Alexandru Gagniuc wrote:
>>>> ghes_severity()
On 05/11/2018 10:40 AM, Borislav Petkov wrote:
> On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote:
>> The policy was to panic() when GHES said that an error is "Fatal".
>> This logic is wrong for several reasons, as it doesn't take into
>> account what caused the error.
>>
>> PCI
On 05/11/2018 10:39 AM, Borislav Petkov wrote:
> On Mon, Apr 30, 2018 at 04:33:51PM -0500, Alexandru Gagniuc wrote:
>> ghes_severity() is a misnomer in this case, as it implies the severity
>> of the entire GHES structure. Instead, it maps one CPER value to a
>> monotonically increasing number.
>
On 05/10/2018 12:00 PM, Keith Busch wrote:
> On Thu, May 10, 2018 at 11:46:33AM -0500, Alexandru Gagniuc wrote:
>> This patch started as a challenge from Keith relating to code
>> structuring with goto vs return. I think the final result improves
>> readability on two counts:
>> First, it clarifi
On 05/04/2018 06:56 AM, Shiju Jose wrote:
Hi Alex,
Hi
-Original Message-
From: Alexandru Gagniuc [mailto:mr.nuke...@gmail.com]
[snip]
-static inline int ghes_severity(int severity)
+static inline int ghes_cper_severity(int severity)
[...]
else
ratelimi
On 05/02/2018 02:10 PM, Pavel Machek wrote:
> On Thu 2018-04-26 13:20:57, Borislav Petkov wrote:
>> On Wed, Apr 25, 2018 at 03:39:51PM -0500, Alexandru Gagniuc wrote:
>>> There seems to be a culture amongst BIOS teams to want to crash the
>>> OS when an error can't be handled in firmware. Marking G
On 04/30/2018 12:15 PM, Bjorn Helgaas wrote:
> On Sat, Apr 28, 2018 at 12:07:48PM -0500, Alex G. wrote:
(snip)
>> I could update the offending line to say:
>> + info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
>
> That's what I would have expected. So I
On 04/28/2018 11:46 AM, Alex G. wrote:
On 04/27/2018 05:43 PM, Bjorn Helgaas wrote:
On Tue, Apr 17, 2018 at 12:09:43PM -0500, Alexandru Gagniuc wrote:
(snip)
+ memset(&info, 0, sizeof(info));
+ info.severity = aer_severity;
+ info.status = status;
+ info.mask =
On 04/27/2018 05:43 PM, Bjorn Helgaas wrote:
On Tue, Apr 17, 2018 at 12:09:43PM -0500, Alexandru Gagniuc wrote:
On errors reported from CPER, cper_print_bits() was used to log the
AER bits. This resulted in hard-to-understand messages, without a
prefix. Instead use __aer_print_error() for both n
On 04/26/2018 06:20 AM, Borislav Petkov wrote:
Pasting the same comment from last time since you missed it:
"No, I don't want any of that crap issuing stuff in dmesg and then people
opening bugs and running around and trying to replace hardware.
We either can handle the error and log a normal r
Hi Borislav,
On 04/26/2018 06:19 AM, Borislav Petkov wrote:
On Wed, Apr 25, 2018 at 03:39:50PM -0500, Alexandru Gagniuc wrote:
@@ -932,7 +971,7 @@ static void __process_error(struct ghes *ghes)
static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
{
struct ghes *ghes;
-
On 04/25/2018 12:15 PM, Borislav Petkov wrote:
> On Wed, Apr 25, 2018 at 10:00:53AM -0500, Alex G. wrote:
>> Firmware-first.
>
> Ok, my guess was right.
>
>> We could probably use more of the native AER print functions, but that's
>> beyond the scope of th
On 04/25/2018 09:01 AM, Borislav Petkov wrote:
> On Mon, Apr 23, 2018 at 11:19:25PM -0500, Alex G. wrote:
>> That tells you what FFS said about the error.
>
> I betcha those status and command values have a human-readable counterparts.
>
> Btw, what do you abbreviate
On 04/22/2018 05:48 AM, Borislav Petkov wrote:
On Thu, Apr 19, 2018 at 05:55:08PM -0500, Alex G. wrote:
How does such an error look like, in detail?
It's green on the soft side, with lots of red accents, as well as some
textured white shades:
[ 51.414616] pciehp :b0:06.0:pc
On 04/20/2018 02:27 AM, James Morse wrote:
> Hi Alex,
>
> On 04/16/2018 10:59 PM, Alex G. wrote:
>> On 04/13/2018 11:38 AM, James Morse wrote:
>>> This assumes a cache-invalidate will clear the error, which I don't
> think we're
>>> guaranteed
On 04/19/2018 02:03 PM, Borislav Petkov wrote:
> (snip useful explanation).
>
> On Thu, Apr 19, 2018 at 12:40:54PM -0500, Alex G. wrote:
>> On the r740xd, FW just hides those errors from the OS with no further
>> notification. On this machine BIOS sets things up such that n
SURPRISE!!!
On 04/19/2018 11:45 AM, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 11:26:57AM -0500, Alex G. wrote:
>> At a very high level, I'm working with Dell on improving server
>> reliability, with a focus on NVME hotplug and surprise removal. One of
>> the fe
On 04/19/2018 10:35 AM, James Morse wrote:
> Hi Alex,
>
> (I haven't read through all this yet, just on this one:)
>
> On 04/19/2018 03:57 PM, Alex G. wrote:
>> Maybe it's better move the AER handling to NMI/IRQ context, since
>> ghes_handle_aer() is only s
On 04/19/2018 10:40 AM, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 09:57:07AM -0500, Alex G. wrote:
>> ghes_severity() is a one-to-one mapping from a set of unsorted
>> severities to monotonically increasing numbers. The "one-to-one" mapping
>> part of t
On 04/19/2018 10:29 AM, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 09:57:08AM -0500, Alex G. wrote:
>> And that was the motivation behind my splitting it in this patch.
>
> By "split" I don't mean add a function pointer which gets selected and
> then cal
On 04/18/2018 12:54 PM, Borislav Petkov wrote:
> On Mon, Apr 16, 2018 at 04:59:03PM -0500, Alexandru Gagniuc wrote:
(snip)
>> +
>> +corrected_sev = max(corrected_sev, sec_sev);
>> +}
>> +
>> +if ((sev >= GHES_SEV_PANIC) && (corrected_sev < sev)) {
>> +pr_warn("FIR
On 04/18/2018 12:54 PM, Borislav Petkov wrote:
> On Mon, Apr 16, 2018 at 04:59:02PM -0500, Alexandru Gagniuc wrote:
>> Firmware is evil:
>> - ACPI was created to "try and make the 'ACPI' extensions somehow
>> Windows specific" in order to "work well with NT and not the others
>> even if they a
On 04/19/2018 09:30 AM, Borislav Petkov wrote:
> On Thu, Apr 19, 2018 at 09:19:03AM -0500, Alex G. wrote:
>> On the other side, you lose readability as soon as you get a few more
>> handlers and the function becomes too long.
>
> No you don't - you split it p
On 04/18/2018 12:52 PM, Borislav Petkov wrote:
> On Mon, Apr 16, 2018 at 04:59:01PM -0500, Alexandru Gagniuc wrote:
>> static void ghes_do_proc(struct ghes *ghes,
>> const struct acpi_hest_generic_status *estatus)
>> {
>> int sev, sec_sev;
>> struct acpi_hest_gene
On 04/17/2018 04:36 AM, Borislav Petkov wrote:
> On Mon, Apr 16, 2018 at 04:59:00PM -0500, Alexandru Gagniuc wrote:
>
> <--- Insert commit message here.
>
> A possible candidate would be some blurb about what commit removed the
> use of that first arg.
I didn't consider any commit message pork
On 04/13/2018 11:38 AM, James Morse wrote:
> Hi Alex,
>
> On 09/04/18 19:11, Alex G. wrote:
>> On 04/06/2018 01:24 PM, James Morse wrote:
>> Do you have any ETA on when your SEA patches are going to make it
>> upstream? There's not much point in updating my patch
1 - 100 of 103 matches
Mail list logo