On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk wrote:
> To recap: a dual-socket Xeon (E5 v4) server system had been running a
> bunch of KVM workloads just fine for over 6 weeks. Suddenly hard
> lockups occurred on cpu 13 in task_numa_migrate(), and cpu 0 in
> idle_balance(). That conditi
problem with
an upstream kernel or any other, yet to my limited understanding of
the evidence it appears there may indeed be a real problem lurking in
there.
I will follow up with the grsec folks.
> On Thu, Nov 2, 2017 at 5:51 PM, Ed Swierk wrote:
>> Ping?
>>
>> On Wed,
Ping?
On Wed, Oct 25, 2017 at 9:35 PM, Ed Swierk wrote:
>
> Ping?
>
> On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk wrote:
> >
> > Ping for Peter, Ingo and other sched maintainers:
> >
> > I'd appreciate any feedback on this hard lockup issue, whic
Ping?
On Mon, Oct 16, 2017 at 4:11 PM, Ed Swierk wrote:
>
> Ping for Peter, Ingo and other sched maintainers:
>
> I'd appreciate any feedback on this hard lockup issue, which occurred
> on a system running kernel 4.4.52-grsec.
>
> To recap: a dual-socket Xeon (E5
Ping for Peter, Ingo and other sched maintainers:
I'd appreciate any feedback on this hard lockup issue, which occurred
on a system running kernel 4.4.52-grsec.
To recap: a dual-socket Xeon (E5 v4) server system had been running a
bunch of KVM workloads just fine for over 6 weeks. Suddenly hard
l
Continuing the conversation with the voices in my head...
On Mon, Oct 9, 2017 at 10:45 PM, Ed Swierk wrote:
> Based on the addresses in the stack and registers, here's what I think
> happened.
>
> On cpu 13:
>
> - task_numa_fault() calls task_numa_migrate(), which select
On Fri, Oct 6, 2017 at 6:25 PM, Ed Swierk wrote:
> I'm trying to untangle a series of problems that suddenly occurred on
> a dual-socket Xeon server system that had been running a bunch of KVM
> workloads just fine for over 6 weeks (4.4.52-grsec kernel,
> Debian-derived userspac
I'm trying to untangle a series of problems that suddenly occurred on
a dual-socket Xeon server system that had been running a bunch of KVM
workloads just fine for over 6 weeks (4.4.52-grsec kernel,
Debian-derived userspace).
Here are the highlights, with timestamps in seconds:
[3851435] NMI watc
On all supported platforms, the TS Reading (TSR) field in the
Temperature (TEMP) register is 9 bits wide. Values above 0x100 (78
degrees C) are plausible, so don't mask out the topmost bit. And the
register itself is 16 bits wide, so use readw() rather than readl().
Signed-off-by: Ed S
configure it, and the dynamic shutdown state
should not prevent the driver from loading. The ETS flag itself
indicates whether the thermal sensor is enabled, so use it instead of
the TSDSS flag on all hardware platforms.
Signed-off-by: Ed Swierk
---
drivers/thermal/intel_pch_thermal.c | 4 ++--
1 file
I have a Linux kernel 4.4 system hosting a number of kvm VMs. Physical
interface eth0 connects to an 802.1Q trunk port on an external switch. Each VM
has a virtual interface (e1000 or virtio-net) connected to the physical NIC
through a macvtap interface and a VLAN interface; traffic between the
On 8/31/16 13:57, Aaro Koskinen wrote:
> This series implements multiple RX group support that should improve
> the networking performance on multi-core OCTEONs. Basically we register
> IRQ and NAPI for each group, and ask the HW to select the group for
> the incoming packets based on hash.
>
> Te
On 8/31/16 14:20, Aaro Koskinen wrote:
> On Wed, Aug 31, 2016 at 09:20:07AM -0700, Ed Swierk wrote:
>> Here's my workaround:
>
> [...]
>
>> -static int cvm_oct_poll(struct oct_rx_group *rx_group, int budget)
>> +static int cvm_oct_poll(int group, int budget)
&
Aaro Koskinen wrote:
> Oops, looks like I tested without CONFIG_NET_POLL_CONTROLLER enabled
> and that seems to be broken. Sorry.
I'm not using CONFIG_NET_POLL_CONTROLLER either; the problem is in the
normal cvm_oct_napi_poll() path.
Here's my workaround:
--- a/drivers/staging/octeon/ethernet-rx
Hi Aaro,
On Tue, Aug 30, 2016 at 11:47 AM, Aaro Koskinen wrote:
> This series implements multiple RX group support that should improve
> the networking performance on multi-core OCTEONs. Basically we register
> IRQ and NAPI for each group, and ask the HW to select the group for
> the incoming pac
On Tue, Aug 16, 2016 at 3:07 AM, Juergen Gross wrote:
> On 15/08/16 17:02, Jan Beulich wrote:
>> This should really only be done for XS_TRANSACTION_END messages, or
>> else at least some of the xenstore-* tools don't work anymore.
>>
>> Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced
On Wed, Jul 13, 2016 at 10:36 AM, Jason Gunthorpe
wrote:
> I think your bios is broken?
The BIOS is broken in many ways. I already have to pass
memmap=256M$0x8000, otherwise PCIe extended config space
(MMCONFIG) is inaccessible. Also I found memmap=0x7000$0x7a7d
works around "APEI: Can no
On Wed, Jul 13, 2016 at 9:19 AM, Ed Swierk wrote:
> v9: Include command duration in existing error messages rather than
> logging an extra debug message. Rebase onto Jarkko's tree.
Incidentally, with Jarkko's tree the tpm_tis module refuses to
initialize (with or without force=1)
el configured with DYNAMIC_DEBUG=y.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
---
drivers/char/tpm/tpm-interface.c | 19 +--
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 5e3c1b6..a
Call tpm_getcap() from tpm_get_timeouts() to eliminate redundant
code. Return all errors to the caller rather than swallowing them
(e.g. when tpm_transmit_cmd() returns nonzero).
Signed-off-by: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 74 +++-
1 file
: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 148 ++-
drivers/char/tpm/tpm_tis_core.c | 32 +++--
include/linux/tpm.h | 3 +-
3 files changed, 94 insertions(+), 89 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers
chip-specific override of command durations as well as protocol
timeouts
- overrides ST19NP18 TPM command duration to avoid lockups
Ed Swierk (5):
tpm_tis: Improve reporting of IO errors
tpm: Add optional logging of TPM command durations
tpm: Clean up reading of timeout and duration capabil
to know whether any commands are immune to being blocked by
this process. So it seems safest to ignore the chip's reported command
durations, and use a value much higher than any observed duration,
like 180 sec (which is the duration this chip reports for "long"
commands).
Signed-off-b
Mysterious TPM behavior can be difficult to track down through all the
layers of software. Add error messages for conditions that should
never happen. Also include the manufacturer ID along with other chip
data printed during init.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
Call tpm_getcap() from tpm_get_timeouts() to eliminate redundant
code. Return all errors to the caller rather than swallowing them
(e.g. when tpm_transmit_cmd() returns nonzero).
Signed-off-by: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 74 +++-
1 file
el configured with DYNAMIC_DEBUG=y.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
---
drivers/char/tpm/tpm-interface.c | 17 +
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index c50637d..c
: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 143 +--
drivers/char/tpm/tpm_tis.c | 35 +++---
include/linux/tpm.h | 3 +-
3 files changed, 88 insertions(+), 93 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers
both timeouts and
durations via a single callback.
This series
- improves TPM command error reporting
- adds optional logging of TPM command durations
- allows chip-specific override of command durations as well as protocol
timeouts
- overrides ST19NP18 TPM command duration to avoid lockups
Ed
to know whether any commands are immune to being blocked by
this process. So it seems safest to ignore the chip's reported command
durations, and use a value much higher than any observed duration,
like 180 sec (which is the duration this chip reports for "long"
commands).
Signed-off-b
Mysterious TPM behavior can be difficult to track down through all the
layers of software. Add error messages for conditions that should
never happen. Also include the manufacturer ID along with other chip
data printed during init.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
On Mon, Jun 20, 2016 at 6:54 PM, Ed Swierk wrote:
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -461,9 +461,19 @@ ssize_t tpm_getcap(struct device *dev, __be32 subcap_id,
> cap_t *cap,
> tpm_cmd.params.getcap_in.subcap_size
Call tpm_getcap() from tpm_get_timeouts() to eliminate redundant
code. Return all errors to the caller rather than swallowing them
(e.g. when tpm_transmit_cmd() returns nonzero).
Signed-off-by: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 74 +++-
1 file
Mysterious TPM behavior can be difficult to track down through all the
layers of software. Add error messages for conditions that should
never happen. Also include the manufacturer ID along with other chip
data printed during init.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
single callback.
This series
- improves TPM command error reporting
- adds optional logging of TPM command durations
- allows chip-specific override of command durations as well as protocol
timeouts
- overrides ST19NP18 TPM command duration to avoid lockups
Ed Swierk (5):
tpm_tis: Improve
: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 143 +--
drivers/char/tpm/tpm_tis.c | 35 +++---
include/linux/tpm.h | 3 +-
3 files changed, 88 insertions(+), 93 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers
to know whether any commands are immune to being blocked by
this process. So it seems safest to ignore the chip's reported command
durations, and use a value much higher than any observed duration,
like 180 sec (which is the duration this chip reports for "long"
commands).
Signed-off-b
el configured with DYNAMIC_DEBUG=y.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
---
drivers/char/tpm/tpm-interface.c | 17 +
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index c50637d..c
el configured with DYNAMIC_DEBUG=y.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
---
drivers/char/tpm/tpm-interface.c | 17 +
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index c50637d..c
Factor sending the TPM_GetCapability command and validating the result
from tpm_get_timeouts() into a new function. Return all errors to the
caller rather than swallowing them (e.g. when tpm_transmit_cmd()
returns nonzero).
Signed-off-by: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 96
: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 139 +--
drivers/char/tpm/tpm_tis.c | 35 +++---
include/linux/tpm.h | 3 +-
3 files changed, 85 insertions(+), 92 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers
reporting
- adds optional logging of TPM command durations
- allows chip-specific override of command durations as well as protocol
timeouts
- overrides ST19NP18 TPM command duration to avoid lockups
Ed Swierk (5):
tpm_tis: Improve reporting of IO errors
tpm: Add optional logging of TPM command
to know whether any commands are immune to being blocked by
this process. So it seems safest to ignore the chip's reported command
durations, and use a value much higher than any observed duration,
like 180 sec (which is the duration this chip reports for "long"
commands).
Signed-off-b
Mysterious TPM behavior can be difficult to track down through all the
layers of software. Add error messages for conditions that should
never happen. Also include the manufacturer ID along with other chip
data printed during init.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
On Fri, Jun 10, 2016 at 12:42 PM, Jarkko Sakkinen
wrote:
> On Fri, Jun 10, 2016 at 10:34:15AM -0700, Ed Swierk wrote:
>> On Fri, Jun 10, 2016 at 5:19 AM, Jarkko Sakkinen
>> wrote:
>> > On Wed, Jun 08, 2016 at 04:00:17PM -0700, Ed Swierk wrote:
>> >> Some TPM
On Fri, Jun 10, 2016 at 5:19 AM, Jarkko Sakkinen
wrote:
> On Wed, Jun 08, 2016 at 04:00:17PM -0700, Ed Swierk wrote:
>> Some TPM chips report bogus command durations in their capabilities,
>> just as others report incorrect timeouts. Rework tpm_get_timeouts()
>> to allow chi
to know whether any commands are immune to being blocked by
this process. So it seems safest to ignore the chip's reported command
durations, and use a value much higher than any observed duration,
like 180 sec (which is the duration this chip reports for "long"
commands).
Signed-off-b
Mysterious TPM behavior can be difficult to track down through all the
layers of software. Add error messages for conditions that should
never happen. Also include the manufacturer ID along with other chip
data printed during init.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
durations as well as protocol
timeouts
- overrides ST19NP18 TPM command duration to avoid lockups
Ed Swierk (4):
tpm_tis: Improve reporting of IO errors
tpm: Add optional logging of TPM command durations
tpm: Allow TPM chip drivers to override reported command durations
tpm_tis: Increase
el configured with DYNAMIC_DEBUG=y.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
---
drivers/char/tpm/tpm-interface.c | 17 +
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index c50637d..c
: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 177 +--
drivers/char/tpm/tpm_tis.c | 35 ++--
include/linux/tpm.h | 3 +-
3 files changed, 106 insertions(+), 109 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers
On Wed, Jun 8, 2016 at 12:05 PM, Jason Gunthorpe
wrote:
> On Tue, Jun 07, 2016 at 05:45:39PM -0700, Ed Swierk wrote:
>> + case 0x32041114: /* Atmel 3204 */
>> + chip->vendor.timeout_a = TIS_SHORT_TIMEOUT * HZ / 1000;
>> + chip->vendor.timeou
el configured with DYNAMIC_DEBUG=y.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
---
drivers/char/tpm/tpm-interface.c | 17 +
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index c50637d..c
Mysterious TPM behavior can be difficult to track down through all the
layers of software. Add error messages for conditions that should
never happen. Also include the manufacturer ID along with other chip
data printed during init.
Signed-off-by: Ed Swierk
Reviewed-by: Jarkko Sakkinen
ST19NP18 TPM command duration to avoid lockups
Ed Swierk (4):
tpm_tis: Improve reporting of IO errors
tpm: Add optional logging of TPM command durations
tpm: Allow TPM chip drivers to override reported command durations
tpm_tis: Increase ST19NP18 TPM command duration to avoid chip lockup
to know whether any commands are immune to being blocked by
this process. So it seems safest to ignore the chip's reported command
durations, and use a value much higher than any observed duration,
like 180 sec (which is the duration this chip reports for "long"
commands).
Signed-off-b
: Ed Swierk
---
drivers/char/tpm/tpm-interface.c | 177 +--
drivers/char/tpm/tpm_tis.c | 35 ++--
include/linux/tpm.h | 3 +-
3 files changed, 106 insertions(+), 109 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers
I believe e967ef02 "MIPS: Fix restart of indirect syscalls" should be
backported to all stable kernels.
It would be a surprising coincidence if parisc suffers from the same problem.
--Ed
On Thu, Dec 17, 2015 at 4:54 AM, Mathieu Desnoyers
wrote:
> - On Dec 16, 2015, at 5:09 PM, Mathieu Desn
call the equivalent ethtool_ops functions provided
by network drivers with built-in phy support.
Signed-off-by: Ed Swierk
---
include/linux/phy.h | 9 +
net/core/ethtool.c | 45 ++---
2 files changed, 43 insertions(+), 11 deletions(-)
diff --git a
On 9/19/07, Ayaz Abdulla <[EMAIL PROTECTED]> wrote:
> It seems that you are powering down the phy even if WOL is enabled.
Right; I've updated the patch to skip powering down the phy when wol is enabled.
> Secondly, can you powerdown the phy at the same time you start
> performing autoneg restart?
Log "no link during initialization" at KERN_INFO as it's not an error,
and occurs every time the interface comes up (when the
forcedeth-phy-power-down patch is applied).
Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
forcedeth-open-no-link-printk.patch
Description: Binary data
Bring the physical link down when the interface is down, by placing
the PHY in power-down state. This mirrors the behavior of other
drivers including e1000 and tg3.
Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
forcedeth-phy-power-down.patch
Description: Binary data
On 9/11/07, Herbert Xu <[EMAIL PROTECTED]> wrote:
> Please make it 65535 without an Ethernet header and 65521
> with an Ethernet header.
Here is a revised patch that allows MTUs up to 65535 for tap
interfaces and up to 65521 for tun interfaces.
(If I set the MTU to 65521 on a tun interface, ping
the value used by the e1000 driver, so it seems
like a safe upper limit.
Signed-off-by: Ed Swierk <[EMAIL PROTECTED]>
---
tap-change-mtu.patch
Description: Binary data
Ed Swierk arastra.com> writes:
> I made the following change to kexec:
>
> [deleted]
>
> but I'm now seeing intermittent corruption of the initrd in the new kernel--a
> few bytes at different locations each time.
>
> I suspect I've neglected some other i
I'm attempting to get kexec to pass a command line longer than 256 bytes to the
new kernel, using kexec-tools-testing-20070330 and kernel 2.6.22.1. The new
kernel is a bzImage, and I'm using kexec to tack on a command line and an
initrd.
I made the following change to kexec:
--- kexec-tools-tes
65 matches
Mail list logo