Hi Greg,
On 26.01.2021 [08:29:25 +0100], Greg Kroah-Hartman wrote:
> On Mon, Jan 25, 2021 at 11:55:11AM -0800, Scott Branden wrote:
> > Hi All,
> >
> > The 5.10 LTS kernel being officially LTS supported for 2 years
> > presents a problem: why would anyone select a 5.10 kernel with 2
> > year LTS
On 17.09.2018 [13:33:15 +0200], Peter Zijlstra wrote:
> On Fri, Sep 14, 2018 at 06:25:44PM +0200, Jan H. Schönherr wrote:
> > On 09/14/2018 01:12 PM, Peter Zijlstra wrote:
> > > On Fri, Sep 07, 2018 at 11:39:47PM +0200, Jan H. Schönherr wrote:
>
> > >> B) Why would I want this?
> > >
> > >>In
On 01.11.2018 [12:03:40 -0700], Nishanth Aravamudan wrote:
> Hi,
>
> tl;dr: I see a kernel NULL pointer dereference with Linus' master
> (7c6c54b5) when enabling the IO cgroup2 controller at runtime. Is this
> PEBKAC and if so what config option am I missing?
Actually, t
On 26.09.2018 [10:25:19 -0700], Nishanth Aravamudan wrote:
> On 13.09.2018 [21:19:38 +0200], Jan H. Schönherr wrote:
> > Here is an "extra" patch containing bug fixes and warning removals,
> > that I have accumulated up to this point.
> >
> > It goes on top
On 13.09.2018 [21:19:38 +0200], Jan H. Schönherr wrote:
> Here is an "extra" patch containing bug fixes and warning removals,
> that I have accumulated up to this point.
>
> It goes on top of the other 60 patches. (When it is time for v2,
> these fixes will be integrated into the appropriate patch
On 13.09.2018 [13:31:36 +0200], Jan H. Schönherr wrote:
> On 09/13/2018 01:15 AM, Nishanth Aravamudan wrote:
> > [...] if I just try to set machine's
> > cpu.scheduled to 1, with no other changes (not even changing any child
> > cgroup's cpu.scheduled
On 13.09.2018 [01:18:14 +0200], Jan H. Schönherr wrote:
> On 09/12/2018 09:34 PM, Jan H. Schönherr wrote:
> > That said, I see a hang, too. It seems to happen, when there is a
> > cpu.scheduled!=0 group that is not a direct child of the root task group.
> > You seem to have "/sys/fs/cgroup/cpu/mach
On 12.09.2018 [21:34:14 +0200], Jan H. Schönherr wrote:
> On 09/12/2018 02:24 AM, Nishanth Aravamudan wrote:
> > [ I am not subscribed to LKML, please keep me CC'd on replies ]
> >
> > I tried a simple test with several VMs (in my initial test, I have 48
> > idle 1
[ I am not subscribed to LKML, please keep me CC'd on replies ]
On 07.09.2018 [23:39:47 +0200], Jan H. Schönherr wrote:
> This patch series extends CFS with support for coscheduling. The
> implementation is versatile enough to cover many different
> coscheduling use-cases, while at the same time b
On 05.11.2015 [11:58:39 -0800], Christoph Hellwig wrote:
> Looks fine,
>
> Reviewed-by: Christoph Hellwig
>
> ... but I doubt we'll ever bother updating it. Most architectures
> with arger page sizes also have iommus and would need different settings
> for different iommus vs direct mapping for
On 05.11.2015 [11:58:39 -0800], Christoph Hellwig wrote:
> Looks fine,
>
> Reviewed-by: Christoph Hellwig
>
> ... but I doubt we'll ever bother updating it. Most architectures
> with arger page sizes also have iommus and would need different settings
> for different iommus vs direct mapping for
On 03.11.2015 [13:46:25 +], Keith Busch wrote:
> On Tue, Nov 03, 2015 at 05:18:24AM -0800, Christoph Hellwig wrote:
> > On Fri, Oct 30, 2015 at 02:35:11PM -0700, Nishanth Aravamudan wrote:
> > > diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> &
On 30.10.2015 [21:48:48 +], Keith Busch wrote:
> On Fri, Oct 30, 2015 at 02:35:11PM -0700, Nishanth Aravamudan wrote:
> > Given that it's 4K just about everywhere by default (and sort of
> > implicitly expected to be, I guess), I think I'd prefer we default to
> &g
On 29.10.2015 [18:49:55 -0700], David Miller wrote:
> From: Nishanth Aravamudan
> Date: Thu, 29 Oct 2015 08:57:01 -0700
>
> > So, would that imply changing just the NVMe driver code rather than
> > adding the dma_page_shift API at all? What about
> > architectures
On 29.10.2015 [17:20:43 +], Busch, Keith wrote:
> On Thu, Oct 29, 2015 at 08:57:01AM -0700, Nishanth Aravamudan wrote:
> > On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote:
> > > We had a quick cht about this issue and I think we simply should
> > > default to
On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote:
> On Wed, Oct 28, 2015 at 01:59:23PM +, Busch, Keith wrote:
> > The "new" interface for all the other architectures is the same as the
> > old one we've been using for the last 5 years.
> >
> > I welcome x86 maintainer feedback to confir
On 28.10.2015 [11:20:05 +0900], Benjamin Herrenschmidt wrote:
> On Tue, 2015-10-27 at 18:54 -0700, Nishanth Aravamudan wrote:
> >
> > In "bypass" mode, what TCE size is used? Is it guaranteed to be 4K?
>
> None :-) The TCEs are completely bypassed. You get a N:M
On 28.10.2015 [12:00:20 +1100], Alexey Kardashevskiy wrote:
> On 10/28/2015 09:27 AM, Nishanth Aravamudan wrote:
> >On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote:
> >>On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote:
> >>>On Power, the kernel's pa
On 27.10.2015 [17:53:22 -0700], David Miller wrote:
> From: Nishanth Aravamudan
> Date: Tue, 27 Oct 2015 15:20:10 -0700
>
> > Well, looks like I should spin up a v4 anyways for the powerpc changes.
> > So, to make sure I understand your point, should I make the generic
>
On 28.10.2015 [09:57:48 +1100], Julian Calaby wrote:
> Hi Nishanth,
>
> On Wed, Oct 28, 2015 at 9:20 AM, Nishanth Aravamudan
> wrote:
> > On 26.10.2015 [18:27:46 -0700], David Miller wrote:
> >> From: Nishanth Aravamudan
> >> Date: Fri, 23 Oct 2015 13:54:2
On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote:
> On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote:
> >On Power, the kernel's page size can differ from the IOMMU's page size,
> >so we need to override the generic implementation, which always returns
> >
On 27.10.2015 [16:56:10 +1100], Alexey Kardashevskiy wrote:
> On 10/24/2015 07:59 AM, Nishanth Aravamudan wrote:
> >When DDW (Dynamic DMA Windows) are present for a device, we have stored
> >the TCE (Translation Control Entry) size in a special device tree
> >property. Check i
On 26.10.2015 [18:27:46 -0700], David Miller wrote:
> From: Nishanth Aravamudan
> Date: Fri, 23 Oct 2015 13:54:20 -0700
>
> > 1) add a generic dma_get_page_shift implementation that just returns
> > PAGE_SHIFT
>
> I won't object to this patch series, but if I had
[Apologies for the subject line, should just have the [RFC PATCH 5/7]]
On 23.10.2015 [14:00:08 -0700], Nishanth Aravamudan wrote:
> In order to cleanly expose the desired IOMMU page shift via the new
> dma_get_page_shift API, we need to have the sparc constants available in
> a mor
ge size for the default device
page size, rather than the kernel's page size.
With this patch, a NVMe device survives our internal hardware
exerciser; the kernel BUGs within a few seconds without the patch.
Signed-off-by: Nishanth Aravamudan
---
v1 -> v2:
Based upon feedback from Chris
On sparc, the kernel's page size differs from the IOMMU's page size, so
override the generic implementation, which always returns the kernel's
page size, and return IOMMU_PAGE_SHIFT instead.
Signed-off-by: Nishanth Aravamudan
---
I know very little about sparc, so please cor
In order to cleanly expose the desired IOMMU page shift via the new
dma_get_page_shift API, we need to have the sparc constants available in
a more typical location. There should be no functional impact to this
move, but it is untested.
Signed-off-by: Nishanth Aravamudan
---
arch/sparc/include
oking the value up in struct iommu_table. If we don't find
a iommu_table, fallback to the kernel's page size.
Signed-off-by: Nishanth Aravamudan
---
arch/powerpc/platforms/pseries/iommu.c | 36 ++
1 file changed, 36 insertions(+)
diff --git a/arch/po
. DDW is a pseries-specific feature, so allow
platforms to override the implementation of dma_get_page_shift if
desired.
Signed-off-by: Nishanth Aravamudan
---
arch/powerpc/include/asm/machdep.h | 3 ++-
arch/powerpc/kernel/dma.c | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)
diff
[Sorry, subject should have been 0/7!]
On 23.10.2015 [13:54:20 -0700], Nishanth Aravamudan wrote:
> We received a bug report recently when DDW (64-bit direct DMA on Power)
> is not enabled for NVMe devices. In that case, we fall back to 32-bit
> DMA via the IOMMU, which is always done vi
otherwise.
Signed-off-by: Nishanth Aravamudan
---
arch/powerpc/include/asm/dma-mapping.h | 3 +++
arch/powerpc/kernel/dma.c | 9 +
2 files changed, 12 insertions(+)
diff --git a/arch/powerpc/include/asm/dma-mapping.h
b/arch/powerpc/include/asm/dma-mapping.h
index 7f522c0..
Drivers like NVMe need to be able to determine the page size used for
DMA transfers. Add a new API that defaults to return PAGE_SHIFT on all
architectures.
Signed-off-by: Nishanth Aravamudan
---
v1 -> v2:
Based upon feedback from Christoph Hellwig, implement the IOMMU page
size lookup a
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).
The NVMe device driver, though, assumes that the DMA alignment for the
PR
On 15.10.2015 [15:52:19 -0700], Nishanth Aravamudan wrote:
> On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote:
> > Hi Nishanth,
> >
> > sorry for the late reply.
> >
> > > > On Power, since it's technically variable, we'd need a function.
On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote:
> Hi Nishanth,
>
> sorry for the late reply.
>
> > > On Power, since it's technically variable, we'd need a function. So are
> > > you suggesting define'ing it to a function just on Power and leaving it
> > > a constant elsewhere?
> > >
>
Hi Christoph,
On 12.10.2015 [14:06:51 -0700], Nishanth Aravamudan wrote:
> On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote:
> > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define
> > with an #ifndef in common code?
>
> On Power, since it's
On 12.10.2015 [09:03:52 -0700], Nishanth Aravamudan wrote:
> On 06.10.2015 [14:19:43 +1100], David Gibson wrote:
> > On Fri, Oct 02, 2015 at 10:18:00AM -0700, Nishanth Aravamudan wrote:
> > > We will leverage this macro in the NVMe driver, which needs to know the
> > >
On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote:
> Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define
> with an #ifndef in common code?
On Power, since it's technically variable, we'd need a function. So are
you suggesting define'ing it to a function just on Power and
On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote:
> Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define
> with an #ifndef in common code?
I suppose we could do that -- I wasn't sure if the macro would be
palatable.
> Also not all architectures use dma-mapping-common.h
On 06.10.2015 [14:19:43 +1100], David Gibson wrote:
> On Fri, Oct 02, 2015 at 10:18:00AM -0700, Nishanth Aravamudan wrote:
> > We will leverage this macro in the NVMe driver, which needs to know the
> > configured IOMMU page shift to properly configure its device's page
>
On 03.10.2015 [07:35:09 +1000], Benjamin Herrenschmidt wrote:
> On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote:
> > Right, I did start with your advice and tried that approach, but it
> > turned out I was wrong about the actual issue at the time. The problem
>
On 03.10.2015 [06:51:06 +1000], Benjamin Herrenschmidt wrote:
> On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote:
>
> > 1) add a generic dma_get_page_shift implementation that just returns
> > PAGE_SHIFT
>
> So you chose to return the granularity of the iomm
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).
The NVMe device driver, though, assumes that the DMA alignment for the
PR
oking the value up in struct iommu_table. If we don't find
a iommu_table, fallback to the kernel's page size.
Signed-off-by: Nishanth Aravamudan
diff --git a/arch/powerpc/platforms/pseries/iommu.c
b/arch/powerpc/platforms/pseries/iommu.c
index 0946b98..1bf6471 100644
--- a/arch/po
. DDW is a pseries-specific feature, so allow
platforms to override the implementation of dma_get_page_shift if
desired.
Signed-off-by: Nishanth Aravamudan
diff --git a/arch/powerpc/include/asm/machdep.h
b/arch/powerpc/include/asm/machdep.h
index cab6753..5c372e3 100644
--- a/arch/powerpc/include
otherwise.
Signed-off-by: Nishanth Aravamudan
diff --git a/arch/powerpc/include/asm/dma-mapping.h
b/arch/powerpc/include/asm/dma-mapping.h
index 7f522c0..c5638f4 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -125,6 +125,9 @@ static inline v
Drivers like NVMe need to be able to determine the page size used for
DMA transfers. Add a new API that defaults to return PAGE_SHIFT on all
architectures.
Signed-off-by: Nishanth Aravamudan
diff --git a/include/asm-generic/dma-mapping-common.h
b/include/asm-generic/dma-mapping-common.h
index
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).
The NVMe device driver, though, assumes that the DMA alignment for the
P
On 02.10.2015 [10:25:44 -0700], Christoph Hellwig wrote:
> Hi Nishanth,
>
> please expose this value through the generic DMA API instead of adding
> architecture specific hacks to drivers.
Ok, I'm happy to do that instead -- what I struggled with is that I
don't have enough knowledge of the vario
survives our internal hardware
exerciser; the kernel BUGs within a few seconds without the patch.
Signed-off-by: Nishanth Aravamudan
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 7920c27..969a95e 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).
The NVMe device driver, though, assumes that the DMA alignment for the
PR
We will leverage this macro in the NVMe driver, which needs to know the
configured IOMMU page shift to properly configure its device's page
size.
Signed-off-by: Nishanth Aravamudan
---
Given this is available, it seems reasonable to expose -- and it doesn't
really make sense to make
On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote:
> Once we have made the distinction between nid and chipid
> create a 1:1 mapping between them. This makes compacting the
> nids easy later.
>
> No functionality change.
>
> Signed-off-by: Raghavendra K T
> ---
> arch/powerpc/mm/numa.c | 36
On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote:
> Problem description:
> Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
> numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
> got from device tree is naturally mapped (directly) to nid.
chipid is a OPA
On 27.09.2015 [23:59:12 +0530], Raghavendra K T wrote:
> Create arrays that maps serial nids and sparse chipids.
>
> Note: My original idea had only two arrays of chipid to nid map. Final
> code is inspired by driver/acpi/numa.c that maps a proximity node with
> a logical node by Takayoshi Kochi ,
On 27.09.2015 [23:59:10 +0530], Raghavendra K T wrote:
> There is no change in the fuctionality
>
> Signed-off-by: Raghavendra K T
> ---
> arch/powerpc/mm/numa.c | 42 +-
> 1 file changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/arch/powerpc/mm
On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote:
> Once we have made the distinction between nid and chipid
> create a 1:1 mapping between them. This makes compacting the
> nids easy later.
Didn't the previous patch just do the opposite of...
> @@ -286,7 +308,7 @@ int of_node_to_nid(struct
On 28.09.2015 [13:44:42 +0300], Denis Kirjanov wrote:
> On 9/27/15, Raghavendra K T wrote:
> > Problem description:
> > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
> > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
> > got from device tree is natural
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote:
> On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote:
> > On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
> > > Some architectures like POWER can have a NUMA node_possible_map that
> > > contains
tch node_online_map on boot.
> Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
>
> Signed-off-by: Chris J Arges
Acked-by: Nishanth Aravamudan
> ---
> net/openvswitch/flow_table.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/flow_tab
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
> Some architectures like POWER can have a NUMA node_possible_map that
> contains sparse entries. This causes memory corruption with openvswitch
> since it allocates flow_cache with a multiple of num_possible_nodes() and
Couldn't this also be fi
On 15.07.2015 [16:35:16 -0400], Tejun Heo wrote:
> Hello,
>
> On Thu, Jul 02, 2015 at 04:02:02PM -0700, Nishanth Aravamudan wrote:
> > we currently emit at boot:
> >
> > [0.00] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7
> >
> > After this commit, we
On 08.07.2015 [18:22:09 -0700], David Rientjes wrote:
> On Thu, 2 Jul 2015, Nishanth Aravamudan wrote:
>
> > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
> > have an ordering issue during boot with early calls to cpu_to_node().
> > The value ret
On 08.07.2015 [16:16:23 -0700], Nishanth Aravamudan wrote:
> On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
> > On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
> > > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
> > > ha
On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
> On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
> > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
> > have an ordering issue during boot with early calls to cpu_to_node().
>
>
A simple move to a wrapper function to numa_cpu_lookup_table, now that
power has the early_cpu_to_node() API.
Signed-off-by: Nishanth Aravamudan
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ec9ec20..7bf333b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc
c: [0] 0 1 2 3 [1] 4 5 6 7
Signed-off-by: Nishanth Aravamudan
diff --git a/arch/powerpc/include/asm/topology.h
b/arch/powerpc/include/asm/topology.h
index 5f1048e..f2c4c89 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -39,6 +39,8 @@ static inlin
On 24.06.2015 [07:13:36 -0500], Nathan Fontenot wrote:
> On 06/23/2015 11:01 PM, Bharata B Rao wrote:
> > So will it be correct to say that memory hotplug to memory-less node
> > isn't supported by PowerPC kernel ? Should I enforce the same in QEMU
> > for PowerKVM ?
> >
>
> I'm not sure if that i
On 12.06.2015 [16:47:03 +1000], Gavin Shan wrote:
> On Fri, Jun 12, 2015 at 04:19:17PM +1000, Alexey Kardashevskiy wrote:
> >The existing code puts all devices from a root PE to the same IOMMU group.
> >However it is a possible situation when subordinate buses belong to
> >separate PEs, in this cas
On 08.05.2015 [15:47:26 -0700], Andrew Morton wrote:
> On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka wrote:
>
> > On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote:
> > > On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote:
> > >>> What I find somew
On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote:
> On 03.04.2015 [09:57:35 +0200], Vlastimil Babka wrote:
> > On 03/31/2015 11:48 AM, Michal Hocko wrote:
> > >On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote:
> > >>On 27.03.2015 [13:17:59 -0700], Dave
On 10.04.2015 [10:31:53 +0200], Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 03:29:56PM -0700, Nishanth Aravamudan wrote:
> > > No, that's very much not the same. Even if it were dealing with hotplug
> > > it would still assume the cpu to return to the same node.
>
On 10.04.2015 [11:08:10 +0200], Peter Zijlstra wrote:
> On Fri, Apr 10, 2015 at 10:31:53AM +0200, Peter Zijlstra wrote:
> > Please, step back, look at what you're doing and ask yourself, will any
> > sane person want to use this? Can they use this?
> >
> > If so, start by describing the desired us
On 10.04.2015 [14:37:19 +0300], Konstantin Khlebnikov wrote:
> On 10.04.2015 01:58, Tanisha Aravamudan wrote:
> >On 09.04.2015 [07:27:28 +0300], Konstantin Khlebnikov wrote:
> >>On Thu, Apr 9, 2015 at 2:07 AM, Nishanth Aravamudan
> >> wrote:
> >>>On
On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote:
> On 08.04.2015 19:59, Konstantin Khlebnikov wrote:
> >Node 0 might be offline as well as any other numa node,
> >in this case kernel cannot handle memory allocation and crashes.
Isn't the bug that numa_node_id() returned an offline node
On 07.04.2015 [12:21:47 +0200], Peter Zijlstra wrote:
> On Mon, Apr 06, 2015 at 02:45:58PM -0700, Nishanth Aravamudan wrote:
> > Hi Peter,
> >
> > As you are very aware, I think, power has some odd NUMA topologies (and
> > changes to the those topologies) at run-time
Hi Peter,
As you are very aware, I think, power has some odd NUMA topologies (and
changes to the those topologies) at run-time. In particular, we can see
a topology at boot:
Node 0: all Cpus
Node 7: no cpus
Then we get a notification from the hypervisor that a core (or two) have
moved from node
On 03.04.2015 [20:24:45 +0200], Michal Hocko wrote:
> On Fri 03-04-15 10:43:57, Nishanth Aravamudan wrote:
> > On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote:
> [...]
> > > I would expect kswapd would be looping endlessly because the zone
> > > wouldn't be
On 03.04.2015 [09:57:35 +0200], Vlastimil Babka wrote:
> On 03/31/2015 11:48 AM, Michal Hocko wrote:
> >On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote:
> >>On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote:
> >>>On 03/27/2015 12:28 PM, Nishanth Aravamudan w
On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote:
> On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote:
> > On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote:
> > > On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote:
> > > > @@ -2585,7 +2585,7 @@ static bool pfm
On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote:
> On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote:
> > @@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
> >
> > for (i = 0; i <= ZONE_NORMAL; i++) {
> >
[ Sorry, typo'd anton's address ]
On 27.03.2015 [12:28:50 -0700], Nishanth Aravamudan wrote:
> Based upon 675becce15 ("mm: vmscan: do not throttle based on pfmemalloc
> reserves if node has no ZONE_NORMAL") from Mel.
>
> We have a system with the following t
ge, the afore-mentioned 16M hugepage allocation succeeds
and correctly round-robins between Nodes 1 and 3.
Signed-off-by: Nishanth Aravamudan
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcd90c8..033c2b7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2585,7 +2585,7 @@ static bool pfmemalloc
Hi John!
On 19.02.2015 [11:03:26 -0800], John Stultz wrote:
> Hey Nish! Long time!
yep :)
> On Thu, Feb 19, 2015 at 10:35 AM, Nishanth Aravamudan
> wrote:
> > Hi John,
> >
> > We're seeing an interesting issue with the openposix testcase
> > difftim
Hi John,
We're seeing an interesting issue with the openposix testcase
difftime/1-1, which basically calls gtod/time, sleeps, calls time/gtod,
then difftime and sees if they disagree. The issue occurs with either
vDSO implementations or direct syscalls.
We are seeing failures on ppc64le and x86_6
Hi Gerry,
On 25.07.2014 [09:50:01 +0800], Jiang Liu wrote:
>
>
> On 2014/7/25 7:32, Nishanth Aravamudan wrote:
> > On 23.07.2014 [16:20:24 +0800], Jiang Liu wrote:
> >>
> >>
> >> On 2014/7/22 1:57, Nishanth Aravamudan wrote:
> >>> On 21.07
On 28.07.2014 [07:30:40 -0600], Grant Likely wrote:
> On Mon, 21 Jul 2014 10:52:41 -0700, Nishanth Aravamudan
> wrote:
> > On 11.07.2014 [15:37:39 +0800], Jiang Liu wrote:
> > > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> > > may ret
On 23.07.2014 [16:20:24 +0800], Jiang Liu wrote:
>
>
> On 2014/7/22 1:57, Nishanth Aravamudan wrote:
> > On 21.07.2014 [10:41:59 -0700], Tony Luck wrote:
> >> On Mon, Jul 21, 2014 at 10:23 AM, Nishanth Aravamudan
> >> wrote:
> >>> It seems like the i
On 11.07.2014 [15:37:47 +0800], Jiang Liu wrote:
> With typical CPU hot-addition flow on x86, PCI host bridges embedded
> in physical processor are always associated with NOMA_NO_NODE, which
> may cause sub-optimal performance.
> 1) Handle CPU hot-addition notification
> acpi_processor_add()
On 11.07.2014 [15:37:46 +0800], Jiang Liu wrote:
> With current implementation, all CPUs within a NUMA node will be
> assocaited with another NUMA node if the node has no memory installed.
> ---
> arch/x86/Kconfig|3 +++
> arch/x86/kernel/acpi/boot.c |5 -
> arch/x86/ker
On 21.07.2014 [12:53:33 -0700], Alexander Duyck wrote:
> I do agree the description should probably be changed. There shouldn't be
> any panics involved, only a performance impact as it will be reallocating
> always if it is on a node with no memory.
Yep, thanks for the review.
> My intention on
On 21.07.2014 [10:41:59 -0700], Tony Luck wrote:
> On Mon, Jul 21, 2014 at 10:23 AM, Nishanth Aravamudan
> wrote:
> > It seems like the issue is the order of onlining of resources on a
> > specific x86 platform?
>
> Yes. When we online a node the BIOS hits us with
On 11.07.2014 [15:37:39 +0800], Jiang Liu wrote:
> When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> may return a node without memory, and later cause system failure/panic
> when calling kmalloc_node() and friends with returned node id.
> So use cpu_to_mem()/numa_mem_id()
On 11.07.2014 [15:37:45 +0800], Jiang Liu wrote:
> Current kernel only updates _mem_id_[cpu] for onlined CPUs when memory
> configuration changes. So kernel may allocate memory from remote node
> for a CPU if the CPU is still in absent or offline state even if the
> node associated with the CPU has
On 11.07.2014 [15:37:32 +0800], Jiang Liu wrote:
> When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> may return a node without memory, and later cause system failure/panic
> when calling kmalloc_node() and friends with returned node id.
> So use cpu_to_mem()/numa_mem_id()
On 11.07.2014 [15:37:34 +0800], Jiang Liu wrote:
> When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> may return a node without memory, and later cause system failure/panic
> when calling kmalloc_node() and friends with returned node id.
> So use cpu_to_mem()/numa_mem_id()
Hi Jiang,
On 11.07.2014 [15:37:17 +0800], Jiang Liu wrote:
> Previously we have posted a patch fix a memory crash issue caused by
> memoryless node on x86 platforms, please refer to
> http://comments.gmane.org/gmane.linux.kernel/1687425
>
> As suggested by David Rientjes, the most suitable fix fo
Hi Paul,
On 11.07.2014 [08:14:05 -0700], Paul E. McKenney wrote:
> On Fri, Jul 11, 2014 at 03:37:18PM +0800, Jiang Liu wrote:
> > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> > may return a node without memory, and later cause system failure/panic
> > when calling k
USE_PERCPU_NUMA_NODE_ID"). Those commits also
helped improve memory consumption with these kind of environments.
Signed-off-by: Nishanth Aravamudan
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 51a3ff7..91ff531 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powe
In the presence of memoryless nodes, the workqueue code incorrectly uses
cpu_to_node() to determine what node to prefer memory allocations come
from. cpu_to_mem() should be used instead, which will use the nearest
NUMA node with memory.
Signed-off-by: Nishanth Aravamudan
diff --git a/kernel
1 - 100 of 254 matches
Mail list logo