Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-22 Thread Nishanth Aravamudan
Hi Greg, On 26.01.2021 [08:29:25 +0100], Greg Kroah-Hartman wrote: > On Mon, Jan 25, 2021 at 11:55:11AM -0800, Scott Branden wrote: > > Hi All, > > > > The 5.10 LTS kernel being officially LTS supported for 2 years > > presents a problem: why would anyone select a 5.10 kernel with 2 > > year LTS

Re: [RFC 00/60] Coscheduling for Linux

2018-11-02 Thread Nishanth Aravamudan
On 17.09.2018 [13:33:15 +0200], Peter Zijlstra wrote: > On Fri, Sep 14, 2018 at 06:25:44PM +0200, Jan H. Schönherr wrote: > > On 09/14/2018 01:12 PM, Peter Zijlstra wrote: > > > On Fri, Sep 07, 2018 at 11:39:47PM +0200, Jan H. Schönherr wrote: > > > >> B) Why would I want this? > > > > > >>In

Re: Kernel panic when enabling cgroup2 io controller at runtime

2018-11-01 Thread Nishanth Aravamudan
On 01.11.2018 [12:03:40 -0700], Nishanth Aravamudan wrote: > Hi, > > tl;dr: I see a kernel NULL pointer dereference with Linus' master > (7c6c54b5) when enabling the IO cgroup2 controller at runtime. Is this > PEBKAC and if so what config option am I missing? Actually, t

Re: [RFC 61/60] cosched: Accumulated fixes and improvements

2018-09-26 Thread Nishanth Aravamudan
On 26.09.2018 [10:25:19 -0700], Nishanth Aravamudan wrote: > On 13.09.2018 [21:19:38 +0200], Jan H. Schönherr wrote: > > Here is an "extra" patch containing bug fixes and warning removals, > > that I have accumulated up to this point. > > > > It goes on top

Re: [RFC 61/60] cosched: Accumulated fixes and improvements

2018-09-26 Thread Nishanth Aravamudan
On 13.09.2018 [21:19:38 +0200], Jan H. Schönherr wrote: > Here is an "extra" patch containing bug fixes and warning removals, > that I have accumulated up to this point. > > It goes on top of the other 60 patches. (When it is time for v2, > these fixes will be integrated into the appropriate patch

Re: [RFC 00/60] Coscheduling for Linux

2018-09-13 Thread Nishanth Aravamudan
On 13.09.2018 [13:31:36 +0200], Jan H. Schönherr wrote: > On 09/13/2018 01:15 AM, Nishanth Aravamudan wrote: > > [...] if I just try to set machine's > > cpu.scheduled to 1, with no other changes (not even changing any child > > cgroup's cpu.scheduled

Re: [RFC 00/60] Coscheduling for Linux

2018-09-12 Thread Nishanth Aravamudan
On 13.09.2018 [01:18:14 +0200], Jan H. Schönherr wrote: > On 09/12/2018 09:34 PM, Jan H. Schönherr wrote: > > That said, I see a hang, too. It seems to happen, when there is a > > cpu.scheduled!=0 group that is not a direct child of the root task group. > > You seem to have "/sys/fs/cgroup/cpu/mach

Re: [RFC 00/60] Coscheduling for Linux

2018-09-12 Thread Nishanth Aravamudan
On 12.09.2018 [21:34:14 +0200], Jan H. Schönherr wrote: > On 09/12/2018 02:24 AM, Nishanth Aravamudan wrote: > > [ I am not subscribed to LKML, please keep me CC'd on replies ] > > > > I tried a simple test with several VMs (in my initial test, I have 48 > > idle 1

Re: [RFC 00/60] Coscheduling for Linux

2018-09-11 Thread Nishanth Aravamudan
[ I am not subscribed to LKML, please keep me CC'd on replies ] On 07.09.2018 [23:39:47 +0200], Jan H. Schönherr wrote: > This patch series extends CFS with support for coscheduling. The > implementation is versatile enough to cover many different > coscheduling use-cases, while at the same time b

Re: [PATCH 1/1 v4] drivers/nvme: default to 4k device page size

2015-11-06 Thread Nishanth Aravamudan
On 05.11.2015 [11:58:39 -0800], Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig > > ... but I doubt we'll ever bother updating it. Most architectures > with arger page sizes also have iommus and would need different settings > for different iommus vs direct mapping for

Re: [PATCH 1/1 v4] drivers/nvme: default to 4k device page size

2015-11-05 Thread Nishanth Aravamudan
On 05.11.2015 [11:58:39 -0800], Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig > > ... but I doubt we'll ever bother updating it. Most architectures > with arger page sizes also have iommus and would need different settings > for different iommus vs direct mapping for

[PATCH 1/1 v4] drivers/nvme: default to 4k device page size

2015-11-05 Thread Nishanth Aravamudan
On 03.11.2015 [13:46:25 +], Keith Busch wrote: > On Tue, Nov 03, 2015 at 05:18:24AM -0800, Christoph Hellwig wrote: > > On Fri, Oct 30, 2015 at 02:35:11PM -0700, Nishanth Aravamudan wrote: > > > diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c > &

Re: [PATCH 1/1 v3] drivers/nvme: default to 4k device page size

2015-10-30 Thread Nishanth Aravamudan
On 30.10.2015 [21:48:48 +], Keith Busch wrote: > On Fri, Oct 30, 2015 at 02:35:11PM -0700, Nishanth Aravamudan wrote: > > Given that it's 4K just about everywhere by default (and sort of > > implicitly expected to be, I guess), I think I'd prefer we default to > &g

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-30 Thread Nishanth Aravamudan
On 29.10.2015 [18:49:55 -0700], David Miller wrote: > From: Nishanth Aravamudan > Date: Thu, 29 Oct 2015 08:57:01 -0700 > > > So, would that imply changing just the NVMe driver code rather than > > adding the dma_page_shift API at all? What about > > architectures

[PATCH 1/1 v3] drivers/nvme: default to 4k device page size

2015-10-30 Thread Nishanth Aravamudan
On 29.10.2015 [17:20:43 +], Busch, Keith wrote: > On Thu, Oct 29, 2015 at 08:57:01AM -0700, Nishanth Aravamudan wrote: > > On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote: > > > We had a quick cht about this issue and I think we simply should > > > default to

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-29 Thread Nishanth Aravamudan
On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote: > On Wed, Oct 28, 2015 at 01:59:23PM +, Busch, Keith wrote: > > The "new" interface for all the other architectures is the same as the > > old one we've been using for the last 5 years. > > > > I welcome x86 maintainer feedback to confir

Re: [PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 28.10.2015 [11:20:05 +0900], Benjamin Herrenschmidt wrote: > On Tue, 2015-10-27 at 18:54 -0700, Nishanth Aravamudan wrote: > > > > In "bypass" mode, what TCE size is used? Is it guaranteed to be 4K? > > None :-) The TCEs are completely bypassed. You get a N:M

Re: [PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 28.10.2015 [12:00:20 +1100], Alexey Kardashevskiy wrote: > On 10/28/2015 09:27 AM, Nishanth Aravamudan wrote: > >On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote: > >>On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote: > >>>On Power, the kernel's pa

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-27 Thread Nishanth Aravamudan
On 27.10.2015 [17:53:22 -0700], David Miller wrote: > From: Nishanth Aravamudan > Date: Tue, 27 Oct 2015 15:20:10 -0700 > > > Well, looks like I should spin up a v4 anyways for the powerpc changes. > > So, to make sure I understand your point, should I make the generic >

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-27 Thread Nishanth Aravamudan
On 28.10.2015 [09:57:48 +1100], Julian Calaby wrote: > Hi Nishanth, > > On Wed, Oct 28, 2015 at 9:20 AM, Nishanth Aravamudan > wrote: > > On 26.10.2015 [18:27:46 -0700], David Miller wrote: > >> From: Nishanth Aravamudan > >> Date: Fri, 23 Oct 2015 13:54:2

Re: [PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 27.10.2015 [17:02:16 +1100], Alexey Kardashevskiy wrote: > On 10/24/2015 07:57 AM, Nishanth Aravamudan wrote: > >On Power, the kernel's page size can differ from the IOMMU's page size, > >so we need to override the generic implementation, which always returns > >

Re: [PATCH 4/7 v2] pseries/iommu: implement DDW-aware dma_get_page_shift

2015-10-27 Thread Nishanth Aravamudan
On 27.10.2015 [16:56:10 +1100], Alexey Kardashevskiy wrote: > On 10/24/2015 07:59 AM, Nishanth Aravamudan wrote: > >When DDW (Dynamic DMA Windows) are present for a device, we have stored > >the TCE (Translation Control Entry) size in a special device tree > >property. Check i

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-27 Thread Nishanth Aravamudan
On 26.10.2015 [18:27:46 -0700], David Miller wrote: > From: Nishanth Aravamudan > Date: Fri, 23 Oct 2015 13:54:20 -0700 > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > I won't object to this patch series, but if I had

Re: [PATCH 5/7] [RFC PATCH 5/7] sparc: rename kernel/iommu_common.h -> include/asm/iommu_common.h

2015-10-23 Thread Nishanth Aravamudan
[Apologies for the subject line, should just have the [RFC PATCH 5/7]] On 23.10.2015 [14:00:08 -0700], Nishanth Aravamudan wrote: > In order to cleanly expose the desired IOMMU page shift via the new > dma_get_page_shift API, we need to have the sparc constants available in > a mor

[PATCH 7/7 v2] drivers/nvme: default to the IOMMU page size

2015-10-23 Thread Nishanth Aravamudan
ge size for the default device page size, rather than the kernel's page size. With this patch, a NVMe device survives our internal hardware exerciser; the kernel BUGs within a few seconds without the patch. Signed-off-by: Nishanth Aravamudan --- v1 -> v2: Based upon feedback from Chris

[RFC PATCH 6/7] sparc/dma-mapping: override dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
On sparc, the kernel's page size differs from the IOMMU's page size, so override the generic implementation, which always returns the kernel's page size, and return IOMMU_PAGE_SHIFT instead. Signed-off-by: Nishanth Aravamudan --- I know very little about sparc, so please cor

[PATCH 5/7] [RFC PATCH 5/7] sparc: rename kernel/iommu_common.h -> include/asm/iommu_common.h

2015-10-23 Thread Nishanth Aravamudan
In order to cleanly expose the desired IOMMU page shift via the new dma_get_page_shift API, we need to have the sparc constants available in a more typical location. There should be no functional impact to this move, but it is untested. Signed-off-by: Nishanth Aravamudan --- arch/sparc/include

[PATCH 4/7 v2] pseries/iommu: implement DDW-aware dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
oking the value up in struct iommu_table. If we don't find a iommu_table, fallback to the kernel's page size. Signed-off-by: Nishanth Aravamudan --- arch/powerpc/platforms/pseries/iommu.c | 36 ++ 1 file changed, 36 insertions(+) diff --git a/arch/po

[PATCH 3/7 v2] powerpc/dma: implement per-platform dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
. DDW is a pseries-specific feature, so allow platforms to override the implementation of dma_get_page_shift if desired. Signed-off-by: Nishanth Aravamudan --- arch/powerpc/include/asm/machdep.h | 3 ++- arch/powerpc/kernel/dma.c | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff

Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-23 Thread Nishanth Aravamudan
[Sorry, subject should have been 0/7!] On 23.10.2015 [13:54:20 -0700], Nishanth Aravamudan wrote: > We received a bug report recently when DDW (64-bit direct DMA on Power) > is not enabled for NVMe devices. In that case, we fall back to 32-bit > DMA via the IOMMU, which is always done vi

[PATCH 2/7 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-23 Thread Nishanth Aravamudan
otherwise. Signed-off-by: Nishanth Aravamudan --- arch/powerpc/include/asm/dma-mapping.h | 3 +++ arch/powerpc/kernel/dma.c | 9 + 2 files changed, 12 insertions(+) diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 7f522c0..

[PATCH 1/7 v3] dma-mapping: add generic dma_get_page_shift API

2015-10-23 Thread Nishanth Aravamudan
Drivers like NVMe need to be able to determine the page size used for DMA transfers. Add a new API that defaults to return PAGE_SHIFT on all architectures. Signed-off-by: Nishanth Aravamudan --- v1 -> v2: Based upon feedback from Christoph Hellwig, implement the IOMMU page size lookup a

[PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA

2015-10-23 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PR

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-19 Thread Nishanth Aravamudan
On 15.10.2015 [15:52:19 -0700], Nishanth Aravamudan wrote: > On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote: > > Hi Nishanth, > > > > sorry for the late reply. > > > > > > On Power, since it's technically variable, we'd need a function.

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-15 Thread Nishanth Aravamudan
On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote: > Hi Nishanth, > > sorry for the late reply. > > > > On Power, since it's technically variable, we'd need a function. So are > > > you suggesting define'ing it to a function just on Power and leaving it > > > a constant elsewhere? > > > >

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-14 Thread Nishanth Aravamudan
Hi Christoph, On 12.10.2015 [14:06:51 -0700], Nishanth Aravamudan wrote: > On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote: > > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define > > with an #ifndef in common code? > > On Power, since it's

Re: [PATCH 1/2] powerpc/iommu: expose IOMMU page shift

2015-10-12 Thread Nishanth Aravamudan
On 12.10.2015 [09:03:52 -0700], Nishanth Aravamudan wrote: > On 06.10.2015 [14:19:43 +1100], David Gibson wrote: > > On Fri, Oct 02, 2015 at 10:18:00AM -0700, Nishanth Aravamudan wrote: > > > We will leverage this macro in the NVMe driver, which needs to know the > > >

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-12 Thread Nishanth Aravamudan
On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote: > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define > with an #ifndef in common code? On Power, since it's technically variable, we'd need a function. So are you suggesting define'ing it to a function just on Power and

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-12 Thread Nishanth Aravamudan
On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote: > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define > with an #ifndef in common code? I suppose we could do that -- I wasn't sure if the macro would be palatable. > Also not all architectures use dma-mapping-common.h

Re: [PATCH 1/2] powerpc/iommu: expose IOMMU page shift

2015-10-12 Thread Nishanth Aravamudan
On 06.10.2015 [14:19:43 +1100], David Gibson wrote: > On Fri, Oct 02, 2015 at 10:18:00AM -0700, Nishanth Aravamudan wrote: > > We will leverage this macro in the NVMe driver, which needs to know the > > configured IOMMU page shift to properly configure its device's page >

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
On 03.10.2015 [07:35:09 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote: > > Right, I did start with your advice and tried that approach, but it > > turned out I was wrong about the actual issue at the time. The problem >

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
On 03.10.2015 [06:51:06 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote: > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > So you chose to return the granularity of the iomm

[PATCH 5/5 v2] drivers/nvme: default to the IOMMU page size

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PR

[PATCH 4/5 v2] pseries/iommu: implement DDW-aware dma_get_page_shift

2015-10-02 Thread Nishanth Aravamudan
oking the value up in struct iommu_table. If we don't find a iommu_table, fallback to the kernel's page size. Signed-off-by: Nishanth Aravamudan diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 0946b98..1bf6471 100644 --- a/arch/po

[PATCH 3/5 v2] powerpc/dma: implement per-platform dma_get_page_shift

2015-10-02 Thread Nishanth Aravamudan
. DDW is a pseries-specific feature, so allow platforms to override the implementation of dma_get_page_shift if desired. Signed-off-by: Nishanth Aravamudan diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index cab6753..5c372e3 100644 --- a/arch/powerpc/include

[PATCH 2/5 v2] powerpc/dma-mapping: override dma_get_page_shift

2015-10-02 Thread Nishanth Aravamudan
otherwise. Signed-off-by: Nishanth Aravamudan diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h index 7f522c0..c5638f4 100644 --- a/arch/powerpc/include/asm/dma-mapping.h +++ b/arch/powerpc/include/asm/dma-mapping.h @@ -125,6 +125,9 @@ static inline v

[PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-02 Thread Nishanth Aravamudan
Drivers like NVMe need to be able to determine the page size used for DMA transfers. Add a new API that defaults to return PAGE_SHIFT on all architectures. Signed-off-by: Nishanth Aravamudan diff --git a/include/asm-generic/dma-mapping-common.h b/include/asm-generic/dma-mapping-common.h index

[PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the P

Re: [PATCH 2/2] drivers/nvme: default to the IOMMU page size on Power

2015-10-02 Thread Nishanth Aravamudan
On 02.10.2015 [10:25:44 -0700], Christoph Hellwig wrote: > Hi Nishanth, > > please expose this value through the generic DMA API instead of adding > architecture specific hacks to drivers. Ok, I'm happy to do that instead -- what I struggled with is that I don't have enough knowledge of the vario

[PATCH 2/2] drivers/nvme: default to the IOMMU page size on Power

2015-10-02 Thread Nishanth Aravamudan
survives our internal hardware exerciser; the kernel BUGs within a few seconds without the patch. Signed-off-by: Nishanth Aravamudan diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c index 7920c27..969a95e 100644 --- a/drivers/block/nvme-core.c +++ b/drivers/block/nvme-core

[PATCH 0/2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PR

[PATCH 1/2] powerpc/iommu: expose IOMMU page shift

2015-10-02 Thread Nishanth Aravamudan
We will leverage this macro in the NVMe driver, which needs to know the configured IOMMU page shift to properly configure its device's page size. Signed-off-by: Nishanth Aravamudan --- Given this is available, it seems reasonable to expose -- and it doesn't really make sense to make

Re: [PATCH RFC 3/5] powerpc:numa create 1:1 mappaing between chipid and nid

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote: > Once we have made the distinction between nid and chipid > create a 1:1 mapping between them. This makes compacting the > nids easy later. > > No functionality change. > > Signed-off-by: Raghavendra K T > --- > arch/powerpc/mm/numa.c | 36

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote: > Problem description: > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid > got from device tree is naturally mapped (directly) to nid. chipid is a OPA

Re: [PATCH RFC 4/5] powerpc:numa Add helper functions to maintain chipid to nid mapping

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:12 +0530], Raghavendra K T wrote: > Create arrays that maps serial nids and sparse chipids. > > Note: My original idea had only two arrays of chipid to nid map. Final > code is inspired by driver/acpi/numa.c that maps a proximity node with > a logical node by Takayoshi Kochi ,

Re: [PATCH RFC 2/5] powerpc:numa Rename functions referring to nid as chipid

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:10 +0530], Raghavendra K T wrote: > There is no change in the fuctionality > > Signed-off-by: Raghavendra K T > --- > arch/powerpc/mm/numa.c | 42 +- > 1 file changed, 21 insertions(+), 21 deletions(-) > > diff --git a/arch/powerpc/mm

Re: [PATCH RFC 3/5] powerpc:numa create 1:1 mappaing between chipid and nid

2015-09-28 Thread Nishanth Aravamudan
On 27.09.2015 [23:59:11 +0530], Raghavendra K T wrote: > Once we have made the distinction between nid and chipid > create a 1:1 mapping between them. This makes compacting the > nids easy later. Didn't the previous patch just do the opposite of... > @@ -286,7 +308,7 @@ int of_node_to_nid(struct

Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support

2015-09-28 Thread Nishanth Aravamudan
On 28.09.2015 [13:44:42 +0300], Denis Kirjanov wrote: > On 9/27/15, Raghavendra K T wrote: > > Problem description: > > Powerpc has sparse node numbering, i.e. on a 4 node system nodes are > > numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid > > got from device tree is natural

Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote: > On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote: > > On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: > > > Some architectures like POWER can have a NUMA node_possible_map that > > > contains

Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Nishanth Aravamudan
tch node_online_map on boot. > Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 > > Signed-off-by: Chris J Arges Acked-by: Nishanth Aravamudan > --- > net/openvswitch/flow_table.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/openvswitch/flow_tab

Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: > Some architectures like POWER can have a NUMA node_possible_map that > contains sparse entries. This causes memory corruption with openvswitch > since it allocates flow_cache with a multiple of num_possible_nodes() and Couldn't this also be fi

Re: [RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-15 Thread Nishanth Aravamudan
On 15.07.2015 [16:35:16 -0400], Tejun Heo wrote: > Hello, > > On Thu, Jul 02, 2015 at 04:02:02PM -0700, Nishanth Aravamudan wrote: > > we currently emit at boot: > > > > [0.00] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 > > > > After this commit, we

Re: [RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-10 Thread Nishanth Aravamudan
On 08.07.2015 [18:22:09 -0700], David Rientjes wrote: > On Thu, 2 Jul 2015, Nishanth Aravamudan wrote: > > > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we > > have an ordering issue during boot with early calls to cpu_to_node(). > > The value ret

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-10 Thread Nishanth Aravamudan
On 08.07.2015 [16:16:23 -0700], Nishanth Aravamudan wrote: > On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote: > > On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote: > > > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we > > > ha

Re: [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-08 Thread Nishanth Aravamudan
On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote: > On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote: > > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we > > have an ordering issue during boot with early calls to cpu_to_node(). > >

[RFC PATCH 2/2] powerpc/smp: use early_cpu_to_node() instead of direct references to numa_cpu_lookup_table

2015-07-02 Thread Nishanth Aravamudan
A simple move to a wrapper function to numa_cpu_lookup_table, now that power has the early_cpu_to_node() API. Signed-off-by: Nishanth Aravamudan diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ec9ec20..7bf333b 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc

[RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot

2015-07-02 Thread Nishanth Aravamudan
c: [0] 0 1 2 3 [1] 4 5 6 7 Signed-off-by: Nishanth Aravamudan diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 5f1048e..f2c4c89 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -39,6 +39,8 @@ static inlin

Re: powerpc,numa: Memory hotplug to memory-less nodes ?

2015-06-25 Thread Nishanth Aravamudan
On 24.06.2015 [07:13:36 -0500], Nathan Fontenot wrote: > On 06/23/2015 11:01 PM, Bharata B Rao wrote: > > So will it be correct to say that memory hotplug to memory-less node > > isn't supported by PowerPC kernel ? Should I enforce the same in QEMU > > for PowerKVM ? > > > > I'm not sure if that i

Re: [PATCH kernel] powerpc/powernv/ioda2: Add devices only from buses which belong to PE

2015-06-12 Thread Nishanth Aravamudan
On 12.06.2015 [16:47:03 +1000], Gavin Shan wrote: > On Fri, Jun 12, 2015 at 04:19:17PM +1000, Alexey Kardashevskiy wrote: > >The existing code puts all devices from a root PE to the same IOMMU group. > >However it is a possible situation when subordinate buses belong to > >separate PEs, in this cas

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-08 Thread Nishanth Aravamudan
On 08.05.2015 [15:47:26 -0700], Andrew Morton wrote: > On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka wrote: > > > On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote: > > > On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote: > > >>> What I find somew

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-05 Thread Nishanth Aravamudan
On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote: > On 03.04.2015 [09:57:35 +0200], Vlastimil Babka wrote: > > On 03/31/2015 11:48 AM, Michal Hocko wrote: > > >On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote: > > >>On 27.03.2015 [13:17:59 -0700], Dave

Re: Topology updates and NUMA-level sched domains

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [10:31:53 +0200], Peter Zijlstra wrote: > On Thu, Apr 09, 2015 at 03:29:56PM -0700, Nishanth Aravamudan wrote: > > > No, that's very much not the same. Even if it were dealing with hotplug > > > it would still assume the cpu to return to the same node. >

Re: Topology updates and NUMA-level sched domains

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [11:08:10 +0200], Peter Zijlstra wrote: > On Fri, Apr 10, 2015 at 10:31:53AM +0200, Peter Zijlstra wrote: > > Please, step back, look at what you're doing and ask yourself, will any > > sane person want to use this? Can they use this? > > > > If so, start by describing the desired us

Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-10 Thread Nishanth Aravamudan
On 10.04.2015 [14:37:19 +0300], Konstantin Khlebnikov wrote: > On 10.04.2015 01:58, Tanisha Aravamudan wrote: > >On 09.04.2015 [07:27:28 +0300], Konstantin Khlebnikov wrote: > >>On Thu, Apr 9, 2015 at 2:07 AM, Nishanth Aravamudan > >> wrote: > >>>On

Re: [PATCH] of: return NUMA_NO_NODE from fallback of_node_to_nid()

2015-04-08 Thread Nishanth Aravamudan
On 08.04.2015 [20:04:04 +0300], Konstantin Khlebnikov wrote: > On 08.04.2015 19:59, Konstantin Khlebnikov wrote: > >Node 0 might be offline as well as any other numa node, > >in this case kernel cannot handle memory allocation and crashes. Isn't the bug that numa_node_id() returned an offline node

Re: Topology updates and NUMA-level sched domains

2015-04-07 Thread Nishanth Aravamudan
On 07.04.2015 [12:21:47 +0200], Peter Zijlstra wrote: > On Mon, Apr 06, 2015 at 02:45:58PM -0700, Nishanth Aravamudan wrote: > > Hi Peter, > > > > As you are very aware, I think, power has some odd NUMA topologies (and > > changes to the those topologies) at run-time

Topology updates and NUMA-level sched domains

2015-04-06 Thread Nishanth Aravamudan
Hi Peter, As you are very aware, I think, power has some odd NUMA topologies (and changes to the those topologies) at run-time. In particular, we can see a topology at boot: Node 0: all Cpus Node 7: no cpus Then we get a notification from the hypervisor that a core (or two) have moved from node

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-04-03 Thread Nishanth Aravamudan
On 03.04.2015 [20:24:45 +0200], Michal Hocko wrote: > On Fri 03-04-15 10:43:57, Nishanth Aravamudan wrote: > > On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote: > [...] > > > I would expect kswapd would be looping endlessly because the zone > > > wouldn't be

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-04-03 Thread Nishanth Aravamudan
On 03.04.2015 [09:57:35 +0200], Vlastimil Babka wrote: > On 03/31/2015 11:48 AM, Michal Hocko wrote: > >On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote: > >>On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: > >>>On 03/27/2015 12:28 PM, Nishanth Aravamudan w

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-04-03 Thread Nishanth Aravamudan
On 31.03.2015 [11:48:29 +0200], Michal Hocko wrote: > On Fri 27-03-15 15:23:50, Nishanth Aravamudan wrote: > > On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: > > > On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote: > > > > @@ -2585,7 +2585,7 @@ static bool pfm

[PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-03-27 Thread Nishanth Aravamudan
On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote: > On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote: > > @@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat) > > > > for (i = 0; i <= ZONE_NORMAL; i++) { > >

Re: [PATCH] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable zones

2015-03-27 Thread Nishanth Aravamudan
[ Sorry, typo'd anton's address ] On 27.03.2015 [12:28:50 -0700], Nishanth Aravamudan wrote: > Based upon 675becce15 ("mm: vmscan: do not throttle based on pfmemalloc > reserves if node has no ZONE_NORMAL") from Mel. > > We have a system with the following t

[PATCH] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable zones

2015-03-27 Thread Nishanth Aravamudan
ge, the afore-mentioned 16M hugepage allocation succeeds and correctly round-robins between Nodes 1 and 3. Signed-off-by: Nishanth Aravamudan diff --git a/mm/vmscan.c b/mm/vmscan.c index dcd90c8..033c2b7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2585,7 +2585,7 @@ static bool pfmemalloc

Re: time / gtod seconds value out of sync?

2015-02-19 Thread Nishanth Aravamudan
Hi John! On 19.02.2015 [11:03:26 -0800], John Stultz wrote: > Hey Nish! Long time! yep :) > On Thu, Feb 19, 2015 at 10:35 AM, Nishanth Aravamudan > wrote: > > Hi John, > > > > We're seeing an interesting issue with the openposix testcase > > difftim

time / gtod seconds value out of sync?

2015-02-19 Thread Nishanth Aravamudan
Hi John, We're seeing an interesting issue with the openposix testcase difftime/1-1, which basically calls gtod/time, sleeps, calls time/gtod, then difftime and sees if they disagree. The issue occurs with either vDSO implementations or direct syscalls. We are seeing failures on ppc64le and x86_6

Re: [RFC Patch V1 00/30] Enable memoryless node on x86 platforms

2014-08-18 Thread Nishanth Aravamudan
Hi Gerry, On 25.07.2014 [09:50:01 +0800], Jiang Liu wrote: > > > On 2014/7/25 7:32, Nishanth Aravamudan wrote: > > On 23.07.2014 [16:20:24 +0800], Jiang Liu wrote: > >> > >> > >> On 2014/7/22 1:57, Nishanth Aravamudan wrote: > >>> On 21.07

Re: [RFC Patch V1 22/30] mm, of: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-28 Thread Nishanth Aravamudan
On 28.07.2014 [07:30:40 -0600], Grant Likely wrote: > On Mon, 21 Jul 2014 10:52:41 -0700, Nishanth Aravamudan > wrote: > > On 11.07.2014 [15:37:39 +0800], Jiang Liu wrote: > > > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id() > > > may ret

Re: [RFC Patch V1 00/30] Enable memoryless node on x86 platforms

2014-07-24 Thread Nishanth Aravamudan
On 23.07.2014 [16:20:24 +0800], Jiang Liu wrote: > > > On 2014/7/22 1:57, Nishanth Aravamudan wrote: > > On 21.07.2014 [10:41:59 -0700], Tony Luck wrote: > >> On Mon, Jul 21, 2014 at 10:23 AM, Nishanth Aravamudan > >> wrote: > >>> It seems like the i

Re: [RFC Patch V1 30/30] x86, NUMA: Online node earlier when doing CPU hot-addition

2014-07-24 Thread Nishanth Aravamudan
On 11.07.2014 [15:37:47 +0800], Jiang Liu wrote: > With typical CPU hot-addition flow on x86, PCI host bridges embedded > in physical processor are always associated with NOMA_NO_NODE, which > may cause sub-optimal performance. > 1) Handle CPU hot-addition notification > acpi_processor_add()

Re: [RFC Patch V1 29/30] mm, x86: Enable memoryless node support to better support CPU/memory hotplug

2014-07-24 Thread Nishanth Aravamudan
On 11.07.2014 [15:37:46 +0800], Jiang Liu wrote: > With current implementation, all CPUs within a NUMA node will be > assocaited with another NUMA node if the node has no memory installed. > --- > arch/x86/Kconfig|3 +++ > arch/x86/kernel/acpi/boot.c |5 - > arch/x86/ker

Re: [RFC Patch V1 15/30] mm, igb: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-21 Thread Nishanth Aravamudan
On 21.07.2014 [12:53:33 -0700], Alexander Duyck wrote: > I do agree the description should probably be changed. There shouldn't be > any panics involved, only a performance impact as it will be reallocating > always if it is on a node with no memory. Yep, thanks for the review. > My intention on

Re: [RFC Patch V1 00/30] Enable memoryless node on x86 platforms

2014-07-21 Thread Nishanth Aravamudan
On 21.07.2014 [10:41:59 -0700], Tony Luck wrote: > On Mon, Jul 21, 2014 at 10:23 AM, Nishanth Aravamudan > wrote: > > It seems like the issue is the order of onlining of resources on a > > specific x86 platform? > > Yes. When we online a node the BIOS hits us with

Re: [RFC Patch V1 22/30] mm, of: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-21 Thread Nishanth Aravamudan
On 11.07.2014 [15:37:39 +0800], Jiang Liu wrote: > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id() > may return a node without memory, and later cause system failure/panic > when calling kmalloc_node() and friends with returned node id. > So use cpu_to_mem()/numa_mem_id()

Re: [RFC Patch V1 28/30] mm: Update _mem_id_[] for every possible CPU when memory configuration changes

2014-07-21 Thread Nishanth Aravamudan
On 11.07.2014 [15:37:45 +0800], Jiang Liu wrote: > Current kernel only updates _mem_id_[cpu] for onlined CPUs when memory > configuration changes. So kernel may allocate memory from remote node > for a CPU if the CPU is still in absent or offline state even if the > node associated with the CPU has

Re: [RFC Patch V1 15/30] mm, igb: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-21 Thread Nishanth Aravamudan
On 11.07.2014 [15:37:32 +0800], Jiang Liu wrote: > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id() > may return a node without memory, and later cause system failure/panic > when calling kmalloc_node() and friends with returned node id. > So use cpu_to_mem()/numa_mem_id()

Re: [RFC Patch V1 17/30] mm, intel_powerclamp: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-21 Thread Nishanth Aravamudan
On 11.07.2014 [15:37:34 +0800], Jiang Liu wrote: > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id() > may return a node without memory, and later cause system failure/panic > when calling kmalloc_node() and friends with returned node id. > So use cpu_to_mem()/numa_mem_id()

Re: [RFC Patch V1 00/30] Enable memoryless node on x86 platforms

2014-07-21 Thread Nishanth Aravamudan
Hi Jiang, On 11.07.2014 [15:37:17 +0800], Jiang Liu wrote: > Previously we have posted a patch fix a memory crash issue caused by > memoryless node on x86 platforms, please refer to > http://comments.gmane.org/gmane.linux.kernel/1687425 > > As suggested by David Rientjes, the most suitable fix fo

Re: [RFC Patch V1 01/30] mm, kernel: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-21 Thread Nishanth Aravamudan
Hi Paul, On 11.07.2014 [08:14:05 -0700], Paul E. McKenney wrote: > On Fri, Jul 11, 2014 at 03:37:18PM +0800, Jiang Liu wrote: > > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id() > > may return a node without memory, and later cause system failure/panic > > when calling k

[RFC 2/2] powerpc: reorder per-cpu NUMA information's initialization

2014-07-17 Thread Nishanth Aravamudan
USE_PERCPU_NUMA_NODE_ID"). Those commits also helped improve memory consumption with these kind of environments. Signed-off-by: Nishanth Aravamudan diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 51a3ff7..91ff531 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powe

[RFC 1/2] workqueue: use the nearest NUMA node, not the local one

2014-07-17 Thread Nishanth Aravamudan
In the presence of memoryless nodes, the workqueue code incorrectly uses cpu_to_node() to determine what node to prefer memory allocations come from. cpu_to_mem() should be used instead, which will use the nearest NUMA node with memory. Signed-off-by: Nishanth Aravamudan diff --git a/kernel

  1   2   3   >