On Wed 31-05-17 23:35:48, Pasha Tatashin wrote:
> >OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
> >would do memset. You said it would be slower but would that be
> >measurable? I am sorry to be so persistent here but I would be really
> >happier if this didn't depend on the
OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
would do memset. You said it would be slower but would that be
measurable? I am sorry to be so persistent here but I would be really
happier if this didn't depend on the deferred initialization. If this is
absolutely a no-go the
From: Michal Hocko
Date: Wed, 31 May 2017 18:31:31 +0200
> On Tue 30-05-17 13:16:50, Pasha Tatashin wrote:
>> >Could you be more specific? E.g. how are other stores done in
>> >__init_single_page safe then? I am sorry to be dense here but how does
>> >the full 64B store differ from other stores d
On Tue 30-05-17 13:16:50, Pasha Tatashin wrote:
> >Could you be more specific? E.g. how are other stores done in
> >__init_single_page safe then? I am sorry to be dense here but how does
> >the full 64B store differ from other stores done in the same function.
>
> Hi Michal,
>
> It is safe to do
Could you be more specific? E.g. how are other stores done in
__init_single_page safe then? I am sorry to be dense here but how does
the full 64B store differ from other stores done in the same function.
Hi Michal,
It is safe to do regular 8-byte and smaller stores (stx, st, sth, stb)
without
On Fri 26-05-17 12:45:55, Pasha Tatashin wrote:
> Hi Michal,
>
> I have considered your proposals:
>
> 1. Making memset(0) unconditional inside __init_single_page() is not going
> to work because it slows down SPARC, and ppc64. On SPARC even the BSTI
> optimization that I have proposed earlier wo
Hi Michal,
I have considered your proposals:
1. Making memset(0) unconditional inside __init_single_page() is not
going to work because it slows down SPARC, and ppc64. On SPARC even the
BSTI optimization that I have proposed earlier won't work, because after
consulting with other engineers I
On Fri, 2017-05-12 at 13:37 -0400, David Miller wrote:
> > Right now it is larger, but what I suggested is to add a new optimized
> > routine just for this case, which would do STBI for 64-bytes but
> > without membar (do membar at the end of memmap_init_zone() and
> > deferred_init_memmap()
> >
>
On Mon 15-05-17 16:44:26, Pasha Tatashin wrote:
> On 05/15/2017 03:38 PM, Michal Hocko wrote:
> >I do not think this is the right approach. Your measurements just show
> >that sparc could have a more optimized memset for small sizes. If you
> >keep the same memset only for the parallel initializati
On 05/15/2017 03:38 PM, Michal Hocko wrote:
On Mon 15-05-17 14:12:10, Pasha Tatashin wrote:
Hi Michal,
After looking at your suggested memblock_virt_alloc_core() change again, I
decided to keep what I have. I do not want to inline
memblock_virt_alloc_internal(), because it is not a performanc
On Mon 15-05-17 14:12:10, Pasha Tatashin wrote:
> Hi Michal,
>
> After looking at your suggested memblock_virt_alloc_core() change again, I
> decided to keep what I have. I do not want to inline
> memblock_virt_alloc_internal(), because it is not a performance critical
> path, and by inlining it w
Hi Michal,
After looking at your suggested memblock_virt_alloc_core() change again,
I decided to keep what I have. I do not want to inline
memblock_virt_alloc_internal(), because it is not a performance critical
path, and by inlining it we will unnecessarily increase the text size on
all plat
From: Pasha Tatashin
Date: Fri, 12 May 2017 13:24:52 -0400
> Right now it is larger, but what I suggested is to add a new optimized
> routine just for this case, which would do STBI for 64-bytes but
> without membar (do membar at the end of memmap_init_zone() and
> deferred_init_memmap()
>
> #de
On 05/12/2017 12:57 PM, David Miller wrote:
From: Pasha Tatashin
Date: Thu, 11 May 2017 16:59:33 -0400
We should either keep memset() only for deferred struct pages as what
I have in my patches.
Another option is to add a new function struct_page_clear() which
would default to memset() and
From: Pasha Tatashin
Date: Thu, 11 May 2017 16:59:33 -0400
> We should either keep memset() only for deferred struct pages as what
> I have in my patches.
>
> Another option is to add a new function struct_page_clear() which
> would default to memset() and to something else on platforms that
> d
From: Pasha Tatashin
Date: Thu, 11 May 2017 16:47:05 -0400
> So, moving memset() into __init_single_page() benefits Intel. I am
> actually surprised why memset() is so slow on intel when it is called
> from memblock. But, hurts SPARC, I guess these membars at the end of
> memset() kills the perfo
We should either keep memset() only for deferred struct pages as what I
have in my patches.
Another option is to add a new function struct_page_clear() which would
default to memset() and to something else on platforms that decide to
optimize it.
On SPARC it would call STBIs, and we would do
Have you measured that? I do not think it would be super hard to
measure. I would be quite surprised if this added much if anything at
all as the whole struct page should be in the cache line already. We do
set reference count and other struct members. Almost nobody should be
looking at our page
From: Michal Hocko
Date: Thu, 11 May 2017 10:05:38 +0200
> Anyway, do you agree that doing the struct page initialization along
> with other writes to it shouldn't add a measurable overhead comparing
> to pre-zeroing of larger block of struct pages? We already have an
> exclusive cache line and
On Wed 10-05-17 11:19:43, David S. Miller wrote:
> From: Michal Hocko
> Date: Wed, 10 May 2017 16:57:26 +0200
>
> > Have you measured that? I do not think it would be super hard to
> > measure. I would be quite surprised if this added much if anything at
> > all as the whole struct page should be
On Wed, May 10, 2017 at 02:00:26PM -0400, David Miller wrote:
> From: Matthew Wilcox
> Date: Wed, 10 May 2017 10:17:03 -0700
> > On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote:
> >> I guess it might be clearer if you understand what the block
> >> initializing stores do on sparc64.
From: Matthew Wilcox
Date: Wed, 10 May 2017 10:17:03 -0700
> On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote:
>> From: Michal Hocko
>> Date: Wed, 10 May 2017 16:57:26 +0200
>>
>> > Have you measured that? I do not think it would be super hard to
>> > measure. I would be quite surpr
On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote:
> From: Michal Hocko
> Date: Wed, 10 May 2017 16:57:26 +0200
>
> > Have you measured that? I do not think it would be super hard to
> > measure. I would be quite surprised if this added much if anything at
> > all as the whole struct p
From: Pasha Tatashin
Date: Wed, 10 May 2017 11:01:40 -0400
> Perhaps you are right, and I will measure on x86. But, I suspect hit
> can become unacceptable on some platfoms: there is an overhead of
> calling a function, even if it is leaf-optimized, and there is an
> overhead in memset() to check
From: Michal Hocko
Date: Wed, 10 May 2017 16:57:26 +0200
> Have you measured that? I do not think it would be super hard to
> measure. I would be quite surprised if this added much if anything at
> all as the whole struct page should be in the cache line already. We do
> set reference count and o
On 05/10/2017 10:57 AM, Michal Hocko wrote:
On Wed 10-05-17 09:42:22, Pasha Tatashin wrote:
Well, I didn't object to this particular part. I was mostly concerned
about
http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com
and the "zero" argument for other funct
On Wed 10-05-17 09:42:22, Pasha Tatashin wrote:
> >
> >Well, I didn't object to this particular part. I was mostly concerned
> >about
> >http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com
> >and the "zero" argument for other functions. I guess we can do without
>
Well, I didn't object to this particular part. I was mostly concerned
about
http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com
and the "zero" argument for other functions. I guess we can do without
that. I _think_ that we should simply _always_ initialize the pa
On Tue 09-05-17 14:54:50, Pasha Tatashin wrote:
[...]
> >The implementation just looks too large to what I would expect. E.g. do
> >we really need to add zero argument to the large part of the memblock
> >API? Wouldn't it be easier to simply export memblock_virt_alloc_internal
> >(or its tiny wrapp
Hi Michal,
I like the idea of postponing the zeroing from the allocation to the
init time. To be honest the improvement looks much larger than I would
expect (Btw. this should be a part of the changelog rather than a
outside link).
The improvements are larger, because this time was never measu
On Fri 05-05-17 13:03:07, Pavel Tatashin wrote:
> Changelog:
> v2 - v3
> - Addressed David's comments about one change per patch:
> * Splited changes to platforms into 4 patches
> * Made "do not zero vmemmap_buf" as a separate patch
> v1 - v2
> -
Changelog:
v2 - v3
- Addressed David's comments about one change per patch:
* Splited changes to platforms into 4 patches
* Made "do not zero vmemmap_buf" as a separate patch
v1 - v2
- Per request, added s390 to deferred "struct page"
32 matches
Mail list logo