subject:"Re\: Memory hotplug softlock issue"

Re: Memory hotplug softlock issue

2018-11-21 Thread Hugh Dickins

On Wed, 21 Nov 2018, Michal Hocko wrote: > On Mon 19-11-18 21:44:41, Hugh Dickins wrote: > [...] > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > > > We have all assumed that it is essential to hold a page reference while > > waiting on a page lock: partly to guarantee that

Re: Memory hotplug softlock issue

2018-11-21 Thread Michal Hocko

On Mon 19-11-18 21:44:41, Hugh Dickins wrote: [...] > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > We have all assumed that it is essential to hold a page reference while > waiting on a page lock: partly to guarantee that there is still a struct > page when MEMORY_HOTREMOVE

Re: Memory hotplug softlock issue

2018-11-20 Thread Hugh Dickins

On Tue, 20 Nov 2018, Hugh Dickins wrote: > On Tue, 20 Nov 2018, Vlastimil Babka wrote: > > > > > > finish_wait(q, wait); > > > > ... the code continues by: > > > > if (thrashing) { > > if (!PageSwapBacked(page)) > > > > So maybe we should not set 'thrashing' true when

Re: Memory hotplug softlock issue

2018-11-20 Thread Hugh Dickins

On Tue, 20 Nov 2018, Baoquan He wrote: > On 11/20/18 at 02:38pm, Vlastimil Babka wrote: > > On 11/20/18 6:44 AM, Hugh Dickins wrote: > > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > > > > > We have all assumed that it is essential to hold a page reference while > > > wait

Re: Memory hotplug softlock issue

2018-11-20 Thread Hugh Dickins

On Tue, 20 Nov 2018, Vlastimil Babka wrote: > On 11/20/18 6:44 AM, Hugh Dickins wrote: > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > > > We have all assumed that it is essential to hold a page reference while > > waiting on a page lock: partly to guarantee that there is

Re: Memory hotplug softlock issue

2018-11-20 Thread Baoquan He

On 11/20/18 at 03:05pm, Michal Hocko wrote: > > Yes, I applied Hugh's patch 8 hours ago, then our QE Ping operated on > > that machine, after many times of hot removing/adding, the endless > > looping during mirgrating is not seen any more. The test result for > > Hugh's patch is positive. I even s

Re: Memory hotplug softlock issue

2018-11-20 Thread Michal Hocko

On Tue 20-11-18 21:58:03, Baoquan He wrote: > Hi, > > On 11/20/18 at 02:38pm, Vlastimil Babka wrote: > > On 11/20/18 6:44 AM, Hugh Dickins wrote: > > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > > > > > We have all assumed that it is essential to hold a page reference wh

Re: Memory hotplug softlock issue

2018-11-20 Thread Baoquan He

Hi, On 11/20/18 at 02:38pm, Vlastimil Babka wrote: > On 11/20/18 6:44 AM, Hugh Dickins wrote: > > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > > > We have all assumed that it is essential to hold a page reference while > > waiting on a page lock: partly to guarantee that t

Re: Memory hotplug softlock issue

2018-11-20 Thread Vlastimil Babka

On 11/20/18 6:44 AM, Hugh Dickins wrote: > [PATCH] mm: put_and_wait_on_page_locked() while page is migrated > > We have all assumed that it is essential to hold a page reference while > waiting on a page lock: partly to guarantee that there is still a struct > page when MEMORY_HOTREMOVE is configu

Re: Memory hotplug softlock issue

2018-11-19 Thread Hugh Dickins

On Tue, 20 Nov 2018, Baoquan He wrote: > On 11/19/18 at 09:59pm, Michal Hocko wrote: > > On Mon 19-11-18 12:34:09, Hugh Dickins wrote: > > > I'm glad that I delayed, what I had then (migration_waitqueue instead > > > of using page_waitqueue) was not wrong, but what I've been using the > > > last co

Re: Memory hotplug softlock issue

2018-11-19 Thread Baoquan He

On 11/19/18 at 09:59pm, Michal Hocko wrote: > On Mon 19-11-18 12:34:09, Hugh Dickins wrote: > > I'm glad that I delayed, what I had then (migration_waitqueue instead > > of using page_waitqueue) was not wrong, but what I've been using the > > last couple of months is rather better (and can be put t

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 12:34:09, Hugh Dickins wrote: > On Mon, 19 Nov 2018, Michal Hocko wrote: > > On Mon 19-11-18 15:10:16, Michal Hocko wrote: > > [...] > > > In other words. Why cannot we do the following? > > > > Baoquan, this is certainly not the right fix but I would be really > > curious whether

Re: Memory hotplug softlock issue

2018-11-19 Thread Hugh Dickins

On Mon, 19 Nov 2018, Michal Hocko wrote: > On Mon 19-11-18 15:10:16, Michal Hocko wrote: > [...] > > In other words. Why cannot we do the following? > > Baoquan, this is certainly not the right fix but I would be really > curious whether it makes the problem go away. > > > diff --git a/mm/migrate

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 15:10:16, Michal Hocko wrote: [...] > In other words. Why cannot we do the following? Baoquan, this is certainly not the right fix but I would be really curious whether it makes the problem go away. > diff --git a/mm/migrate.c b/mm/migrate.c > index f7e4bfdc13b7..7ccab29bcf9a 1006

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 17:48:35, Vlastimil Babka wrote: > On 11/19/18 5:46 PM, Vlastimil Babka wrote: > > On 11/19/18 5:46 PM, Michal Hocko wrote: > >> On Mon 19-11-18 17:36:21, Vlastimil Babka wrote: > >>> > >>> So what protects us from locking a page whose refcount dropped to zero? > >>> and is being fr

Re: Memory hotplug softlock issue

2018-11-19 Thread Vlastimil Babka

On 11/19/18 5:46 PM, Vlastimil Babka wrote: > On 11/19/18 5:46 PM, Michal Hocko wrote: >> On Mon 19-11-18 17:36:21, Vlastimil Babka wrote: >>> >>> So what protects us from locking a page whose refcount dropped to zero? >>> and is being freed? The checks in freeing path won't be happy about a >>> st

Re: Memory hotplug softlock issue

2018-11-19 Thread Vlastimil Babka

On 11/19/18 5:46 PM, Michal Hocko wrote: > On Mon 19-11-18 17:36:21, Vlastimil Babka wrote: >> >> So what protects us from locking a page whose refcount dropped to zero? >> and is being freed? The checks in freeing path won't be happy about a >> stray lock. > > Nothing really prevents that. But do

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 17:36:21, Vlastimil Babka wrote: > On 11/19/18 3:10 PM, Michal Hocko wrote: > > On Mon 19-11-18 13:51:21, Michal Hocko wrote: > >> On Mon 19-11-18 13:40:33, Michal Hocko wrote: > >>> How are > >>> we supposed to converge when the swapin code waits for the migration to > >>> finish w

Re: Memory hotplug softlock issue

2018-11-19 Thread Vlastimil Babka

On 11/19/18 3:10 PM, Michal Hocko wrote: > On Mon 19-11-18 13:51:21, Michal Hocko wrote: >> On Mon 19-11-18 13:40:33, Michal Hocko wrote: >>> How are >>> we supposed to converge when the swapin code waits for the migration to >>> finish with the reference count elevated? Indeed this looks wrong. H

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 13:51:21, Michal Hocko wrote: > On Mon 19-11-18 13:40:33, Michal Hocko wrote: > > On Mon 19-11-18 18:52:02, Baoquan He wrote: > > [...] > > > > There are few stacks directly in the offline path but those should be > > OK. > > The real culprit seems to be the swap in code > > > > >

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 13:40:33, Michal Hocko wrote: > On Mon 19-11-18 18:52:02, Baoquan He wrote: > [...] > > There are few stacks directly in the offline path but those should be > OK. > The real culprit seems to be the swap in code > > > [ +1.734416] CPU: 255 PID: 5558 Comm: stress Tainted: G

Re: Memory hotplug softlock issue

2018-11-19 Thread Michal Hocko

On Mon 19-11-18 18:52:02, Baoquan He wrote: [...] There are few stacks directly in the offline path but those should be OK. The real culprit seems to be the swap in code > [ +1.734416] CPU: 255 PID: 5558 Comm: stress Tainted: G L > 4.20.0-rc2+ #7 > [ +0.007927] Hardware name: 9

Re: Memory hotplug softlock issue

2018-11-16 Thread Baoquan He

On 11/16/18 at 10:14am, Michal Hocko wrote: > Could you try to apply this debugging patch on top please? It will dump > stack trace for each reference count elevation for one page that fails > to migrate after multiple passes. Thanks, applied and fixed two code issues. The dmesg has been sent to y

Re: Memory hotplug softlock issue

2018-11-16 Thread Michal Hocko

On Fri 16-11-18 09:24:33, Baoquan He wrote: > On 11/15/18 at 03:32pm, Michal Hocko wrote: > > On Thu 15-11-18 21:38:40, Baoquan He wrote: > > > On 11/15/18 at 02:19pm, Michal Hocko wrote: > > > > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > > > > On 11/15/18 at 09:30am, Michal Hocko wrote: > > >

Re: Memory hotplug softlock issue

2018-11-15 Thread Baoquan He

On 11/15/18 at 03:32pm, Michal Hocko wrote: > On Thu 15-11-18 21:38:40, Baoquan He wrote: > > On 11/15/18 at 02:19pm, Michal Hocko wrote: > > > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > > > On 11/15/18 at 09:30am, Michal Hocko wrote: > > > [...] > > > > > It would be also good to find out whe

Re: Memory hotplug softlock issue

2018-11-15 Thread Baoquan He

On 11/15/18 at 03:32pm, Michal Hocko wrote: > On Thu 15-11-18 21:38:40, Baoquan He wrote: > > On 11/15/18 at 02:19pm, Michal Hocko wrote: > > > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > > > On 11/15/18 at 09:30am, Michal Hocko wrote: > > > [...] > > > > > It would be also good to find out whe

Re: Memory hotplug softlock issue

2018-11-15 Thread Michal Hocko

On Thu 15-11-18 21:38:40, Baoquan He wrote: > On 11/15/18 at 02:19pm, Michal Hocko wrote: > > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > > On 11/15/18 at 09:30am, Michal Hocko wrote: > > [...] > > > > It would be also good to find out whether this is fs specific. E.g. does > > > > it make any

Re: Memory hotplug softlock issue

2018-11-15 Thread Michal Hocko

On Thu 15-11-18 21:23:42, Baoquan He wrote: > On 11/15/18 at 02:19pm, Michal Hocko wrote: > > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > > On 11/15/18 at 09:30am, Michal Hocko wrote: > > [...] > > > > It would be also good to find out whether this is fs specific. E.g. does > > > > it make any

Re: Memory hotplug softlock issue

2018-11-15 Thread Baoquan He

On 11/15/18 at 02:19pm, Michal Hocko wrote: > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > On 11/15/18 at 09:30am, Michal Hocko wrote: > [...] > > > It would be also good to find out whether this is fs specific. E.g. does > > > it make any difference if you use a different one for your stress >

Re: Memory hotplug softlock issue

2018-11-15 Thread Baoquan He

On 11/15/18 at 02:19pm, Michal Hocko wrote: > On Thu 15-11-18 21:12:11, Baoquan He wrote: > > On 11/15/18 at 09:30am, Michal Hocko wrote: > [...] > > > It would be also good to find out whether this is fs specific. E.g. does > > > it make any difference if you use a different one for your stress >

Re: Memory hotplug softlock issue

2018-11-15 Thread Michal Hocko

On Thu 15-11-18 21:12:11, Baoquan He wrote: > On 11/15/18 at 09:30am, Michal Hocko wrote: [...] > > It would be also good to find out whether this is fs specific. E.g. does > > it make any difference if you use a different one for your stress > > testing? > > Created a ramdisk and put stress bin t

Re: Memory hotplug softlock issue

2018-11-15 Thread Baoquan He

On 11/15/18 at 09:30am, Michal Hocko wrote: > On Thu 15-11-18 15:53:56, Baoquan He wrote: > > On 11/15/18 at 08:30am, Michal Hocko wrote: > > > On Thu 15-11-18 13:10:34, Baoquan He wrote: > > > > On 11/14/18 at 04:00pm, Michal Hocko wrote: > > > > > On Wed 14-11-18 22:52:50, Baoquan He wrote: > > >

Re: Memory hotplug softlock issue

2018-11-15 Thread David Hildenbrand

On 15.11.18 10:52, Baoquan He wrote: > On 11/15/18 at 10:42am, David Hildenbrand wrote: >> I am wondering why it is always the last memory block of that device >> (and even that node). Coincidence? > > I remember one or two times it's the last 6G or 4G which stall there, > the size of memory block

Re: Memory hotplug softlock issue

2018-11-15 Thread Baoquan He

On 11/15/18 at 10:42am, David Hildenbrand wrote: > I am wondering why it is always the last memory block of that device > (and even that node). Coincidence? I remember one or two times it's the last 6G or 4G which stall there, the size of memory block is 2G. But most of time it's the last memory b

Re: Memory hotplug softlock issue

2018-11-15 Thread David Hildenbrand

On 15.11.18 09:30, Michal Hocko wrote: > On Thu 15-11-18 15:53:56, Baoquan He wrote: >> On 11/15/18 at 08:30am, Michal Hocko wrote: >>> On Thu 15-11-18 13:10:34, Baoquan He wrote: On 11/14/18 at 04:00pm, Michal Hocko wrote: > On Wed 14-11-18 22:52:50, Baoquan He wrote: >> On 11/14/18 a

Re: Memory hotplug softlock issue

2018-11-15 Thread Michal Hocko

On Thu 15-11-18 15:53:56, Baoquan He wrote: > On 11/15/18 at 08:30am, Michal Hocko wrote: > > On Thu 15-11-18 13:10:34, Baoquan He wrote: > > > On 11/14/18 at 04:00pm, Michal Hocko wrote: > > > > On Wed 14-11-18 22:52:50, Baoquan He wrote: > > > > > On 11/14/18 at 10:01am, Michal Hocko wrote: > > >

Re: Memory hotplug softlock issue

2018-11-14 Thread Baoquan He

On 11/15/18 at 08:30am, Michal Hocko wrote: > On Thu 15-11-18 13:10:34, Baoquan He wrote: > > On 11/14/18 at 04:00pm, Michal Hocko wrote: > > > On Wed 14-11-18 22:52:50, Baoquan He wrote: > > > > On 11/14/18 at 10:01am, Michal Hocko wrote: > > > > > I have seen an issue when the migration cannot ma

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

On Thu 15-11-18 13:10:34, Baoquan He wrote: > On 11/14/18 at 04:00pm, Michal Hocko wrote: > > On Wed 14-11-18 22:52:50, Baoquan He wrote: > > > On 11/14/18 at 10:01am, Michal Hocko wrote: > > > > I have seen an issue when the migration cannot make a forward progress > > > > because of a glibc page

Re: Memory hotplug softlock issue

2018-11-14 Thread Baoquan He

On 11/14/18 at 04:00pm, Michal Hocko wrote: > On Wed 14-11-18 22:52:50, Baoquan He wrote: > > On 11/14/18 at 10:01am, Michal Hocko wrote: > > > I have seen an issue when the migration cannot make a forward progress > > > because of a glibc page with a reference count bumping up and down. Most > > >

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

On Wed 14-11-18 22:52:50, Baoquan He wrote: > On 11/14/18 at 10:01am, Michal Hocko wrote: > > I have seen an issue when the migration cannot make a forward progress > > because of a glibc page with a reference count bumping up and down. Most > > probable explanation is the faultaround code. I am wo

Re: Memory hotplug softlock issue

2018-11-14 Thread Baoquan He

On 11/14/18 at 10:01am, Michal Hocko wrote: > I have seen an issue when the migration cannot make a forward progress > because of a glibc page with a reference count bumping up and down. Most > probable explanation is the faultaround code. I am working on this and > will post a patch soon. In any c

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

On Wed 14-11-18 10:48:09, David Hildenbrand wrote: > On 14.11.18 10:41, Michal Hocko wrote: > > On Wed 14-11-18 10:25:57, David Hildenbrand wrote: > >> On 14.11.18 10:00, Baoquan He wrote: > >>> Hi David, > >>> > >>> On 11/14/18 at 09:18am, David Hildenbrand wrote: > Code seems to be waiting f

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

[Cc Vladimir] On Wed 14-11-18 15:09:09, Baoquan He wrote: > Hi, > > Tested memory hotplug on a bare metal system, hot removing always > trigger a lock. Usually need hot plug/unplug several times, then the hot > removing will hang there at the last block. Surely with memory pressure > added by exe

Re: Memory hotplug softlock issue

2018-11-14 Thread David Hildenbrand

On 14.11.18 10:41, Michal Hocko wrote: > On Wed 14-11-18 10:25:57, David Hildenbrand wrote: >> On 14.11.18 10:00, Baoquan He wrote: >>> Hi David, >>> >>> On 11/14/18 at 09:18am, David Hildenbrand wrote: Code seems to be waiting for the mem_hotplug_lock in read. We hold mem_hotplug_lock in

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

On Wed 14-11-18 10:25:57, David Hildenbrand wrote: > On 14.11.18 10:00, Baoquan He wrote: > > Hi David, > > > > On 11/14/18 at 09:18am, David Hildenbrand wrote: > >> Code seems to be waiting for the mem_hotplug_lock in read. > >> We hold mem_hotplug_lock in write whenever we online/offline/add/rem

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

On Wed 14-11-18 10:22:31, David Hildenbrand wrote: > >> > >> The real question is, however, why offlining of the last block doesn't > >> succeed. In __offline_pages() we basically have an endless loop (while > >> holding the mem_hotplug_lock in write). Now I consider this piece of > >> code very pr

Re: Memory hotplug softlock issue

2018-11-14 Thread David Hildenbrand

>>> Failing on ENOMEM is a questionable thing. I haven't seen that happening >>> wildly but if it is a case then I wouldn't be opposed. >>> You mentioned memory pressure, if our host is under memory pressure we can easily trigger running into an endless loop there, because we basical

Re: Memory hotplug softlock issue

2018-11-14 Thread David Hildenbrand

On 14.11.18 10:00, Baoquan He wrote: > Hi David, > > On 11/14/18 at 09:18am, David Hildenbrand wrote: >> Code seems to be waiting for the mem_hotplug_lock in read. >> We hold mem_hotplug_lock in write whenever we online/offline/add/remove >> memory. There are two ways to trigger offlining of memor

Re: Memory hotplug softlock issue

2018-11-14 Thread David Hildenbrand

>> >> The real question is, however, why offlining of the last block doesn't >> succeed. In __offline_pages() we basically have an endless loop (while >> holding the mem_hotplug_lock in write). Now I consider this piece of >> code very problematic (we should automatically fail after X >> attempts/a

Re: Memory hotplug softlock issue

2018-11-14 Thread Michal Hocko

On Wed 14-11-18 09:18:09, David Hildenbrand wrote: > On 14.11.18 08:09, Baoquan He wrote: > > Hi, > > > > Tested memory hotplug on a bare metal system, hot removing always > > trigger a lock. Usually need hot plug/unplug several times, then the hot > > removing will hang there at the last block. S

Re: Memory hotplug softlock issue

2018-11-14 Thread Baoquan He

Hi David, On 11/14/18 at 09:18am, David Hildenbrand wrote: > Code seems to be waiting for the mem_hotplug_lock in read. > We hold mem_hotplug_lock in write whenever we online/offline/add/remove > memory. There are two ways to trigger offlining of memory: > > 1. Offlining via "cat offline > /sys/d

Re: Memory hotplug softlock issue

2018-11-14 Thread David Hildenbrand

On 14.11.18 08:09, Baoquan He wrote: > Hi, > > Tested memory hotplug on a bare metal system, hot removing always > trigger a lock. Usually need hot plug/unplug several times, then the hot > removing will hang there at the last block. Surely with memory pressure > added by executing "stress -m 200"

52 matches

Mail list logo