Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-06 Thread Tetsuo Handa
Tetsuo Handa wrote: > Michal Hocko wrote: > > > I assert that we should fix af5679fbc669f31f. > > > > If you can come up with reasonable patch which doesn't complicate the > > code and it is a clear win for both this particular workload as well as > > others then why not. > > Why can't we do "at

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Tetsuo Handa
Michal Hocko wrote: > > I assert that we should fix af5679fbc669f31f. > > If you can come up with reasonable patch which doesn't complicate the > code and it is a clear win for both this particular workload as well as > others then why not. Why can't we do "at least MMF_OOM_SKIP should be set und

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Michal Hocko
On Thu 06-09-18 10:00:00, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 05-09-18 22:53:33, Tetsuo Handa wrote: > > > On 2018/09/05 22:40, Michal Hocko wrote: > > > > Changelog said > > > > > > > > "Although this is possible in principle let's wait for it to actually > > > > happen in real

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Tetsuo Handa
Michal Hocko wrote: > On Wed 05-09-18 22:53:33, Tetsuo Handa wrote: > > On 2018/09/05 22:40, Michal Hocko wrote: > > > Changelog said > > > > > > "Although this is possible in principle let's wait for it to actually > > > happen in real life before we make the locking more complex again." > > >

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Michal Hocko
On Wed 05-09-18 22:53:33, Tetsuo Handa wrote: > On 2018/09/05 22:40, Michal Hocko wrote: > > Changelog said > > > > "Although this is possible in principle let's wait for it to actually > > happen in real life before we make the locking more complex again." > > > > So what is the real life workl

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Tetsuo Handa
On 2018/09/05 22:40, Michal Hocko wrote: > Changelog said > > "Although this is possible in principle let's wait for it to actually > happen in real life before we make the locking more complex again." > > So what is the real life workload that hits it? The log you have pasted > below doesn't te

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Michal Hocko
On Wed 05-09-18 22:20:58, Tetsuo Handa wrote: > On 2018/08/24 9:31, Tetsuo Handa wrote: > > For now, I don't think we need to add af5679fbc669f31f to the list for > > CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim > > selection (especially with CONFIG_PREEMPT=y kernels)

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-09-05 Thread Tetsuo Handa
On 2018/08/24 9:31, Tetsuo Handa wrote: > For now, I don't think we need to add af5679fbc669f31f to the list for > CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim > selection (especially with CONFIG_PREEMPT=y kernels) due to > >__alloc_pages_may_oom():

[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-27 Thread Michal Hocko
From: Michal Hocko Tetsuo Handa has reported that it is possible to bypass the short sleep for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") and lock up the system if OOM. The primary

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-23 Thread Tetsuo Handa
gt; this issue would be appreciated. > > > > > > > Commit 9bfe5ded054b ("mm, oom: remove sleep from under oom_lock") is a > > mitigation for CVE-2016-10723. > > > > "[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at > > should_reclaim_r

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-23 Thread David Rientjes
ep from under oom_lock") is a > mitigation for CVE-2016-10723. > > "[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at > should_reclaim_retry()." is independent from CVE-2016-10723. > Thanks, Tetsuo. Should commit af5679fbc669 ("mm, oom: remove oom_lock from oom_reaper") also be added to the list for CVE-2016-10723?

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-23 Thread Tetsuo Handa
CVE-2016-10723. "[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry()." is independent from CVE-2016-10723. We haven't made sure that the OOM reaper / exit_mmap() will get enough CPU resources. For example, under a cluster of concurrently allocating

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-23 Thread David Rientjes
On Wed, 22 Aug 2018, Tetsuo Handa wrote: > On 2018/08/03 15:16, Michal Hocko wrote: > > On Fri 03-08-18 07:05:54, Tetsuo Handa wrote: > >> On 2018/07/31 14:09, Michal Hocko wrote: > >>> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: > On 2018/07/31 4:10, Michal Hocko wrote: > > Since shoul

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-22 Thread Michal Hocko
On Wed 22-08-18 06:07:40, Tetsuo Handa wrote: > On 2018/08/03 15:16, Michal Hocko wrote: [...] > >> Now that Roman's cgroup aware OOM killer patchset will be dropped from > >> linux-next.git , > >> linux-next.git will get the sleeping point removed. Please send this patch > >> to linux-next.git .

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-21 Thread Tetsuo Handa
On 2018/08/03 15:16, Michal Hocko wrote: > On Fri 03-08-18 07:05:54, Tetsuo Handa wrote: >> On 2018/07/31 14:09, Michal Hocko wrote: >>> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: On 2018/07/31 4:10, Michal Hocko wrote: > Since should_reclaim_retry() should be a natural reschedule point

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-02 Thread Michal Hocko
On Fri 03-08-18 07:05:54, Tetsuo Handa wrote: > On 2018/07/31 14:09, Michal Hocko wrote: > > On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: > >> On 2018/07/31 4:10, Michal Hocko wrote: > >>> Since should_reclaim_retry() should be a natural reschedule point, > >>> let's do the short sleep for PF_WQ_W

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-08-02 Thread Tetsuo Handa
On 2018/07/31 14:09, Michal Hocko wrote: > On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: >> On 2018/07/31 4:10, Michal Hocko wrote: >>> Since should_reclaim_retry() should be a natural reschedule point, >>> let's do the short sleep for PF_WQ_WORKER threads unconditionally in >>> order to guarantee

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-31 Thread Michal Hocko
On Tue 31-07-18 20:30:08, Tetsuo Handa wrote: > On 2018/07/31 20:15, Michal Hocko wrote: > >>> I will send the patch to Andrew if the patch is ok. > >> > >> Andrew, can we send the "we used to have a sleeping point in the oom path > >> but this has > >> been removed recently" patch to linux.git ?

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-31 Thread Tetsuo Handa
On 2018/07/31 20:15, Michal Hocko wrote: >>> I will send the patch to Andrew if the patch is ok. >> >> Andrew, can we send the "we used to have a sleeping point in the oom path >> but this has >> been removed recently" patch to linux.git ? > > This can really wait for the next merge window IMHO.

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-31 Thread Michal Hocko
On Tue 31-07-18 19:47:45, Tetsuo Handa wrote: > On 2018/07/31 14:09, Michal Hocko wrote: > > On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: > >> On 2018/07/31 4:10, Michal Hocko wrote: > >>> Since should_reclaim_retry() should be a natural reschedule point, > >>> let's do the short sleep for PF_WQ_W

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-31 Thread Tetsuo Handa
On 2018/07/31 14:09, Michal Hocko wrote: > On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: >> On 2018/07/31 4:10, Michal Hocko wrote: >>> Since should_reclaim_retry() should be a natural reschedule point, >>> let's do the short sleep for PF_WQ_WORKER threads unconditionally in >>> order to guarantee

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Michal Hocko
On Tue 31-07-18 06:01:48, Tetsuo Handa wrote: > On 2018/07/31 4:10, Michal Hocko wrote: > > Since should_reclaim_retry() should be a natural reschedule point, > > let's do the short sleep for PF_WQ_WORKER threads unconditionally in > > order to guarantee that other pending work items are started. T

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Tetsuo Handa
On 2018/07/31 4:10, Michal Hocko wrote: > Since should_reclaim_retry() should be a natural reschedule point, > let's do the short sleep for PF_WQ_WORKER threads unconditionally in > order to guarantee that other pending work items are started. This will > workaround this problem and it is less frag

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Tejun Heo
Hello, Michal. On Mon, Jul 30, 2018 at 08:51:10PM +0200, Michal Hocko wrote: > > Yeah, workqueue can choke on things like that and kthread indefinitely > > busy looping doesn't do anybody any good. > > Yeah, I do agree. But this is much easier said than done ;) Sure > we have that hack that does

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Michal Hocko
better. >From 9bbea6516bb99615aff5ba5699865aa2d48333cc Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 26 Jul 2018 14:40:03 +0900 Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Tetsuo Handa has reported that it is possible to bypass t

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Michal Hocko
On Mon 30-07-18 08:44:24, Tejun Heo wrote: > Hello, > > On Tue, Jul 31, 2018 at 12:25:04AM +0900, Tetsuo Handa wrote: > > WQ_MEM_RECLAIM guarantees that "struct task_struct" is preallocated. But > > WQ_MEM_RECLAIM does not guarantee that the pending work is started as soon > > as an item was queue

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Tejun Heo
Hello, On Tue, Jul 31, 2018 at 12:25:04AM +0900, Tetsuo Handa wrote: > WQ_MEM_RECLAIM guarantees that "struct task_struct" is preallocated. But > WQ_MEM_RECLAIM does not guarantee that the pending work is started as soon > as an item was queued. Same rule applies to both WQ_MEM_RECLAIM workqueues

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Tetsuo Handa
On 2018/07/30 23:54, Tejun Heo wrote: > Hello, > > On Mon, Jul 30, 2018 at 04:46:47PM +0200, Michal Hocko wrote: >> On Mon 30-07-18 23:34:23, Tetsuo Handa wrote: >>> On 2018/07/30 18:32, Michal Hocko wrote: >> [...] This one is waiting for draining and we are in mm_percpu_wq WQ context w

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Tejun Heo
Hello, On Mon, Jul 30, 2018 at 04:46:47PM +0200, Michal Hocko wrote: > On Mon 30-07-18 23:34:23, Tetsuo Handa wrote: > > On 2018/07/30 18:32, Michal Hocko wrote: > [...] > > > This one is waiting for draining and we are in mm_percpu_wq WQ context > > > which has its rescuer so no other activity ca

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Michal Hocko
On Mon 30-07-18 23:34:23, Tetsuo Handa wrote: > On 2018/07/30 18:32, Michal Hocko wrote: [...] > > This one is waiting for draining and we are in mm_percpu_wq WQ context > > which has its rescuer so no other activity can block us for ever. So > > this certainly shouldn't deadlock. It can be dead sl

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Tetsuo Handa
b40032 Mon Sep 17 00:00:00 2001 >>>> From: Michal Hocko >>>> Date: Thu, 26 Jul 2018 14:40:03 +0900 >>>> Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at >>>> should_reclaim_retry(). >>>> >>>> Tetsuo Handa has repor

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-30 Thread Michal Hocko
; > all just back off on the oom_lock trylock? In other words what is > > preventing from the oom killer invocation? > > All __GFP_FS allocations got stuck at direct reclaim or workqueue. OK, I see. This is important information which was missing in the previous examination. [...] > >&g

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-27 Thread Tetsuo Handa
256 [ 444.239138] in-flight: 5:disk_events_workfn [ 444.241022] workqueue mm_percpu_wq: flags=0x8 [ 444.242829] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [ 444.245057] pending: vmstat_update, drain_local_pages_wq BAR(498) > > [...] > >> Since the patch shown be

Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-26 Thread Michal Hocko
om_lock trylock? In other words what is preventing from the oom killer invocation? [...] > Since the patch shown below was suggested by Michal Hocko at > https://marc.info/?l=linux-mm&m=152723708623015 , it is from Michal Hocko. > > >From cd8095242de13ace61eefca0c3d6f2a5

[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry().

2018-07-26 Thread Tetsuo Handa
14:40:03 +0900 Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Tetsuo Handa has reported that it is possible to bypass the short sleep for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5 ("mm, vmstat: allow WQ concurrency to discove