Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > I assert that we should fix af5679fbc669f31f.
> >
> > If you can come up with reasonable patch which doesn't complicate the
> > code and it is a clear win for both this particular workload as well as
> > others then why not.
>
> Why can't we do "at
Michal Hocko wrote:
> > I assert that we should fix af5679fbc669f31f.
>
> If you can come up with reasonable patch which doesn't complicate the
> code and it is a clear win for both this particular workload as well as
> others then why not.
Why can't we do "at least MMF_OOM_SKIP should be set und
On Thu 06-09-18 10:00:00, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 05-09-18 22:53:33, Tetsuo Handa wrote:
> > > On 2018/09/05 22:40, Michal Hocko wrote:
> > > > Changelog said
> > > >
> > > > "Although this is possible in principle let's wait for it to actually
> > > > happen in real
Michal Hocko wrote:
> On Wed 05-09-18 22:53:33, Tetsuo Handa wrote:
> > On 2018/09/05 22:40, Michal Hocko wrote:
> > > Changelog said
> > >
> > > "Although this is possible in principle let's wait for it to actually
> > > happen in real life before we make the locking more complex again."
> > >
On Wed 05-09-18 22:53:33, Tetsuo Handa wrote:
> On 2018/09/05 22:40, Michal Hocko wrote:
> > Changelog said
> >
> > "Although this is possible in principle let's wait for it to actually
> > happen in real life before we make the locking more complex again."
> >
> > So what is the real life workl
On 2018/09/05 22:40, Michal Hocko wrote:
> Changelog said
>
> "Although this is possible in principle let's wait for it to actually
> happen in real life before we make the locking more complex again."
>
> So what is the real life workload that hits it? The log you have pasted
> below doesn't te
On Wed 05-09-18 22:20:58, Tetsuo Handa wrote:
> On 2018/08/24 9:31, Tetsuo Handa wrote:
> > For now, I don't think we need to add af5679fbc669f31f to the list for
> > CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim
> > selection (especially with CONFIG_PREEMPT=y kernels)
On 2018/08/24 9:31, Tetsuo Handa wrote:
> For now, I don't think we need to add af5679fbc669f31f to the list for
> CVE-2016-10723, for af5679fbc669f31f might cause premature next OOM victim
> selection (especially with CONFIG_PREEMPT=y kernels) due to
>
>__alloc_pages_may_oom():
From: Michal Hocko
Tetsuo Handa has reported that it is possible to bypass the short sleep
for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5
("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make
any progress") and lock up the system if OOM.
The primary
gt; this issue would be appreciated.
> > >
> >
> > Commit 9bfe5ded054b ("mm, oom: remove sleep from under oom_lock") is a
> > mitigation for CVE-2016-10723.
> >
> > "[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
> > should_reclaim_r
ep from under oom_lock") is a
> mitigation for CVE-2016-10723.
>
> "[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
> should_reclaim_retry()." is independent from CVE-2016-10723.
>
Thanks, Tetsuo. Should commit af5679fbc669 ("mm, oom: remove oom_lock
from oom_reaper") also be added to the list for CVE-2016-10723?
CVE-2016-10723.
"[PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
should_reclaim_retry()." is independent from CVE-2016-10723.
We haven't made sure that the OOM reaper / exit_mmap() will get enough CPU
resources. For example, under a cluster of concurrently allocating
On Wed, 22 Aug 2018, Tetsuo Handa wrote:
> On 2018/08/03 15:16, Michal Hocko wrote:
> > On Fri 03-08-18 07:05:54, Tetsuo Handa wrote:
> >> On 2018/07/31 14:09, Michal Hocko wrote:
> >>> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
> On 2018/07/31 4:10, Michal Hocko wrote:
> > Since shoul
On Wed 22-08-18 06:07:40, Tetsuo Handa wrote:
> On 2018/08/03 15:16, Michal Hocko wrote:
[...]
> >> Now that Roman's cgroup aware OOM killer patchset will be dropped from
> >> linux-next.git ,
> >> linux-next.git will get the sleeping point removed. Please send this patch
> >> to linux-next.git .
On 2018/08/03 15:16, Michal Hocko wrote:
> On Fri 03-08-18 07:05:54, Tetsuo Handa wrote:
>> On 2018/07/31 14:09, Michal Hocko wrote:
>>> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
On 2018/07/31 4:10, Michal Hocko wrote:
> Since should_reclaim_retry() should be a natural reschedule point
On Fri 03-08-18 07:05:54, Tetsuo Handa wrote:
> On 2018/07/31 14:09, Michal Hocko wrote:
> > On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
> >> On 2018/07/31 4:10, Michal Hocko wrote:
> >>> Since should_reclaim_retry() should be a natural reschedule point,
> >>> let's do the short sleep for PF_WQ_W
On 2018/07/31 14:09, Michal Hocko wrote:
> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
>> On 2018/07/31 4:10, Michal Hocko wrote:
>>> Since should_reclaim_retry() should be a natural reschedule point,
>>> let's do the short sleep for PF_WQ_WORKER threads unconditionally in
>>> order to guarantee
On Tue 31-07-18 20:30:08, Tetsuo Handa wrote:
> On 2018/07/31 20:15, Michal Hocko wrote:
> >>> I will send the patch to Andrew if the patch is ok.
> >>
> >> Andrew, can we send the "we used to have a sleeping point in the oom path
> >> but this has
> >> been removed recently" patch to linux.git ?
On 2018/07/31 20:15, Michal Hocko wrote:
>>> I will send the patch to Andrew if the patch is ok.
>>
>> Andrew, can we send the "we used to have a sleeping point in the oom path
>> but this has
>> been removed recently" patch to linux.git ?
>
> This can really wait for the next merge window IMHO.
On Tue 31-07-18 19:47:45, Tetsuo Handa wrote:
> On 2018/07/31 14:09, Michal Hocko wrote:
> > On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
> >> On 2018/07/31 4:10, Michal Hocko wrote:
> >>> Since should_reclaim_retry() should be a natural reschedule point,
> >>> let's do the short sleep for PF_WQ_W
On 2018/07/31 14:09, Michal Hocko wrote:
> On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
>> On 2018/07/31 4:10, Michal Hocko wrote:
>>> Since should_reclaim_retry() should be a natural reschedule point,
>>> let's do the short sleep for PF_WQ_WORKER threads unconditionally in
>>> order to guarantee
On Tue 31-07-18 06:01:48, Tetsuo Handa wrote:
> On 2018/07/31 4:10, Michal Hocko wrote:
> > Since should_reclaim_retry() should be a natural reschedule point,
> > let's do the short sleep for PF_WQ_WORKER threads unconditionally in
> > order to guarantee that other pending work items are started. T
On 2018/07/31 4:10, Michal Hocko wrote:
> Since should_reclaim_retry() should be a natural reschedule point,
> let's do the short sleep for PF_WQ_WORKER threads unconditionally in
> order to guarantee that other pending work items are started. This will
> workaround this problem and it is less frag
Hello, Michal.
On Mon, Jul 30, 2018 at 08:51:10PM +0200, Michal Hocko wrote:
> > Yeah, workqueue can choke on things like that and kthread indefinitely
> > busy looping doesn't do anybody any good.
>
> Yeah, I do agree. But this is much easier said than done ;) Sure
> we have that hack that does
better.
>From 9bbea6516bb99615aff5ba5699865aa2d48333cc Mon Sep 17 00:00:00 2001
From: Michal Hocko
Date: Thu, 26 Jul 2018 14:40:03 +0900
Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
should_reclaim_retry().
Tetsuo Handa has reported that it is possible to bypass t
On Mon 30-07-18 08:44:24, Tejun Heo wrote:
> Hello,
>
> On Tue, Jul 31, 2018 at 12:25:04AM +0900, Tetsuo Handa wrote:
> > WQ_MEM_RECLAIM guarantees that "struct task_struct" is preallocated. But
> > WQ_MEM_RECLAIM does not guarantee that the pending work is started as soon
> > as an item was queue
Hello,
On Tue, Jul 31, 2018 at 12:25:04AM +0900, Tetsuo Handa wrote:
> WQ_MEM_RECLAIM guarantees that "struct task_struct" is preallocated. But
> WQ_MEM_RECLAIM does not guarantee that the pending work is started as soon
> as an item was queued. Same rule applies to both WQ_MEM_RECLAIM workqueues
On 2018/07/30 23:54, Tejun Heo wrote:
> Hello,
>
> On Mon, Jul 30, 2018 at 04:46:47PM +0200, Michal Hocko wrote:
>> On Mon 30-07-18 23:34:23, Tetsuo Handa wrote:
>>> On 2018/07/30 18:32, Michal Hocko wrote:
>> [...]
This one is waiting for draining and we are in mm_percpu_wq WQ context
w
Hello,
On Mon, Jul 30, 2018 at 04:46:47PM +0200, Michal Hocko wrote:
> On Mon 30-07-18 23:34:23, Tetsuo Handa wrote:
> > On 2018/07/30 18:32, Michal Hocko wrote:
> [...]
> > > This one is waiting for draining and we are in mm_percpu_wq WQ context
> > > which has its rescuer so no other activity ca
On Mon 30-07-18 23:34:23, Tetsuo Handa wrote:
> On 2018/07/30 18:32, Michal Hocko wrote:
[...]
> > This one is waiting for draining and we are in mm_percpu_wq WQ context
> > which has its rescuer so no other activity can block us for ever. So
> > this certainly shouldn't deadlock. It can be dead sl
b40032 Mon Sep 17 00:00:00 2001
>>>> From: Michal Hocko
>>>> Date: Thu, 26 Jul 2018 14:40:03 +0900
>>>> Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
>>>> should_reclaim_retry().
>>>>
>>>> Tetsuo Handa has repor
; > all just back off on the oom_lock trylock? In other words what is
> > preventing from the oom killer invocation?
>
> All __GFP_FS allocations got stuck at direct reclaim or workqueue.
OK, I see. This is important information which was missing in the
previous examination.
[...]
> >&g
256
[ 444.239138] in-flight: 5:disk_events_workfn
[ 444.241022] workqueue mm_percpu_wq: flags=0x8
[ 444.242829] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[ 444.245057] pending: vmstat_update, drain_local_pages_wq BAR(498)
>
> [...]
>
>> Since the patch shown be
om_lock trylock? In other words what is
preventing from the oom killer invocation?
[...]
> Since the patch shown below was suggested by Michal Hocko at
> https://marc.info/?l=linux-mm&m=152723708623015 , it is from Michal Hocko.
>
> >From cd8095242de13ace61eefca0c3d6f2a5
14:40:03 +0900
Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
should_reclaim_retry().
Tetsuo Handa has reported that it is possible to bypass the short sleep
for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5
("mm, vmstat: allow WQ concurrency to discove
35 matches
Mail list logo