On Fri 08-02-13 17:29:18, Michal Hocko wrote:
[...]
> OK, I have checked the allocator slow path and you are right even
> GFP_KERNEL will not fail. This can lead to similar deadlocks - e.g.
> OOM killed task blocked on down_write(mmap_sem) while the page fault
> handler holding mmap_sem for reading
On Thu 07-02-13 20:27:00, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 10:09:57, Greg Thelen wrote:
> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >>
> >> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> >> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >> >
On Fri 08-02-13 10:40:13, KAMEZAWA Hiroyuki wrote:
> (2013/02/07 20:01), Kamezawa Hiroyuki wrote:
[...]
> >Hmm. do we need to increase the "limit" virtually at memcg oom until
> >the oom-killed process dies ?
>
> Here is my naive idea...
and the next step would be
http://en.wikipedia.org/wiki/Cre
On Tue, Feb 05 2013, Michal Hocko wrote:
> On Tue 05-02-13 10:09:57, Greg Thelen wrote:
>> On Tue, Feb 05 2013, Michal Hocko wrote:
>>
>> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
>> >> On Tue, Feb 05 2013, Michal Hocko wrote:
>> >>
>> >> > On Tue 05-02-13 15:49:47, azurIt wrote:
>> >> > [.
(2013/02/07 21:31), Michal Hocko wrote:
On Thu 07-02-13 20:01:45, KAMEZAWA Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patc
(2013/02/07 20:01), Kamezawa Hiroyuki wrote:
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed
On Thu 07-02-13 20:01:45, KAMEZAWA Hiroyuki wrote:
> (2013/02/06 23:01), Michal Hocko wrote:
> >On Wed 06-02-13 02:17:21, azurIt wrote:
> >>>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> >>>mentioned in a follow up email. Here is the full patch:
> >>
> >>
> >>Here is the l
(2013/02/06 23:01), Michal Hocko wrote:
On Wed 06-02-13 02:17:21, azurIt wrote:
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
http://w
On Wed 06-02-13 15:01:19, Michal Hocko wrote:
> On Wed 06-02-13 02:17:21, azurIt wrote:
> > >5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> > >mentioned in a follow up email. Here is the full patch:
> >
> >
> > Here is the log where OOM, again, killed MySQL server [search
On Wed 06-02-13 02:17:21, azurIt wrote:
> >5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> >mentioned in a follow up email. Here is the full patch:
>
>
> Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
> http://www.watchdog.sk/lkml/oom_mysqld
>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
>mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
http://www.watchdog.sk/lkml/oom_mysqld6
azur
--
To unsubscribe from this list: send the l
On Tue 05-02-13 10:09:57, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> >> On Tue, Feb 05 2013, Michal Hocko wrote:
> >>
> >> > On Tue 05-02-13 15:49:47, azurIt wrote:
> >> > [...]
> >> >> Just to be sure - am i supposed to appl
On Tue, Feb 05 2013, Michal Hocko wrote:
> On Tue 05-02-13 08:48:23, Greg Thelen wrote:
>> On Tue, Feb 05 2013, Michal Hocko wrote:
>>
>> > On Tue 05-02-13 15:49:47, azurIt wrote:
>> > [...]
>> >> Just to be sure - am i supposed to apply this two patches?
>> >> http://watchdog.sk/lkml/patches/
>>
On Tue 05-02-13 08:48:23, Greg Thelen wrote:
> On Tue, Feb 05 2013, Michal Hocko wrote:
>
> > On Tue 05-02-13 15:49:47, azurIt wrote:
> > [...]
> >> Just to be sure - am i supposed to apply this two patches?
> >> http://watchdog.sk/lkml/patches/
> >
> > 5-memcg-fix-1.patch is not complete. It does
On Tue, Feb 05 2013, Michal Hocko wrote:
> On Tue 05-02-13 15:49:47, azurIt wrote:
> [...]
>> Just to be sure - am i supposed to apply this two patches?
>> http://watchdog.sk/lkml/patches/
>
> 5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
> mentioned in a follow up email. H
>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
>mentioned in a follow up email.
ou, it wasn't complete? i used it in my last test.. sorry, i'm litte confused
by all those patches. will try it this night and report back.
--
To unsubscribe from this list: send the line "unsu
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
> I have another old problem which is maybe also related to this. I
> wasn't connecting it with this before but now i'm not sure. Two of our
> servers, which are affected by this cgroup problem, are also randomly
> freezing completely (few times per mon
On Tue 05-02-13 15:49:47, azurIt wrote:
[...]
> Just to be sure - am i supposed to apply this two patches?
> http://watchdog.sk/lkml/patches/
5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
mentioned in a follow up email. Here is the full patch:
---
>From f2bf8437d5b9bb38a95a
>Sorry, to get back to this that late but I was busy as hell since the
>beginning of the year.
Thank you for your time!
>Has the issue repeated since then?
Yes, it's happening all the time but meanwhile i wrote a script which is
monitoring the problem and killing freezed processes when it oc
On Fri 25-01-13 17:31:30, Michal Hocko wrote:
> On Fri 25-01-13 16:07:23, azurIt wrote:
> > Any news? Thnx!
>
> Sorry, but I didn't get to this one yet.
Sorry, to get back to this that late but I was busy as hell since the
beginning of the year.
Has the issue repeated since then?
You said you d
2.2012 12:08
> > Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
> > add_to_page_cache_locked
> >
> > CC: linux-kernel@vger.kernel.org, linux...@kvack.org, "cgroups mailinglist"
> > , "KAMEZAWA Hiroyuki"
> > , "Johannes
Any news? Thnx!
azur
__
> Od: "Michal Hocko"
> Komu: azurIt
> Dátum: 30.12.2012 12:08
> Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
> add_to_page_cache_locked
>
> CC: linu
On Sun 30-12-12 02:09:47, azurIt wrote:
> >which suggests that the patch is incomplete and that I am blind :/
> >mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
> >and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
> >follow-up patch on top of the one yo
>which suggests that the patch is incomplete and that I am blind :/
>mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
>and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
>follow-up patch on top of the one you already have (which should catch
>all the rema
On Mon 24-12-12 14:38:50, azurIt wrote:
> >OK, good to hear and fingers crossed. I will try to get back to the
> >original problem and a better solution sometimes early next year when
> >all the things settle a bit.
>
>
> Btw, i noticed one more thing when problem is happening (=when any
> cgroup
On Mon 24-12-12 14:25:26, azurIt wrote:
> >OK, good to hear and fingers crossed. I will try to get back to the
> >original problem and a better solution sometimes early next year when
> >all the things settle a bit.
>
>
> Michal, problem, unfortunately, happened again :( twice. When it
> happened
>OK, good to hear and fingers crossed. I will try to get back to the
>original problem and a better solution sometimes early next year when
>all the things settle a bit.
Btw, i noticed one more thing when problem is happening (=when any cgroup is
stucked), i fogot to mention it before, sorry :(
>OK, good to hear and fingers crossed. I will try to get back to the
>original problem and a better solution sometimes early next year when
>all the things settle a bit.
Michal, problem, unfortunately, happened again :( twice. When it happened first
time (two days ago) i don't want to believe it
On Tue 18-12-12 15:22:23, azurIt wrote:
> >It should mitigate the problem. The real fix shouldn't be that specific
> >(as per discussion in other thread). The chance this will get upstream
> >is not big and that means that it will not get to the stable tree
> >either.
>
>
> OOM is no longer killi
>It should mitigate the problem. The real fix shouldn't be that specific
>(as per discussion in other thread). The chance this will get upstream
>is not big and that means that it will not get to the stable tree
>either.
OOM is no longer killing processes outside target cgroups, so everything loo
On Mon 17-12-12 19:23:01, azurIt wrote:
> >[Ohh, I am really an idiot. I screwed the first patch]
> >- bool oom = true;
> >+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
> >
> >Which obviously doesn't work. It should read !(gfp_mask &GFP_MEMCG_NO_OOM).
> > No idea how I could have missed
>[Ohh, I am really an idiot. I screwed the first patch]
>- bool oom = true;
>+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
>
>Which obviously doesn't work. It should read !(gfp_mask &GFP_MEMCG_NO_OOM).
> No idea how I could have missed that. I am really sorry about that.
:D no problem
On Mon 17-12-12 02:34:30, azurIt wrote:
> >I would try to limit changes to minimum. So the original kernel you were
> >using + the first patch to prevent OOM from the write path + 2 debugging
> >patches.
>
>
> It didn't take off the whole system this time (but i was
> prepared to record a video o
>I would try to limit changes to minimum. So the original kernel you were
>using + the first patch to prevent OOM from the write path + 2 debugging
>patches.
It didn't take off the whole system this time (but i was prepared to record a
video of console ;) ), here it is:
http://www.watchdog.sk/lk
>I would try to limit changes to minimum. So the original kernel you were
>using + the first patch to prevent OOM from the write path + 2 debugging
>patches.
ok.
>But was it at least related to the debugging from the patch or it was
>rather a totally unrelated thing?
I wasn't reading it much
On Mon 10-12-12 11:18:17, azurIt wrote:
> >Hmm, this is _really_ surprising. The latest patch didn't add any new
> >logging actually. It just enahanced messages which were already printed
> >out previously + changed few functions to be not inlined so they show up
> >in the traces. So the only expla
>Hmm, this is _really_ surprising. The latest patch didn't add any new
>logging actually. It just enahanced messages which were already printed
>out previously + changed few functions to be not inlined so they show up
>in the traces. So the only explanation is that the workload has changed
>or the
On Mon 10-12-12 02:20:38, azurIt wrote:
[...]
> Michal,
Hi,
> this was printing so many debug messages to console that the whole
> server hangs
Hmm, this is _really_ surprising. The latest patch didn't add any new
logging actually. It just enahanced messages which were already printed
out previ
>There are no other callers AFAICS so I am getting clueless. Maybe more
>debugging will tell us something (the inlining has been reduced for thp
>paths which can reduce performance in thp page fault heavy workloads but
>this will give us better traces - I hope).
Michal,
this was printing so many
On Thu 06-12-12 11:12:49, azurIt wrote:
> >Dohh. The very same stack mem_cgroup_newpage_charge called from the page
> >fault. The heavy inlining is not particularly helping here... So there
> >must be some other THP charge leaking out.
> >[/me is diving into the code again]
> >
> >* do_huge_pmd_ano
>Dohh. The very same stack mem_cgroup_newpage_charge called from the page
>fault. The heavy inlining is not particularly helping here... So there
>must be some other THP charge leaking out.
>[/me is diving into the code again]
>
>* do_huge_pmd_anonymous_page falls back to handle_pte_fault
>* do_hug
On Thu 06-12-12 01:29:24, azurIt wrote:
> >OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
> >This can only happen if this was an atomic allocation request
> >(!__GFP_WAIT) or if oom is not allowed which is the case only for
> >transparent huge page allocation.
> >The first ca
>OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
>This can only happen if this was an atomic allocation request
>(!__GFP_WAIT) or if oom is not allowed which is the case only for
>transparent huge page allocation.
>The first case can be excluded (in the clean 3.2 stable kernel
On Wed 05-12-12 02:36:44, azurIt wrote:
> >The following should print the traces when we hand over ENOMEM to the
> >caller. It should catch all charge paths (migration is not covered but
> >that one is not important here). If we don't see any traces from here
> >and there is still global OOM striki
>The following should print the traces when we hand over ENOMEM to the
>caller. It should catch all charge paths (migration is not covered but
>that one is not important here). If we don't see any traces from here
>and there is still global OOM striking then there must be something else
>to trigger
On Fri 30-11-12 17:19:23, Michal Hocko wrote:
[...]
> The important question is why you see VM_FAULT_OOM and whether memcg
> charging failure can trigger that. I don not see how this could happen
> right now because __GFP_NORETRY is not used for user pages (except for
> THP which disable memcg OOM
>The only strange thing I noticed is that some groups have 0 limit. Is
>this intentional?
>grep memory.limit_in_bytes cgroups | grep -v uid | sed 's@.*/@@' | sort | uniq
>-c
> 3 memory.limit_in_bytes:0
These are users who are not allowed to run anything.
azur
--
To unsubscribe from this l
On Fri 30-11-12 17:26:51, azurIt wrote:
> >Could you also post your complete containers configuration, maybe there
> >is something strange in there (basically grep . -r YOUR_CGROUP_MNT
> >except for tasks files which are of no use right now).
>
>
> Here it is:
> http://www.watchdog.sk/lkml/cgroup
>Could you also post your complete containers configuration, maybe there
>is something strange in there (basically grep . -r YOUR_CGROUP_MNT
>except for tasks files which are of no use right now).
Here it is:
http://www.watchdog.sk/lkml/cgroups.gz
--
To unsubscribe from this list: send the line "
On Fri 30-11-12 16:59:37, azurIt wrote:
> >> Here is the full boot log:
> >> www.watchdog.sk/lkml/kern.log
> >
> >The log is not complete. Could you paste the comple dmesg output? Or
> >even better, do you have logs from the previous run?
>
>
> What is missing there? All kernel messages are loggi
>> Here is the full boot log:
>> www.watchdog.sk/lkml/kern.log
>
>The log is not complete. Could you paste the comple dmesg output? Or
>even better, do you have logs from the previous run?
What is missing there? All kernel messages are logging into /var/log/kern.log
(it's the same as dmesg), dme
On Fri 30-11-12 16:08:11, azurIt wrote:
> >DMA32 zone is usually fills up first 4G unless your HW remaps the rest
> >of the memory above 4G or you have a numa machine and the rest of the
> >memory is at other node. Could you post your memory map printed during
> >the boot? (e820: BIOS-provided phys
On Fri 30-11-12 16:03:47, Michal Hocko wrote:
[...]
> Anyway, the more interesting thing is gfp_mask is GFP_NOWAIT allocation
> from the page fault? Huh this shouldn't happen - ever.
OK, it starts making sense now. The message came from
pagefault_out_of_memory which doesn't have gfp nor the requir
>DMA32 zone is usually fills up first 4G unless your HW remaps the rest
>of the memory above 4G or you have a numa machine and the rest of the
>memory is at other node. Could you post your memory map printed during
>the boot? (e820: BIOS-provided physical RAM map: and following lines)
Here is the
On Fri 30-11-12 15:44:31, Michal Hocko wrote:
> On Fri 30-11-12 14:44:27, azurIt wrote:
> > >Anyway your system is under both global and local memory pressure. You
> > >didn't see apache going down previously because it was probably the one
> > >which was stuck and could be killed.
> > >Anyway you
On Fri 30-11-12 14:44:27, azurIt wrote:
> >Anyway your system is under both global and local memory pressure. You
> >didn't see apache going down previously because it was probably the one
> >which was stuck and could be killed.
> >Anyway you need to setup your system more carefully.
>
>
> There
>Anyway your system is under both global and local memory pressure. You
>didn't see apache going down previously because it was probably the one
>which was stuck and could be killed.
>Anyway you need to setup your system more carefully.
There is, also, an evidence that system has enough of memory
>Anyway your system is under both global and local memory pressure. You
>didn't see apache going down previously because it was probably the one
>which was stuck and could be killed.
>Anyway you need to setup your system more carefully.
No, it wasn't, i'm 1000% sure (i was on SSH). Here is the me
On Fri 30-11-12 03:29:18, azurIt wrote:
> >Here we go with the patch for 3.2.34. Could you test with this one,
> >please?
>
>
> Michal, unfortunately i had to boot to another kernel because the one
> with this patch keeps killing my MySQL server :( it was, probably,
> doing it on OOM in any cgrou
>Here we go with the patch for 3.2.34. Could you test with this one,
>please?
Michal, unfortunately i had to boot to another kernel because the one with this
patch keeps killing my MySQL server :( it was, probably, doing it on OOM in any
cgroup - looks like OOM was not choosing processes only f
>Here we go with the patch for 3.2.34. Could you test with this one,
>please?
I installed kernel with this patch, will report back if problem occurs again OR
in few weeks if everything will be ok. Thank you!
azurIt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
th
>Here we go with the patch for 3.2.34. Could you test with this one,
>please?
Michal, regarding to your conversation with Johannes Weiner, should i try this
patch or not?
azur
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.ke
Here we go with the patch for 3.2.34. Could you test with this one,
please?
---
>From 0d2d915c16f93918051b7ab8039d30b5a922049c Mon Sep 17 00:00:00 2001
From: Michal Hocko
Date: Mon, 26 Nov 2012 11:47:57 +0100
Subject: [PATCH] memcg: do not trigger OOM from add_to_page_cache_locked
memcg oom kille
63 matches
Mail list logo