> Od: Johannes Weiner
> Komu: "Ma, Xindong"
> Dátum: 28.11.2013 07:54
> Predmet: Re: [PATCH] Fix race between oom kill and task exit
>
> CC: "a...@linux-foundation.org" , "mho...@suse.cz"
> , "rient...@google.com" ,
> "ru...@rustcorp.com.au" , "linux...@kvack.org"
> , "linux-kernel@vger.kernel
>On Wed, Oct 09, 2013 at 08:44:50PM +0200, azurIt wrote:
>> Joahnnes,
>>
>> i'm very sorry to say it but today something strange happened.. :) i was
>> just right at the computer so i noticed it almost immediately but i don't
>> have much info. Serv
>Hi azur,
>
>On Mon, Oct 07, 2013 at 01:01:49PM +0200, azurIt wrote:
>> >On Thu, Sep 26, 2013 at 06:54:59PM +0200, azurIt wrote:
>> >> On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> >> >Here is an update. Full replacement on top o
>On Thu, Sep 26, 2013 at 06:54:59PM +0200, azurIt wrote:
>> On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> >Here is an update. Full replacement on top of 3.2 since we tried a
>> >dead end and it would be more painful to revert individual changes.
> CC: "Michal Hocko" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.kern
ernel.org
>On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> On Wed, Sep 18, 2013 at 02:04:55PM -0400, Johannes Weiner wrote:
>> > On Wed, Sep 18, 2013 at 04:03:04PM +0200, azurIt wrote:
>> > > > CC: "Johannes Weiner" , "Andrew Mor
ernel.org
>On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> On Wed, Sep 18, 2013 at 02:04:55PM -0400, Johannes Weiner wrote:
>> > On Wed, Sep 18, 2013 at 04:03:04PM +0200, azurIt wrote:
>> > > > CC: "Johannes Weiner" , "Andrew Mor
ernel.org
>On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> On Wed, Sep 18, 2013 at 02:04:55PM -0400, Johannes Weiner wrote:
>> > On Wed, Sep 18, 2013 at 04:03:04PM +0200, azurIt wrote:
>> > > > CC: "Johannes Weiner" , "Andrew Mor
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.ke
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vge
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@v
> CC: "Michal Hocko" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vge
__
> Od: Johannes Weiner
> Komu: azurIt
> Dátum: 17.09.2013 02:02
> Predmet: Re: [patch 0/7] improve memcg oom killer robustness v2
>
> CC: "Michal Hocko" , "Andrew Morton"
> , "Davi
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.ke
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.ke
> CC: "Michal Hocko" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vge
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.ke
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.ke
> CC: "Johannes Weiner" , "Andrew Morton"
> , "David Rientjes" ,
> "KAMEZAWA Hiroyuki" , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vger.k
> CC: "Andrew Morton" , "Michal Hocko"
> , "David Rientjes" , "KAMEZAWA Hiroyuki"
> , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vge
> CC: "Andrew Morton" , "Michal Hocko"
> , "David Rientjes" , "KAMEZAWA Hiroyuki"
> , "KOSAKI Motohiro"
> , linux...@kvack.org,
> cgro...@vger.kernel.org, x...@kernel.org, linux-a...@vger.kernel.org,
> linux-kernel@vge
>On Wed, Sep 11, 2013 at 08:54:48PM +0200, azurIt wrote:
>> >On Wed, Sep 11, 2013 at 02:33:05PM +0200, azurIt wrote:
>> >> >On Tue, Sep 10, 2013 at 11:32:47PM +0200, azurIt wrote:
>> >> >> >On Tue, Sep 10, 2013 at 11:08:53PM +0200, azurIt wrote:
>On Wed, Sep 11, 2013 at 02:33:05PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 11:32:47PM +0200, azurIt wrote:
>> >> >On Tue, Sep 10, 2013 at 11:08:53PM +0200, azurIt wrote:
>> >> >> >On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>
>On Tue, Sep 10, 2013 at 11:32:47PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 11:08:53PM +0200, azurIt wrote:
>> >> >On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >> >> Here is full kernel log between 6:00 and 7:59:
>> >>
>On Tue, Sep 10, 2013 at 11:08:53PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >> >On Tue, Sep 10, 2013 at 08:13:59PM +0200, azurIt wrote:
>> >> >> >On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 08:13:59PM +0200, azurIt wrote:
>> >> >On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >> >> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>
>On Tue, Sep 10, 2013 at 08:13:59PM +0200, azurIt wrote:
>> >On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >> >Hi azur,
>> >> >> >
>> >
>On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >Hi azur,
>> >> >
>> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> >> > C
>On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >Hi azur,
>> >> >
>> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> >> > C
>On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >Hi azur,
>> >> >
>> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> >> > C
>On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >Hi azur,
>> >
>> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> > CC: "Andrew Morton" , "Michal Hocko"
>> >> > , "David Rient
>Hi azur,
>
>On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> > CC: "Andrew Morton" , "Michal Hocko"
>> > , "David Rientjes" , "KAMEZAWA
>> > Hiroyuki" , "KOSAKI Motohiro"
>> > ,
>On Thu 05-09-13 14:33:43, azurIt wrote:
>[...]
>> >Just to be sure I got you right. You have killed all the processes from
>> >the group you have sent stacks for, right? If that is the case I am
>> >really curious about processes sitting in sleep_on_page_killable
>On Thu 05-09-13 13:47:02, azurIt wrote:
>> >On Thu 05-09-13 12:17:00, azurIt wrote:
>> >> >[...]
>> >> >> My script detected another freezed cgroup today, sending stacks. Is
>> >> >> there anything interesting?
>> >>
>On Thu 05-09-13 12:17:00, azurIt wrote:
>> >[...]
>> >> My script detected another freezed cgroup today, sending stacks. Is
>> >> there anything interesting?
>> >
>> >3 tasks are sleeping and waiting for somebody to take an action to
>>
>[...]
>> My script detected another freezed cgroup today, sending stacks. Is
>> there anything interesting?
>
>3 tasks are sleeping and waiting for somebody to take an action to
>resolve memcg OOM. The memcg oom killer is enabled for that group? If
>yes, which task has been selected to be killed?
>> >[...]
>> >> My script has just detected (and killed) another freezed cgroup. I
>> >> must say that i'm not 100% sure that cgroup was really freezed but it
>> >> has 99% or more memory usage for at least 30 seconds (well, or it has
>> >> 99% memory usage in both two cases the script was checking
>> >[...]
>> >> My script has just detected (and killed) another freezed cgroup. I
>> >> must say that i'm not 100% sure that cgroup was really freezed but it
>> >> has 99% or more memory usage for at least 30 seconds (well, or it has
>> >> 99% memory usage in both two cases the script was checking
>[...]
>> My script has just detected (and killed) another freezed cgroup. I
>> must say that i'm not 100% sure that cgroup was really freezed but it
>> has 99% or more memory usage for at least 30 seconds (well, or it has
>> 99% memory usage in both two cases the script was checking it). Here
>> a
>Hello azur,
>
>On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >>Hi azur,
>> >>
>> >>here is the x86-only rollup of the series for 3.2.
>> >>
>> >>Thanks!
>> >>Johannes
>> >>---
>> >
.kernel.org
>Hello azur,
>
>On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >>Hi azur,
>> >>
>> >>here is the x86-only rollup of the series for 3.2.
>> >>
>> >>Thanks!
>> >>Johannes
>> >>---
>> &g
>On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >>Hi azur,
>> >>
>> >>here is the x86-only rollup of the series for 3.2.
>> >>
>> >>Thanks!
>> >>Johannes
>> >>---
>> >
>> >
>
>>Hi azur,
>>
>>here is the x86-only rollup of the series for 3.2.
>>
>>Thanks!
>>Johannes
>>---
>
>
>Johannes,
>
>unfortunately, one problem arises: I have (again) cgroup which cannot be
>deleted :( it's a user who had very high memory usage and was reaching his
>limit very often. Do you need an
>Hi azur,
>
>here is the x86-only rollup of the series for 3.2.
>
>Thanks!
>Johannes
>---
Johannes,
unfortunately, one problem arises: I have (again) cgroup which cannot be
deleted :( it's a user who had very high memory usage and was reaching his
limit very often. Do you need any info which i
>Hi azur,
>
>here is the x86-only rollup of the series for 3.2.
>
>Thanks!
>Johannes
Hi Johannes,
i'm running kernel with this new patch for 1 day now without any problems! Will
report back in few weeks or months or in case of any problems occures. Thank
you!
azur
--
To unsubscribe from this
>azurIt, this is the combined backport for 3.2, x86 + generic bits +
>debugging. It would be fantastic if you could give this another shot
>once you get back from vacation. Thanks!
>
>Johannes
Hi Johannes,
is this still up to date? Thank you.
azur
--
To unsubscribe from this
o wrote:
>> > On Tue 16-07-13 11:35:44, Johannes Weiner wrote:
>> > > On Mon, Jul 15, 2013 at 06:00:06PM +0200, Michal Hocko wrote:
>> > > > On Mon 15-07-13 17:41:19, Michal Hocko wrote:
>> > > > > On Sun 14-07-13 01:51:12, azurIt wrote:
>> &g
> CC: "Michal Hocko" , linux-kernel@vger.kernel.org,
> linux...@kvack.org, "cgroups mailinglist" ,
> "KAMEZAWA Hiroyuki"
>On Fri, Jul 05, 2013 at 09:02:46PM +0200, azurIt wrote:
>> >I looked at your debug messages but could not find anythin
, "cgroups mailinglist" ,
>> "KAMEZAWA Hiroyuki" , righi.and...@gmail.com
>>On Wed 10-07-13 18:25:06, azurIt wrote:
>>> >> Now i realized that i forgot to remove UID from that cgroup before
>>> >> trying to remove it, so cgroup cannot be remove
> CC: "Johannes Weiner" , linux-kernel@vger.kernel.org,
> linux...@kvack.org, "cgroups mailinglist" ,
> "KAMEZAWA Hiroyuki" , righi.and...@gmail.com
>On Wed 10-07-13 18:25:06, azurIt wrote:
>> >> Now i realized that i forgot to remove UID
>> Now i realized that i forgot to remove UID from that cgroup before
>> trying to remove it, so cgroup cannot be removed anyway (we are using
>> third party cgroup called cgroup-uid from Andrea Righi, which is able
>> to associate all user's processes with target cgroup). Look here for
>> cgroup-u
>On Mon 08-07-13 01:42:24, azurIt wrote:
>> > CC: "Michal Hocko" , linux-kernel@vger.kernel.org,
>> > linux...@kvack.org, "cgroups mailinglist" ,
>> > "KAMEZAWA Hiroyuki"
>> >On Fri, Jul 05, 2013 at 09:02:46PM +0200, az
> CC: "Michal Hocko" , linux-kernel@vger.kernel.org,
> linux...@kvack.org, "cgroups mailinglist" ,
> "KAMEZAWA Hiroyuki"
>On Fri, Jul 05, 2013 at 09:02:46PM +0200, azurIt wrote:
>> >I looked at your debug messages but could not find anythin
>I looked at your debug messages but could not find anything that would
>hint at a deadlock. All tasks are stuck in the refrigerator, so I
>assume you use the freezer cgroup and enabled it somehow?
Yes, i'm really using freezer cgroup BUT i was checking if it's not doing
problems - unfortunatel
>It's not a kernel thread that does it because all kernel-context
>handle_mm_fault() are annotated properly, which means the task must be
>userspace and, since tasks is empty, have exited before synchronizing.
>
>Can you try with the following patch on top?
Michal and Johannes,
i have some obser
>I would be really interesting to see what those tasks are blocked on.
Ok, i got it! Problem occurs two times and it behaves differently each time, I
was running kernel with that latest patch.
1.) It doesn't have impact on the whole server, only on one cgroup. Here are
stacks:
http://watchdog.
Michal,
>> I'm unable to send you stacks or more info because problem is taking
>> down the whole server for some time now (don't know what exactly
>> caused it to start happening, maybe newer versions of 3.2.x).
>
>So you are not testing with the same kernel with just the old patch
>replaced by
>Here we go. I hope I didn't screw anything (Johannes might double check)
>because there were quite some changes in the area since 3.2. Nothing
>earth shattering though. Please note that I have only compile tested
>this. Also make sure you remove the previous patches you have from me.
Hi Michal,
Hello Michal,
nice to read you! :) Yes, i'm still on 3.2. Could you be so kind and try to
backport it? Thank you very much!
azur
__
> Od: "Michal Hocko"
> Komu: azurIt
> Dátum: 06.06.2013 18:04
> Pre
>I am not sure how much time I'll have for this today but just to make
>sure we are on the same page, could you point me to the two patches you
>have applied in the mean time?
Here:
http://watchdog.sk/lkml/patches2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the
>Unfortunately I am not able to reproduce this behavior even if I try
>to hammer OOM like mad so I am afraid I cannot help you much without
>further debugging patches.
>I do realize that experimenting in your environment is a problem but I
>do not many options left. Please do not use strace and rat
>Unfortunately I am not able to reproduce this behavior even if I try
>to hammer OOM like mad so I am afraid I cannot help you much without
>further debugging patches.
>I do realize that experimenting in your environment is a problem but I
>do not many options left. Please do not use strace and rat
>stuck in the ptrace code.
But this happens _after_ the cgroup was freezed and i tried to strace one of
it's processes (to see what's happening):
Feb 8 01:29:46 server01 kernel: [ 1187.540672] grsec: From 178.40.250.111:
process /usr/lib/apache2/mpm-itk/apache2(apache2:18211) attached to via
>
>I assume you have checked that the killed processes eventually die,
>right?
>
When i killed them by hand, yes, they dissappeard from process list (i saw it).
I don't know if they really died when OOM killed them.
>Well, I do not see anything supsicious during that time period
>(timestamps t
>Which means that the oom killer didn't try to kill any task more than
>once which is good because it tells us that the killed task manages to
>die before we trigger oom again. So this is definitely not a deadlock.
>You are just hitting OOM very often.
>$ grep "killed as a result of limit" kern2.lo
>kernel log would be sufficient.
Full kernel log from kernel with you newest patch:
http://watchdog.sk/lkml/kern2.log
>This limit is for top level groups, right? Those seem to children which
>have 62MB charged - is that a limit for those children?
It was the limit for parent cgroup and proce
>
>Do you have logs from that time period?
>
>I have only glanced through the stacks and most of the threads are
>waiting in the mem_cgroup_handle_oom (mostly from the page fault path
>where we do not have other options than waiting) which suggests that
>your memory limit is seriously underestimate
l, wrote this e-mail and go to my lovely
bed ;)
__
> Od: "Michal Hocko"
> Komu: azurIt
> Dátum: 06.02.2013 17:00
> Predmet: [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG_OOM is
> set
>
>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
>mentioned in a follow up email. Here is the full patch:
Here is the log where OOM, again, killed MySQL server [search for "(mysqld)"]:
http://www.watchdog.sk/lkml/oom_mysqld6
azur
--
To unsubscribe from this list: send the l
>5-memcg-fix-1.patch is not complete. It doesn't contain the folloup I
>mentioned in a follow up email.
ou, it wasn't complete? i used it in my last test.. sorry, i'm litte confused
by all those patches. will try it this night and report back.
--
To unsubscribe from this list: send the line "unsu
>Sorry, to get back to this that late but I was busy as hell since the
>beginning of the year.
Thank you for your time!
>Has the issue repeated since then?
Yes, it's happening all the time but meanwhile i wrote a script which is
monitoring the problem and killing freezed processes when it oc
Any news? Thnx!
azur
__
> Od: "Michal Hocko"
> Komu: azurIt
> Dátum: 30.12.2012 12:08
> Predmet: Re: [PATCH for 3.2.34] memcg: do not trigger OOM from
> add_to_page_cache_locked
>
> CC: linu
>which suggests that the patch is incomplete and that I am blind :/
>mem_cgroup_cache_charge calls __mem_cgroup_try_charge for the page cache
>and that one doesn't check GFP_MEMCG_NO_OOM. So you need the following
>follow-up patch on top of the one you already have (which should catch
>all the rema
>OK, good to hear and fingers crossed. I will try to get back to the
>original problem and a better solution sometimes early next year when
>all the things settle a bit.
Btw, i noticed one more thing when problem is happening (=when any cgroup is
stucked), i fogot to mention it before, sorry :(
>OK, good to hear and fingers crossed. I will try to get back to the
>original problem and a better solution sometimes early next year when
>all the things settle a bit.
Michal, problem, unfortunately, happened again :( twice. When it happened first
time (two days ago) i don't want to believe it
>It should mitigate the problem. The real fix shouldn't be that specific
>(as per discussion in other thread). The chance this will get upstream
>is not big and that means that it will not get to the stable tree
>either.
OOM is no longer killing processes outside target cgroups, so everything loo
>[Ohh, I am really an idiot. I screwed the first patch]
>- bool oom = true;
>+ bool oom = !(gfp_mask | GFP_MEMCG_NO_OOM);
>
>Which obviously doesn't work. It should read !(gfp_mask &GFP_MEMCG_NO_OOM).
> No idea how I could have missed that. I am really sorry about that.
:D no problem
>I would try to limit changes to minimum. So the original kernel you were
>using + the first patch to prevent OOM from the write path + 2 debugging
>patches.
It didn't take off the whole system this time (but i was prepared to record a
video of console ;) ), here it is:
http://www.watchdog.sk/lk
>I would try to limit changes to minimum. So the original kernel you were
>using + the first patch to prevent OOM from the write path + 2 debugging
>patches.
ok.
>But was it at least related to the debugging from the patch or it was
>rather a totally unrelated thing?
I wasn't reading it much
>Hmm, this is _really_ surprising. The latest patch didn't add any new
>logging actually. It just enahanced messages which were already printed
>out previously + changed few functions to be not inlined so they show up
>in the traces. So the only explanation is that the workload has changed
>or the
>There are no other callers AFAICS so I am getting clueless. Maybe more
>debugging will tell us something (the inlining has been reduced for thp
>paths which can reduce performance in thp page fault heavy workloads but
>this will give us better traces - I hope).
Michal,
this was printing so many
>Dohh. The very same stack mem_cgroup_newpage_charge called from the page
>fault. The heavy inlining is not particularly helping here... So there
>must be some other THP charge leaking out.
>[/me is diving into the code again]
>
>* do_huge_pmd_anonymous_page falls back to handle_pte_fault
>* do_hug
>OK, so the ENOMEM seems to be leaking from mem_cgroup_newpage_charge.
>This can only happen if this was an atomic allocation request
>(!__GFP_WAIT) or if oom is not allowed which is the case only for
>transparent huge page allocation.
>The first case can be excluded (in the clean 3.2 stable kernel
>The following should print the traces when we hand over ENOMEM to the
>caller. It should catch all charge paths (migration is not covered but
>that one is not important here). If we don't see any traces from here
>and there is still global OOM striking then there must be something else
>to trigger
>The only strange thing I noticed is that some groups have 0 limit. Is
>this intentional?
>grep memory.limit_in_bytes cgroups | grep -v uid | sed 's@.*/@@' | sort | uniq
>-c
> 3 memory.limit_in_bytes:0
These are users who are not allowed to run anything.
azur
--
To unsubscribe from this l
>Could you also post your complete containers configuration, maybe there
>is something strange in there (basically grep . -r YOUR_CGROUP_MNT
>except for tasks files which are of no use right now).
Here it is:
http://www.watchdog.sk/lkml/cgroups.gz
--
To unsubscribe from this list: send the line "
>> Here is the full boot log:
>> www.watchdog.sk/lkml/kern.log
>
>The log is not complete. Could you paste the comple dmesg output? Or
>even better, do you have logs from the previous run?
What is missing there? All kernel messages are logging into /var/log/kern.log
(it's the same as dmesg), dme
>DMA32 zone is usually fills up first 4G unless your HW remaps the rest
>of the memory above 4G or you have a numa machine and the rest of the
>memory is at other node. Could you post your memory map printed during
>the boot? (e820: BIOS-provided physical RAM map: and following lines)
Here is the
>Anyway your system is under both global and local memory pressure. You
>didn't see apache going down previously because it was probably the one
>which was stuck and could be killed.
>Anyway you need to setup your system more carefully.
There is, also, an evidence that system has enough of memory
>Anyway your system is under both global and local memory pressure. You
>didn't see apache going down previously because it was probably the one
>which was stuck and could be killed.
>Anyway you need to setup your system more carefully.
No, it wasn't, i'm 1000% sure (i was on SSH). Here is the me
s only from cgroup which is
out of memory. Here is the log from syslog:
http://www.watchdog.sk/lkml/oom_mysqld
Maybe i should mention that MySQL server has it's own cgroup (called 'mysql')
but with no limits to any resources.
azurIt
--
To unsubscribe from this list: send the li
>Here we go with the patch for 3.2.34. Could you test with this one,
>please?
I installed kernel with this patch, will report back if problem occurs again OR
in few weeks if everything will be ok. Thank you!
azurIt
--
To unsubscribe from this list: send the line "unsubscribe linux
>Here we go with the patch for 3.2.34. Could you test with this one,
>please?
Michal, regarding to your conversation with Johannes Weiner, should i try this
patch or not?
azur
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.ke
>This issue has been around for a while so frankly I don't think it's
>urgent enough to rush things.
Well, it's quite urgent at least for us :( i wasn't reported this so far cos i
wasn't sure it's a kernel thing. I will be really happy and thankfull if fix
for this can go to 3.2 in some near fu
>This is hackish but it should help you in this case. Kamezawa, what do
>you think about that? Should we generalize this and prepare something
>like mem_cgroup_cache_charge_locked which would add __GFP_NORETRY
>automatically and use the function whenever we are in a locked context?
>To be honest I
>> Thank you very much, i will install it ASAP (probably this night).
>
>Please don't. If my analysis is correct which I am almost 100% sure it
>is then it would cause excessive logging. I am sorry I cannot come up
>with something else in the mean time.
Ok then. I will, meanwhile, try to contact
>Inlined at the end of the email. Please note I have compile tested
>it. It might produce a lot of output.
Thank you very much, i will install it ASAP (probably this night).
>dmesg | grep "Out of memory"
>doesn't tell anything, right?
Only messages for other cgroups but not for the freezed on
>So there is a lot of attempts to allocate which fail, every second!
Yes, as i said, the cgroup was taking 100% of (allocated) CPU core(s). Not sure
if all processes were using CPU but _few_ of them (not only one) for sure.
--
To unsubscribe from this list: send the line "unsubscribe linux-kerne
>Could you take few snapshots over time?
Here it is, now from different server, snapshot was taken every second for 10
minutes (hope it's enough):
www.watchdog.sk/lkml/memcg-bug-2.tar.gz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord
>If you could instrument mem_cgroup_handle_oom with some printks (before
>we take the memcg_oom_lock, before we schedule and into
>mem_cgroup_out_of_memory)
If you send me patch i can do it. I'm, unfortunately, not able to code it.
>> It, luckily, happend again so i have more info.
>>
>> - t
1 - 100 of 105 matches
Mail list logo