Hi,

I am really sorry it took so long but I was constantly preempted by
other stuff. I hope I have a good news for you, though. Johannes has
found a nice way how to overcome deadlock issues from memcg OOM which
might help you. Would you be willing to test with his patch
(http://permalink.gmane.org/gmane.linux.kernel.mm/101437). Unlike my
patch which handles just the i_mutex case his patch solved all possible
locks.

I can backport the patch for your kernel (are you still using 3.2 kernel
or you have moved to a newer one?).

On Fri 22-02-13 09:23:32, azurIt wrote:
> >Unfortunately I am not able to reproduce this behavior even if I try
> >to hammer OOM like mad so I am afraid I cannot help you much without
> >further debugging patches.
> >I do realize that experimenting in your environment is a problem but I
> >do not many options left. Please do not use strace and rather collect
> >/proc/pid/stack instead. It would be also helpful to get group/tasks
> >file to have a full list of tasks in the group
> 
> 
> 
> Hi Michal,
> 
> 
> sorry that i didn't response for a while. Today i installed kernel with your 
> two patches and i'm running it now. I'm still having problems with OOM which 
> is not able to handle low memory and is not killing processes. Here is some 
> info:
> 
> - data from cgroup 1258 while it was under OOM and no processes were killed 
> (so OOM don't stop and cgroup was freezed)
> http://watchdog.sk/lkml/memcg-bug-6.tar.gz
> 
> I noticed problem about on 8:39 and waited until 8:57 (nothing happend). Then 
> i killed process 19864 which seems to help and other processes probably ends 
> and cgroup started to work. But problem accoured again about 20 seconds 
> later, so i killed all processes at 8:58. The problem is occuring all the 
> time since then. All processes (in that cgroup) are always in state 'D' when 
> it occurs.
> 
> 
> - kernel log from boot until now
> http://watchdog.sk/lkml/kern3.gz
> 
> 
> Btw, something probably happened also at about 3:09 but i wasn't able to 
> gather any data because my 'load check script' killed all apache processes 
> (load was more than 100).
> 
> 
> 
> azur
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to