On Wed 07-11-12 14:53:40, Andrew Morton wrote:
> On Wed, 7 Nov 2012 23:46:40 +0100
> Michal Hocko <mho...@suse.cz> wrote:
> 
> > > Realistically, is anyone likely to hurt from this?
> > 
> > The primary motivation for the fix was a real report by a customer.
> 
> Describe it please and I'll copy it to the changelog.

The original issue (a wrong tasks get killed in a small group and memcg
swappiness=0) has been reported on top of our 3.0 based kernel (with
fe35004f backported). I have tried to replicate it by the test case
mentioned https://lkml.org/lkml/2012/10/10/223.

As David correctly pointed out (https://lkml.org/lkml/2012/10/10/418)
the significant role played the fact that all the processes in the group
have CAP_SYS_ADMIN but oom_score_adj has the similar effect. 
Say there is 2G of swap space which is 524288 pages. If you add
CAP_SYS_ADMIN bonus then you have -15728 score for the bias. This means
that all tasks with less than 60M get the minimum score and it is tasks
ordering which determines who gets killed as a result.

To summarize it. Users of small groups (relatively to the swap size)
with CAP_SYS_ADMIN tasks resp. oom_score_adj are affected the most
others might see an unexpected oom_badness calculation.
Whether this is a workload which is representative, I don't know but
I think that it is worth fixing and pushing to stable as well.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to