Hugh Dickins wrote: > On Wed, 19 Dec 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >> >> We always call mem_cgroup_isolate_pages() from shrink_(in)active_pages >> under spin_lock_irq of the zone's lru lock. That's the reason that we >> don't explicitly use it in the routine. > > Indeed, thanks. > >>> 2. There's mem_cgroup_charge and mem_cgroup_cache_charge (wouldn't the >>> former be better called mem_cgroup_charge_mapped? why does the latter >> Yes, it would be. After we've refactored the code, the new name makes sense. >> >>> test MEM_CGROUP_TYPE_ALL instead of MEM_CGROUP_TYPE_CACHED? I still don't >>> understand your enums there). >> We do that to ensure that we charge page cache pages only when the >> accounting type is set to MEM_CGROUP_TYPE_ALL. If the type is anything >> else, we ignore cached pages, we did not have MEM_CGROUP_TYPE_CACHED >> initially when the patches went in. > > I think you've given yourself far too many degrees of freedom here. > > Please explain to me how the system is supposed to behave when I > echo -n 2 >/cg/0/memory.control_type > > From the name in the enum (MEM_CGROUP_TYPE_CACHED, doesn't even have > an "= 2", yet this is a part of the API?), I think it's supposed to > account pages into and out of the page cache, but not as they're > mapped into and out of userspace. But from the code, no references > whatever to MEM_CGROUP_TYPE_CACHED or MEM_CGROUP_TYPE_MAPPED, > it looks as if it'll behave just like MEM_CGROUP_TYPE_MAPPED > (which we hope = 1). > > As you say, you didn't have MEM_CGROUP_TYPE_CACHED originally: > it looks like it got added to the straightforward case of MAPPED, > in an apparently flexible but not fully thought out way. >
Yes, I think your argument makes sense. I had initially thought of making the enums a mask MEM_CGROUP_TYPE_MAPPED = 0x1, MEM_CGROUP_TYPE_CACHED = 0x2, MEM_CGROUP_TYPE_ALL = 0x3 and check for control_type & MEM_CGROUP_TYPE_CACHED >> But there's only mem_cgroup_uncharge. >>> So when, for example, an add_to_page_cache fails, the uncharge may not >>> balance the charge? >>> >> We use mem_cgroup_uncharge() everywhere. The reason being, we might >> switch control type, we uncharge pages that have a page_cgroup >> associated with them, hence once we;ve charged, uncharge does not >> distinguish between charge types. > > Ah, so this is the meaning of that > /* > * This can handle cases when a page is not charged at all and we > * are switching between handling the control_type. > */ > if (!pc) > return; > > if (atomic_dec_and_test(&pc->ref_cnt)) { > at the beginning of mem_cgroup_uncharge. > > Sorry, that doesn't fly. You've no locking between acquiring pc from > the page->page_cgroup, testing pc here, and decrementing its ref_cnt. > When that ref_cnt goes down to zero, clear_page_cgroup may NULLify > page->page_cgroup and kfree the pc. > Yes, we need to lock the page group. > So if you cannot really keep track of the ref_cnt (because of > changing control_type), that atomic_dec_and_test is in danger > of corrupting someone else's memory - isn't it? > Yes, my bad :( > I can just about imagine that some admins will want control_type > MAPPED and others CACHED and others MAPPED+CACHED. But is there > actually a need for one cgroup to be controlling MAPPED while > another on the same machine is controlling MAPPED+CACHED? And > does that make sense - there'll be weirdnesses, won't there? > And is there actually a need to change a cgroup's control_type > on the fly while it's alive? > > Of course it's nice to be flexible and allow for such possibilities; > but not if that just opens windows for corruption. My own view is > that at present you should just account mapped and cached for all, > and strip out these extra degrees of freedom: add them back in some > future when the locking is worked out. > I am going to rip out the control_type interface for now. The locking needs fixing irrespective of control_type. I am sending out a patch to rip out control_type. > Hugh Thanks for your detailed comments, -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/