Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > Of course they have specific affinity needs, that's why they used > mempolicies. No. Good grief. If they are just looking for some set of memory banks, not to other node-specific hardware, then they might not need a specific node. Consider for example a multi-threaded, compute b

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > Of course there's actual and real world examples of this, because right > now we're not meeting the full intent of the application. Please describe one, an actual one, not a hypothetical one, of which you have personal knowledge. There are many refinements we could add, an endless

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > But, with Choice C, my intent is still preserved in the mempolicy even > though it's not effected because my access rights to the node has changed. Choice B, as I'm coding it, has this property as well. -- I won't rest till it's the best ...

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > That's what Choice C is intended to replace Yes, one remaps nodes it can't provide, and the other removes nodes it can't provide. Yup - that's a logical difference. So ... I would think that the only solution that would be satisfactory to apps that require specific hardware nodes

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread David Rientjes
On Tue, 30 Oct 2007, Paul Jackson wrote: > > Those applications that currently rely on the remapping are going to be > > broken anyway because they are unknowingly receiving different nodes than > > they intended, this is the objection to remapping that Lee agreed with. > > No, they may or may

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > but the remap certainly doesn't help respect the intent of the > application and the mempolicies they have set up when influenced > by an outside entity such as cpusets. ... guess that depends on the intent, doesn't it? -- I won't rest till it's the best ...

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > If your argument is that most applications are written to implement > mempolicies without necessarily thinking too much about its cpuset > placement or interactions with cpusets, then the requirement of remapping > nodes when a cpuset changes for effected mempolicies isn't actuall

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread David Rientjes
On Tue, 30 Oct 2007, Paul Jackson wrote: > We've already got two Choices, one released and one in the oven. Is > there an actual, real world situation, motivating this third Choice? > Let's put Choice C into the lower oven, then. Of course there's actual and real world examples of this, becaus

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
David wrote: > The nodemask passed to set_mempolicy() will always have exactly one > meaning: the system nodes that the policy is intended for. Ok - that makes the meaning of Choice C clearer to me. Thank-you. We've already got two Choices, one released and one in the oven. Is there an actual,

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread David Rientjes
On Mon, 29 Oct 2007, Paul Jackson wrote: > But in any case, we (the kernel) are just providing the mechanisms. > If they don't fit ones needs, don't use them ;). > The kernel is providing the mechanism to interleave over a set of nodes or prefer a single node for allocations, but it also provid

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread David Rientjes
On Mon, 29 Oct 2007, Paul Jackson wrote: > Blind siding users with a unilateral change like this will leave > orphaned bits gasping in agony on the computer room floor. It can > sometimes takes months of elapsed time and hundreds of hours of various > peoples time across a dozen departments in th

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread David Rientjes
On Mon, 29 Oct 2007, Lee Schermerhorn wrote: > And even when the intent is to preserve the cpuset relative positions of > the nodes in the nodemask, this really only makes sense if the original > and modified cpusets have the same physical topology w/rt multi-level > NUMA interconnects. This is s

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread David Rientjes
On Mon, 29 Oct 2007, Paul Jackson wrote: > > Policies such as MPOL_INTERLEAVE always get AND'd with > > pol->cpuset_mems_allowed. > > Not AND'd - Folded, as in bitmap_remap(). > > > If that yields numa_no_nodes, MPOL_DEFAULT is used instead. > > Not an issue with Folding. > > >

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Andi Kleen
On Tuesday 30 October 2007 20:47:51 Paul Jackson wrote: > Andi, Christoph, or whomever: > > Are there any good regression tests of mempolicy functionality? numactl has some basic tests (make test). I think newer LTP also has some but i haven't looked at them. And there is Lee's memtoy which do

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
Lee wrote: > Paul: Andi has a regression test in the numactl source package. Good - thanks. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list:

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Lee Schermerhorn
On Tue, 2007-10-30 at 12:47 -0700, Paul Jackson wrote: > Andi, Christoph, or whomever: > > Are there any good regression tests of mempolicy functionality? Paul: Andi has a regression test in the numactl source package. Try: http://freshmeat.net/redir/numactl/62210/url_tgz/numactl-1.0.

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-30 Thread Paul Jackson
Andi, Christoph, or whomever: Are there any good regression tests of mempolicy functionality? This patch I'm coding is delicate enough that I probably broke something. It would be nice to catch it sooner rather than later. -- I won't rest till it's the best ...

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
> So the user space asks for 8 nodes because it knows the machine > has that many from /sys and it only gets 4 if a cpuset says so? That's > just bad semantics. And is not likely to make the user programs happy. That's no different than what can happen today -- if a task actually is in an 8 node c

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Andi Kleen
On Monday 29 October 2007 20:35:58 Paul Jackson wrote: > Lee wrote: > > 2. As this thread progresses, you've discussed relaxing the requirement > > that applications pass a valid subset of mems_allowed. I.e., something > > that was illegal becomes legal. An API change, I think. But, a > > backw

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Christoph Lameter
On Mon, 29 Oct 2007, Paul Jackson wrote: > The more I have stared at this, the more certain I've become that we > need to make the mbind/mempolicy calls modal -- the default mode > continues to interpret node numbers and masks just as these calls do > now, and the alternative mode provides the so

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Christoph Lameter
On Mon, 29 Oct 2007, Lee Schermerhorn wrote: > Note: I don't [didn't] think I need to ref count the nodemasks > associated with the mempolicies because they are allocated when the > mempolicy is and destroyed when the policy is--not shared. Just like > the custom zonelist for bind policy, and we

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > In libnuma in numactl-1.0.2 that I recently grabbed off Andi's site, > numa_available() indeed issues this call. But, I don't see any internal > calls to numa_available() [comments says all other calls undefined when > numa_available() returns an error] nor any other calls to > get_me

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > > Indeed, if there was much such usage, I suspect they'd > > be complaining that the current kernel API was borked, and > > they'd be filing a request for enhancement -asking- for just > > this subtle change in the kernel API's here. In other words, > > this subtle

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > Again, we stumble upon the notion of "intent". If the intent is just to > spread allocations to share bandwidth, it probably doesn't matter. If, > on the other hand, the original mask was carefully constructed, taking > into consideration the distances between the memories specified

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > If most apps use libnuma APIs instead of directly calling the sys calls, > libnuma could query something as simple as an environment variable, or a > new flag to get_mempolicy(), or the value of a file in it's current > cpuset--but I'd like to avoid a dependency on libcpuset--to determ

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > 2. As this thread progresses, you've discussed relaxing the requirement > that applications pass a valid subset of mems_allowed. I.e., something > that was illegal becomes legal. An API change, I think. But, a > backward compatible one, so that's OK, right? :-) The more I have sta

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Lee Schermerhorn
On Mon, 2007-10-29 at 11:41 -0700, Paul Jackson wrote: > Lee wrote: > > Maybe it's just me, but I think it's pretty presumptuous to think we can > > infer the intent of the application from the nodemask w/o additional > > flags such as Christoph proposed [cpuset relative]--especially for > > subset

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > Maybe it's just me, but I think it's pretty presumptuous to think we can > infer the intent of the application from the nodemask w/o additional > flags such as Christoph proposed [cpuset relative]--especially for > subsets of the cpuset. E.g., the application could intend the nodemask

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Lee Schermerhorn
On Mon, 2007-10-29 at 10:33 -0700, Paul Jackson wrote: > Lee wrote: > > I only brought it up again because now you all are considering another > > nodemask per policy. > > The patch David and I are discussing will replace the > cpuset_mems_allowed nodemask in struct mempolicy, not > add a new node

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Andi Kleen
> > Another thing occurs to me: perhaps numactl would need an additional > 'nodes' specifier such as 'allowed'. Alternatively, 'all' could be > redefined to me 'all allowed'. This is independent of how you specify > 'all allowed' to the system call. cpuset support in libnuma/numactl is still

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Paul Jackson
Lee wrote: > I only brought it up again because now you all are considering another > nodemask per policy. The patch David and I are discussing will replace the cpuset_mems_allowed nodemask in struct mempolicy, not add a new nodemask. In other words, the meaning and name of that existing nodemask

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Lee Schermerhorn
On Sat, 2007-10-27 at 16:19 -0700, Paul Jackson wrote: > David wrote: > > I think there's a mixup in the flag name [MPOL_MF_RELATIVE] there > > Most likely. The discussion involving that flag name was kinda mixed up ;). > > > but I actually would recommend against any flag to effect Choice A. >

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Lee Schermerhorn
On Sat, 2007-10-27 at 12:16 -0700, David Rientjes wrote: > On Fri, 26 Oct 2007, David Rientjes wrote: > > > Hacking and requiring an updated version of libnuma to allow empty > > nodemasks to be passed is a poor solution; if mempolicy's are supposed to > > be independent from cpusets, then what

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 14:39 -0700, David Rientjes wrote: > On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > > So, you pass the subset, you don't set the flag to indicate you want > > interleaving over all available. You must be thinking of some other use > > for saving the subset mask that I'm no

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-29 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 14:37 -0700, Christoph Lameter wrote: > On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > > > > Now, if we could replace the 'cpuset_mems_allowed' nodemask with a > > > > pointer to something stable, it might be a win. > > > > > > The memory policies are already shared and ha

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread Paul Jackson
> Let's add a Choice C: > > Any nodemask that is passed to set_mempolicy() is saved as > the intent of the application in struct mempolicy. Yes > All policies are effected on a contextualized per-allocation > basis. "contextualized" - I guess that means converted to cpuset relat

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread Paul Jackson
David wrote: > The problem that I see with immediately offering both choices is that we > don't know if anybody is actually reverting back to Choice A behavior > because libnuma, by default, would use it. That's going to making it very > painful to remove later. Yes, that's a problem. I would

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread David Rientjes
On Sun, 28 Oct 2007, Paul Jackson wrote: > > If we can't identify any applications that would be broken by this, what's > > the difference in simply implementing Choice B and then, if we hear > > complaints, add your hack to revert back to Choice A behavior based on the > > get_mempolicy() call

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread Paul Jackson
David wrote: > If we can't identify any applications that would be broken by this, what's > the difference in simply implementing Choice B and then, if we hear > complaints, add your hack to revert back to Choice A behavior based on the > get_mempolicy() call you specified is always part of libn

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread David Rientjes
On Sun, 28 Oct 2007, Paul Jackson wrote: > And, unless someone in the know tells us otherwise, I have to assume > that this could break them. Now, the odds are that they simply don't > run that solution stack on any system making active use of cpusets, > so the odds are this would be no problem f

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread Paul Jackson
> Nobody can show an example of an application that would be broken because > of this and, given the scenario and sequence of events that it requires to > be broken when implementing the default as Choice B, I don't think it's as > much of an issue as you believe. Well, neither you nor I have s

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread David Rientjes
On Sun, 28 Oct 2007, Paul Jackson wrote: > The Linux documentation is not a legal contract. Anytime we change the > actual behaviour of the code, we have to ask ourselves what will be the > impact of that change on existing users and usages. The burden is on > us to minimize breaking things (by

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread Paul Jackson
David wrote: > From a standpoint of the MPOL_PREFERRED memory policy itself, there > is no documented behavior or standard that specifies its interaction > with cpusets. Thus, it's "undefined." We are completely free > to implement an undefined behavior as we choose and change it as > Linux matur

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-28 Thread David Rientjes
On Sat, 27 Oct 2007, Paul Jackson wrote: > > but I actually would recommend against any flag to effect Choice A. > > It's simply going to be too complex to describe and is going to be a > > headache to code and support. > > While I am sorely tempted to agree entirely with this, I suspect that >

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
David wrote: > I think there's a mixup in the flag name [MPOL_MF_RELATIVE] there Most likely. The discussion involving that flag name was kinda mixed up ;). > but I actually would recommend against any flag to effect Choice A. > It's simply going to be too complex to describe and is going to be

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
David wrote: > I prefer Choice B because it does not force mempolicies to have any > dependence on cpusets with regard to what nodemask is passed. Yes, well said. > It would be very good to store the passed nodemask to set_mempolicy in > struct mempolicy, Yes - that's what I'm intending to do

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
> > You have chosen (1) above, which keeps Choice A as the default. > > There can be different defaults for the user space API via libnuma that > are indepdent from the kernel API which needs to remain stable. The kernel > API can be extended but not changed. Yes - the user level code can have

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread David Rientjes
On Fri, 26 Oct 2007, David Rientjes wrote: > Hacking and requiring an updated version of libnuma to allow empty > nodemasks to be passed is a poor solution; if mempolicy's are supposed to > be independent from cpusets, then what semantics does an empty nodemask > actually imply when using MPOL_

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread David Rientjes
On Fri, 26 Oct 2007, Paul Jackson wrote: > > Yes. We should default to Choice B. Add an option MPOL_MF_RELATIVE to > > enable that functionality? A new version of numactl can then enable > > that by default for newer applications. > > I'm confused. If B is the default, then we don't need a flag

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Christoph Lameter
On Sat, 27 Oct 2007, Paul Jackson wrote: > > Tough. The API needs to remain stable. > > Good - that I understand. Your position is clear now. > > You have chosen (1) above, which keeps Choice A as the default. There can be different defaults for the user space API via libnuma that are indepd

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread David Rientjes
On Fri, 26 Oct 2007, Paul Jackson wrote: > Choice A: > as it does today, the second node in the tasks cpuset or it could > mean > > Choice B: > the fourth node in the cpuset, if available, just as > it did in the case above involving a cpuset on nodes 10 and 11. > Thanks for des

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
> > Are you saying: > > 1) The kernel continues to default to Choice A, unless > > the flag enables Choice B, or > > 2) The kernel defaults to the new Choice B, unless the > > flag reverts to the old Choice A? > > If 2) is keeping the API semantics then 2. No .. (1) keeps the same API s

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, Paul Jackson wrote: > Are you saying: > 1) The kernel continues to default to Choice A, unless > the flag enables Choice B, or > 2) The kernel defaults to the new Choice B, unless the > flag reverts to the old Choice A? If 2) is keeping the API semantics then 2. >

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
I'm still confused, Christoph. Are you saying: 1) The kernel continues to default to Choice A, unless the flag enables Choice B, or 2) The kernel defaults to the new Choice B, unless the flag reverts to the old Choice A? Alternative (2) breaks libnuma and hence numactl until it is chang

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, Paul Jackson wrote: > Christoph wrote: > > Yes. We should default to Choice B. Add an option MPOL_MF_RELATIVE to > > enable that functionality? A new version of numactl can then enable > > that by default for newer applications. > > I'm confused. If B is the default, then w

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
Christoph wrote: > Yes. We should default to Choice B. Add an option MPOL_MF_RELATIVE to > enable that functionality? A new version of numactl can then enable > that by default for newer applications. I'm confused. If B is the default, then we don't need a flag to enable it, rather we need a fla

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, Paul Jackson wrote: > Choice B lets the task calculate its mempolicy mask as if it owned > the entire system, and express whatever elaborate mempolicy placement > it might need, when blessed with enough memory nodes to matter. > The system would automatically scrunch that requ

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
Issue: Are the nodes and nodemasks passed into set_mempolicy() to be presumed relative to the cpuset or not? [Careful, this question doesn't mean what you might think it means.] Let's say our system has 100 nodes, numbered 0-99, and we have a task in a cpuset that includes the twenty

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > So, you pass the subset, you don't set the flag to indicate you want > interleaving over all available. You must be thinking of some other use > for saving the subset mask that I'm not seeing here. Maybe restoring to > the exact nodes requested if t

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > > Now, if we could replace the 'cpuset_mems_allowed' nodemask with a > > > pointer to something stable, it might be a win. > > > > The memory policies are already shared and have refcounters for that > > purpose. > > I must have missed that in th

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 14:18 -0700, David Rientjes wrote: > On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > > You don't need to save the entire mask--just note that NODE_MASK_ALL was > > passed--like with my internal MPOL_CONTEXT flag. This would involve > > special casing NODE_MASK_ALL in the er

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 14:17 -0700, Christoph Lameter wrote: > On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > > For some systems [not mine], the nodemasks can get quite large. I have > > a patch, that I've tested atop Mel Gorman's "onezonelist" patches that > > replaces the nodemasks embedded i

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > You don't need to save the entire mask--just note that NODE_MASK_ALL was > passed--like with my internal MPOL_CONTEXT flag. This would involve > special casing NODE_MASK_ALL in the error checking, as currently > set_mempolicy() complains loudly if yo

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > For some systems [not mine], the nodemasks can get quite large. I have > a patch, that I've tested atop Mel Gorman's "onezonelist" patches that > replaces the nodemasks embedded in struct mempolicy with pointers to > dynamically allocated ones. How

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Christoph Lameter wrote: > We would need two fields in the policy structure > > 1. The specified nodemask (generally ignored) > What I've called pol->passed_nodemask. > 2. The effective nodemask (specified & cpuset_mems_allowed) > Which is pol->v.nodes. > If we have the

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 13:45 -0700, David Rientjes wrote: > On Fri, 26 Oct 2007, Paul Jackson wrote: > > > Without at least this sort of change to MPOL_INTERLEAVE nodemasks, > > allowing either empty nodemasks (Lee's proposal) or extending them > > outside the current cpuset (what I'm cooking up no

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, David Rientjes wrote: > You would pass NODE_MASK_ALL if your intent was to interleave over > everything you have access to, yes. Otherwise you can pass whatever you > want access to and your interleaved nodemask becomes > mpol_rebind_policy()'s newmask formal (the cpuset's

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Christoph Lameter wrote: > > Well, passing a single node to set_mempolicy() for MPOL_INTERLEAVE doesn't > > make a whole lot of sense in the first place. I prefer your solution of > > allowing set_mempolicy(MPOL_INTERLEAVE, NODE_MASK_ALL) to mean "interleave > > me over ev

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, David Rientjes wrote: > Well, passing a single node to set_mempolicy() for MPOL_INTERLEAVE doesn't > make a whole lot of sense in the first place. I prefer your solution of > allowing set_mempolicy(MPOL_INTERLEAVE, NODE_MASK_ALL) to mean "interleave > me over everything I'

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Paul Jackson wrote: > Without at least this sort of change to MPOL_INTERLEAVE nodemasks, > allowing either empty nodemasks (Lee's proposal) or extending them > outside the current cpuset (what I'm cooking up now), there is no way > for a task that is currently confined to a si

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 11:46 -0700, David Rientjes wrote: > On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > > Actually, my patch doesn't change the set_mempolicy() API at all, it > > just co-opts a currently unused/illegal value for the nodemask to > > indicate "all allowed nodes". Again, I need

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Michael Kerrisk
On 10/26/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > Michael wrote: > > PS Note my new addres for man-apges: [EMAIL PROTECTED] > > Noted. > > > Is there anything I can do to assist? > > Got any spare round tuit's ;)? I ran out quite some time ago unfortunately. Cheers, Michael - To unsubscribe

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
Michael wrote: > PS Note my new addres for man-apges: [EMAIL PROTECTED] Noted. > Is there anything I can do to assist? Got any spare round tuit's ;)? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Michael Kerrisk
On 10/26/07, Paul Jackson <[EMAIL PROTECTED]> wrote: > Lee wrote: > > Paul: what do you think about subsetting the cpuset.txt into a man page > > or 2 that can be referenced by other man pages' See Also sections? > > Oh dear --- looking back in my work queue I have with my employer, I > see I have

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
David wrote: > I personally prefer an approach where cpusets take the responsibility for > determining how policies change (they use set_mempolicy() anyway to effect > their mems boundaries) because it's cpusets that has changed the available > nodemask out from beneath the application. Agreed.

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
David wrote: > If something that was previously unaccepted is now allowed with a > newly-introduced semantic, that's an API change. Agreed, as I wrote earlier: > It should work with libnuma and be > fully upward compatible with current code (except perhaps code that > depends on getting an error

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > Actually, my patch doesn't change the set_mempolicy() API at all, it > just co-opts a currently unused/illegal value for the nodemask to > indicate "all allowed nodes". Again, I need to provide a libnuma API to > request this. Soon come, mon... >

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread David Rientjes
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > That's what my "cpuset-independent interleave" patch does. David > doesn't like the "null node mask" interface because it doesn't work with > libnuma. I plan to fix that, but I'm chasing other issues. I should > get back to the mempol work after to

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Fri, 2007-10-26 at 10:04 -0700, Paul Jackson wrote: > Lee wrote: > > Paul: what do you think about subsetting the cpuset.txt into a man page > > or 2 that can be referenced by other man pages' See Also sections? > > Oh dear --- looking back in my work queue I have with my employer, I > see I h

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Christoph Lameter
On Fri, 26 Oct 2007, Lee Schermerhorn wrote: > > With that MPOL_INTERLEAVE would be context dependent and no longer > > needs translation. Lee had similar ideas. Lee: Could we make > > MPOL_INTERLEAVE generally cpuset context dependent? > > > > That's what my "cpuset-independent interleave" pa

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Paul Jackson
Lee wrote: > Paul: what do you think about subsetting the cpuset.txt into a man page > or 2 that can be referenced by other man pages' See Also sections? Oh dear --- looking back in my work queue I have with my employer, I see I have a task that is now over a year old, still unfinished, to provid

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Thu, 2007-10-25 at 20:58 -0700, David Rientjes wrote: > On Thu, 25 Oct 2007, Paul Jackson wrote: > > > The user space man pages for set_mempolicy(2) are now even more > > behind the curve, by not mentioning that MPOL_INTERLEAVE's mask > > might mean nothing, if (1) in a cpuset marked memory_spr

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Thu, 2007-10-25 at 19:11 -0700, David Rientjes wrote: > On Thu, 25 Oct 2007, Paul Jackson wrote: > > > David - could you describe the real world situation in which you > > are finding that this new 'interleave_over_allowed' option, aka > > 'memory_spread_user', is useful? I'm not always oppose

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-26 Thread Lee Schermerhorn
On Thu, 2007-10-25 at 17:28 -0700, Christoph Lameter wrote: > On Thu, 25 Oct 2007, David Rientjes wrote: > > > The problem occurs when you add cpusets into the mix and permit the > > allowed nodes to change without knowledge to the application. Right now, > > a simple remap is done so if the ca

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Paul Jackson
David wrote: > I think that documenting the change in the man page as saying that > "the nodemask will include all allowed nodes if the mems_allowed > of a memory_spread_user cpuset is expanded" is better. Ok. I'm inclined the other way, but not certain enough of my position to push the point any

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread David Rientjes
On Thu, 25 Oct 2007, Paul Jackson wrote: > The user space man pages for set_mempolicy(2) are now even more > behind the curve, by not mentioning that MPOL_INTERLEAVE's mask > might mean nothing, if (1) in a cpuset marked memory_spread_user, > (2) after the cpuset has changed 'mems'. > Yeah. The

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Paul Jackson
David wrote: > Yes, when using cpusets for resource control. If memory pressure is being > felt for that cpuset and additional mems are added to alleviate possible > OOM conditions, it is insufficient to allow tasks within that cpuset to > continue using memory policies that prohibit them from

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread David Rientjes
On Thu, 25 Oct 2007, Paul Jackson wrote: > Are you seeing this in a real world situation? Can you describe the > situation? I don't mean just describing how it looks to this kernel > code, but what is going on in the system, what sort of job mix or > applications, what kind of users, ... In sho

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Paul Jackson
> Yes, when a task with MPOL_INTERLEAVE has its cpuset mems_allowed expanded > to include more memory. The task itself can't access all that memory with > the memory policy of its choice. That much I could have guessed (did guess, actually.) Are you seeing this in a real world situation? Can

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread David Rientjes
On Thu, 25 Oct 2007, Paul Jackson wrote: > David - could you describe the real world situation in which you > are finding that this new 'interleave_over_allowed' option, aka > 'memory_spread_user', is useful? I'm not always opposed to special > case solutions; but they do usually require special

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Paul Jackson
Christoph wrote: > With that MPOL_INTERLEAVE would be context dependent and no longer > needs translation. Lee had similar ideas. Lee: Could we make > MPOL_INTERLEAVE generally cpuset context dependent? Well ... MPOL_INTERLEAVE already is essentially cpuset relative. So long as the cpuset size

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread David Rientjes
On Thu, 25 Oct 2007, Paul Jackson wrote: > Can we call this "memory_spread_user" instead, or something else > matching "memory_spread_*" ? > Sounds better. I was hoping somebody was going to come forward with an alternative that sounded better than interleave_over_allowed. > How a

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Paul Jackson
I'm probably going to be ok with this ... after a bit. 1) First concern - my primary issue: One thing I really want to change, the name of the per-cpuset file that controls this option. You call it "interleave_over_allowed". Take a look at the existing per-cpuset file names:

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Christoph Lameter
On Thu, 25 Oct 2007, David Rientjes wrote: > The problem occurs when you add cpusets into the mix and permit the > allowed nodes to change without knowledge to the application. Right now, > a simple remap is done so if the cardinality of the set of nodes > decreases, you're interleaving over a

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread David Rientjes
On Thu, 25 Oct 2007, Christoph Lameter wrote: > More interactions between cpusets and memory policies. We have to be > careful here to keep clean semantics. > I agree. > Isnt it a bit surprising for an application that has set up a custom > MPOL_INTERLEAVE policy if the nodes suddenly change

Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread Christoph Lameter
On Thu, 25 Oct 2007, David Rientjes wrote: > Adds a new 'interleave_over_allowed' option to cpusets. > > When a task with an MPOL_INTERLEAVE memory policy is attached to a cpuset > with this option set, the interleaved nodemask becomes the cpuset's > mems_allowed. When the cpuset's mems_allowed

[patch 2/2] cpusets: add interleave_over_allowed option

2007-10-25 Thread David Rientjes
Adds a new 'interleave_over_allowed' option to cpusets. When a task with an MPOL_INTERLEAVE memory policy is attached to a cpuset with this option set, the interleaved nodemask becomes the cpuset's mems_allowed. When the cpuset's mems_allowed changes, the interleaved nodemask for all tasks with M