Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
Hi Kim, On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote: > On Fri, 27 Apr 2018 17:09:14 +0100 > Will Deacon wrote: > > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote: > > > On Fri, 27 Apr 2018 15:37:20 +0100 > > > Will Deacon wrote: > > > > For anything under drivers/perf/, I'd prefer not to have these prints > > > > and instead see efforts to improve error reporting via the perf system > > > > call interface. > > > > > > We'd all prefer that, and for all PMU drivers, why should ones under > > > drivers/perf be treated differently? > > > > Because they're the ones I maintain... > > You represent a minority on your opinion on this matter though. I'm not sure that's true -- you tend to be the only one raising this issue; I think most people don't actually care one way or the other. If you know of others who care about it too then I suggest you work together as a group to solve the problem in a way which is amenable to the upstream maintainers. Then you won't have to worry about fixing it personally. You could even propose a topic for plumbers if there is enough interest in a general framework for propagating error messages to userspace. > > > As you are already aware, I've personally tried to fix this problem - > > > that has existed since before the introduction of the perf tool (I > > > consider it a syscall-independent enhanced error interface), multiple > > > times, and failed. > > > > Why is that my problem? Try harder? > > It's your problem because we're here reviewing a patch that happens to > fall under your maintainership. I'll be the first person to tell you > I'm obviously incompetent and haven't been able to come up with a > solution that is acceptable for everyone up to and including Linus > Torvalds. I'm just noticing a chronic usability problem that can be > easily alleviated in the context of this patch review. I don't see how incompetence has anything to do with it and, like most people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver internals. No arguments on the usability problem, but this ain't the fix you're looking for: it's a bodge. We've been over this many times before. > > > So until someone comes up with a solution that works for everyone > > > up to and including Linus Torvalds (who hasn't put up a problem > > > pulling PMU drivers emitting things to dmesg so far, by the way), this > > > keep PMU drivers' errors silent preference of yours is unnecessarily > > > impeding people trying to measure system performance on Arm based > > > machines - all other archs' maintainers are fine with PMU drivers using > > > dmesg. > > > > Good for them, although I'm pretty sure that at least the x86 folks are > > against this crap too. > > Unfortunately, it doesn't affect them nearly as much as it does our > more diverse platforms, which is why I don't think they care to do > much about it. x86 has its fair share of PMUs too, so I don't think we're special in this regard. The main difference seems to be that they have better support in the perf tool for diagnosing problems. > > > > Anyway, I think this driver has bigger problems that need addressing. > > > > > > To me it represents yet another PMU driver submission - as the years go > > > by - that is lacking in the user messaging area. Which reminds me, can > > > you take another look at applying this?: > > > > As I said before, I'm not going to take anything that logs above pr_debug > > for things that are directly triggerable from userspace. Spin a version > > Why? There are plenty of things that emit stuff into dmesg that are > directly triggerable from userspace. Is it because it upsets fuzzing > tests? How about those be run with a patched kernel that somehow > mitigates the printing? [we've been over this before] There are two camps that might find this information useful: 1. People writing userspace support for a new PMU using the perf_event_open syscall 2. People trying to use a tool which abstracts the PMU details to some degree (e.g. perf tool) Those in (1) can get by with pr_debug. Sure, they have to recompile, but it's not like there are many people in this camp and they'll probably be working with the PMU driver writer to some degree anyway and taking new kernel drops. Those in (2) may not have access to dmesg, absolutely should not be able to spam it (since this could hide other important messages), will probably struggle to match an internal message from the PMU driver to their invocation of the high-level tool and may well be in a multi-user environment so will struggle to identify the messages that they are responsible for. What they actually want is for the perf tool to give them more information, and dmesg is not the right channel for that, for similar reasons. > > using pr_debug and I'll queue it. > > How about using a ratelimited dev_err variant? Nah, I don't want dmesg being parsed by users of perf tool. pr_debug should be suffici
Re: [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range
On 04/30/2018 06:40 PM, Kees Cook wrote: > I like this series overall, thanks! No objections from me. One thing I > noted, though: > > On Fri, Apr 27, 2018 at 2:00 PM, Waiman Long wrote: >> if (param->min && *param->min > val) { >> if (clamp) { >> val = *param->min; >> + clamped = true; >> } else { >> return -EINVAL; >> } > This appears as a common bit of logic in many places in the series. It > seems like it'd make sense to make this a helper of some kind? > > -Kees > We can't have an inline helper function because the types are different. We may be able to use a helper macro, but helper macro like that may be not well accepted by the kernel community. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl
On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks wrote: > The decision to log a seccomp action will always be subject to the > value of the kernel.seccomp.actions_logged sysctl, even for processes > that are being inspected via the audit subsystem, in an upcoming patch. > Therefore, we need to emit an audit record on attempts at writing to the > actions_logged sysctl when auditing is enabled. > > This patch updates the write handler for the actions_logged sysctl to > emit an audit record on attempts to write to the sysctl. Successful > writes to the sysctl will result in a record that includes a normalized > list of logged actions in the "actions" field and a "res" field equal to > 0. Unsuccessful writes to the sysctl will result in a record that > doesn't include the "actions" field and has a "res" field equal to 1. > > Not all unsuccessful writes to the sysctl are audited. For example, an > audit record will not be emitted if an unprivileged process attempts to > open the sysctl file for reading since that access control check is not > part of the sysctl's write handler. > > Below are some example audit records when writing various strings to the > actions_logged sysctl. > > Writing "not-a-real-action" emits: > > type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0 > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" > op=seccomp-logging res=1 > > Writing "kill_process kill_thread errno trace log" emits: > > type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0 > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" > op=seccomp-logging actions="kill_process kill_thread errno trace log" > res=0 I've got some additional comments regarding the fields in the code below, but it would be good to hear Steve comment on the "actions" field since his userspace tools are extremely picky about what they will accept. It looks like you are treating the actions as an untrusted string, which is good, so I suspect you are okay, but still ... > Writing the string "log log errno trace kill_process kill_thread", which > is unordered and contains the log action twice, results in the same > value as the previous example for the actions field: > > type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0 > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" > op=seccomp-logging actions="kill_process kill_thread errno trace log" > res=0 > > No audit records are generated when reading the actions_logged sysctl. > > Suggested-by: Steve Grubb > Signed-off-by: Tyler Hicks > --- > include/linux/audit.h | 3 +++ > kernel/auditsc.c | 37 + > kernel/seccomp.c | 43 ++- > 3 files changed, 74 insertions(+), 9 deletions(-) ... > diff --git a/include/linux/audit.h b/include/linux/audit.h > index 75d5b03..b311d7d 100644 > --- a/include/linux/audit.h > +++ b/include/linux/audit.h > @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent, > const struct dentry *dentry, > const unsigned char type); > extern void __audit_seccomp(unsigned long syscall, long signr, int code); > +extern void audit_seccomp_actions_logged(const char *names, int res); > extern void __audit_ptrace(struct task_struct *t); > > static inline bool audit_dummy_context(void) > @@ -502,6 +503,8 @@ static inline void __audit_seccomp(unsigned long syscall, > long signr, int code) > { } > static inline void audit_seccomp(unsigned long syscall, long signr, int code) > { } > +static inline void audit_seccomp_actions_logged(const char *names, int res) > +{ } > static inline int auditsc_get_stamp(struct audit_context *ctx, > struct timespec64 *t, unsigned int *serial) > { > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 4e0a4ac..3496238 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -2478,6 +2478,43 @@ void __audit_seccomp(unsigned long syscall, long > signr, int code) > audit_log_end(ab); > } > > +void audit_seccomp_actions_logged(const char *names, int res) > +{ > + struct tty_struct *tty; > + const struct cred *cred; > + struct audit_buffer *ab; > + char comm[sizeof(current->comm)]; > + > + if (!audit_enabled) > + return; > + > + ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE); > + if (unlikely(!ab)) > + return; Instead of NULL, let's pass current->audit_context to audit_log_start(). Yes, most of the AUDIT_CONFIG_CHANGE users pass NULL but all of that is going to need to change because of 1) the audit container ID work and 2) it makes sense to connect records that are related. Let's do it right the first time here :) > + cred = current_cred(); > + tty = audit_get_tty(current); > + audit_log_format(ab, "pid=%d uid=%u auid=%u tty=%s ses=%u", > +task_tgid_nr(curren
Re: [PATCH 3/3] seccomp: Don't special case audited processes when logging
On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks wrote: > Seccomp logging for "handled" actions such as RET_TRAP, RET_TRACE, or > RET_ERRNO can be very noisy for processes that are being audited. This > patch modifies the seccomp logging behavior to treat processes that are > being inspected via the audit subsystem the same as processes that > aren't under inspection. Handled actions will no longer be logged just > because the process is being inspected. Since v4.14, applications have > the ability to request logging of handled actions by using the > SECCOMP_FILTER_FLAG_LOG flag when loading seccomp filters. > > With this patch, the logic for deciding if an action will be logged is: > > if action == RET_ALLOW: > do not log > else if action not in actions_logged: > do not log > else if action == RET_KILL: > log > else if action == RET_LOG: > log > else if filter-requests-logging: > log > else: > do not log > > Reported-by: Steve Grubb > Signed-off-by: Tyler Hicks > --- > Documentation/userspace-api/seccomp_filter.rst | 7 --- > include/linux/audit.h | 10 +- > kernel/auditsc.c | 2 +- > kernel/seccomp.c | 15 +-- > 4 files changed, 7 insertions(+), 27 deletions(-) > > diff --git a/Documentation/userspace-api/seccomp_filter.rst > b/Documentation/userspace-api/seccomp_filter.rst > index 099c412..82a468b 100644 > --- a/Documentation/userspace-api/seccomp_filter.rst > +++ b/Documentation/userspace-api/seccomp_filter.rst > @@ -207,13 +207,6 @@ directory. Here's a description of each file in that > directory: > to the file do not need to be in ordered form but reads from the file > will be ordered in the same way as the actions_avail sysctl. > > - It is important to note that the value of ``actions_logged`` does not > - prevent certain actions from being logged when the audit subsystem is > - configured to audit a task. If the action is not found in > - ``actions_logged`` list, the final decision on whether to audit the > - action for that task is ultimately left up to the audit subsystem to > - decide for all seccomp return values other than ``SECCOMP_RET_ALLOW``. > - > The ``allow`` string is not accepted in the ``actions_logged`` sysctl > as it is not possible to log ``SECCOMP_RET_ALLOW`` actions. Attempting > to write ``allow`` to the sysctl will result in an EINVAL being > diff --git a/include/linux/audit.h b/include/linux/audit.h > index b311d7d..1964fbd 100644 > --- a/include/linux/audit.h > +++ b/include/linux/audit.h > @@ -232,7 +232,7 @@ extern void __audit_file(const struct file *); > extern void __audit_inode_child(struct inode *parent, > const struct dentry *dentry, > const unsigned char type); > -extern void __audit_seccomp(unsigned long syscall, long signr, int code); > +extern void audit_seccomp(unsigned long syscall, long signr, int code); > extern void audit_seccomp_actions_logged(const char *names, int res); > extern void __audit_ptrace(struct task_struct *t); > > @@ -303,12 +303,6 @@ static inline void audit_inode_child(struct inode > *parent, > } > void audit_core_dumps(long signr); > > -static inline void audit_seccomp(unsigned long syscall, long signr, int code) > -{ > - if (audit_enabled && unlikely(!audit_dummy_context())) > - __audit_seccomp(syscall, signr, code); > -} > - > static inline void audit_ptrace(struct task_struct *t) > { > if (unlikely(!audit_dummy_context())) > @@ -499,8 +493,6 @@ static inline void audit_inode_child(struct inode *parent, > { } > static inline void audit_core_dumps(long signr) > { } > -static inline void __audit_seccomp(unsigned long syscall, long signr, int > code) > -{ } > static inline void audit_seccomp(unsigned long syscall, long signr, int code) > { } > static inline void audit_seccomp_actions_logged(const char *names, int res) > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 3496238..1e64b91 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -2464,7 +2464,7 @@ void audit_core_dumps(long signr) > audit_log_end(ab); > } > > -void __audit_seccomp(unsigned long syscall, long signr, int code) > +void audit_seccomp(unsigned long syscall, long signr, int code) > { > struct audit_buffer *ab; Since it is a bit unusual, it might be nice to add a comment at the top of audit_seccomp() that the event filtering is being done in the seccomp_log() function, and we may need to force auditing independent of the audit_enabled and dummy context state. > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index e28ddcc..947cc0f 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -584,18 +584,13 @@ static inline void seccomp_log(unsigned long syscall, > long signr, u32 action, > } > >
Re: [PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl
On Tuesday, May 1, 2018 11:18:55 AM EDT Paul Moore wrote: > On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks wrote: > > The decision to log a seccomp action will always be subject to the > > value of the kernel.seccomp.actions_logged sysctl, even for processes > > that are being inspected via the audit subsystem, in an upcoming patch. > > Therefore, we need to emit an audit record on attempts at writing to the > > actions_logged sysctl when auditing is enabled. > > > > This patch updates the write handler for the actions_logged sysctl to > > emit an audit record on attempts to write to the sysctl. Successful > > writes to the sysctl will result in a record that includes a normalized > > list of logged actions in the "actions" field and a "res" field equal to > > 0. Unsuccessful writes to the sysctl will result in a record that > > doesn't include the "actions" field and has a "res" field equal to 1. > > > > Not all unsuccessful writes to the sysctl are audited. For example, an > > audit record will not be emitted if an unprivileged process attempts to > > open the sysctl file for reading since that access control check is not > > part of the sysctl's write handler. > > > > Below are some example audit records when writing various strings to the > > actions_logged sysctl. > > > > Writing "not-a-real-action" emits: > > type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0 > > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" > > op=seccomp-logging res=1 > > > > Writing "kill_process kill_thread errno trace log" emits: > > type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0 > > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" > > op=seccomp-logging actions="kill_process kill_thread errno trace log" > > res=0 > > I've got some additional comments regarding the fields in the code > below, but it would be good to hear Steve comment on the "actions" > field since his userspace tools are extremely picky about what they > will accept. Its not that the audit user space applications are picky, its that we have a coding standard that everyone needs to abide by so that any parser coded to the specification works. In short, we should not have spaces inside the "" because that can trick a naive parser. What we typically do in a situation like this is add a comma as a separator. But having "" means that the value is untrusted and subject to escaping. I don't think that is the case here. Output is not controlled by the user. Its a list of well known names. > It looks like you are treating the actions as an untrusted string, which is > good, so I suspect you are okay, but still The function below that logs names is calling audit_log_format which does not handle untrusted strings. I would suggest not treating it as an untrusted string, but as a string with no spaces in it. actions=kill_process,kill_thread,errno,trace,log > > Writing the string "log log errno trace kill_process kill_thread", which > > is unordered and contains the log action twice, results in the same > > > > value as the previous example for the actions field: > > type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0 > > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" > > op=seccomp-logging actions="kill_process kill_thread errno trace log" > > res=0 > > > > No audit records are generated when reading the actions_logged sysctl. > > > > Suggested-by: Steve Grubb > > Signed-off-by: Tyler Hicks > > --- > > > > include/linux/audit.h | 3 +++ > > kernel/auditsc.c | 37 + > > kernel/seccomp.c | 43 ++- > > 3 files changed, 74 insertions(+), 9 deletions(-) > > ... > > > diff --git a/include/linux/audit.h b/include/linux/audit.h > > index 75d5b03..b311d7d 100644 > > --- a/include/linux/audit.h > > +++ b/include/linux/audit.h > > @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent, > > > > const struct dentry *dentry, > > const unsigned char type); > > > > extern void __audit_seccomp(unsigned long syscall, long signr, int > > code); > > > > +extern void audit_seccomp_actions_logged(const char *names, int res); > > > > extern void __audit_ptrace(struct task_struct *t); > > > > static inline bool audit_dummy_context(void) > > > > @@ -502,6 +503,8 @@ static inline void __audit_seccomp(unsigned long > > syscall, long signr, int code)> > > { } > > static inline void audit_seccomp(unsigned long syscall, long signr, int > > code) { } > > > > +static inline void audit_seccomp_actions_logged(const char *names, int > > res) +{ } > > > > static inline int auditsc_get_stamp(struct audit_context *ctx, > > > > struct timespec64 *t, unsigned int *serial) > > > > { > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > index 4e0a4ac..3496238 100644 > > --- a/kernel
Re: [PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl
On Tue, May 1, 2018 at 12:41 PM, Steve Grubb wrote: > On Tuesday, May 1, 2018 11:18:55 AM EDT Paul Moore wrote: >> On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks wrote: >> > The decision to log a seccomp action will always be subject to the >> > value of the kernel.seccomp.actions_logged sysctl, even for processes >> > that are being inspected via the audit subsystem, in an upcoming patch. >> > Therefore, we need to emit an audit record on attempts at writing to the >> > actions_logged sysctl when auditing is enabled. >> > >> > This patch updates the write handler for the actions_logged sysctl to >> > emit an audit record on attempts to write to the sysctl. Successful >> > writes to the sysctl will result in a record that includes a normalized >> > list of logged actions in the "actions" field and a "res" field equal to >> > 0. Unsuccessful writes to the sysctl will result in a record that >> > doesn't include the "actions" field and has a "res" field equal to 1. >> > >> > Not all unsuccessful writes to the sysctl are audited. For example, an >> > audit record will not be emitted if an unprivileged process attempts to >> > open the sysctl file for reading since that access control check is not >> > part of the sysctl's write handler. >> > >> > Below are some example audit records when writing various strings to the >> > actions_logged sysctl. >> > >> > Writing "not-a-real-action" emits: >> > type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0 >> > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" >> > op=seccomp-logging res=1 >> > >> > Writing "kill_process kill_thread errno trace log" emits: >> > type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0 >> > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" >> > op=seccomp-logging actions="kill_process kill_thread errno trace log" >> > res=0 >> >> I've got some additional comments regarding the fields in the code >> below, but it would be good to hear Steve comment on the "actions" >> field since his userspace tools are extremely picky about what they >> will accept. > > Its not that the audit user space applications are picky, its that we have a > coding standard that everyone needs to abide by so that any parser coded to > the specification works. We're getting a bit off track, but the issue with the "specification" is that there has not been enough checking and enforcement of the "specification" to give it much weight. We've made some fixes to records of lower impact, but I'm not going to merge disruptive record changes. After a while, the status-quo becomes the "specification". At some point in the future I plan to submit patches which disconnect the audit data from the record format for in-kernel callers; this is really the only way we can enforce any type of record format. The current in-kernel audit API is for too open for misuse and abuse. > In short, we should not have spaces inside the "" > because that can trick a naive parser. What we typically do in a situation > like this is add a comma as a separator. But having "" means that the value > is untrusted and subject to escaping. I don't think that is the case here. > Output is not controlled by the user. Its a list of well known names. > >> It looks like you are treating the actions as an untrusted string, which is >> good, so I suspect you are okay, but still > > The function below that logs names is calling audit_log_format which does not > handle untrusted strings. I would suggest not treating it as an untrusted > string, but as a string with no spaces in it. > > actions=kill_process,kill_thread,errno,trace,log Yes, my mistake, I suspect I was distracted by the comm logging. >> > Writing the string "log log errno trace kill_process kill_thread", which >> > is unordered and contains the log action twice, results in the same >> > >> > value as the previous example for the actions field: >> > type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0 >> > auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee" >> > op=seccomp-logging actions="kill_process kill_thread errno trace log" >> > res=0 >> > >> > No audit records are generated when reading the actions_logged sysctl. >> > >> > Suggested-by: Steve Grubb >> > Signed-off-by: Tyler Hicks >> > --- >> > >> > include/linux/audit.h | 3 +++ >> > kernel/auditsc.c | 37 + >> > kernel/seccomp.c | 43 ++- >> > 3 files changed, 74 insertions(+), 9 deletions(-) >> >> ... >> >> > diff --git a/include/linux/audit.h b/include/linux/audit.h >> > index 75d5b03..b311d7d 100644 >> > --- a/include/linux/audit.h >> > +++ b/include/linux/audit.h >> > @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent, >> > >> > const struct dentry *dentry, >> > const unsigned char type); >> > >> > extern void __audit_seccomp(unsigned long syscall, lon
Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
Hello, Waiman. Sorry about the delay. On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote: > With the addition of "cpuset.cpus.isolated", it makes sense to add the > restriction that load balancing can only be turned off if the CPUs in > the isolated cpuset are subset of "cpuset.cpus.isolated". > > Signed-off-by: Waiman Long > --- > Documentation/cgroup-v2.txt | 7 --- > kernel/cgroup/cpuset.c | 29 ++--- > 2 files changed, 30 insertions(+), 6 deletions(-) > > diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt > index 8d89dc2..c4227ee 100644 > --- a/Documentation/cgroup-v2.txt > +++ b/Documentation/cgroup-v2.txt > @@ -1554,9 +1554,10 @@ Cpuset Interface Files > and will not be moved to other CPUs. > > This flag is hierarchical and is inherited by child cpusets. It > - can be turned off only when the CPUs in this cpuset aren't > - listed in the cpuset.cpus of other sibling cgroups, and all > - the child cpusets, if present, have this flag turned off. > + can be explicitly turned off only when it is a direct child of > + the root cgroup and the CPUs in this cpuset are subset of the > + root's "cpuset.cpus.isolated". Moreover, the CPUs cannot be > + listed in the "cpuset.cpus" of other sibling cgroups. It is a little bit convoluted that the isolation requires coordination among root's isolated file and the first-level children's cpus file and the flag. Maybe I'm missing something but can't we do something like the following? * Add isolated flag file, which can only be modified on empty (in terms of cpus) first level children. * Once isolated flag is set, CPUs can only be added to the cpus file iff they aren't being used by anyone else and automatically become isolated. The first level cpus file is owned by the root cgroup anyway, so there's no danger regarding delegation or whatever and the interface would be a lot simpler. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
On 05/01/2018 03:51 PM, Tejun Heo wrote: > Hello, Waiman. > > Sorry about the delay. > > On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote: >> With the addition of "cpuset.cpus.isolated", it makes sense to add the >> restriction that load balancing can only be turned off if the CPUs in >> the isolated cpuset are subset of "cpuset.cpus.isolated". >> >> Signed-off-by: Waiman Long >> --- >> Documentation/cgroup-v2.txt | 7 --- >> kernel/cgroup/cpuset.c | 29 ++--- >> 2 files changed, 30 insertions(+), 6 deletions(-) >> >> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt >> index 8d89dc2..c4227ee 100644 >> --- a/Documentation/cgroup-v2.txt >> +++ b/Documentation/cgroup-v2.txt >> @@ -1554,9 +1554,10 @@ Cpuset Interface Files >> and will not be moved to other CPUs. >> >> This flag is hierarchical and is inherited by child cpusets. It >> -can be turned off only when the CPUs in this cpuset aren't >> -listed in the cpuset.cpus of other sibling cgroups, and all >> -the child cpusets, if present, have this flag turned off. >> +can be explicitly turned off only when it is a direct child of >> +the root cgroup and the CPUs in this cpuset are subset of the >> +root's "cpuset.cpus.isolated". Moreover, the CPUs cannot be >> +listed in the "cpuset.cpus" of other sibling cgroups. > It is a little bit convoluted that the isolation requires coordination > among root's isolated file and the first-level children's cpus file > and the flag. Maybe I'm missing something but can't we do something > like the following? > > * Add isolated flag file, which can only be modified on empty (in > terms of cpus) first level children. > > * Once isolated flag is set, CPUs can only be added to the cpus file > iff they aren't being used by anyone else and automatically become > isolated. > > The first level cpus file is owned by the root cgroup anyway, so > there's no danger regarding delegation or whatever and the interface > would be a lot simpler. I think that will work too. We currently don't have a flag to make a file visible on first-level children only, but it shouldn't be hard to make one. Putting CPUs into an isolated child cpuset means removing it from the root's effective CPUs. So I would probably like to expose the read-only cpus.effective in the root cgroup so that we can check changes in the effective cpu list. I will renew the patchset will your suggestion. Thanks, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
Hello, On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote: > I think that will work too. We currently don't have a flag to make a > file visible on first-level children only, but it shouldn't be hard to > make one. I think it'd be fine to make the flag file exist on all !root cgroups but only writable on the first level children. > Putting CPUs into an isolated child cpuset means removing it from the > root's effective CPUs. So I would probably like to expose the read-only > cpus.effective in the root cgroup so that we can check changes in the > effective cpu list. Ah, yeah, that makes sense. > I will renew the patchset will your suggestion. Thank you very much. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated
On 05/01/2018 04:58 PM, Tejun Heo wrote: > Hello, > > On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote: >> I think that will work too. We currently don't have a flag to make a >> file visible on first-level children only, but it shouldn't be hard to >> make one. > I think it'd be fine to make the flag file exist on all !root cgroups > but only writable on the first level children. Right. This flag will be inherited by child cgroups like the sched_load_balance. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit
> The sysctl parameters msgmni, shmmni and semmni have an inherent limit > of IPC_MNI (32k). However, users may not be aware of that because they > can write a value much higher than that without getting any error or > notification. Reading the parameters back will show the newly written > values which are not real. > > Enforcing the limit by failing sysctl parameter write, however, may > cause regressions if existing user setup scripts set those parameters > above 32k as those scripts will now fail in this case. I have a serious problem with this approach. Have you made any effort to identify any code that sets these values above 32k? Have you looked to see if these applications actually care if you return an error when a value is set too large? Right now this seems like a lot of work to avoid breaking applications and or users that may or may not exist. If you can find something that will care sure. We need to avoid breaking userspace and causing regressions. However as this stands it looks you are making maintenance of the kernel more difficult to avoid having to look to see if there are monsters under the bed. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] coresight: Remove %px for printing pcsr value
The driver prints pcsr twice: the first time it uses specifier %px to print hexadecimal pcsr value and the second time uses specifier %pS for output kernel symbols. As suggested by Kees, using %pS should be sufficient and %px isn't necessary; the reason is if the pcsr is a kernel space address, we can easily get to know the code line from %pS format, on the other hand, if the pcsr value doesn't fall into kernel space range (e.g. if the CPU is stuck in firmware), %pS also gives out pcsr hexadecimal value. So this commit removes useless %px and update section "Output format" in the document for alignment between the code and document. Suggested-by: Kees Cook Cc: Mathieu Poirier Signed-off-by: Leo Yan --- Documentation/trace/coresight-cpu-debug.txt | 4 ++-- drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/trace/coresight-cpu-debug.txt b/Documentation/trace/coresight-cpu-debug.txt index 2b9b51c..89ab09e 100644 --- a/Documentation/trace/coresight-cpu-debug.txt +++ b/Documentation/trace/coresight-cpu-debug.txt @@ -177,11 +177,11 @@ Here is an example of the debugging output format: ARM external debug module: coresight-cpu-debug 85.debug: CPU[0]: coresight-cpu-debug 85.debug: EDPRSR: 0001 (Power:On DLK:Unlock) -coresight-cpu-debug 85.debug: EDPCSR: [] handle_IPI+0x174/0x1d8 +coresight-cpu-debug 85.debug: EDPCSR: handle_IPI+0x174/0x1d8 coresight-cpu-debug 85.debug: EDCIDSR: coresight-cpu-debug 85.debug: EDVIDSR: 9000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) coresight-cpu-debug 852000.debug: CPU[1]: coresight-cpu-debug 852000.debug: EDPRSR: 0001 (Power:On DLK:Unlock) -coresight-cpu-debug 852000.debug: EDPCSR: [] debug_notifier_call+0x23c/0x358 +coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358 coresight-cpu-debug 852000.debug: EDCIDSR: coresight-cpu-debug 852000.debug: EDVIDSR: 9000 (State:Non-secure Mode:EL1/0 Width:64bits VMID:0) diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c index 9cdb3fb..78a054e 100644 --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata) } pc = debug_adjust_pc(drvdata); - dev_emerg(dev, " EDPCSR: [<%px>] %pS\n", (void *)pc, (void *)pc); + dev_emerg(dev, " EDPCSR: %pS\n", (void *)pc); if (drvdata->edcidsr_present) dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] coresight: Remove %px for printing pcsr value
On Tue, May 1, 2018 at 10:00 PM, Leo Yan wrote: > The driver prints pcsr twice: the first time it uses specifier %px to > print hexadecimal pcsr value and the second time uses specifier %pS for > output kernel symbols. > > As suggested by Kees, using %pS should be sufficient and %px isn't > necessary; the reason is if the pcsr is a kernel space address, we can > easily get to know the code line from %pS format, on the other hand, if > the pcsr value doesn't fall into kernel space range (e.g. if the CPU is > stuck in firmware), %pS also gives out pcsr hexadecimal value. > > So this commit removes useless %px and update section "Output format" > in the document for alignment between the code and document. > > Suggested-by: Kees Cook > Cc: Mathieu Poirier > Signed-off-by: Leo Yan Thanks! Reviewed-by: Kees Cook -Kees > --- > Documentation/trace/coresight-cpu-debug.txt | 4 ++-- > drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Documentation/trace/coresight-cpu-debug.txt > b/Documentation/trace/coresight-cpu-debug.txt > index 2b9b51c..89ab09e 100644 > --- a/Documentation/trace/coresight-cpu-debug.txt > +++ b/Documentation/trace/coresight-cpu-debug.txt > @@ -177,11 +177,11 @@ Here is an example of the debugging output format: > ARM external debug module: > coresight-cpu-debug 85.debug: CPU[0]: > coresight-cpu-debug 85.debug: EDPRSR: 0001 (Power:On DLK:Unlock) > -coresight-cpu-debug 85.debug: EDPCSR: [] > handle_IPI+0x174/0x1d8 > +coresight-cpu-debug 85.debug: EDPCSR: handle_IPI+0x174/0x1d8 > coresight-cpu-debug 85.debug: EDCIDSR: > coresight-cpu-debug 85.debug: EDVIDSR: 9000 (State:Non-secure > Mode:EL1/0 Width:64bits VMID:0) > coresight-cpu-debug 852000.debug: CPU[1]: > coresight-cpu-debug 852000.debug: EDPRSR: 0001 (Power:On DLK:Unlock) > -coresight-cpu-debug 852000.debug: EDPCSR: [] > debug_notifier_call+0x23c/0x358 > +coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358 > coresight-cpu-debug 852000.debug: EDCIDSR: > coresight-cpu-debug 852000.debug: EDVIDSR: 9000 (State:Non-secure > Mode:EL1/0 Width:64bits VMID:0) > diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c > b/drivers/hwtracing/coresight/coresight-cpu-debug.c > index 9cdb3fb..78a054e 100644 > --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c > +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c > @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata) > } > > pc = debug_adjust_pc(drvdata); > - dev_emerg(dev, " EDPCSR: [<%px>] %pS\n", (void *)pc, (void *)pc); > + dev_emerg(dev, " EDPCSR: %pS\n", (void *)pc); > > if (drvdata->edcidsr_present) > dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr); > -- > 2.7.4 > -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] coresight: Remove %px for printing pcsr value
On Tue, May 01, 2018 at 10:29:46PM -0700, Kees Cook wrote: > On Tue, May 1, 2018 at 10:00 PM, Leo Yan wrote: > > The driver prints pcsr twice: the first time it uses specifier %px to > > print hexadecimal pcsr value and the second time uses specifier %pS for > > output kernel symbols. > > > > As suggested by Kees, using %pS should be sufficient and %px isn't > > necessary; the reason is if the pcsr is a kernel space address, we can > > easily get to know the code line from %pS format, on the other hand, if > > the pcsr value doesn't fall into kernel space range (e.g. if the CPU is > > stuck in firmware), %pS also gives out pcsr hexadecimal value. > > > > So this commit removes useless %px and update section "Output format" > > in the document for alignment between the code and document. > > > > Suggested-by: Kees Cook > > Cc: Mathieu Poirier > > Signed-off-by: Leo Yan > > Thanks! > > Reviewed-by: Kees Cook Thanks for reviewing, Kees. > -Kees > > > --- > > Documentation/trace/coresight-cpu-debug.txt | 4 ++-- > > drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/trace/coresight-cpu-debug.txt > > b/Documentation/trace/coresight-cpu-debug.txt > > index 2b9b51c..89ab09e 100644 > > --- a/Documentation/trace/coresight-cpu-debug.txt > > +++ b/Documentation/trace/coresight-cpu-debug.txt > > @@ -177,11 +177,11 @@ Here is an example of the debugging output format: > > ARM external debug module: > > coresight-cpu-debug 85.debug: CPU[0]: > > coresight-cpu-debug 85.debug: EDPRSR: 0001 (Power:On DLK:Unlock) > > -coresight-cpu-debug 85.debug: EDPCSR: [] > > handle_IPI+0x174/0x1d8 > > +coresight-cpu-debug 85.debug: EDPCSR: handle_IPI+0x174/0x1d8 > > coresight-cpu-debug 85.debug: EDCIDSR: > > coresight-cpu-debug 85.debug: EDVIDSR: 9000 (State:Non-secure > > Mode:EL1/0 Width:64bits VMID:0) > > coresight-cpu-debug 852000.debug: CPU[1]: > > coresight-cpu-debug 852000.debug: EDPRSR: 0001 (Power:On DLK:Unlock) > > -coresight-cpu-debug 852000.debug: EDPCSR: [] > > debug_notifier_call+0x23c/0x358 > > +coresight-cpu-debug 852000.debug: EDPCSR: debug_notifier_call+0x23c/0x358 > > coresight-cpu-debug 852000.debug: EDCIDSR: > > coresight-cpu-debug 852000.debug: EDVIDSR: 9000 (State:Non-secure > > Mode:EL1/0 Width:64bits VMID:0) > > diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c > > b/drivers/hwtracing/coresight/coresight-cpu-debug.c > > index 9cdb3fb..78a054e 100644 > > --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c > > +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c > > @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata > > *drvdata) > > } > > > > pc = debug_adjust_pc(drvdata); > > - dev_emerg(dev, " EDPCSR: [<%px>] %pS\n", (void *)pc, (void *)pc); > > + dev_emerg(dev, " EDPCSR: %pS\n", (void *)pc); > > > > if (drvdata->edcidsr_present) > > dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr); > > -- > > 2.7.4 > > > > > > -- > Kees Cook > Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html