Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-01 Thread Will Deacon
Hi Kim,

On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 17:09:14 +0100
> Will Deacon  wrote:
> > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > > On Fri, 27 Apr 2018 15:37:20 +0100
> > > Will Deacon  wrote:
> > > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > > and instead see efforts to improve error reporting via the perf system
> > > > call interface.
> > > 
> > > We'd all prefer that, and for all PMU drivers, why should ones under
> > > drivers/perf be treated differently?
> > 
> > Because they're the ones I maintain...
> 
> You represent a minority on your opinion on this matter though.

I'm not sure that's true -- you tend to be the only one raising this issue;
I think most people don't actually care one way or the other. If you know of
others who care about it too then I suggest you work together as a group to
solve the problem in a way which is amenable to the upstream maintainers.
Then you won't have to worry about fixing it personally. You could even
propose a topic for plumbers if there is enough interest in a general
framework for propagating error messages to userspace.

> > > As you are already aware, I've personally tried to fix this problem -
> > > that has existed since before the introduction of the perf tool (I
> > > consider it a syscall-independent enhanced error interface), multiple
> > > times, and failed.
> > 
> > Why is that my problem? Try harder?
> 
> It's your problem because we're here reviewing a patch that happens to
> fall under your maintainership.  I'll be the first person to tell you
> I'm obviously incompetent and haven't been able to come up with a
> solution that is acceptable for everyone up to and including Linus
> Torvalds.  I'm just noticing a chronic usability problem that can be
> easily alleviated in the context of this patch review.

I don't see how incompetence has anything to do with it and, like most
people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
internals. No arguments on the usability problem, but this ain't the fix
you're looking for: it's a bodge. We've been over this many times before.

> > > So until someone comes up with a solution that works for everyone
> > > up to and including Linus Torvalds (who hasn't put up a problem
> > > pulling PMU drivers emitting things to dmesg so far, by the way), this
> > > keep PMU drivers' errors silent preference of yours is unnecessarily
> > > impeding people trying to measure system performance on Arm based
> > > machines - all other archs' maintainers are fine with PMU drivers using
> > > dmesg.
> > 
> > Good for them, although I'm pretty sure that at least the x86 folks are
> > against this crap too.
> 
> Unfortunately, it doesn't affect them nearly as much as it does our
> more diverse platforms, which is why I don't think they care to do
> much about it.

x86 has its fair share of PMUs too, so I don't think we're special in this
regard. The main difference seems to be that they have better support in
the perf tool for diagnosing problems.

> > > > Anyway, I think this driver has bigger problems that need addressing.
> > > 
> > > To me it represents yet another PMU driver submission - as the years go
> > > by - that is lacking in the user messaging area.  Which reminds me, can
> > > you take another look at applying this?:
> > 
> > As I said before, I'm not going to take anything that logs above pr_debug
> > for things that are directly triggerable from userspace. Spin a version
> 
> Why?  There are plenty of things that emit stuff into dmesg that are
> directly triggerable from userspace.  Is it because it upsets fuzzing
> tests?  How about those be run with a patched kernel that somehow
> mitigates the printing?

[we've been over this before]

There are two camps that might find this information useful:

  1. People writing userspace support for a new PMU using the
 perf_event_open syscall

  2. People trying to use a tool which abstracts the PMU details to some
 degree (e.g. perf tool)

Those in (1) can get by with pr_debug. Sure, they have to recompile, but
it's not like there are many people in this camp and they'll probably be
working with the PMU driver writer to some degree anyway and taking new
kernel drops.

Those in (2) may not have access to dmesg, absolutely should not be able
to spam it (since this could hide other important messages), will probably
struggle to match an internal message from the PMU driver to their
invocation of the high-level tool and may well be in a multi-user
environment so will struggle to identify the messages that they are
responsible for. What they actually want is for the perf tool to give
them more information, and dmesg is not the right channel for that, for
similar reasons.

> > using pr_debug and I'll queue it.
> 
> How about using a ratelimited dev_err variant?

Nah, I don't want dmesg being parsed by users of perf tool. pr_debug should
be suffici

Re: [PATCH v6 3/8] sysctl: Warn when a clamped sysctl parameter is set out of range

2018-05-01 Thread Waiman Long
On 04/30/2018 06:40 PM, Kees Cook wrote:
> I like this series overall, thanks! No objections from me. One thing I
> noted, though:
>
> On Fri, Apr 27, 2018 at 2:00 PM, Waiman Long  wrote:
>> if (param->min && *param->min > val) {
>> if (clamp) {
>> val = *param->min;
>> +   clamped = true;
>> } else {
>> return -EINVAL;
>> }
> This appears as a common bit of logic in many places in the series. It
> seems like it'd make sense to make this a helper of some kind?
>
> -Kees
>
We can't have an inline helper function because the types are different.
We may be able to use a helper macro, but helper macro like that may be
not well accepted by the kernel community.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl

2018-05-01 Thread Paul Moore
On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks  wrote:
> The decision to log a seccomp action will always be subject to the
> value of the kernel.seccomp.actions_logged sysctl, even for processes
> that are being inspected via the audit subsystem, in an upcoming patch.
> Therefore, we need to emit an audit record on attempts at writing to the
> actions_logged sysctl when auditing is enabled.
>
> This patch updates the write handler for the actions_logged sysctl to
> emit an audit record on attempts to write to the sysctl. Successful
> writes to the sysctl will result in a record that includes a normalized
> list of logged actions in the "actions" field and a "res" field equal to
> 0. Unsuccessful writes to the sysctl will result in a record that
> doesn't include the "actions" field and has a "res" field equal to 1.
>
> Not all unsuccessful writes to the sysctl are audited. For example, an
> audit record will not be emitted if an unprivileged process attempts to
> open the sysctl file for reading since that access control check is not
> part of the sysctl's write handler.
>
> Below are some example audit records when writing various strings to the
> actions_logged sysctl.
>
> Writing "not-a-real-action" emits:
>
>  type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0
>  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
>  op=seccomp-logging res=1
>
> Writing "kill_process kill_thread errno trace log" emits:
>
>  type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0
>  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
>  op=seccomp-logging actions="kill_process kill_thread errno trace log"
>  res=0

I've got some additional comments regarding the fields in the code
below, but it would be good to hear Steve comment on the "actions"
field since his userspace tools are extremely picky about what they
will accept.  It looks like you are treating the actions as an
untrusted string, which is good, so I suspect you are okay, but still
...

> Writing the string "log log errno trace kill_process kill_thread", which
> is unordered and contains the log action twice, results in the same
> value as the previous example for the actions field:
>
>  type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0
>  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
>  op=seccomp-logging actions="kill_process kill_thread errno trace log"
>  res=0
>
> No audit records are generated when reading the actions_logged sysctl.
>
> Suggested-by: Steve Grubb 
> Signed-off-by: Tyler Hicks 
> ---
>  include/linux/audit.h |  3 +++
>  kernel/auditsc.c  | 37 +
>  kernel/seccomp.c  | 43 ++-
>  3 files changed, 74 insertions(+), 9 deletions(-)

...

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 75d5b03..b311d7d 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent,
> const struct dentry *dentry,
> const unsigned char type);
>  extern void __audit_seccomp(unsigned long syscall, long signr, int code);
> +extern void audit_seccomp_actions_logged(const char *names, int res);
>  extern void __audit_ptrace(struct task_struct *t);
>
>  static inline bool audit_dummy_context(void)
> @@ -502,6 +503,8 @@ static inline void __audit_seccomp(unsigned long syscall, 
> long signr, int code)
>  { }
>  static inline void audit_seccomp(unsigned long syscall, long signr, int code)
>  { }
> +static inline void audit_seccomp_actions_logged(const char *names, int res)
> +{ }
>  static inline int auditsc_get_stamp(struct audit_context *ctx,
>   struct timespec64 *t, unsigned int *serial)
>  {
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 4e0a4ac..3496238 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2478,6 +2478,43 @@ void __audit_seccomp(unsigned long syscall, long 
> signr, int code)
> audit_log_end(ab);
>  }
>
> +void audit_seccomp_actions_logged(const char *names, int res)
> +{
> +   struct tty_struct *tty;
> +   const struct cred *cred;
> +   struct audit_buffer *ab;
> +   char comm[sizeof(current->comm)];
> +
> +   if (!audit_enabled)
> +   return;
> +
> +   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> +   if (unlikely(!ab))
> +   return;

Instead of NULL, let's pass current->audit_context to
audit_log_start().  Yes, most of the AUDIT_CONFIG_CHANGE users pass
NULL but all of that is going to need to change because of 1) the
audit container ID work and 2) it makes sense to connect records that
are related.  Let's do it right the first time here :)

> +   cred = current_cred();
> +   tty = audit_get_tty(current);
> +   audit_log_format(ab, "pid=%d uid=%u auid=%u tty=%s ses=%u",
> +task_tgid_nr(curren

Re: [PATCH 3/3] seccomp: Don't special case audited processes when logging

2018-05-01 Thread Paul Moore
On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks  wrote:
> Seccomp logging for "handled" actions such as RET_TRAP, RET_TRACE, or
> RET_ERRNO can be very noisy for processes that are being audited. This
> patch modifies the seccomp logging behavior to treat processes that are
> being inspected via the audit subsystem the same as processes that
> aren't under inspection. Handled actions will no longer be logged just
> because the process is being inspected. Since v4.14, applications have
> the ability to request logging of handled actions by using the
> SECCOMP_FILTER_FLAG_LOG flag when loading seccomp filters.
>
> With this patch, the logic for deciding if an action will be logged is:
>
>   if action == RET_ALLOW:
> do not log
>   else if action not in actions_logged:
> do not log
>   else if action == RET_KILL:
> log
>   else if action == RET_LOG:
> log
>   else if filter-requests-logging:
> log
>   else:
> do not log
>
> Reported-by: Steve Grubb 
> Signed-off-by: Tyler Hicks 
> ---
>  Documentation/userspace-api/seccomp_filter.rst |  7 ---
>  include/linux/audit.h  | 10 +-
>  kernel/auditsc.c   |  2 +-
>  kernel/seccomp.c   | 15 +--
>  4 files changed, 7 insertions(+), 27 deletions(-)
>
> diff --git a/Documentation/userspace-api/seccomp_filter.rst 
> b/Documentation/userspace-api/seccomp_filter.rst
> index 099c412..82a468b 100644
> --- a/Documentation/userspace-api/seccomp_filter.rst
> +++ b/Documentation/userspace-api/seccomp_filter.rst
> @@ -207,13 +207,6 @@ directory. Here's a description of each file in that 
> directory:
> to the file do not need to be in ordered form but reads from the file
> will be ordered in the same way as the actions_avail sysctl.
>
> -   It is important to note that the value of ``actions_logged`` does not
> -   prevent certain actions from being logged when the audit subsystem is
> -   configured to audit a task. If the action is not found in
> -   ``actions_logged`` list, the final decision on whether to audit the
> -   action for that task is ultimately left up to the audit subsystem to
> -   decide for all seccomp return values other than ``SECCOMP_RET_ALLOW``.
> -
> The ``allow`` string is not accepted in the ``actions_logged`` sysctl
> as it is not possible to log ``SECCOMP_RET_ALLOW`` actions. Attempting
> to write ``allow`` to the sysctl will result in an EINVAL being
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index b311d7d..1964fbd 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -232,7 +232,7 @@ extern void __audit_file(const struct file *);
>  extern void __audit_inode_child(struct inode *parent,
> const struct dentry *dentry,
> const unsigned char type);
> -extern void __audit_seccomp(unsigned long syscall, long signr, int code);
> +extern void audit_seccomp(unsigned long syscall, long signr, int code);
>  extern void audit_seccomp_actions_logged(const char *names, int res);
>  extern void __audit_ptrace(struct task_struct *t);
>
> @@ -303,12 +303,6 @@ static inline void audit_inode_child(struct inode 
> *parent,
>  }
>  void audit_core_dumps(long signr);
>
> -static inline void audit_seccomp(unsigned long syscall, long signr, int code)
> -{
> -   if (audit_enabled && unlikely(!audit_dummy_context()))
> -   __audit_seccomp(syscall, signr, code);
> -}
> -
>  static inline void audit_ptrace(struct task_struct *t)
>  {
> if (unlikely(!audit_dummy_context()))
> @@ -499,8 +493,6 @@ static inline void audit_inode_child(struct inode *parent,
>  { }
>  static inline void audit_core_dumps(long signr)
>  { }
> -static inline void __audit_seccomp(unsigned long syscall, long signr, int 
> code)
> -{ }
>  static inline void audit_seccomp(unsigned long syscall, long signr, int code)
>  { }
>  static inline void audit_seccomp_actions_logged(const char *names, int res)
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 3496238..1e64b91 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2464,7 +2464,7 @@ void audit_core_dumps(long signr)
> audit_log_end(ab);
>  }
>
> -void __audit_seccomp(unsigned long syscall, long signr, int code)
> +void audit_seccomp(unsigned long syscall, long signr, int code)
>  {
> struct audit_buffer *ab;

Since it is a bit unusual, it might be nice to add a comment at the
top of audit_seccomp() that the event filtering is being done in the
seccomp_log() function, and we may need to force auditing independent
of the audit_enabled and dummy context state.

> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index e28ddcc..947cc0f 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -584,18 +584,13 @@ static inline void seccomp_log(unsigned long syscall, 
> long signr, u32 action,
> }
>
> 

Re: [PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl

2018-05-01 Thread Steve Grubb
On Tuesday, May 1, 2018 11:18:55 AM EDT Paul Moore wrote:
> On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks  wrote:
> > The decision to log a seccomp action will always be subject to the
> > value of the kernel.seccomp.actions_logged sysctl, even for processes
> > that are being inspected via the audit subsystem, in an upcoming patch.
> > Therefore, we need to emit an audit record on attempts at writing to the
> > actions_logged sysctl when auditing is enabled.
> > 
> > This patch updates the write handler for the actions_logged sysctl to
> > emit an audit record on attempts to write to the sysctl. Successful
> > writes to the sysctl will result in a record that includes a normalized
> > list of logged actions in the "actions" field and a "res" field equal to
> > 0. Unsuccessful writes to the sysctl will result in a record that
> > doesn't include the "actions" field and has a "res" field equal to 1.
> > 
> > Not all unsuccessful writes to the sysctl are audited. For example, an
> > audit record will not be emitted if an unprivileged process attempts to
> > open the sysctl file for reading since that access control check is not
> > part of the sysctl's write handler.
> > 
> > Below are some example audit records when writing various strings to the
> > actions_logged sysctl.
> > 
> > Writing "not-a-real-action" emits:
> >  type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0
> >  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
> >  op=seccomp-logging res=1
> > 
> > Writing "kill_process kill_thread errno trace log" emits:
> >  type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0
> >  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
> >  op=seccomp-logging actions="kill_process kill_thread errno trace log"
> >  res=0
> 
> I've got some additional comments regarding the fields in the code
> below, but it would be good to hear Steve comment on the "actions"
> field since his userspace tools are extremely picky about what they
> will accept.

Its not that the audit user space applications are picky, its that we have a 
coding standard that everyone needs to abide by so that any parser coded to 
the specification works. In short, we should not have spaces inside the "" 
because that can trick a naive parser. What we typically do in a situation 
like this is add a comma as a separator. But having "" means that the value 
is untrusted and subject to escaping. I don't think that is the case here. 
Output is not controlled by the user. Its a list of well known names.

> It looks like you are treating the actions as an untrusted string, which is
> good, so I suspect you are okay, but still

The function below that logs names is calling audit_log_format which does not 
handle untrusted strings. I would suggest not treating it as an untrusted 
string, but as a string with no spaces in it. 

actions=kill_process,kill_thread,errno,trace,log


> > Writing the string "log log errno trace kill_process kill_thread", which
> > is unordered and contains the log action twice, results in the same
> > 
> > value as the previous example for the actions field:
> >  type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0
> >  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
> >  op=seccomp-logging actions="kill_process kill_thread errno trace log"
> >  res=0
> > 
> > No audit records are generated when reading the actions_logged sysctl.
> > 
> > Suggested-by: Steve Grubb 
> > Signed-off-by: Tyler Hicks 
> > ---
> > 
> >  include/linux/audit.h |  3 +++
> >  kernel/auditsc.c  | 37 +
> >  kernel/seccomp.c  | 43 ++-
> >  3 files changed, 74 insertions(+), 9 deletions(-)
> 
> ...
> 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 75d5b03..b311d7d 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent,
> > 
> > const struct dentry *dentry,
> > const unsigned char type);
> >  
> >  extern void __audit_seccomp(unsigned long syscall, long signr, int
> >  code);
> > 
> > +extern void audit_seccomp_actions_logged(const char *names, int res);
> > 
> >  extern void __audit_ptrace(struct task_struct *t);
> >  
> >  static inline bool audit_dummy_context(void)
> > 
> > @@ -502,6 +503,8 @@ static inline void __audit_seccomp(unsigned long
> > syscall, long signr, int code)> 
> >  { }
> >  static inline void audit_seccomp(unsigned long syscall, long signr, int
> >  code) { }
> > 
> > +static inline void audit_seccomp_actions_logged(const char *names, int
> > res) +{ }
> > 
> >  static inline int auditsc_get_stamp(struct audit_context *ctx,
> >  
> >   struct timespec64 *t, unsigned int *serial)
> >  
> >  {
> > 
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 4e0a4ac..3496238 100644
> > --- a/kernel

Re: [PATCH 2/3] seccomp: Audit attempts to modify the actions_logged sysctl

2018-05-01 Thread Paul Moore
On Tue, May 1, 2018 at 12:41 PM, Steve Grubb  wrote:
> On Tuesday, May 1, 2018 11:18:55 AM EDT Paul Moore wrote:
>> On Fri, Apr 27, 2018 at 3:16 PM, Tyler Hicks  wrote:
>> > The decision to log a seccomp action will always be subject to the
>> > value of the kernel.seccomp.actions_logged sysctl, even for processes
>> > that are being inspected via the audit subsystem, in an upcoming patch.
>> > Therefore, we need to emit an audit record on attempts at writing to the
>> > actions_logged sysctl when auditing is enabled.
>> >
>> > This patch updates the write handler for the actions_logged sysctl to
>> > emit an audit record on attempts to write to the sysctl. Successful
>> > writes to the sysctl will result in a record that includes a normalized
>> > list of logged actions in the "actions" field and a "res" field equal to
>> > 0. Unsuccessful writes to the sysctl will result in a record that
>> > doesn't include the "actions" field and has a "res" field equal to 1.
>> >
>> > Not all unsuccessful writes to the sysctl are audited. For example, an
>> > audit record will not be emitted if an unprivileged process attempts to
>> > open the sysctl file for reading since that access control check is not
>> > part of the sysctl's write handler.
>> >
>> > Below are some example audit records when writing various strings to the
>> > actions_logged sysctl.
>> >
>> > Writing "not-a-real-action" emits:
>> >  type=CONFIG_CHANGE msg=audit(1524600971.363:119): pid=1651 uid=0
>> >  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
>> >  op=seccomp-logging res=1
>> >
>> > Writing "kill_process kill_thread errno trace log" emits:
>> >  type=CONFIG_CHANGE msg=audit(1524601023.982:131): pid=1658 uid=0
>> >  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
>> >  op=seccomp-logging actions="kill_process kill_thread errno trace log"
>> >  res=0
>>
>> I've got some additional comments regarding the fields in the code
>> below, but it would be good to hear Steve comment on the "actions"
>> field since his userspace tools are extremely picky about what they
>> will accept.
>
> Its not that the audit user space applications are picky, its that we have a
> coding standard that everyone needs to abide by so that any parser coded to
> the specification works.

We're getting a bit off track, but the issue with the "specification"
is that there has not been enough checking and enforcement of the
"specification" to give it much weight.  We've made some fixes to
records of lower impact, but I'm not going to merge disruptive record
changes.  After a while, the status-quo becomes the "specification".

At some point in the future I plan to submit patches which disconnect
the audit data from the record format for in-kernel callers; this is
really the only way we can enforce any type of record format.  The
current in-kernel audit API is for too open for misuse and abuse.

> In short, we should not have spaces inside the ""
> because that can trick a naive parser. What we typically do in a situation
> like this is add a comma as a separator. But having "" means that the value
> is untrusted and subject to escaping. I don't think that is the case here.
> Output is not controlled by the user. Its a list of well known names.
>
>> It looks like you are treating the actions as an untrusted string, which is
>> good, so I suspect you are okay, but still
>
> The function below that logs names is calling audit_log_format which does not
> handle untrusted strings. I would suggest not treating it as an untrusted
> string, but as a string with no spaces in it.
>
> actions=kill_process,kill_thread,errno,trace,log

Yes, my mistake, I suspect I was distracted by the comm logging.

>> > Writing the string "log log errno trace kill_process kill_thread", which
>> > is unordered and contains the log action twice, results in the same
>> >
>> > value as the previous example for the actions field:
>> >  type=CONFIG_CHANGE msg=audit(1524601204.365:152): pid=1704 uid=0
>> >  auid=1000 tty=pts8 ses=1 comm="tee" exe="/usr/bin/tee"
>> >  op=seccomp-logging actions="kill_process kill_thread errno trace log"
>> >  res=0
>> >
>> > No audit records are generated when reading the actions_logged sysctl.
>> >
>> > Suggested-by: Steve Grubb 
>> > Signed-off-by: Tyler Hicks 
>> > ---
>> >
>> >  include/linux/audit.h |  3 +++
>> >  kernel/auditsc.c  | 37 +
>> >  kernel/seccomp.c  | 43 ++-
>> >  3 files changed, 74 insertions(+), 9 deletions(-)
>>
>> ...
>>
>> > diff --git a/include/linux/audit.h b/include/linux/audit.h
>> > index 75d5b03..b311d7d 100644
>> > --- a/include/linux/audit.h
>> > +++ b/include/linux/audit.h
>> > @@ -233,6 +233,7 @@ extern void __audit_inode_child(struct inode *parent,
>> >
>> > const struct dentry *dentry,
>> > const unsigned char type);
>> >
>> >  extern void __audit_seccomp(unsigned long syscall, lon

Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated

2018-05-01 Thread Tejun Heo
Hello, Waiman.

Sorry about the delay.

On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote:
> With the addition of "cpuset.cpus.isolated", it makes sense to add the
> restriction that load balancing can only be turned off if the CPUs in
> the isolated cpuset are subset of "cpuset.cpus.isolated".
> 
> Signed-off-by: Waiman Long 
> ---
>  Documentation/cgroup-v2.txt |  7 ---
>  kernel/cgroup/cpuset.c  | 29 ++---
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index 8d89dc2..c4227ee 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1554,9 +1554,10 @@ Cpuset Interface Files
>   and will not be moved to other CPUs.
>  
>   This flag is hierarchical and is inherited by child cpusets. It
> - can be turned off only when the CPUs in this cpuset aren't
> - listed in the cpuset.cpus of other sibling cgroups, and all
> - the child cpusets, if present, have this flag turned off.
> + can be explicitly turned off only when it is a direct child of
> + the root cgroup and the CPUs in this cpuset are subset of the
> + root's "cpuset.cpus.isolated".  Moreover, the CPUs cannot be
> + listed in the "cpuset.cpus" of other sibling cgroups.

It is a little bit convoluted that the isolation requires coordination
among root's isolated file and the first-level children's cpus file
and the flag.  Maybe I'm missing something but can't we do something
like the following?

* Add isolated flag file, which can only be modified on empty (in
  terms of cpus) first level children.

* Once isolated flag is set, CPUs can only be added to the cpus file
  iff they aren't being used by anyone else and automatically become
  isolated.

The first level cpus file is owned by the root cgroup anyway, so
there's no danger regarding delegation or whatever and the interface
would be a lot simpler.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated

2018-05-01 Thread Waiman Long
On 05/01/2018 03:51 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> Sorry about the delay.
>
> On Thu, Apr 19, 2018 at 09:47:03AM -0400, Waiman Long wrote:
>> With the addition of "cpuset.cpus.isolated", it makes sense to add the
>> restriction that load balancing can only be turned off if the CPUs in
>> the isolated cpuset are subset of "cpuset.cpus.isolated".
>>
>> Signed-off-by: Waiman Long 
>> ---
>>  Documentation/cgroup-v2.txt |  7 ---
>>  kernel/cgroup/cpuset.c  | 29 ++---
>>  2 files changed, 30 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index 8d89dc2..c4227ee 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -1554,9 +1554,10 @@ Cpuset Interface Files
>>  and will not be moved to other CPUs.
>>  
>>  This flag is hierarchical and is inherited by child cpusets. It
>> -can be turned off only when the CPUs in this cpuset aren't
>> -listed in the cpuset.cpus of other sibling cgroups, and all
>> -the child cpusets, if present, have this flag turned off.
>> +can be explicitly turned off only when it is a direct child of
>> +the root cgroup and the CPUs in this cpuset are subset of the
>> +root's "cpuset.cpus.isolated".  Moreover, the CPUs cannot be
>> +listed in the "cpuset.cpus" of other sibling cgroups.
> It is a little bit convoluted that the isolation requires coordination
> among root's isolated file and the first-level children's cpus file
> and the flag.  Maybe I'm missing something but can't we do something
> like the following?
>
> * Add isolated flag file, which can only be modified on empty (in
>   terms of cpus) first level children.
>
> * Once isolated flag is set, CPUs can only be added to the cpus file
>   iff they aren't being used by anyone else and automatically become
>   isolated.
>
> The first level cpus file is owned by the root cgroup anyway, so
> there's no danger regarding delegation or whatever and the interface
> would be a lot simpler.

I think that will work too. We currently don't have a flag to make a
file visible on first-level children only, but it shouldn't be hard to
make one.

Putting CPUs into an isolated child cpuset means removing it from the
root's effective CPUs. So I would probably like to expose the read-only
cpus.effective in the root cgroup so that we can check changes in the
effective cpu list.

I will renew the patchset will your suggestion.

Thanks,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated

2018-05-01 Thread Tejun Heo
Hello,

On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote:
> I think that will work too. We currently don't have a flag to make a
> file visible on first-level children only, but it shouldn't be hard to
> make one.

I think it'd be fine to make the flag file exist on all !root cgroups
but only writable on the first level children.

> Putting CPUs into an isolated child cpuset means removing it from the
> root's effective CPUs. So I would probably like to expose the read-only
> cpus.effective in the root cgroup so that we can check changes in the
> effective cpu list.

Ah, yeah, that makes sense.

> I will renew the patchset will your suggestion.

Thank you very much.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/5] cpuset: Restrict load balancing off cpus to subset of cpus.isolated

2018-05-01 Thread Waiman Long
On 05/01/2018 04:58 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, May 01, 2018 at 04:33:45PM -0400, Waiman Long wrote:
>> I think that will work too. We currently don't have a flag to make a
>> file visible on first-level children only, but it shouldn't be hard to
>> make one.
> I think it'd be fine to make the flag file exist on all !root cgroups
> but only writable on the first level children.

Right. This flag will be inherited by child cgroups like the
sched_load_balance.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/8] ipc: Clamp *mni to the real IPCMNI limit & increase that limit

2018-05-01 Thread Eric W. Biederman


> The sysctl parameters msgmni, shmmni and semmni have an inherent limit
> of IPC_MNI (32k). However, users may not be aware of that because they
> can write a value much higher than that without getting any error or
> notification. Reading the parameters back will show the newly written
> values which are not real.
>
> Enforcing the limit by failing sysctl parameter write, however, may
> cause regressions if existing user setup scripts set those parameters
> above 32k as those scripts will now fail in this case.

I have a serious problem with this approach.  Have you made any effort
to identify any code that sets these values above 32k?  Have you looked
to see if these applications actually care if you return an error when
a value is set too large?

Right now this seems like a lot of work to avoid breaking applications
and or users that may or may not exist.  If you can find something that
will care sure.  We need to avoid breaking userspace and causing
regressions.  However as this stands it looks you are making maintenance
of the kernel more difficult to avoid having to look to see if there are
monsters under the bed.



Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Leo Yan
The driver prints pcsr twice: the first time it uses specifier %px to
print hexadecimal pcsr value and the second time uses specifier %pS for
output kernel symbols.

As suggested by Kees, using %pS should be sufficient and %px isn't
necessary; the reason is if the pcsr is a kernel space address, we can
easily get to know the code line from %pS format, on the other hand, if
the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
stuck in firmware), %pS also gives out pcsr hexadecimal value.

So this commit removes useless %px and update section "Output format"
in the document for alignment between the code and document.

Suggested-by: Kees Cook 
Cc: Mathieu Poirier 
Signed-off-by: Leo Yan 
---
 Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/coresight-cpu-debug.txt 
b/Documentation/trace/coresight-cpu-debug.txt
index 2b9b51c..89ab09e 100644
--- a/Documentation/trace/coresight-cpu-debug.txt
+++ b/Documentation/trace/coresight-cpu-debug.txt
@@ -177,11 +177,11 @@ Here is an example of the debugging output format:
 ARM external debug module:
 coresight-cpu-debug 85.debug: CPU[0]:
 coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
-coresight-cpu-debug 85.debug:  EDPCSR:  [] 
handle_IPI+0x174/0x1d8
+coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
 coresight-cpu-debug 85.debug:  EDCIDSR: 
 coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
Mode:EL1/0 Width:64bits VMID:0)
 coresight-cpu-debug 852000.debug: CPU[1]:
 coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
-coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
debug_notifier_call+0x23c/0x358
+coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
 coresight-cpu-debug 852000.debug:  EDCIDSR: 
 coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
Mode:EL1/0 Width:64bits VMID:0)
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 9cdb3fb..78a054e 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata)
}
 
pc = debug_adjust_pc(drvdata);
-   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
+   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
 
if (drvdata->edcidsr_present)
dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Kees Cook
On Tue, May 1, 2018 at 10:00 PM, Leo Yan  wrote:
> The driver prints pcsr twice: the first time it uses specifier %px to
> print hexadecimal pcsr value and the second time uses specifier %pS for
> output kernel symbols.
>
> As suggested by Kees, using %pS should be sufficient and %px isn't
> necessary; the reason is if the pcsr is a kernel space address, we can
> easily get to know the code line from %pS format, on the other hand, if
> the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
> stuck in firmware), %pS also gives out pcsr hexadecimal value.
>
> So this commit removes useless %px and update section "Output format"
> in the document for alignment between the code and document.
>
> Suggested-by: Kees Cook 
> Cc: Mathieu Poirier 
> Signed-off-by: Leo Yan 

Thanks!

Reviewed-by: Kees Cook 

-Kees

> ---
>  Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/trace/coresight-cpu-debug.txt 
> b/Documentation/trace/coresight-cpu-debug.txt
> index 2b9b51c..89ab09e 100644
> --- a/Documentation/trace/coresight-cpu-debug.txt
> +++ b/Documentation/trace/coresight-cpu-debug.txt
> @@ -177,11 +177,11 @@ Here is an example of the debugging output format:
>  ARM external debug module:
>  coresight-cpu-debug 85.debug: CPU[0]:
>  coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> -coresight-cpu-debug 85.debug:  EDPCSR:  [] 
> handle_IPI+0x174/0x1d8
> +coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
>  coresight-cpu-debug 85.debug:  EDCIDSR: 
>  coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
> Mode:EL1/0 Width:64bits VMID:0)
>  coresight-cpu-debug 852000.debug: CPU[1]:
>  coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> -coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
> debug_notifier_call+0x23c/0x358
> +coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
>  coresight-cpu-debug 852000.debug:  EDCIDSR: 
>  coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
> Mode:EL1/0 Width:64bits VMID:0)
> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
> b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> index 9cdb3fb..78a054e 100644
> --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata)
> }
>
> pc = debug_adjust_pc(drvdata);
> -   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
> +   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
>
> if (drvdata->edcidsr_present)
> dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
> --
> 2.7.4
>



-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Leo Yan
On Tue, May 01, 2018 at 10:29:46PM -0700, Kees Cook wrote:
> On Tue, May 1, 2018 at 10:00 PM, Leo Yan  wrote:
> > The driver prints pcsr twice: the first time it uses specifier %px to
> > print hexadecimal pcsr value and the second time uses specifier %pS for
> > output kernel symbols.
> >
> > As suggested by Kees, using %pS should be sufficient and %px isn't
> > necessary; the reason is if the pcsr is a kernel space address, we can
> > easily get to know the code line from %pS format, on the other hand, if
> > the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
> > stuck in firmware), %pS also gives out pcsr hexadecimal value.
> >
> > So this commit removes useless %px and update section "Output format"
> > in the document for alignment between the code and document.
> >
> > Suggested-by: Kees Cook 
> > Cc: Mathieu Poirier 
> > Signed-off-by: Leo Yan 
> 
> Thanks!
> 
> Reviewed-by: Kees Cook 

Thanks for reviewing, Kees.

> -Kees
> 
> > ---
> >  Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
> >  drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/trace/coresight-cpu-debug.txt 
> > b/Documentation/trace/coresight-cpu-debug.txt
> > index 2b9b51c..89ab09e 100644
> > --- a/Documentation/trace/coresight-cpu-debug.txt
> > +++ b/Documentation/trace/coresight-cpu-debug.txt
> > @@ -177,11 +177,11 @@ Here is an example of the debugging output format:
> >  ARM external debug module:
> >  coresight-cpu-debug 85.debug: CPU[0]:
> >  coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> > -coresight-cpu-debug 85.debug:  EDPCSR:  [] 
> > handle_IPI+0x174/0x1d8
> > +coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
> >  coresight-cpu-debug 85.debug:  EDCIDSR: 
> >  coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
> > Mode:EL1/0 Width:64bits VMID:0)
> >  coresight-cpu-debug 852000.debug: CPU[1]:
> >  coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> > -coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
> > debug_notifier_call+0x23c/0x358
> > +coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
> >  coresight-cpu-debug 852000.debug:  EDCIDSR: 
> >  coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
> > Mode:EL1/0 Width:64bits VMID:0)
> > diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
> > b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > index 9cdb3fb..78a054e 100644
> > --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata 
> > *drvdata)
> > }
> >
> > pc = debug_adjust_pc(drvdata);
> > -   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
> > +   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
> >
> > if (drvdata->edcidsr_present)
> > dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
> > --
> > 2.7.4
> >
> 
> 
> 
> -- 
> Kees Cook
> Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html