Re: [PATCH v11 7/9] coresight: add support for CPU debug module

2017-05-24 Thread kbuild test robot
Hi Leo,

[auto build test WARNING on next-20170522]
[cannot apply to linus/master linux/master robh/for-next v4.9-rc8 v4.9-rc7 
v4.9-rc6 v4.12-rc2]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Leo-Yan/coresight-enable-debug-module/20170524-075217
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   drivers/hwtracing/coresight/coresight-cpu-debug.c: In function 
'debug_adjust_pc':
>> drivers/hwtracing/coresight/coresight-cpu-debug.c:263:44: warning: left 
>> shift count >= width of type [-Wshift-count-overflow]
  return (unsigned long)drvdata->edpcsr_hi << 32 |
   ^~

vim +263 drivers/hwtracing/coresight/coresight-cpu-debug.c

   247  if (drvdata->edvidsr_present)
   248  drvdata->edvidsr = readl_relaxed(drvdata->base + 
EDVIDSR);
   249  
   250  out:
   251  /* Restore EDPRCR register */
   252  writel_relaxed(save_edprcr, drvdata->base + EDPRCR);
   253  
   254  CS_LOCK(drvdata->base);
   255  }
   256  
   257  static unsigned long debug_adjust_pc(struct debug_drvdata *drvdata)
   258  {
   259  unsigned long arm_inst_offset = 0, thumb_inst_offset = 0;
   260  unsigned long pc;
   261  
   262  if (IS_ENABLED(CONFIG_64BIT))
 > 263  return (unsigned long)drvdata->edpcsr_hi << 32 |
   264 (unsigned long)drvdata->edpcsr;
   265  
   266  pc = (unsigned long)drvdata->edpcsr;
   267  
   268  if (drvdata->pc_has_offset) {
   269  arm_inst_offset = 8;
   270  thumb_inst_offset = 4;
   271  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v4 next 1/3] modules:capabilities: allow __request_module() to take a capability argument

2017-05-24 Thread Djalal Harouni
On Tue, May 23, 2017 at 9:19 PM, Kees Cook  wrote:
> On Tue, May 23, 2017 at 3:29 AM, Djalal Harouni  wrote:
[...]

>> I think if there is an interface request_module_capable() , then code
>> will use it. The DCCP code path did not check capabilities at all and
>> called request_module(), other code does the same.
>>
>> A new interface can be abused, the result of this: we may break
>> "modules_autoload_mode" in mode 0 and 1. In the long term code will
>> want to change may_autoload_module() to also allow mode 1 to load a
>> module with CAP_NET_ADMIN or other caps in its own userns, resulting
>> in "modules_autoload_mode == 0 == 1". Without userns in the game we
>> may just see request_module_capable(CAP_SYS_ADMIN, ...)  . There is
>> already some code maybe phonet sockets ? that require CAP_SYS_ADMIN to
>> get the appropriate protocol and no one will be able to review all
>> this code or track new patches with request_module_capable() callers.
>
> I'm having some trouble following what you're saying here, but if I
> understand, you're worried about getting the kernel into a state where
> autoload state 0 == 1. Autoload 0 is "business as usual", and autoload
> 1 is "CAP_SYS_MODULE required to be able to trigger a module auto-load
> operation, or CAP_NET_ADMIN for modules with a 'netdev-%s' alias."

Indeed.


> In the v4 patch, under autoload==1, CAP_NET_ADMIN is needed to load
> netdev- modules:
>
> if (no_module && capable(CAP_NET_ADMIN))
>no_module = __request_module(true, CAP_NET_ADMIN,
> "netdev-%s", name);
>
> and in the LSM hook, CAP_NET_ADMIN is passed as an allowable "alias"
> for the CAP_SYS_MODULE requirement:
>
>else if (modules_autoload_mode == MODULES_AUTOLOAD_PRIVILEGED) {
>/* Check CAP_SYS_MODULE then allow_cap if valid */
>if (capable(CAP_SYS_MODULE) ||
>(allow_cap > 0 && capable(allow_cap)))
>   return 0;
>}
>
> What I see is some needless double-checking. Since you're making
> changes to the request_module() API, it would be possible to have

That check is *not* a double check and it is *really* needed in v4
since how may_autoload_module() was implemented. It first checks if
'autoload' == 0 == ALLOWED, if so then it allows the operation
regardless of the capability. That's why I didn't want to touch
current network logic and assumed that net code knows what it should
do.


> request_module_cap(), which could be checked instead of open-coding
> it:
>
>  if (no_module)
> no_module = request_module_cap(CAP_NET_ADMIN, "netdev-%s", name);
>
> If I'm understanding your objection correctly, it's that you want to
> ONLY ever provide this one-time alias for CAP_SYS_MODULE with the
> netdev-%s things, and you don't want to risk having other module
> loading start using request_module_cap() which would lead to
> CAP_SYS_MODULE aliases in other places?

Yes. We can't really track capabilities usage or new code.


> If the goal is to make sure that only privileged processes are
> autoloading, I don't think adding a well defined interface for
> cap-checks (request_module_cap()) would lead to a slippery slope. The
> worst case scenario (which would never happen) would be all
> request_module() users would convert to request_module_cap(). This

I am also concerned a bit with new code. In the documentation we
explicitly say CAP_SYS_MODULE, and new code should not break that
assumption.


> would mean that all module loading would require specific privileges.
> That seems in line with autoload==1. They would not be tied to
> CAP_SYS_MODULE, though, which is, I suspect, what you're concerned
> about.

Indeed, it is just easy to say hey it needs CAP_SYS_MODULE. The
capability usage in the module subsystem more precisely with explicit
loading is clean. CAP_SYS_MODULE is not overloaded, it has clear
focus. As you say it, we should be concerned if we blindly trust
callers and end up *aliasing* CAP_SYS_MODULE with some other cap...


> Even in the existing code, there is a sense about CAP_NET_ADMIN and
> CAP_SYS_MODULE having different privilege levels, in that
> CAP_NET_ADMIN can only load netdev-%s modules, but CAP_SYS_MODULE can
> load any module. What about refining request_module_cap() to _require_
> an explicit string prefix instead of an arbitrary format string? e.g.
> request_module_cap(CAP_NET_ADMIN, "netdev", "%s", name) which would
> make requests for ("netdev-%s", name)
>
> I see a few options:
>
> 1) keep what you have for v4, and hope other places don't use
> __request_module. (I'm not a fan of this.)

Yes even if it is documented I wouldn't bet on it, though. :-)


> 2) switch the logic on autoload==1 from OR to AND: both the specified
> caps _and_ CAP_SYS_MODULE are required. (This seems like it might make
> autoload==1 less useful.)

That will restrict some userspace that works only with CAP_NET_ADMIN.


> 3) use the request_m

Re: [PATCH v5 0/3] watchdog: allow setting deadline for opening /dev/watchdogN

2017-05-24 Thread Esben Haabendal
On 22 May 2017 at 16:06, Rasmus Villemoes  wrote:
>
> Rasmus Villemoes (3):
>   watchdog: introduce watchdog_worker_should_ping helper
>   watchdog: introduce watchdog.open_timeout commandline parameter
>   watchdog: introduce CONFIG_WATCHDOG_OPEN_TIMEOUT
>
>  Documentation/watchdog/watchdog-parameters.txt | 10 +++
>  drivers/watchdog/Kconfig   |  9 +++
>  drivers/watchdog/watchdog_dev.c| 37 
> +++---
>  3 files changed, 52 insertions(+), 4 deletions(-)


Thanks. This is definitely useful for embedded applications.

If this get merged, it should be able to replace several out-of-tree
hacks to accomplish the same thing.

For all 3 patches:
Reviewed-by: Esben Haabendal 

/Esben
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: per-cgroup memory reclaim stats

2017-05-24 Thread Johannes Weiner
On Thu, May 11, 2017 at 08:16:23PM +0100, Roman Gushchin wrote:
> Track the following reclaim counters for every memory cgroup:
> PGREFILL, PGSCAN, PGSTEAL, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE and
> PGLAZYFREED.
> 
> These values are exposed using the memory.stats interface of cgroup v2.
> 
> The meaning of each value is the same as for global counters,
> available using /proc/vmstat.
> 
> Also, for consistency, rename mem_cgroup_count_vm_event() to
> count_memcg_event_mm().
> 
> Signed-off-by: Roman Gushchin 
> Suggested-by: Johannes Weiner 
> Cc: Johannes Weiner 
> Cc: Tejun Heo 
> Cc: Li Zefan 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Cc: cgro...@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux...@kvack.org

Acked-by: Johannes Weiner 

Andrew, if there aren't any other objections, could you pick this one
up please?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

2017-05-24 Thread Tejun Heo
Hello,

On Mon, May 22, 2017 at 12:56:08PM -0400, Waiman Long wrote:
> All controllers can use the special sub-directory if userland chooses to
> do so. The problem that I am trying to address in this patch is to allow
> more natural hierarchy that reflect a certain purpose, like the task
> classification done by systemd. Restricting tasks only to leaf nodes
> makes the hierarchy unnatural and probably difficult to manage.

I see but how is this different from userland just creating the leaf
cgroup?  I'm not sure what this actually enables in terms of what can
be achieved with cgroup.  I suppose we can argue that this is more
convenient but I'd like to keep the interface orthogonal as much as
reasonably possible.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

2017-05-24 Thread Tejun Heo
Hello, Mike.

On Sat, May 20, 2017 at 04:10:07AM +0200, Mike Galbraith wrote:
> On Fri, 2017-05-19 at 16:38 -0400, Tejun Heo wrote:
> > Hello, Waiman.
> > 
> > On Mon, May 15, 2017 at 09:34:11AM -0400, Waiman Long wrote:
> > > The rationale behind the cgroup v2 no internal process constraint is
> > > to avoid resouorce competition between internal processes and child
> > > cgroups. However, not all controllers have problem with internal
> > > process competiton. Enforcing this rule may lead to unnatural process
> > > hierarchy and unneeded levels for those controllers.
> > 
> > This isn't necessarily something we can determine by looking at the
> > current state of controllers.  It's true that some controllers - pid
> > and perf - inherently only care about membership of each task but at
> > the same time neither really suffers from the constraint either.  CPU
> > which is the problematic one here...
> 
> (+ cpuacct + cpuset)

Yeah, cpuacct and cpuset are in the same boat as perf.  cpuset is
completely so and we can move the tree walk to the reader side or
aggregate propagation for cpuacct as necessary.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2

2017-05-24 Thread Tejun Heo
Hello, Waiman.

On Fri, May 19, 2017 at 05:20:01PM -0400, Waiman Long wrote:
> > This breaks the invariant that in a cgroup its resource control knobs
> > control distribution of resources from its parent.  IOW, the resource
> > control knobs of a cgroup always belong to the parent.  This is also
> > reflected in how delegation is done.  The delegatee assumes ownership
> > of the cgroup itself and the ability to manage sub-cgroups but doesn't
> > get the ownership of the resource control knobs as otherwise the
> > parent would lose control over how it distributes its resources.
> 
> One twist that I am thinking is to have a controller enabled by the
> parent in subtree_control, but then allow the child to either disable it
> or set it in pass-through mode by writing to controllers file. IOW, a
> child cannot enable a controller without parent's permission. Once a
> child has permission, it can do whatever it wants. A parent cannot force
> a child to have a controller enabled.

Heh, I think I need more details to follow your proposal.  Anyways,
what we need to guarantee is that a descendant is never allowed to
pull in more resources than its ancestors want it to.

> > Another aspect is that most controllers aren't that sensitive to
> > nesting several levels.  Expensive operations can be and already are
> > aggregated and the performance overhead of several levels of nesting
> > barely shows up.  Skipping levels can be an interesting optimization
> > approach and we can definitely support from the core side; however,
> > it'd be a lot nicer if we could do that optimization transparently
> > (e.g. CPU can skip multi level queueing if there usually is only one
> > item at some levels).
> 
> The trend that I am seeing is that the total number of controllers is
> going to grow over time. New controllers may be sensitive to the level
> of nesting like the cpu controller. I am also thinking about how systemd
> is using the cgroup filesystem for task classification purpose without
> any controller attached to it. With this scheme, we can accommodate all
> the different needs without using different cgroup filesystems.

I'm not sure about that.  It's true that cgroup hierarchy is being
used more but there are only so many hard / complex resources that we
deal with - cpu, memory and io.  Beyond those, other uses are usually
about identifying membership (perf, net) or propagating and
restricting attributes (cpuset).  pids can be considered an exception
but we have it only because pids can globally run out a lot sooner
than can be controlled through memory.  Even then, it's trivial.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/31] gcc-plugins.txt: standardize document format

2017-05-24 Thread Kees Cook
On Thu, May 18, 2017 at 6:22 PM, Mauro Carvalho Chehab
 wrote:
> Each text file under Documentation follows a different
> format. Some doesn't even have titles!
>
> Change its representation to follow the adopted standard,
> using ReST markups for it to be parseable by Sphinx:
>
> - promote main title;
> - use the right markup for footnotes;
> - use bold markup for files name;
> - identify literal blocks;
> - add blank lines to avoid Sphinx to complain;
> - remove numeration from titles.
>
> Signed-off-by: Mauro Carvalho Chehab 

Acked-by: Kees Cook 

This should probably get moved under "Kernel API documentation" but
may need a new sub-category, maybe "instrumentation"? Things like
KASan could be put under that too.

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2

2017-05-24 Thread Waiman Long
On 05/24/2017 01:31 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Fri, May 19, 2017 at 05:20:01PM -0400, Waiman Long wrote:
>>> This breaks the invariant that in a cgroup its resource control knobs
>>> control distribution of resources from its parent.  IOW, the resource
>>> control knobs of a cgroup always belong to the parent.  This is also
>>> reflected in how delegation is done.  The delegatee assumes ownership
>>> of the cgroup itself and the ability to manage sub-cgroups but doesn't
>>> get the ownership of the resource control knobs as otherwise the
>>> parent would lose control over how it distributes its resources.
>> One twist that I am thinking is to have a controller enabled by the
>> parent in subtree_control, but then allow the child to either disable it
>> or set it in pass-through mode by writing to controllers file. IOW, a
>> child cannot enable a controller without parent's permission. Once a
>> child has permission, it can do whatever it wants. A parent cannot force
>> a child to have a controller enabled.
> Heh, I think I need more details to follow your proposal.  Anyways,
> what we need to guarantee is that a descendant is never allowed to
> pull in more resources than its ancestors want it to.

What I am saying is as follows:
/ A
P - B
   \ C

# echo +memory > P/cgroups.subtree_control
# echo -memory > P/A/cgroup.controllers
# echo "#memory" > P/B/cgroup.controllers

The parent grants the memory controller to its children - A, B and C.
Child A has the memory controller explicitly disabled. Child B has the
memory controller in pass-through mode, while child C has the memory
controller enabled by default. "echo +memory > cgroup.controllers" is
not allowed. There are 2 possible choices with regard to the '-' or '#'
prefixes. We can allow them before the grant from the parent or only
after that. In the former case, the state remains dormant until after
the grant from the parent.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2

2017-05-24 Thread Tejun Heo
Hello,

On Wed, May 24, 2017 at 01:49:46PM -0400, Waiman Long wrote:
> What I am saying is as follows:
> / A
> P - B
>\ C
> 
> # echo +memory > P/cgroups.subtree_control
> # echo -memory > P/A/cgroup.controllers
> # echo "#memory" > P/B/cgroup.controllers
> 
> The parent grants the memory controller to its children - A, B and C.
> Child A has the memory controller explicitly disabled. Child B has the
> memory controller in pass-through mode, while child C has the memory
> controller enabled by default. "echo +memory > cgroup.controllers" is
> not allowed. There are 2 possible choices with regard to the '-' or '#'
> prefixes. We can allow them before the grant from the parent or only
> after that. In the former case, the state remains dormant until after
> the grant from the parent.

Ah, I see, you want cgroup.controllers to be able to mask available
controllers by the parent.  Can you expand your example with further
nesting and how #memory on cgroup.controllers would affect the nested
descendant?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-24 Thread Djalal Harouni
On Tue, May 23, 2017 at 9:48 AM, Solar Designer  wrote:
>> >>> On Mon, May 22, 2017 at 2:08 PM, Solar Designer  
>> >>> wrote:
>> >>> > On Mon, May 22, 2017 at 01:57:03PM +0200, Djalal Harouni wrote:
>> >>> >> *) When modules_autoload_mode is set to (2), automatic module loading 
>> >>> >> is
>> >>> >> disabled for all. Once set, this value can not be changed.
>> >>> >
>> >>> > What purpose does this securelevel-like property ("Once set, this value
>> >>> > can not be changed.") serve here?  I think this mode 2 is needed, but
>> >>> > without this extra property, which is bypassable by e.g. explicitly
>> >>> > loaded kernel modules anyway (and that's OK).
>
> On Mon, May 22, 2017 at 04:07:56PM -0700, Kees Cook wrote:
>> I'm on the fence. For modules_disabled and Yama, it was tied to
>> CAP_SYS_ADMIN, basically designed to be a at-boot setting that could
>> not later be undone by an attacker gaining that privilege, keeping
>> them out of either kernel memory or existing user process memory.
>> Here, it's CAP_SYS_MODULE... it's hard to imagine the situation where
>> a CAP_SYS_MODULE-capable process could write to this sysctl but NOT
>> issue direct modprobe requests, but it's _possible_ via crazy symlink
>> games to trick capable processes into writing to sysctls. We've seen
>> this multiple times before, and it's a way for attackers to turn a
>> single privileged write into a privileged exec.
>
> OK, tricking a process via crazy symlink games is finally a potentially
> valid reason.  The question then becomes: are there perhaps so many
> other important sysctl's, disk files, etc. (which the vulnerable capable
> process could similarly be tricked into writing) so that specifically
> resetting modules_autoload_mode isn't particularly lucrative?  I think
> that the answer to that is usually yes.  Another related question: do we
> really want to inconsistently single out a handful of sysctl's for this
> kind of extra protection?  I think not.
>
> I agree there are some other settings where being unable to reset them
> makes sense, but I think this isn't one of those.
>

Alright, I already replied to Andy, since it was requested I will drop
it. I definitely prefer that we have something merged and usable ;-)


>> I might turn the question around, though: why would we want to have it
>> changeable at this setting?
>
> Convenience for the sysadmin - being able to correct one's error (e.g.,
> wrong order of shell commands), respond to new findings (thought module
> autoloading was unneeded after some point, then found out some software
> relies on it), change one's mind, reuse a system differently than
> originally intended without a forced reboot.
>
>> I'm fine leaving that piece off, either way.
>
> I'm also fine with either decision.  I just thought I'd point out what
> looked weird to me.
>
> I think this is an important patch that should get in, but primarily
> for modules_autoload_mode=1, which many distros could make the default
> (and maybe the kernel eventually should?)

I do not think that desktop, or interactive systems will set it.
Ubuntu has snaps for their app store, and they can use the per-task
flag and other vendors too Flatpak, etc.


> For modules_autoload_mode=2, we already seem to have the equivalent of
> modprobe=/bin/true (or does it differ subtly, maybe in return values?),
> which I already use at startup on a GPU box like this (preloading
> modules so that the OpenCL backends wouldn't need the autoloading):
>
> nvidia-smi
> nvidia-modprobe -u -c=0
> #modprobe nvidia_uvm
> #modprobe fglrx
>
> sysctl -w kernel.modprobe=/bin/true
> sysctl -w kernel.hotplug=/bin/true
>
> but it's good to also have this supported more explicitly and more
> consistently through modules_autoload_mode=2 while we're at it.  So I
> support having this mode as well.  I just question the need to have it
> non-resettable.
>

Ok, yes with mode=2 it is clear interface, it logs what was denied, it
is consistent with the per-task flag that we want for sandboxes and
desktop, it will work with CONFIG_STATIC_USERMODEHELPER, more
importantly it will avoid doing the usermod upcall, even if one day
the kernel support upcall into namespaces, I'm pretty sure that no one
will like the idea of namespaced modprobe paths, modules are global
anyway, this allows to say we are safe by default from any future
change that may make an upcall into the wrong context, we just avoid
that.


-- 
tixxdz
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


make pdfdoc errors with 4.12-rc2

2017-05-24 Thread Jim Davis
make pdfdocs, on my Fedora 25 desktop with the latest patches, fails with

Running Sphinx v1.5.2

[...lots-o-stuff...]

make PDFLATEX=xelatex LATEXOPTS="-interaction=batchmode" -C
Documentation/output/./latex || exit;
xelatex -interaction=batchmode 'linux-input.tex'
This is XeTeX, Version 3.14159265-2.6-0.6 (TeX Live 2016)
(preloaded format=xelatex)
 restricted \write18 enabled.
entering extended mode
Makefile:67: recipe for target 'linux-input.pdf' failed
make[2]: *** [linux-input.pdf] Error 1
Documentation/Makefile.sphinx:87: recipe for target 'pdfdocs' failed
make[1]: *** [pdfdocs] Error 2
Makefile:1472: recipe for target 'pdfdocs' failed
make: *** [pdfdocs] Error 2


-- 
Jim
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2

2017-05-24 Thread Waiman Long
On 05/24/2017 01:56 PM, Tejun Heo wrote:
> Hello,
>
> On Wed, May 24, 2017 at 01:49:46PM -0400, Waiman Long wrote:
>> What I am saying is as follows:
>> / A
>> P - B
>>\ C
>>
>> # echo +memory > P/cgroups.subtree_control
>> # echo -memory > P/A/cgroup.controllers
>> # echo "#memory" > P/B/cgroup.controllers
>>
>> The parent grants the memory controller to its children - A, B and C.
>> Child A has the memory controller explicitly disabled. Child B has the
>> memory controller in pass-through mode, while child C has the memory
>> controller enabled by default. "echo +memory > cgroup.controllers" is
>> not allowed. There are 2 possible choices with regard to the '-' or '#'
>> prefixes. We can allow them before the grant from the parent or only
>> after that. In the former case, the state remains dormant until after
>> the grant from the parent.
> Ah, I see, you want cgroup.controllers to be able to mask available
> controllers by the parent.  Can you expand your example with further
> nesting and how #memory on cgroup.controllers would affect the nested
> descendant?
>
> Thanks.
>
I would allow enabling the controller in subtree_control if granted from
the parent and not explicitly disabled. IOW, both B and C can "echo
+memory" to their subtree_control to grant memory controller to their
children, but not A. A has to re-enable memory controller or set it to
pass-through mode before it can enable it in subtree_control. I need to
clarify that "echo +memory > cgroup.controllers" is allowed to re-enable
it, but not without the granting from its parent.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

2017-05-24 Thread Waiman Long
On 05/24/2017 01:05 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, May 22, 2017 at 12:56:08PM -0400, Waiman Long wrote:
>> All controllers can use the special sub-directory if userland chooses to
>> do so. The problem that I am trying to address in this patch is to allow
>> more natural hierarchy that reflect a certain purpose, like the task
>> classification done by systemd. Restricting tasks only to leaf nodes
>> makes the hierarchy unnatural and probably difficult to manage.
> I see but how is this different from userland just creating the leaf
> cgroup?  I'm not sure what this actually enables in terms of what can
> be achieved with cgroup.  I suppose we can argue that this is more
> convenient but I'd like to keep the interface orthogonal as much as
> reasonably possible.
>
> Thanks.
>
I am just thinking that it is a bit more natural with the concept of the
special resource domain sub-directory. You are right that the same
effect can be achieved by proper placement of tasks and enabling of
controllers.

A (cpu,memory) [T1] - B(cpu,memory) [T2]
  \ cgroups.resource_domain (memory)

A (cpu,memory)  - B(cpu,memory) [T2]
\ C (memory) [T1]

With respect to the tasks T1 and T2, the above 2 configurations are the
same.

I am OK to drop this patch. However, I still think the current
no-internal process constraint is too restricting. I will suggest either

 1. Allow internal processes and document the way to avoid internal
process competition as shown above from the userland, or
 2. Mark only certain controllers as not allowing internal processes
when they are enabled.

What do you think about this?

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/31] gcc-plugins.txt: standardize document format

2017-05-24 Thread Mauro Carvalho Chehab
Em Wed, 24 May 2017 10:35:42 -0700
Kees Cook  escreveu:

> On Thu, May 18, 2017 at 6:22 PM, Mauro Carvalho Chehab
>  wrote:
> > Each text file under Documentation follows a different
> > format. Some doesn't even have titles!
> >
> > Change its representation to follow the adopted standard,
> > using ReST markups for it to be parseable by Sphinx:
> >
> > - promote main title;
> > - use the right markup for footnotes;
> > - use bold markup for files name;
> > - identify literal blocks;
> > - add blank lines to avoid Sphinx to complain;
> > - remove numeration from titles.
> >
> > Signed-off-by: Mauro Carvalho Chehab   
> 
> Acked-by: Kees Cook 
> 
> This should probably get moved under "Kernel API documentation" but
> may need a new sub-category, maybe "instrumentation"? Things like
> KASan could be put under that too.

Yeah, I guess that most documents under Documentation/
will need to be renamed and placed into an existing or new book.

Kasan documentation is currently under dev-tools, with is, currently,
an unsorted book with:

   coccinelle
   sparse
   kcov
   gcov
   kasan
   ubsan
   kmemleak
   kmemcheck
   gdb-kernel-debugging
   kgdb

I agree with you: it probably makes sense to split external development
tools, like coccinelle/sparse from Kernel instrumentation, like kgdb,
kasan, gcc-plugins, etc.

So, perhaps we can change the content of Documentation/dev-tools/index.rst
to something like.



Development tools for the kernel


This document describe tools and instrumentation features of the Linux Kernel
used by developers to do quality assurance (QA).

This section describes Kernel internal features designed to provide mechanisms
for developers to test their code.

.. toctree::
   :maxdepth: 2

   (add here books like kasan, printk related docs, gcc-plugins, etc)

This section describes external tools used to ensure Kernel quality
assurance (QA).

.. toctree::
   :maxdepth: 2

   (add here books related external tools and robots, like coccinelle,
sparse, kernel build robot, ktest, Coverity, etc)

.. only::  subproject and html

   Indices
   ===

   * :ref:`genindex`



Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-24 Thread Tejun Heo
Hello, Waiman.

On Mon, May 22, 2017 at 01:13:16PM -0400, Waiman Long wrote:
> > Maybe I'm misunderstanding the design, but this seems to push the
> > processes which belong to the threaded subtree to the parent which is
> > part of the usual resource domain hierarchy thus breaking the no
> > internal competition constraint.  I'm not sure this is something we'd
> > want.  Given that the limitation of the original threaded mode was the
> > required nesting below root and that we treat root special anyway
> > (exactly in the way necessary), I wonder whether it'd be better to
> > simply allow root to be both domain and thread root.
> 
> Yes, root can be both domain and thread root. I haven't placed any
> restriction on that.

I've been playing with the proposed "make the parent resource domain".
Unfortunately, the parent - child relationship becomes weird.

The parent becomes the thread root, which means that its
cgroup.threads file becomes writable and threads can be put in there.
It's really weird to write to a child's interface and have the
parent's behavior changed.  This becomes weirder with delegation.  If
a cgroup is delegated, its cgroup.threads should be delegated too but
if the child enables threaded mode, that makes the undelegated parent
thread root, which means that either 1. the delegatee can't migrate
threads to the thread root or 2. if the parent's cgroup.threads is
writeable, the delegatee can mass with other descendants under it
which shouldn't be allowed.

I think the operation of making a cgroup a thread root should happen
on the cgroup where that's requested; otherwise, nesting becomes too
twisted.  This should be solvable.  Will think more about it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-24 Thread Waiman Long
On 05/24/2017 04:36 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, May 22, 2017 at 01:13:16PM -0400, Waiman Long wrote:
>>> Maybe I'm misunderstanding the design, but this seems to push the
>>> processes which belong to the threaded subtree to the parent which is
>>> part of the usual resource domain hierarchy thus breaking the no
>>> internal competition constraint.  I'm not sure this is something we'd
>>> want.  Given that the limitation of the original threaded mode was the
>>> required nesting below root and that we treat root special anyway
>>> (exactly in the way necessary), I wonder whether it'd be better to
>>> simply allow root to be both domain and thread root.
>> Yes, root can be both domain and thread root. I haven't placed any
>> restriction on that.
> I've been playing with the proposed "make the parent resource domain".
> Unfortunately, the parent - child relationship becomes weird.
>
> The parent becomes the thread root, which means that its
> cgroup.threads file becomes writable and threads can be put in there.
> It's really weird to write to a child's interface and have the
> parent's behavior changed.  This becomes weirder with delegation.  If
> a cgroup is delegated, its cgroup.threads should be delegated too but
> if the child enables threaded mode, that makes the undelegated parent
> thread root, which means that either 1. the delegatee can't migrate
> threads to the thread root or 2. if the parent's cgroup.threads is
> writeable, the delegatee can mass with other descendants under it
> which shouldn't be allowed.
>
> I think the operation of making a cgroup a thread root should happen
> on the cgroup where that's requested; otherwise, nesting becomes too
> twisted.  This should be solvable.  Will think more about it.
>
> Thanks.
>
An alternative is to have separate enabling for thread root. For example,

# echo root > cgroup.threads
# echo enable > child/cgroup.threads

The first statement make the current cgroup the thread root. However,
setting it to a thread root doesn't make its child to be threaded. This
have to be explicitly done on each of the children. Once a child cgroup
is made to be threaded, all its descendants will be threaded. That will
have the same effect as the current patch.

With delegation, do you mean the relationship between a container and
its host?

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-24 Thread Tejun Heo
Hello,

On Wed, May 24, 2017 at 05:17:13PM -0400, Waiman Long wrote:
> An alternative is to have separate enabling for thread root. For example,
> 
> # echo root > cgroup.threads
> # echo enable > child/cgroup.threads
> 
> The first statement make the current cgroup the thread root. However,
> setting it to a thread root doesn't make its child to be threaded. This
> have to be explicitly done on each of the children. Once a child cgroup
> is made to be threaded, all its descendants will be threaded. That will
> have the same effect as the current patch.

Yeah, I'm toying with different ideas.  I'll get back to you once
things get more concrete.

> With delegation, do you mean the relationship between a container and
> its host?

It can be but doesn't have to be.  For example, it can be delegations
to users without namespace / container being involved.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kmod: preempt on kmod_umh_threads_get()

2017-05-24 Thread Luis R. Rodriguez
On Fri, May 19, 2017 at 03:27:12PM -0700, Dmitry Torokhov wrote:
> On Thu, May 18, 2017 at 08:24:43PM -0700, Luis R. Rodriguez wrote:
> > In theory it is possible multiple concurrent threads will try to
> > kmod_umh_threads_get() and as such atomic_inc(&kmod_concurrent) at
> > the same time, therefore enabling a small time during which we've
> > bumped kmod_concurrent but have not really enabled work. By using
> > preemption we mitigate this a bit.
> > 
> > Preemption is not needed when we kmod_umh_threads_put().
> > 
> > Signed-off-by: Luis R. Rodriguez 
> > ---
> >  kernel/kmod.c | 24 ++--
> >  1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 563600fc9bb1..7ea11dbc7564 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -113,15 +113,35 @@ static int call_modprobe(char *module_name, int wait)
> >  
> >  static int kmod_umh_threads_get(void)
> >  {
> > +   int ret = 0;
> > +
> > +   /*
> > +* Disabling preemption makes sure that we are not rescheduled here
> > +*
> > +* Also preemption helps kmod_concurrent is not increased by mistake
> > +* for too long given in theory two concurrent threads could race on
> > +* atomic_inc() before we atomic_read() -- we know that's possible
> > +* and but we don't care, this is not used for object accounting and
> > +* is just a subjective threshold. The alternative is a lock.
> > +*/
> > +   preempt_disable();
> > atomic_inc(&kmod_concurrent);
> > if (atomic_read(&kmod_concurrent) <= max_modprobes)
> 
> That is very "fancy" way to basically say:
> 
>   if (atomic_inc_return(&kmod_concurrent) <= max_modprobes)

Do you mean to combine the atomic_inc() and atomic_read() in one as you noted
(as that is not a change in this patch), *or* that using a memory barrier here
with atomic_inc_return() should suffice to address the same and avoid an
explicit preemption  enable / disable ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kmod: preempt on kmod_umh_threads_get()

2017-05-24 Thread Dmitry Torokhov
On Thu, May 25, 2017 at 02:14:52AM +0200, Luis R. Rodriguez wrote:
> On Fri, May 19, 2017 at 03:27:12PM -0700, Dmitry Torokhov wrote:
> > On Thu, May 18, 2017 at 08:24:43PM -0700, Luis R. Rodriguez wrote:
> > > In theory it is possible multiple concurrent threads will try to
> > > kmod_umh_threads_get() and as such atomic_inc(&kmod_concurrent) at
> > > the same time, therefore enabling a small time during which we've
> > > bumped kmod_concurrent but have not really enabled work. By using
> > > preemption we mitigate this a bit.
> > > 
> > > Preemption is not needed when we kmod_umh_threads_put().
> > > 
> > > Signed-off-by: Luis R. Rodriguez 
> > > ---
> > >  kernel/kmod.c | 24 ++--
> > >  1 file changed, 22 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > index 563600fc9bb1..7ea11dbc7564 100644
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -113,15 +113,35 @@ static int call_modprobe(char *module_name, int 
> > > wait)
> > >  
> > >  static int kmod_umh_threads_get(void)
> > >  {
> > > + int ret = 0;
> > > +
> > > + /*
> > > +  * Disabling preemption makes sure that we are not rescheduled here
> > > +  *
> > > +  * Also preemption helps kmod_concurrent is not increased by mistake
> > > +  * for too long given in theory two concurrent threads could race on
> > > +  * atomic_inc() before we atomic_read() -- we know that's possible
> > > +  * and but we don't care, this is not used for object accounting and
> > > +  * is just a subjective threshold. The alternative is a lock.
> > > +  */
> > > + preempt_disable();
> > >   atomic_inc(&kmod_concurrent);
> > >   if (atomic_read(&kmod_concurrent) <= max_modprobes)
> > 
> > That is very "fancy" way to basically say:
> > 
> > if (atomic_inc_return(&kmod_concurrent) <= max_modprobes)
> 
> Do you mean to combine the atomic_inc() and atomic_read() in one as you noted
> (as that is not a change in this patch), *or* that using a memory barrier here
> with atomic_inc_return() should suffice to address the same and avoid an
> explicit preemption  enable / disable ?

I am saying that atomic_inc_return() will avoid situation where you have
more than one threads incrementing the counter and believing that they
are [not] allowed to start modprobe.

I have no idea why you think preempt_disable() would help here. It only
ensures that current thread will not be preempted between the point
where you update the counter and where you check the result. It does not
stop interrupts nor does it affect other threads that might be updating
the same counter.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/3] watchdog: introduce watchdog.open_timeout commandline parameter

2017-05-24 Thread Guenter Roeck

On 05/22/2017 07:06 AM, Rasmus Villemoes wrote:

The watchdog framework takes care of feeding a hardware watchdog until
userspace opens /dev/watchdogN. If that never happens for some reason
(buggy init script, corrupt root filesystem or whatnot) but the kernel
itself is fine, the machine stays up indefinitely. This patch allows
setting an upper limit for how long the kernel will take care of the
watchdog, thus ensuring that the watchdog will eventually reset the
machine.

This is particularly useful for embedded devices where some fallback
logic is implemented in the bootloader (e.g., use a different root
partition, boot from network, ...).

The open timeout is also used as a maximum time for an application to
re-open /dev/watchdogN after closing it.

A value of 0 (the default) means infinite timeout, preserving the
current behaviour.

Signed-off-by: Rasmus Villemoes 
---
  Documentation/watchdog/watchdog-parameters.txt |  9 +
  drivers/watchdog/watchdog_dev.c| 26 +-
  2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/watchdog/watchdog-parameters.txt 
b/Documentation/watchdog/watchdog-parameters.txt
index 914518a..4801ec6 100644
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ b/Documentation/watchdog/watchdog-parameters.txt
@@ -8,6 +8,15 @@ See Documentation/admin-guide/kernel-parameters.rst for 
information on
  providing kernel parameters for builtin drivers versus loadable
  modules.
  
+The watchdog core currently understands one parameter,


We have a second parameter queued, handle_boot_enabled. Please rephrase
so we don't have to change the text with each new parameter.

Thanks,
Guenter


+watchdog.open_timeout. This is the maximum time, in milliseconds, for
+which the watchdog framework will take care of pinging a hardware
+watchdog until userspace opens the corresponding /dev/watchdogN
+device. A value of 0 (the default) means an infinite timeout. Setting
+this to a non-zero value can be useful to ensure that either userspace
+comes up properly, or the board gets reset and allows fallback logic
+in the bootloader to try something else.
+
  
  -

  acquirewdt:
diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index caa4b90..c807067 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -66,6 +66,7 @@ struct watchdog_core_data {
struct mutex lock;
unsigned long last_keepalive;
unsigned long last_hw_keepalive;
+   unsigned long open_deadline;
struct delayed_work work;
unsigned long status;   /* Internal status bits */
  #define _WDOG_DEV_OPEN0   /* Opened ? */
@@ -80,6 +81,21 @@ static struct watchdog_core_data *old_wd_data;
  
  static struct workqueue_struct *watchdog_wq;
  
+static unsigned open_timeout;

+module_param(open_timeout, uint, 0644);
+
+static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
+{
+   if (!open_timeout)
+   return false;
+   return time_is_before_jiffies(data->open_deadline);
+}
+
+static void watchdog_set_open_deadline(struct watchdog_core_data *data)
+{
+   data->open_deadline = jiffies + msecs_to_jiffies(open_timeout);
+}
+
  static inline bool watchdog_need_worker(struct watchdog_device *wdd)
  {
/* All variables in milli-seconds */
@@ -196,7 +212,13 @@ static bool watchdog_worker_should_ping(struct 
watchdog_core_data *wd_data)
  {
struct watchdog_device *wdd = wd_data->wdd;
  
-	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));

+   if (!wdd)
+   return false;
+
+   if (watchdog_active(wdd))
+   return true;
+
+   return watchdog_hw_running(wdd) && 
!watchdog_past_open_deadline(wd_data);
  }
  
  static void watchdog_ping_work(struct work_struct *work)

@@ -857,6 +879,7 @@ static int watchdog_release(struct inode *inode, struct 
file *file)
watchdog_ping(wdd);
}
  
+	watchdog_set_open_deadline(wd_data);

watchdog_update_worker(wdd);
  
  	/* make sure that /dev/watchdog can be re-opened */

@@ -955,6 +978,7 @@ static int watchdog_cdev_register(struct watchdog_device 
*wdd, dev_t devno)
  
  	/* Record time of most recent heartbeat as 'just before now'. */

wd_data->last_hw_keepalive = jiffies - 1;
+   watchdog_set_open_deadline(wd_data);
  
  	/*

 * If the watchdog is running, prevent its driver from being unloaded,



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kmod: preempt on kmod_umh_threads_get()

2017-05-24 Thread Luis R. Rodriguez
On Wed, May 24, 2017 at 05:45:37PM -0700, Dmitry Torokhov wrote:
> On Thu, May 25, 2017 at 02:14:52AM +0200, Luis R. Rodriguez wrote:
> > On Fri, May 19, 2017 at 03:27:12PM -0700, Dmitry Torokhov wrote:
> > > On Thu, May 18, 2017 at 08:24:43PM -0700, Luis R. Rodriguez wrote:
> > > > In theory it is possible multiple concurrent threads will try to
> > > > kmod_umh_threads_get() and as such atomic_inc(&kmod_concurrent) at
> > > > the same time, therefore enabling a small time during which we've
> > > > bumped kmod_concurrent but have not really enabled work. By using
> > > > preemption we mitigate this a bit.
> > > > 
> > > > Preemption is not needed when we kmod_umh_threads_put().
> > > > 
> > > > Signed-off-by: Luis R. Rodriguez 
> > > > ---
> > > >  kernel/kmod.c | 24 ++--
> > > >  1 file changed, 22 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > index 563600fc9bb1..7ea11dbc7564 100644
> > > > --- a/kernel/kmod.c
> > > > +++ b/kernel/kmod.c
> > > > @@ -113,15 +113,35 @@ static int call_modprobe(char *module_name, int 
> > > > wait)
> > > >  
> > > >  static int kmod_umh_threads_get(void)
> > > >  {
> > > > +   int ret = 0;
> > > > +
> > > > +   /*
> > > > +* Disabling preemption makes sure that we are not rescheduled 
> > > > here
> > > > +*
> > > > +* Also preemption helps kmod_concurrent is not increased by 
> > > > mistake
> > > > +* for too long given in theory two concurrent threads could 
> > > > race on
> > > > +* atomic_inc() before we atomic_read() -- we know that's 
> > > > possible
> > > > +* and but we don't care, this is not used for object 
> > > > accounting and
> > > > +* is just a subjective threshold. The alternative is a lock.
> > > > +*/
> > > > +   preempt_disable();
> > > > atomic_inc(&kmod_concurrent);
> > > > if (atomic_read(&kmod_concurrent) <= max_modprobes)
> > > 
> > > That is very "fancy" way to basically say:
> > > 
> > >   if (atomic_inc_return(&kmod_concurrent) <= max_modprobes)
> > 
> > Do you mean to combine the atomic_inc() and atomic_read() in one as you 
> > noted
> > (as that is not a change in this patch), *or* that using a memory barrier 
> > here
> > with atomic_inc_return() should suffice to address the same and avoid an
> > explicit preemption  enable / disable ?
> 
> I am saying that atomic_inc_return() will avoid situation where you have
> more than one threads incrementing the counter and believing that they
> are [not] allowed to start modprobe.
> 
> I have no idea why you think preempt_disable() would help here. It only
> ensures that current thread will not be preempted between the point
> where you update the counter and where you check the result. It does not
> stop interrupts nor does it affect other threads that might be updating
> the same counter.

The preemption was inspired by __module_get() and try_module_get(), was that
rather silly ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/29] sync_file.txt: standardize document format

2017-05-24 Thread Gustavo Padovan
Hi Mauro,

2017-05-18 Mauro Carvalho Chehab :

> Each text file under Documentation follows a different
> format. Some doesn't even have titles!
> 
> Change its representation to follow the adopted standard,
> using ReST markups for it to be parseable by Sphinx:
> - Use markup for document title and authorship;
> - Mark literal blocks;
> - Use a numbered list for references.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/sync_file.txt | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)

We went ahead and applied this to drm-misc-next. Thanks.

Gustavo

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] kmod: preempt on kmod_umh_threads_get()

2017-05-24 Thread Dmitry Torokhov
On Thu, May 25, 2017 at 03:00:17AM +0200, Luis R. Rodriguez wrote:
> On Wed, May 24, 2017 at 05:45:37PM -0700, Dmitry Torokhov wrote:
> > On Thu, May 25, 2017 at 02:14:52AM +0200, Luis R. Rodriguez wrote:
> > > On Fri, May 19, 2017 at 03:27:12PM -0700, Dmitry Torokhov wrote:
> > > > On Thu, May 18, 2017 at 08:24:43PM -0700, Luis R. Rodriguez wrote:
> > > > > In theory it is possible multiple concurrent threads will try to
> > > > > kmod_umh_threads_get() and as such atomic_inc(&kmod_concurrent) at
> > > > > the same time, therefore enabling a small time during which we've
> > > > > bumped kmod_concurrent but have not really enabled work. By using
> > > > > preemption we mitigate this a bit.
> > > > > 
> > > > > Preemption is not needed when we kmod_umh_threads_put().
> > > > > 
> > > > > Signed-off-by: Luis R. Rodriguez 
> > > > > ---
> > > > >  kernel/kmod.c | 24 ++--
> > > > >  1 file changed, 22 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > > index 563600fc9bb1..7ea11dbc7564 100644
> > > > > --- a/kernel/kmod.c
> > > > > +++ b/kernel/kmod.c
> > > > > @@ -113,15 +113,35 @@ static int call_modprobe(char *module_name, int 
> > > > > wait)
> > > > >  
> > > > >  static int kmod_umh_threads_get(void)
> > > > >  {
> > > > > + int ret = 0;
> > > > > +
> > > > > + /*
> > > > > +  * Disabling preemption makes sure that we are not rescheduled 
> > > > > here
> > > > > +  *
> > > > > +  * Also preemption helps kmod_concurrent is not increased by 
> > > > > mistake
> > > > > +  * for too long given in theory two concurrent threads could 
> > > > > race on
> > > > > +  * atomic_inc() before we atomic_read() -- we know that's 
> > > > > possible
> > > > > +  * and but we don't care, this is not used for object 
> > > > > accounting and
> > > > > +  * is just a subjective threshold. The alternative is a lock.
> > > > > +  */
> > > > > + preempt_disable();
> > > > >   atomic_inc(&kmod_concurrent);
> > > > >   if (atomic_read(&kmod_concurrent) <= max_modprobes)
> > > > 
> > > > That is very "fancy" way to basically say:
> > > > 
> > > > if (atomic_inc_return(&kmod_concurrent) <= max_modprobes)
> > > 
> > > Do you mean to combine the atomic_inc() and atomic_read() in one as you 
> > > noted
> > > (as that is not a change in this patch), *or* that using a memory barrier 
> > > here
> > > with atomic_inc_return() should suffice to address the same and avoid an
> > > explicit preemption  enable / disable ?
> > 
> > I am saying that atomic_inc_return() will avoid situation where you have
> > more than one threads incrementing the counter and believing that they
> > are [not] allowed to start modprobe.
> > 
> > I have no idea why you think preempt_disable() would help here. It only
> > ensures that current thread will not be preempted between the point
> > where you update the counter and where you check the result. It does not
> > stop interrupts nor does it affect other threads that might be updating
> > the same counter.
> 
> The preemption was inspired by __module_get() and try_module_get(), was that
> rather silly ?

As far as I can see prrempt_disable() was needed in __module_get() when
modules user per-cpu refcounts: you did not want to move away from CPU
while manipulating refcount.

Now that modules use simple atomics for refcounting I think these
preempt_disable() and preempt_enable() can be removed.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] rtmutex: update rt-mutex-design

2017-05-24 Thread Alex Shi
The rt-mutex-design documents didn't gotten meaningful update from its
first version. Even after owner's pending bit was removed in commit 8161239a8bcc
("rtmutex: Simplify PI algorithm and make highest prio task get lock")
and priority list 'plist' changed to rbtree. And Peter Zijlstra did some
clean up and fix for deadline task changes on tip tree.

So update it to latest code and make it meaningful.

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Sebastian Siewior 
Cc: Mathieu Poirier 
Cc: Juri Lelli 
Cc: Thomas Gleixner 
To: linux-doc@vger.kernel.org
To: linux-ker...@vger.kernel.org
To: Jonathan Corbet 
To: Ingo Molnar 
To: Peter Zijlstra 
---
 Documentation/locking/rt-mutex-design.txt | 418 +++---
 1 file changed, 97 insertions(+), 321 deletions(-)

diff --git a/Documentation/locking/rt-mutex-design.txt 
b/Documentation/locking/rt-mutex-design.txt
index 8666070..1a0da32 100644
--- a/Documentation/locking/rt-mutex-design.txt
+++ b/Documentation/locking/rt-mutex-design.txt
@@ -97,9 +97,9 @@ waiter   - A waiter is a struct that is stored on the stack 
of a blocked
a process being blocked on the mutex, it is fine to allocate
the waiter on the process's stack (local variable).  This
structure holds a pointer to the task, as well as the mutex that
-   the task is blocked on.  It also has the plist node structures to
-   place the task in the waiter_list of a mutex as well as the
-   pi_list of a mutex owner task (described below).
+   the task is blocked on.  It also has a rbtree node structures to
+   place the task in waiters rbtree of a mutex as well as the
+   pi_waiters rbtree of a mutex owner task (described below).
 
waiter is sometimes used in reference to the task that is waiting
on a mutex. This is the same as waiter->task.
@@ -179,53 +179,35 @@ again.
  |
F->L5-+
 
-
-Plist
--
-
-Before I go further and talk about how the PI chain is stored through lists
-on both mutexes and processes, I'll explain the plist.  This is similar to
-the struct list_head functionality that is already in the kernel.
-The implementation of plist is out of scope for this document, but it is
-very important to understand what it does.
-
-There are a few differences between plist and list, the most important one
-being that plist is a priority sorted linked list.  This means that the
-priorities of the plist are sorted, such that it takes O(1) to retrieve the
-highest priority item in the list.  Obviously this is useful to store processes
-based on their priorities.
-
-Another difference, which is important for implementation, is that, unlike
-list, the head of the list is a different element than the nodes of a list.
-So the head of the list is declared as struct plist_head and nodes that will
-be added to the list are declared as struct plist_node.
-
+If process G has the highest priority in the chain, then all the tasks up
+the chain (A and B in this example), must have their priorities increased
+to that of G.
 
 Mutex Waiter List
 -
 
 Every mutex keeps track of all the waiters that are blocked on itself. The 
mutex
-has a plist to store these waiters by priority.  This list is protected by
+has a rbtree to store these waiters by priority.  This tree is protected by
 a spin lock that is located in the struct of the mutex. This lock is called
-wait_lock.  Since the modification of the waiter list is never done in
+wait_lock.  Since the modification of the waiter tree is never done in
 interrupt context, the wait_lock can be taken without disabling interrupts.
 
 
-Task PI List
+Task PI Tree
 
 
-To keep track of the PI chains, each process has its own PI list.  This is
-a list of all top waiters of the mutexes that are owned by the process.
-Note that this list only holds the top waiters and not all waiters that are
+To keep track of the PI chains, each process has its own PI rbtree.  This is
+a tree of all top waiters of the mutexes that are owned by the process.
+Note that this tree only holds the top waiters and not all waiters that are
 blocked on mutexes owned by the process.
 
-The top of the task's PI list is always the highest priority task that
+The top of the task's PI tree is always the highest priority task that
 is waiting on a mutex that is owned by the task.  So if the task has
 inherited a priority, it will always be the priority of the task that is
-at the top of this list.
+at the top of this tree.
 
-This list is stored in the task structure of a process as a plist called
-pi_list.  This list is protected by a spin lock also in the task structure,
+This tree is stored in the task structure of a process as a rbtree called
+pi_waiters.  It is protected by a spin lock also in the task structure,
 called pi_lock.  This lock may also be taken in interrupt context, so when
 locking the pi_lock, inter

[PATCH v2 2/3] rtmutex: update rt-mutex

2017-05-24 Thread Alex Shi
The rtmutex remove a pending owner bit in in rt_mutex::owner, in
commit 8161239a8bcc ("rtmutex: Simplify PI algorithm and make highest prio task 
get lock")
But the document was changed accordingly. Updating it to a meaningful
state.

BTW, as 'Steven Rostedt' mentioned:
There is still technically a "Pending Owner", it's just not called
that anymore. The pending owner happens to be the top_waiter of a lock
that has no owner and has been woken up to grab the lock.

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Sebastian Siewior 
Cc: Mathieu Poirier 
Cc: Juri Lelli 
Cc: Thomas Gleixner 
To: linux-doc@vger.kernel.org
To: linux-ker...@vger.kernel.org
To: Jonathan Corbet 
To: Ingo Molnar 
To: Peter Zijlstra 
---
 Documentation/locking/rt-mutex.txt | 58 +-
 1 file changed, 26 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/rt-mutex.txt 
b/Documentation/locking/rt-mutex.txt
index 243393d..35793e0 100644
--- a/Documentation/locking/rt-mutex.txt
+++ b/Documentation/locking/rt-mutex.txt
@@ -28,14 +28,13 @@ magic bullet for poorly designed applications, but it allows
 well-designed applications to use userspace locks in critical parts of
 an high priority thread, without losing determinism.
 
-The enqueueing of the waiters into the rtmutex waiter list is done in
+The enqueueing of the waiters into the rtmutex waiter tree is done in
 priority order. For same priorities FIFO order is chosen. For each
 rtmutex, only the top priority waiter is enqueued into the owner's
-priority waiters list. This list too queues in priority order. Whenever
+priority waiters tree. This tree too queues in priority order. Whenever
 the top priority waiter of a task changes (for example it timed out or
-got a signal), the priority of the owner task is readjusted. [The
-priority enqueueing is handled by "plists", see include/linux/plist.h
-for more details.]
+got a signal), the priority of the owner task is readjusted. The
+priority enqueueing is handled by "pi_waiters".
 
 RT-mutexes are optimized for fastpath operations and have no internal
 locking overhead when locking an uncontended mutex or unlocking a mutex
@@ -46,34 +45,29 @@ is used]
 The state of the rt-mutex is tracked via the owner field of the rt-mutex
 structure:
 
-rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1
-are used to keep track of the "owner is pending" and "rtmutex has
-waiters" state.
+lock->owner holds the task_struct pointer of the owner. Bit 0 is used to
+keep track of the "lock has waiters" state.
 
- owner bit1bit0
- NULL  0   0   mutex is free (fast acquire possible)
- NULL  0   1   invalid state
- NULL  1   0   Transitional state*
- NULL  1   1   invalid state
- taskpointer   0   0   mutex is held (fast release possible)
- taskpointer   0   1   task is pending owner
- taskpointer   1   0   mutex is held and has waiters
- taskpointer   1   1   task is pending owner and mutex has waiters
+ ownerbit0
+ NULL 0   lock is free (fast acquire possible)
+ NULL 1   lock is free and has waiters and the top waiter
+   is going to take the lock*
+ taskpointer  0   lock is held (fast release possible)
+ taskpointer  1   lock is held and has waiters**
 
-Pending-ownership handling is a performance optimization:
-pending-ownership is assigned to the first (highest priority) waiter of
-the mutex, when the mutex is released. The thread is woken up and once
-it starts executing it can acquire the mutex. Until the mutex is taken
-by it (bit 0 is cleared) a competing higher priority thread can "steal"
-the mutex which puts the woken up thread back on the waiters list.
+The fast atomic compare exchange based acquire and release is only
+possible when bit 0 of lock->owner is 0.
 
-The pending-ownership optimization is especially important for the
-uninterrupted workflow of high-prio tasks which repeatedly
-takes/releases locks that have lower-prio waiters. Without this
-optimization the higher-prio thread would ping-pong to the lower-prio
-task [because at unlock time we always assign a new owner].
+(*) It also can be a transitional state when grabbing the lock
+with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
+we need to set the bit0 before looking at the lock, and the owner may be
+NULL in this small time, hence this can be a transitional state.
 
-(*) The "mutex has waiters" bit gets set to take the lock. If the lock
-doesn't already have an owner, this bit is quickly cleared if there are
-no waiters.  So this is a transitional state to synchronize with looking
-at the owner field of the mutex and the mutex owner releasing the lock.
+(**) There is a small time when bit 0 is set but there are no
+waiters. This can happen when grabbing the lock in the slow path.
+To prevent a cmpxchg of the owner releasing the lock