Re: [PATCH] docs: Bump version to 5.x

2019-01-14 Thread Andrew Donnellan

On 14/1/19 5:12 pm, Joel Stanley wrote:

This shows up in the index of https://www.kernel.org/doc/html/latest/ so
I figured it should be updated.

Fixes: bfeffd15528 ("Linux 5.0-rc1")
Signed-off-by: Joel Stanley 
--
We could also remove the version number instead of applying this patch.


You missed all the 4.x references in "Installing the kernel source" :)

I also don't think the README matches the contemporary understanding of 
"release notes" so perhaps that phrase should be dropped too.



---
  Documentation/admin-guide/README.rst | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/README.rst 
b/Documentation/admin-guide/README.rst
index 0797eec76be1..a09baa324951 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -1,9 +1,9 @@
  .. _readme:

-Linux kernel release 4.x 
+Linux kernel release 5.x 
  =

-These are the release notes for Linux version 4.  Read them carefully,
+These are the release notes for Linux version 5.  Read them carefully,
  as they tell you what this is all about, explain how to install the
  kernel, and what to do if something goes wrong.

@@ -406,3 +406,4 @@ If something goes wrong

 gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
 disregards the starting offset for which the kernel is compiled.
+



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



updating user verbs documentation

2019-01-14 Thread Joel Nider
A small patchset to update the verbs API documentation with some
information regarding the ioctl syscall. First patch converts the
file format to ReST, since this is the new preferred format. 2nd
patch links this file to the main index so we can actually find
it by browsing (search will work in any case). 3rd patch adds the
new content, documenting a bit of the internal workings of the
kernel side of the API functions. The goal is to make it easier
for developers unfamiliar with the structure to understand what
is going on when adding a new function.

[PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst
[PATCH 2/3] docs-rst: update index file with infiniband docs
[PATCH 3/3] docs-rst: infiniband: update verbs API details



[PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

2019-01-14 Thread Joel Nider
Replace the existing Documentation/infiniband/user_verbs.txt with
Documentation/infiniband/user_verbs.rst. No substantial changes to
the content - just some minor reformatting to have the rendering
come out nicely.
This is in preparation for updating the content in a subsequent
patch.

Signed-off-by: Joel Nider 
---
 Documentation/infiniband/user_verbs.rst | 70 +
 Documentation/infiniband/user_verbs.txt | 69 
 2 files changed, 70 insertions(+), 69 deletions(-)
 create mode 100644 Documentation/infiniband/user_verbs.rst
 delete mode 100644 Documentation/infiniband/user_verbs.txt

diff --git a/Documentation/infiniband/user_verbs.rst 
b/Documentation/infiniband/user_verbs.rst
new file mode 100644
index 000..ffc4aec
--- /dev/null
+++ b/Documentation/infiniband/user_verbs.rst
@@ -0,0 +1,70 @@
+==
+Userspace Verbs Access
+==
+The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
+enables direct userspace access to IB hardware via "verbs," as
+described in chapter 11 of the InfiniBand Architecture Specification.
+
+To use the verbs, the libibverbs library, available from
+https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
+device-independent API for using the ib_uverbs interface.
+libibverbs also requires appropriate device-dependent kernel and
+userspace driver for your InfiniBand hardware.  For example, to use
+a Mellanox HCA, you will need the ib_mthca kernel module and the
+libmthca userspace driver be installed.
+
+User-kernel communication
+=
+Userspace communicates with the kernel for slow path, resource
+management operations via the /dev/infiniband/uverbsN character
+devices.  Fast path operations are typically performed by writing
+directly to hardware registers mmap()ed into userspace, with no
+system call or context switch into the kernel.
+
+Commands are sent to the kernel via write()s on these device files.
+The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
+The structs for commands that require a response from the kernel
+contain a 64-bit field used to pass a pointer to an output buffer.
+Status is returned to userspace as the return value of the write()
+system call.
+
+Resource management
+===
+Since creation and destruction of all IB resources is done by
+commands passed through a file descriptor, the kernel can keep track
+of which resources are attached to a given userspace context.  The
+ib_uverbs module maintains idr tables that are used to translate
+between kernel pointers and opaque userspace handles, so that kernel
+pointers are never exposed to userspace and userspace cannot trick
+the kernel into following a bogus pointer.
+
+This also allows the kernel to clean up when a process exits and
+prevent one process from touching another process's resources.
+
+Memory pinning
+==
+Direct userspace I/O requires that memory regions that are potential
+I/O targets be kept resident at the same physical address.  The
+ib_uverbs module manages pinning and unpinning memory regions via
+get_user_pages() and put_page() calls.  It also accounts for the
+amount of memory pinned in the process's locked_vm, and checks that
+unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
+
+Pages that are pinned multiple times are counted each time they are
+pinned, so the value of locked_vm may be an overestimate of the
+number of pages pinned by a process.
+
+/dev files
+==
+To create the appropriate character device files automatically with
+udev, a rule like::
+
+   KERNEL=="uverbs*", NAME="infiniband/%k"
+
+can be used.  This will create device nodes named::
+
+/dev/infiniband/uverbs0
+
+and so on.  Since the InfiniBand userspace verbs should be safe for
+use by non-privileged processes, it may be useful to add an
+appropriate MODE or GROUP to the udev rule.
diff --git a/Documentation/infiniband/user_verbs.txt 
b/Documentation/infiniband/user_verbs.txt
deleted file mode 100644
index df049b9..000
--- a/Documentation/infiniband/user_verbs.txt
+++ /dev/null
@@ -1,69 +0,0 @@
-USERSPACE VERBS ACCESS
-
-  The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
-  enables direct userspace access to IB hardware via "verbs," as
-  described in chapter 11 of the InfiniBand Architecture Specification.
-
-  To use the verbs, the libibverbs library, available from
-  https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
-  device-independent API for using the ib_uverbs interface.
-  libibverbs also requires appropriate device-dependent kernel and
-  userspace driver for your InfiniBand hardware.  For example, to use
-  a Mellanox HCA, you will need the ib_mthca kernel module and the
-  libmthca userspace driver be installed.
-
-User-kernel communication
-
-  Userspace communicates with the kernel for slow path, resource
-  management operations via

[PATCH 2/3] docs-rst: update index file with infiniband docs

2019-01-14 Thread Joel Nider
Link the previously converted Documentation/infiniband/user_verbs.rst
to the main index by creating a new subsystem (Infiniband) under the
root document. This manifests as a new section under "Kernel API
Documentation" in the index.html, as well as a new section in the
table of contents pane.

This has been tested with 'make htmldocs'.
---
 Documentation/conf.py  |  2 ++
 Documentation/index.rst|  1 +
 Documentation/infiniband/conf.py   | 10 ++
 Documentation/infiniband/index.rst |  9 +
 4 files changed, 22 insertions(+)
 create mode 100644 Documentation/infiniband/conf.py
 create mode 100644 Documentation/infiniband/index.rst

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 72647a3..ff71088 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -389,6 +389,8 @@ latex_documents = [
  'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
 ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
  'The kernel development community', 'manual'),
+('infiniband/index', 'infiniband.tex', 'Infiniband subsystem',
+ 'The kernel development community', 'manual'),
 ('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
  'The kernel development community', 'manual'),
 ('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To 
Hacking The Linux Kernel',
diff --git a/Documentation/index.rst b/Documentation/index.rst
index c858c2e..8d91ea5 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -82,6 +82,7 @@ needed).
core-api/index
media/index
networking/index
+   infiniband/index
input/index
gpu/index
security/index
diff --git a/Documentation/infiniband/conf.py b/Documentation/infiniband/conf.py
new file mode 100644
index 000..dc42d33
--- /dev/null
+++ b/Documentation/infiniband/conf.py
@@ -0,0 +1,10 @@
+# -*- coding: utf-8; mode: python -*-
+
+project = "Linux Infiniband Documentation"
+
+tags.add("subproject")
+
+latex_documents = [
+('index', 'infiniband.tex', project,
+ 'The kernel development community', 'manual'),
+]
diff --git a/Documentation/infiniband/index.rst 
b/Documentation/infiniband/index.rst
new file mode 100644
index 000..2dedc65
--- /dev/null
+++ b/Documentation/infiniband/index.rst
@@ -0,0 +1,9 @@
+Infiniband Documentation
+
+
+Contents:
+
+.. toctree::
+   :maxdepth: 1
+
+   user_verbs
-- 
2.7.4



[PATCH 3/3] docs-rst: infiniband: update verbs API details

2019-01-14 Thread Joel Nider
It is important to understand the existing framework when implementing
a new verb. The majority of existing API functions are implemented using
the write syscall, but this has been superceded by the ioctl syscall
for new commands. This patch updates the documentation regarding how
to go about implementing a new verb, focusing on the new ioctl
interface.

The documentation is far from complete, but this is a good step in the
right direction. Future patches can add more detail according to need.
Also, the interface is still undergoing substantial changes so an
effort was made to document only the stable parts so as to avoid
incorrect information since documentation changes tend to lag behind
code changes.

Signed-off-by: Joel Nider 
---
 Documentation/infiniband/user_verbs.rst | 69 -
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/Documentation/infiniband/user_verbs.rst 
b/Documentation/infiniband/user_verbs.rst
index ffc4aec..f0c7cd3 100644
--- a/Documentation/infiniband/user_verbs.rst
+++ b/Documentation/infiniband/user_verbs.rst
@@ -21,12 +21,79 @@ devices.  Fast path operations are typically performed by 
writing
 directly to hardware registers mmap()ed into userspace, with no
 system call or context switch into the kernel.
 
-Commands are sent to the kernel via write()s on these device files.
+There are currently two methods for executing commands in the kernel: write() 
and ioctl().
+Older commands are sent to the kernel via write()s on the device files
+mentioned earlier. New commands must use the ioctl() method. For completeness,
+both mechanisms are described here.
+
+The interface between userspace and kernel is kept in sync by checking the
+version number. In the kernel, it is defined by IB_USER_VERBS_ABI_VERSION
+(in include/uapi/rdma/ib_user_verbs.h).
+
+Write system call
+-
 The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
 The structs for commands that require a response from the kernel
 contain a 64-bit field used to pass a pointer to an output buffer.
 Status is returned to userspace as the return value of the write()
 system call.
+The entry point to the kernel is the ib_uverbs_write() function, which is
+invoked as a response to the 'write' system call. The requested function is
+looked up from an array called uverbs_cmd_table which contains function 
pointers
+to the various command handlers.
+
+Write Command Handlers
+~~
+These command handler functions are declared
+with the IB_VERBS_DECLARE_CMD macro in drivers/infiniband/core/uverbs.h. There
+are also extended commands, which are kept in a similar manner in the
+uverbs_ex_cmd_table. The extended commands use 64-bit values in the command
+header, as opposed to the 32-bit values used in the regular command table.
+
+
+Ioctl system call
+-
+The entry point for the 'ioctl' system call is the ib_uverbs_ioctl() function.
+Unlike write(), ioctl() accepts a 'cmd' parameter, which must have the value
+defined by RDMA_VERBS_IOCTL. More documentation regarding the ioctl numbering
+scheme can be found in: Documentation/ioctl/ioctl-number.txt. The
+command-specific information is passed as a pointer in the 'arg' parameter,
+which is cast as a 'struct ib_uverbs_ioctl_hdr*'.
+
+The way command handler functions (methods) are looked up is more complicated
+than the array index used for write(). Here, the ib_uverbs_cmd_verbs() function
+uses a radix tree to search for the correct command handler. If the lookup
+succeeds, the method is invoked by ib_uverbs_run_method().
+
+Ioctl Command Handlers
+~~
+Command handlers (also known as 'methods') for ioctl are declared with the
+UVERBS_HANDLER macro. The handler is registered for use by the
+DECLARE_UVERBS_NAMED_METHOD macro, which binds the name of the handler with its
+attributes. By convention, the methods are implemented in files named with the
+prefix 'uverbs_std_types_'.
+
+Each method can accept a set of parameters called attributes. There are 6
+types of attributes: idr, fd, pointer, enum, const and flags. The idr attribute
+declares an indirect (translated) handle for the method, and
+specifies the object that the method will act upon. The first attribute should
+be a handle to the uobj (ib_uobject) which contains private data. There may be
+0 or more
+additional attributes, including other handles. The 'pointer' attribute must be
+specified as 'in' or 'out', depending on if it is an input from userspace, or
+meant to return a value to userspace.
+
+The method also needs to be bound to an object, which is done with the
+DECLARE_UVERBS_NAMED_OBJECT macro. This macro takes a variable
+number of methods and stores them in an array attached to the object.
+
+Objects are declared using DECLARE_UVERBS_NAMED_OBJECT macro. Most of the
+objects (including pd, mw, cq, etc.) are defined in uverbs_std_types.c,
+and the remaining objects are declared in files that are prefixed

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 09:52:14AM +0800, lijiang wrote:
> I would like to remove this variable and post again.

No, you should remove the vmcoreinfo export too:

kernel/crash_core.c:398:VMCOREINFO_OSRELEASE(init_uts_ns.name.release);

after making sure userspace is not using it and *then* remove the
documentation.

But you can do that in a separate patch, so that it can be reverted if
trouble.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 01:30:30PM +0800, lijiang wrote:
> I noticed that the checkpatch was coded in Perl. But i am not familiar with
> the Perl program language, that would be beyond my ability to do this, i have
> to learn the Perl program language step by step. :-)

You could give it a try - it is not hard :-)

And there's no hurry for this, take your time.

> Do you mean this one 'KERNEL_IMAGE_SIZE'?

I mean, all those which are unused. Optimally, you should look at the
tools and see whether they're using those exports and if not, remove
them. But no hurry here too, take your time.

My final goal is to have this up-to-date documentation of what is
exported and what is used by user tools so that people can look at it
first before carelessly exporting yet another thing.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


[PATCH] Documentation/dev-tools: Use gcc version number instead svn revision number

2019-01-14 Thread Sebastian Andrzej Siewior
svn commit 231296 matches commit d29e939c63b71 ("Add fuzzing coverage
support") in the gcc git. The change is part of gcc 6.1.0.

Replace the svn commit number with a gcc version which everyone can
easily compare.

Signed-off-by: Sebastian Andrzej Siewior 
---
 Documentation/dev-tools/kcov.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kcov.rst b/Documentation/dev-tools/kcov.rst
index c2f6452e38ed0..42b6126777998 100644
--- a/Documentation/dev-tools/kcov.rst
+++ b/Documentation/dev-tools/kcov.rst
@@ -22,7 +22,7 @@ Prerequisites
 
 CONFIG_KCOV=y
 
-CONFIG_KCOV requires gcc built on revision 231296 or later.
+CONFIG_KCOV requires gcc 6.1.0 or later.
 
 If the comparison operands need to be collected, set::
 
-- 
2.20.1



Re: [PATCH v2 5/5] psi: introduce psi monitor

2019-01-14 Thread Peter Zijlstra
On Thu, Jan 10, 2019 at 02:07:18PM -0800, Suren Baghdasaryan wrote:
> +/*
> + * psi_update_work represents slowpath accounting part while
> + * psi_group_change represents hotpath part.
> + * There are two potential races between these path:
> + * 1. Changes to group->polling when slowpath checks for new stall, then
> + *hotpath records new stall and then slowpath resets group->polling
> + *flag. This leads to the exit from the polling mode while monitored
> + *states are still changing.
> + * 2. Slowpath overwriting an immediate update scheduled from the hotpath
> + *with a regular update further in the future and missing the
> + *immediate update.
> + * Both races are handled with a retry cycle in the slowpath:
> + *
> + *HOTPATH: |SLOWPATH:
> + * |
> + * A) times[cpu] += delta  | E) delta = times[*]
> + * B) start_poll = (delta[poll_mask] &&|if delta[poll_mask]:
> + *  cmpxchg(g->polling, 0, 1) == 0)| F)   polling_until = now +
> + * |  grace_period
> + * |if now > polling_until:
> + *if start_poll:   |  if g->polling:
> + * C)   mod_delayed_work(1)| G) g->polling = polling = 0
> + *else if !delayed_work_pending(): | H) goto SLOWPATH
> + * D)   schedule_delayed_work(PSI_FREQ)|else:
> + * |  if !g->polling:
> + * | I) g->polling = polling = 1
> + * | J) if delta && first_pass:
> + * |  next_avg = calculate_averages()
> + * |  if polling:
> + * |next_poll = poll_triggers()
> + * |if (delta && first_pass) || 
> polling:
> + * | K)   mod_delayed_work(
> + * |  min(next_avg, next_poll))
> + * |  if !polling:
> + * |first_pass = false
> + * | L) goto SLOWPATH
> + *
> + * Race #1 is represented by (EABGD) sequence in which case slowpath
> + * deactivates polling mode because it misses new monitored stall and hotpath
> + * doesn't activate it because at (B) g->polling is not yet reset by slowpath
> + * in (G). This race is handled by the (H) retry, which in the race described
> + * above results in the new sequence of (EABGDHEIK) that reactivates polling
> + * mode.
> + *
> + * Race #2 is represented by polling==false && (JABCK) sequence which
> + * overwrites immediate update scheduled at (C) with a later (next_avg) 
> update
> + * scheduled at (K). This race is handled by the (L) retry which results in 
> the
> + * new sequence of polling==false && (JABCKLEIK) that reactivates polling 
> mode
> + * and reschedules next polling update (next_poll).
> + *
> + * Note that retries can't result in an infinite loop because retry #1 
> happens
> + * only during polling reactivation and retry #2 happens only on the first
> + * pass. Constant reactivations are impossible because polling will stay 
> active
> + * for at least grace_period. Worst case scenario involves two retries 
> (HEJKLE)
> + */

I'm having a fairly hard time with this. There's a distinct lack of
memory ordering, and a suspicious mixing of atomic ops (cmpxchg) and
regular loads and stores (without READ_ONCE/WRITE_ONCE even).

Please clarify.

(also, you look to have a whole bunch of line-breaks that are really not
needed; concattenated the line would not be over 80 chars).


Re: [PATCH v2] cpuidle: Add 'above' and 'below' idle state metrics

2019-01-14 Thread Daniel Lezcano


Hi Rafael,

sorry for the delay.

On 10/01/2019 11:20, Rafael J. Wysocki wrote:

[ ... ]

>>>   if (entered_state >= 0) {
>>> + s64 diff, delay = drv->states[entered_state].exit_latency;
>>> + int i;
>>> +
>>>   /*
>>>* Update cpuidle counters
>>>* This can be moved to within driver enter routine,
>>> @@ -260,6 +262,33 @@ int cpuidle_enter_state(struct cpuidle_d
>>>   dev->last_residency = (int)diff;
>>
>> Shouldn't we subtract the 'delay' from the computed 'diff' in any case ?
> 
> No.
> 
>> Otherwise the 'last_residency' accumulates the effective sleep time and
>> the time to wakeup. We are interested in the sleep time only for
>> prediction and metrics no ?
> 
> Yes, but 'delay' is the worst-case latency and not the actual one
> experienced, most of the time, and (on average) we would underestimate
> the sleep time if it was always subtracted.

IMO, the exit latency is more or less constant for the cpu power down
state. When it is the cluster power down state, the first cpu waking up
has the worst latency, then the others have the same has the cpu power
down state.

If we can model that, the gray area you mention below can be reduced.
There are platform where the exit latency is very high [1] and not
taking it into account will give very wrong metrics.

> The idea here is to only count the wakeup as 'above' if the total
> 'last_residency' is below the target residency of the idle state that
> was asked for (as in that case we know for certain that the CPU has
> been woken up too early) and to only count it as 'below' if the
> difference between 'last_residency' and 'delay' is greater than or
> equal to the target residency of a deeper idle state (as in that case
> we know for certain that the CPU has been woken up too late).
> 
> Of course, this means that there is a "gray area" in which we are not
> really sure if the sleep time has matched the idle state that was
> asked for, but there's not much we can do about that IMO.

There is another aspect of the metric which can be improved, the 'above'
and the 'below' give an rough indication about the correctness of the
prediction but they don't tell us which idle state we should have
selected (we can be constantly choosing state3 instead of state1 for
example).

It would be nice to add a 'missed' field for each idle states, so when
we check if there is a 'above' or a 'below' condition, we increment the
idle state 'missed' field for the idle state we should have selected.

  -- Daniel

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/hisilicon/hi3660.dtsi#n199


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH v10 08/12] mfd: intel-peci-client: Add PECI client driver

2019-01-14 Thread Joel Stanley
On Tue, 8 Jan 2019 at 08:11, Jae Hyun Yoo  wrote:
>
> This commit adds PECI client driver.

It looks like it's a PECI driver for the three CPU families, and it
implements cpu and dimm temp, with sideband functions deferred to the
future. If you add that information with a few more words it would
make for a nicer commit message.

> Signed-off-by: Jae Hyun Yoo 

Reviewed-by: Joel Stanley 


Re: [PATCH v10 09/12] Documentation: hwmon: Add documents for PECI hwmon client drivers

2019-01-14 Thread Joel Stanley
On Tue, 8 Jan 2019 at 08:11, Jae Hyun Yoo  wrote:
>
> This commit adds hwmon documents for PECI cputemp and dimmtemp drivers.
>
> Cc: Guenter Roeck 
> Cc: Jean Delvare 
> Cc: Jonathan Corbet 
> Cc: Jason M Biils 
> Cc: Randy Dunlap 
> Signed-off-by: Jae Hyun Yoo 
> Reviewed-by: Haiyue Wang 
> Reviewed-by: James Feist 
> Reviewed-by: Vernon Mauery 
> Acked-by: Guenter Roeck 

Reviewed-by: Joel Stanley 


[PATCH] docs/core-api: memory-allocation: add mention of kmem_cache_create_userspace

2019-01-14 Thread Mike Rapoport
Mention that when a part of a slab cache might be exported to the
userspace, the cache should be created using kmem_cache_create_usercopy()

Signed-off-by: Mike Rapoport 
---
 Documentation/core-api/memory-allocation.rst | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst 
b/Documentation/core-api/memory-allocation.rst
index 8954a88..51a200d 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -113,9 +113,11 @@ see :c:func:`kvmalloc_node` reference documentation. Note 
that
 
 If you need to allocate many identical objects you can use the slab
 cache allocator. The cache should be set up with
-:c:func:`kmem_cache_create` before it can be used. Afterwards
-:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate
-memory from that cache.
+:c:func:`kmem_cache_create` or :c:func:`kmem_cache_create_usercopy`
+before it can be used. The second function should be used if a part of
+the cache might be copied to the userspace.  After the cache is
+created :c:func:`kmem_cache_alloc` and its convenience wrappers can
+allocate memory from that cache.
 
 When the allocated memory is no longer needed it must be freed. You
 can use :c:func:`kvfree` for the memory allocated with `kmalloc`,
-- 
2.7.4



[PATCH 1/1 RESEND] doc: net: fix bad references to network drivers

2019-01-14 Thread Otto Sabart
Fix "reference to nonexisting document" warnings.

Fixes: b255e500c8dc ("net: documentation: build a directory structure
for drivers")
Signed-off-by: Otto Sabart 
---
 Documentation/networking/index.rst | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/index.rst 
b/Documentation/networking/index.rst
index 6a47629ef8ed..59e86de662cd 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -11,19 +11,19 @@ Contents:
batman-adv
can
can_ucan_protocol
-   dpaa2/index
-   e100
-   e1000
-   e1000e
-   fm10k
-   igb
-   igbvf
-   ixgb
-   ixgbe
-   ixgbevf
-   i40e
-   iavf
-   ice
+   device_drivers/freescale/dpaa2/index
+   device_drivers/intel/e100
+   device_drivers/intel/e1000
+   device_drivers/intel/e1000e
+   device_drivers/intel/fm10k
+   device_drivers/intel/igb
+   device_drivers/intel/igbvf
+   device_drivers/intel/ixgb
+   device_drivers/intel/ixgbe
+   device_drivers/intel/ixgbevf
+   device_drivers/intel/i40e
+   device_drivers/intel/iavf
+   device_drivers/intel/ice
kapi
z8530book
msg_zerocopy
-- 
2.17.2



signature.asc
Description: PGP signature


Re: [PATCH] fgraph: record function return value

2019-01-14 Thread Mark Rutland
On Sat, Jan 12, 2019 at 02:57:01PM +0800, Changbin Du wrote:
> This patch adds a new trace option 'funcgraph-retval' and is disabled by
> default. When this option is enabled, fgraph tracer will show the return
> value of each function. This is useful to find/analyze a original error
> source in a call graph.
> 
> One limitation is that kernel doesn't know the prototype of functions. So
> fgraph assumes all functions have a retvalue of type int. You must ignore
> the value of *void* function. And if the retvalue looks like an error code
> then both hexadecimal and decimal number are displayed.

This sounds more confusing than helpful, and it sounds like this has
overlap with FTRACE_WITH_REGS functionality.

> diff --git a/arch/arm64/kernel/entry-ftrace.S 
> b/arch/arm64/kernel/entry-ftrace.S
> index 81b8eb5c4633..223f4ad269d4 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -202,6 +202,7 @@ ENTRY(return_to_handler)
>   stp x4, x5, [sp, #32]
>   stp x6, x7, [sp, #48]
>  
> + mov x1, x0  // return value
>   mov x0, x29 // parent's fp
>   bl  ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
>   mov x30, x0 // restore the original return address

What about indirect return values? Those go via x8.

Additionally, in some cases (e.g. static functions with cross-function
optimization), the compiler might not follow the usual PCS, so the
return value might not be in x0 regardless. Maybe such functions aren't
hooked by ftrace today?

Generally, I don't think that this is going to be reliable.

> +config HAVE_FTRACE_RETVAL
> + bool
> +
>  config HAVE_DYNAMIC_FTRACE
>   bool
>   help
> @@ -160,6 +163,7 @@ config FUNCTION_GRAPH_TRACER
>   depends on HAVE_FUNCTION_GRAPH_TRACER
>   depends on FUNCTION_TRACER
>   depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
> + select HAVE_FTRACE_RETVAL if (X86 || ARM)

... but not arm64?

Thanks,
Mark.


Re: [RFC AFBC 03/12] drm/afbc: Add AFBC modifier usage documentation

2019-01-14 Thread Jani Nikula
On Fri, 11 Jan 2019, Liviu Dudau  wrote:
> On Thu, Jan 03, 2019 at 05:44:26PM -0300, Ezequiel Garcia wrote:
>> Hi Liviu,
>> 
>> On Mon, 2018-12-03 at 11:31 +, Ayan Halder wrote:
>> > From: Brian Starkey 
>> > 
>> > AFBC is a flexible, proprietary, lossless compression protocol and
>> > format, with a number of defined DRM format modifiers. To facilitate
>> > consistency and compatibility between different AFBC producers and
>> > consumers, document the expectations for usage of the AFBC DRM format
>> > modifiers in a new .rst chapter.
>> > 
>> > Signed-off-by: Brian Starkey 
>> > Reviewed-by: Liviu Dudau 
>> > ---
>> 
>> I can't find this commit anywhere. Did you decide to reject
>> this or perhaps it just fell thru the cracks?
>
> Cracks have opened wide enough to let this through, sorry about that!
>
> I've now sent a pull request to get it merged.

Okay, so this is a very late comment, so feel free to ignore or,
perhaps, add a change on top.

Documentation/gpu mostly contains files that document high level stuff,
mostly one file per driver (with names matching the directories under
drivers/gpu/drm) or one file per drm core functional area.

Perhaps start an arm.rst, or at least name it more descriptively, say
arm-fbc.rst? Contrast msm-crash-dump.rst.

BR,
Jani.


>
> Best regards,
> Liviu
>
>> 
>> Thanks!
>> Ezequiel
>> 
>> 
>> >  Documentation/gpu/afbc.rst| 233 
>> > ++
>> >  Documentation/gpu/drivers.rst |   1 +
>> >  MAINTAINERS   |   1 +
>> >  include/uapi/drm/drm_fourcc.h |   3 +
>> >  4 files changed, 238 insertions(+)
>> >  create mode 100644 Documentation/gpu/afbc.rst
>> > 
>> > diff --git a/Documentation/gpu/afbc.rst b/Documentation/gpu/afbc.rst
>> > new file mode 100644
>> > index 000..922d955
>> > --- /dev/null
>> > +++ b/Documentation/gpu/afbc.rst
>> > @@ -0,0 +1,233 @@
>> > +===
>> > + Arm Framebuffer Compression (AFBC)
>> > +===
>> > +
>> > +AFBC is a proprietary lossless image compression protocol and format.
>> > +It provides fine-grained random access and minimizes the amount of
>> > +data transferred between IP blocks.
>> > +
>> > +AFBC can be enabled on drivers which support it via use of the AFBC
>> > +format modifiers defined in drm_fourcc.h. See DRM_FORMAT_MOD_ARM_AFBC(*).
>> > +
>> > +All users of the AFBC modifiers must follow the usage guidelines laid
>> > +out in this document, to ensure compatibility across different AFBC
>> > +producers and consumers.
>> > +
>> > +Components and Ordering
>> > +===
>> > +
>> > +AFBC streams can contain several components - where a component
>> > +corresponds to a color channel (i.e. R, G, B, X, A, Y, Cb, Cr).
>> > +The assignment of input/output color channels must be consistent
>> > +between the encoder and the decoder for correct operation, otherwise
>> > +the consumer will interpret the decoded data incorrectly.
>> > +
>> > +Furthermore, when the lossless colorspace transform is used
>> > +(AFBC_FORMAT_MOD_YTR, which should be enabled for RGB buffers for
>> > +maximum compression efficiency), the component order must be:
>> > +
>> > + * Component 0: R
>> > + * Component 1: G
>> > + * Component 2: B
>> > +
>> > +The component ordering is communicated via the fourcc code in the
>> > +fourcc:modifier pair. In general, component '0' is considered to
>> > +reside in the least-significant bits of the corresponding linear
>> > +format. For example, COMP(bits):
>> > +
>> > + * DRM_FORMAT_ABGR
>> > +
>> > +   * Component 0: R(8)
>> > +   * Component 1: G(8)
>> > +   * Component 2: B(8)
>> > +   * Component 3: A(8)
>> > +
>> > + * DRM_FORMAT_BGR888
>> > +
>> > +   * Component 0: R(8)
>> > +   * Component 1: G(8)
>> > +   * Component 2: B(8)
>> > +
>> > + * DRM_FORMAT_YUYV
>> > +
>> > +   * Component 0: Y(8)
>> > +   * Component 1: Cb(8, 2x1 subsampled)
>> > +   * Component 2: Cr(8, 2x1 subsampled)
>> > +
>> > +In AFBC, 'X' components are not treated any differently from any other
>> > +component. Therefore, an AFBC buffer with fourcc DRM_FORMAT_XBGR
>> > +encodes with 4 components, like so:
>> > +
>> > + * DRM_FORMAT_XBGR
>> > +
>> > +   * Component 0: R(8)
>> > +   * Component 1: G(8)
>> > +   * Component 2: B(8)
>> > +   * Component 3: X(8)
>> > +
>> > +Please note, however, that the inclusion of a "wasted" 'X' channel is
>> > +bad for compression efficiency, and so it's recommended to avoid
>> > +formats containing 'X' bits. If a fourth component is
>> > +required/expected by the encoder/decoder, then it is recommended to
>> > +instead use an equivalent format with alpha, setting all alpha bits to
>> > +'1'. If there is no requirement for a fourth component, then a format
>> > +which doesn't include alpha can be used, e.g. DRM_FORMAT_BGR888.
>> > +
>> > +Number of Planes
>> > +
>> > +
>> > +Formats which are typically multi-planar in linear layouts (e.g. YUV
>> > +420), 

Re: [RFC AFBC 03/12] drm/afbc: Add AFBC modifier usage documentation

2019-01-14 Thread Brian Starkey
Hi Jani,

On Mon, Jan 14, 2019 at 02:23:46PM +0200, Jani Nikula wrote:
> On Fri, 11 Jan 2019, Liviu Dudau  wrote:
> > On Thu, Jan 03, 2019 at 05:44:26PM -0300, Ezequiel Garcia wrote:
> >> Hi Liviu,
> >> 
> >> On Mon, 2018-12-03 at 11:31 +, Ayan Halder wrote:
> >> > From: Brian Starkey 
> >> > 
> >> > AFBC is a flexible, proprietary, lossless compression protocol and
> >> > format, with a number of defined DRM format modifiers. To facilitate
> >> > consistency and compatibility between different AFBC producers and
> >> > consumers, document the expectations for usage of the AFBC DRM format
> >> > modifiers in a new .rst chapter.
> >> > 
> >> > Signed-off-by: Brian Starkey 
> >> > Reviewed-by: Liviu Dudau 
> >> > ---
> >> 
> >> I can't find this commit anywhere. Did you decide to reject
> >> this or perhaps it just fell thru the cracks?
> >
> > Cracks have opened wide enough to let this through, sorry about that!
> >
> > I've now sent a pull request to get it merged.
> 
> Okay, so this is a very late comment, so feel free to ignore or,
> perhaps, add a change on top.
> 
> Documentation/gpu mostly contains files that document high level stuff,
> mostly one file per driver (with names matching the directories under
> drivers/gpu/drm) or one file per drm core functional area.
> 
> Perhaps start an arm.rst, or at least name it more descriptively, say
> arm-fbc.rst? Contrast msm-crash-dump.rst.

I did deliberately put it at the top-level, as AFBC is implemented by
IPs from many different vendors. The intention of this file is to try
and ensure interop between those different vendors' drivers. I fear
that if we namespace it 'arm' then it will be regarded as
Arm-specific, whereas it's meant to set the standard for the AFBC
implementations in all vendors' DRM drivers. That only applies if they
use the AFBC modifiers Arm has defined, but IMO that's what we should
be pushing for instead of having each vendor define their own local
AFBC modifiers, because that will make interop a nightmare. The Arm
definitions should cover all conformant implementations.

AFBC is a relatively well-established name, whereas arm-fbc is not a
term anyone will be familiar with. We could name the file
"arm-afbc.rst", though I am slightly against namespacing it that way
for the reason mentioned above - it does/will/should apply to more
than just the gpu/drm/arm tree.

Best regards,
-Brian

> 
> BR,
> Jani.
> 
> 
> >
> > Best regards,
> > Liviu
> >
> >> 
> >> Thanks!
> >> Ezequiel
> >> 
> >> 
> >> >  Documentation/gpu/afbc.rst| 233 
> >> > ++
> >> >  Documentation/gpu/drivers.rst |   1 +
> >> >  MAINTAINERS   |   1 +
> >> >  include/uapi/drm/drm_fourcc.h |   3 +
> >> >  4 files changed, 238 insertions(+)
> >> >  create mode 100644 Documentation/gpu/afbc.rst
> >> > 
> >> > diff --git a/Documentation/gpu/afbc.rst b/Documentation/gpu/afbc.rst
> >> > new file mode 100644
> >> > index 000..922d955
> >> > --- /dev/null
> >> > +++ b/Documentation/gpu/afbc.rst
> >> > @@ -0,0 +1,233 @@
> >> > +===
> >> > + Arm Framebuffer Compression (AFBC)
> >> > +===
> >> > +
> >> > +AFBC is a proprietary lossless image compression protocol and format.
> >> > +It provides fine-grained random access and minimizes the amount of
> >> > +data transferred between IP blocks.
> >> > +
> >> > +AFBC can be enabled on drivers which support it via use of the AFBC
> >> > +format modifiers defined in drm_fourcc.h. See 
> >> > DRM_FORMAT_MOD_ARM_AFBC(*).
> >> > +
> >> > +All users of the AFBC modifiers must follow the usage guidelines laid
> >> > +out in this document, to ensure compatibility across different AFBC
> >> > +producers and consumers.
> >> > +
> >> > +Components and Ordering
> >> > +===
> >> > +
> >> > +AFBC streams can contain several components - where a component
> >> > +corresponds to a color channel (i.e. R, G, B, X, A, Y, Cb, Cr).
> >> > +The assignment of input/output color channels must be consistent
> >> > +between the encoder and the decoder for correct operation, otherwise
> >> > +the consumer will interpret the decoded data incorrectly.
> >> > +
> >> > +Furthermore, when the lossless colorspace transform is used
> >> > +(AFBC_FORMAT_MOD_YTR, which should be enabled for RGB buffers for
> >> > +maximum compression efficiency), the component order must be:
> >> > +
> >> > + * Component 0: R
> >> > + * Component 1: G
> >> > + * Component 2: B
> >> > +
> >> > +The component ordering is communicated via the fourcc code in the
> >> > +fourcc:modifier pair. In general, component '0' is considered to
> >> > +reside in the least-significant bits of the corresponding linear
> >> > +format. For example, COMP(bits):
> >> > +
> >> > + * DRM_FORMAT_ABGR
> >> > +
> >> > +   * Component 0: R(8)
> >> > +   * Component 1: G(8)
> >> > +   * Component 2: B(8)
> >> > +   * Component 3: A(8)
> >> > +
> >> > + * DRM_FORMAT_BGR

Re: [RFC AFBC 03/12] drm/afbc: Add AFBC modifier usage documentation

2019-01-14 Thread Jani Nikula
On Mon, 14 Jan 2019, Brian Starkey  wrote:
> AFBC is a relatively well-established name, whereas arm-fbc is not a
> term anyone will be familiar with.

First time I ever heard of AFBC. ;)

> We could name the file "arm-afbc.rst", though I am slightly against
> namespacing it that way for the reason mentioned above - it
> does/will/should apply to more than just the gpu/drm/arm tree.

Fair enough. It's not like the name is part of the ABI, so I guess let's
go with this, and we can rename later if we come up with a better name.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH 0/5] Improve the latency tracers

2019-01-14 Thread Steven Rostedt
On Sat, 12 Jan 2019 12:05:33 +0800
Changbin Du  wrote:

> Hi Steven, Have you checked this serias yet? :)
> 


Not yet, I'll try to do it today.

-- Steve


Re: [PATCH] Documentation: fix coding-style.rst Sphinx warning

2019-01-14 Thread Jonathan Corbet
On Sun, 13 Jan 2019 19:28:58 -0800
Randy Dunlap  wrote:

> Fix Sphinx warning in coding-style.rst:
> 
> Documentation/process/coding-style.rst:446: WARNING: Inline interpreted text 
> or phrase reference start-string without end-string.
> 
> Signed-off-by: Randy Dunlap 

Applied, thanks.

jon


Re: [PATCH] Documentation: add ibmvmc to toctree(index) and fix warnings

2019-01-14 Thread Jonathan Corbet
On Sun, 13 Jan 2019 19:21:46 -0800
Randy Dunlap  wrote:

> Fix Sphinx warnings in ibmvmc.rst, add an index.rst file in
> Documentation/misc-devices/, and insert that index file into the
> top-level index file.
> 
> Documentation/misc-devices/ibmvmc.rst:2: WARNING: Explicit markup ends 
> without a blank line; unexpected unindent.
> Documentation/misc-devices/ibmvmc.rst:: WARNING: document isn't included in 
> any toctree
> 
> Signed-off-by: Randy Dunlap 
> Cc: Steven Royer 
> Cc: Jonathan Corbet 

Rather than make another new top-level entry to add to the mess there, I
do wonder if this isn't better placed in the driver-api manual.  I *think*
that's appropriate, though the document is a bit vague to use that way.

Oh well, bringing it in and killing some warnings is better than what we
have now, so I've applied it.  We can always move it around later.

Thanks,

jon


Re: [PATCH] fgraph: record function return value

2019-01-14 Thread Changbin Du
On Mon, Jan 14, 2019 at 12:11:56PM +, Mark Rutland wrote:
> On Sat, Jan 12, 2019 at 02:57:01PM +0800, Changbin Du wrote:
> > This patch adds a new trace option 'funcgraph-retval' and is disabled by
> > default. When this option is enabled, fgraph tracer will show the return
> > value of each function. This is useful to find/analyze a original error
> > source in a call graph.
> > 
> > One limitation is that kernel doesn't know the prototype of functions. So
> > fgraph assumes all functions have a retvalue of type int. You must ignore
> > the value of *void* function. And if the retvalue looks like an error code
> > then both hexadecimal and decimal number are displayed.
> 
> This sounds more confusing than helpful, and it sounds like this has
> overlap with FTRACE_WITH_REGS functionality.
> 
Acctually this is similar to Return Probes. The kprobe has same
situation and it uses regs_return_value() to get retvalue. On x86 it is:
static inline long regs_return_value(struct pt_regs *regs)
{
return PT_REGS_AX(regs);
}

on arm it is:
static inline long regs_return_value(struct pt_regs *regs)
{
return regs->ARM_r0;
}
Due to lack of prototype info we cannot handle complex types.

FTRACE_WITH_REGS saves all general registers but here only one.
> > diff --git a/arch/arm64/kernel/entry-ftrace.S 
> > b/arch/arm64/kernel/entry-ftrace.S
> > index 81b8eb5c4633..223f4ad269d4 100644
> > --- a/arch/arm64/kernel/entry-ftrace.S
> > +++ b/arch/arm64/kernel/entry-ftrace.S
> > @@ -202,6 +202,7 @@ ENTRY(return_to_handler)
> > stp x4, x5, [sp, #32]
> > stp x6, x7, [sp, #48]
> >  
> > +   mov x1, x0  // return value
> > mov x0, x29 // parent's fp
> > bl  ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
> > mov x30, x0 // restore the original return address
> 
> What about indirect return values? Those go via x8.
> 
> Additionally, in some cases (e.g. static functions with cross-function
> optimization), the compiler might not follow the usual PCS, so the
> return value might not be in x0 regardless. Maybe such functions aren't
> hooked by ftrace today?
I think these functions have been optimized out so ftrace will not trace
them.

> 
> Generally, I don't think that this is going to be reliable.
> 
> > +config HAVE_FTRACE_RETVAL
> > +   bool
> > +
> >  config HAVE_DYNAMIC_FTRACE
> > bool
> > help
> > @@ -160,6 +163,7 @@ config FUNCTION_GRAPH_TRACER
> > depends on HAVE_FUNCTION_GRAPH_TRACER
> > depends on FUNCTION_TRACER
> > depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
> > +   select HAVE_FTRACE_RETVAL if (X86 || ARM)
> 
> ... but not arm64?
> 
> Thanks,
> Mark.

-- 
Thanks,
Changbin Du


Re: [PATCH] fgraph: record function return value

2019-01-14 Thread Steven Rostedt
On Sat, 12 Jan 2019 14:57:01 +0800
Changbin Du  wrote:

> This patch adds a new trace option 'funcgraph-retval' and is disabled by
> default. When this option is enabled, fgraph tracer will show the return
> value of each function. This is useful to find/analyze a original error
> source in a call graph.
> 
> One limitation is that kernel doesn't know the prototype of functions. So
> fgraph assumes all functions have a retvalue of type int. You must ignore
> the value of *void* function. And if the retvalue looks like an error code
> then both hexadecimal and decimal number are displayed.
> 
> In this patch, only x86 and ARM platforms are supported.
> 
> Here is example showing the error is caused by vmx_create_vcpu() and the
> error code is -5 (-EIO).
> 
> with echo 1 > /sys/kernel/debug/tracing/options/funcgraph-retval
> 
>  3)   |  kvm_vm_ioctl() {
>  3)   |mutex_lock() {
>  3)   |  _cond_resched() {
>  3)   0.234 us|rcu_all_qs(); /* ret=0x8000 */
>  3)   0.704 us|  } /* ret=0x0 */
>  3)   1.226 us|} /* ret=0x0 */
>  3)   0.247 us|mutex_unlock(); /* ret=0x8880738ed040 */
>  3)   |kvm_arch_vcpu_create() {
>  3)   |  vmx_create_vcpu() {
>  3) + 17.969 us   |kmem_cache_alloc(); /* ret=0x88813a980040 */
>  3) + 15.948 us   |kmem_cache_alloc(); /* ret=0x88813aa99200 */
>  3)   0.653 us|allocate_vpid.part.88(); /* ret=0x1 */
>  3)   6.964 us|kvm_vcpu_init(); /* ret=0xfffb */
>  3)   0.323 us|free_vpid.part.89(); /* ret=0x1 */
>  3)   9.985 us|kmem_cache_free(); /* ret=0x8000 */
>  3)   9.491 us|kmem_cache_free(); /* ret=0x8000 */
>  3) + 69.858 us   |  } /* ret=0xfffb/-5 */
>  3) + 70.631 us   |} /* ret=0xfffb/-5 */
>  3)   |mutex_lock() {
>  3)   |  _cond_resched() {
>  3)   0.199 us|rcu_all_qs(); /* ret=0x8000 */
>  3)   0.594 us|  } /* ret=0x0 */
>  3)   1.067 us|} /* ret=0x0 */
>  3)   0.337 us|mutex_unlock(); /* ret=0x8880738ed040 */
>  3) + 92.730 us   |  } /* ret=0xfffb/-5 */
> 
> Signed-off-by: Changbin Du 
> ---
>

Hi Changbin,

I'm rewriting a lot of the function graph tracer code to have
kretprobes be able to work on top of it. It's still a work in progress.
It would be easier to add something to that work when its done than to
do it now.

Thanks!

-- Steve


Re: [PATCH 2/3] docs-rst: update index file with infiniband docs

2019-01-14 Thread Jason Gunthorpe
On Mon, Jan 14, 2019 at 11:00:50AM +0200, Joel Nider wrote:
> Link the previously converted Documentation/infiniband/user_verbs.rst
> to the main index by creating a new subsystem (Infiniband) under the
> root document. This manifests as a new section under "Kernel API
> Documentation" in the index.html, as well as a new section in the
> table of contents pane.
> 
> This has been tested with 'make htmldocs'.
> ---

Missed signed-off-by

>  Documentation/conf.py  |  2 ++
>  Documentation/index.rst|  1 +
>  Documentation/infiniband/conf.py   | 10 ++
>  Documentation/infiniband/index.rst |  9 +
>  4 files changed, 22 insertions(+)
>  create mode 100644 Documentation/infiniband/conf.py
>  create mode 100644 Documentation/infiniband/index.rst
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 72647a3..ff71088 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -389,6 +389,8 @@ latex_documents = [
>   'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
>  ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
>   'The kernel development community', 'manual'),
> +('infiniband/index', 'infiniband.tex', 'Infiniband subsystem',
> + 'The kernel development community', 'manual'),
>  ('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
>   'The kernel development community', 'manual'),
>  ('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To 
> Hacking The Linux Kernel',
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index c858c2e..8d91ea5 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -82,6 +82,7 @@ needed).
> core-api/index
> media/index
> networking/index
> +   infiniband/index
> input/index
> gpu/index
> security/index
> diff --git a/Documentation/infiniband/conf.py 
> b/Documentation/infiniband/conf.py
> new file mode 100644
> index 000..dc42d33
> --- /dev/null
> +++ b/Documentation/infiniband/conf.py
> @@ -0,0 +1,10 @@
> +# -*- coding: utf-8; mode: python -*-
> +
> +project = "Linux Infiniband Documentation"
> +
> +tags.add("subproject")
> +
> +latex_documents = [
> +('index', 'infiniband.tex', project,
> + 'The kernel development community', 'manual'),
> +]
> diff --git a/Documentation/infiniband/index.rst 
> b/Documentation/infiniband/index.rst
> new file mode 100644
> index 000..2dedc65
> --- /dev/null
> +++ b/Documentation/infiniband/index.rst
> @@ -0,0 +1,9 @@
> +Infiniband Documentation
> +
> +
> +Contents:
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   user_verbs
> -- 
> 2.7.4
> 


Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

2019-01-14 Thread Jason Gunthorpe
On Mon, Jan 14, 2019 at 11:00:49AM +0200, Joel Nider wrote:
> Replace the existing Documentation/infiniband/user_verbs.txt with
> Documentation/infiniband/user_verbs.rst. No substantial changes to
> the content - just some minor reformatting to have the rendering
> come out nicely.
> This is in preparation for updating the content in a subsequent
> patch.
> 
> Signed-off-by: Joel Nider 
> ---
>  Documentation/infiniband/user_verbs.rst | 70 
> +
>  Documentation/infiniband/user_verbs.txt | 69 
>  2 files changed, 70 insertions(+), 69 deletions(-)
>  create mode 100644 Documentation/infiniband/user_verbs.rst
>  delete mode 100644 Documentation/infiniband/user_verbs.txt

Thanks for getting this going Joe, I've been mulling over writing more
docs for this area for a while now.

Jonathan/linux-doc: Can you Ack at least the build system parts of
this please? I can take it through the rdma tree, unless you prefer
otherwise?

Cheers,
Jason


Re: [PATCH v4] coding-style: Clarify the expectations around bool

2019-01-14 Thread Jason Gunthorpe
On Fri, Jan 11, 2019 at 07:29:40AM -1000, Joey Pabalinas wrote:
> On Thu, Jan 10, 2019 at 11:48:13PM +, Jason Gunthorpe wrote:
> > There has been some confusion since checkpatch started warning about bool
> > use in structures, and people have been avoiding using it.
> > 
> > Many people feel there is still a legitimate place for bool in structures,
> > so provide some guidance on bool usage derived from the entire thread that
> > spawned the checkpatch warning.
> 
> Hey Jason,
> 
> I very much agree that the bool expectations could be much clearer, and this
> patch is a nice step in that direction! Just a couple small nitpicks:
> 
> > +Do not use bool if cache line layout or size of the value matters, its size
> > +and alignment varies based on the compiled architecture. Structures that 
> > are
> > +optimized for alignment and size should not use bool.
> 
> +Do not use bool if cache line layout or size of the value matters, as its 
> size
> ^
> |
> Adding an "as" makes the sentence flow a bit cleaner: --
> 
> > +into a single bitwise 'flags' argument and 'flags' can often a more 
> > readable
> > +alternative if the call-sites have naked true/false constants.
> 
> +into a single bitwise 'flags' argument and 'flags' can often be a more 
> readable
>   ^
>   |
> Missing a "be" here: -
> 
> Ack from me after those two corrections.
> 
> Reviewed-by: Joey Pabalinas 

done, thanks

Jason


Re: [PATCH v4] coding-style: Clarify the expectations around bool

2019-01-14 Thread Jason Gunthorpe
On Sun, Jan 13, 2019 at 08:49:36AM -0800, Matthew Wilcox wrote:
> On Thu, Jan 10, 2019 at 11:48:13PM +, Jason Gunthorpe wrote:
> > +The Linux kernel bool type is an alias for the C99 _Bool type. bool values 
> > can
> > +only evaluate to 0 or 1, and implicit or explicit conversion to bool
> > +automatically converts the value to true or false. When using bool types 
> > the
> > +!! construction is not needed, which eliminates a class of bugs.
> > +
> > +When working with bool values the true and false definitions should be used
> > +instead of 0 and 1.
> > +
> > +bool function return types and stack variables are always fine to use 
> > whenever
> > +appropriate. Use of bool is encouraged to improve readability and is often 
> > a
> > +better option than 'int' for storing boolean values.
> 
> It's awkward to start a sentence with a lower case letter.  How about
> rephrasing this paragraph and the following one as:
> 
>   Using bool as the return type of a function or as a variable is always
>   fine when appropriate.  It often improves readability and is a better option
>   than int for storing boolean values.  Using bool in data structures is
>   more debatable; its size and alignment can vary between architectures.

This is more concise, but I think if the coding style is not going to
give a concrete advise then it should at least provide some general
information so the reader can try and make an informed choice.

That is why I had it expand on some of the rationals a little bit,
along with a concrete direction to not use bool in the cases Linus
specifically called out.

> > +Do not use bool if cache line layout or size of the value matters, its size
> > +and alignment varies based on the compiled architecture. Structures that 
> > are
> > +optimized for alignment and size should not use bool.
> > +
> > +If a structure has many true/false values, consider consolidating them 
> > into a
> > +bitfield with 1 bit members, or using an appropriate fixed width type, 
> > such as
> > +u8.

JAson


Re: [PATCH v4] coding-style: Clarify the expectations around bool

2019-01-14 Thread Jason Gunthorpe
On Sun, Jan 13, 2019 at 05:01:39PM +0100, Federico Vaga wrote:

> > -17) Don't re-invent the kernel macros
> > +17) Using bool
> > +--
> > +
> > +The Linux kernel bool type is an alias for the C99 _Bool type. bool
> > values can
> > +only evaluate to 0 or 1, and implicit or explicit conversion to bool
> > +automatically converts the value to true or false. When using bool
> > types the
> > +!! construction is not needed, which eliminates a class of bugs.
> > +
> > +When working with bool values the true and false definitions should be
> > used
> > +instead of 0 and 1.
> 
> A very minor thing. I would suggest to keep consistent, in the
> statement, the mapping between definitions ("true and false [...]")
> and their correspondent integer values ("[...] instead of 1 and 0").
> 
> In few words, I propose to change "0 and 1" into "1 and 0".

Hm, sure, seems harmless

> > +Similarly for function arguments, many true/false values can be
> > consolidated
> > +into a single bitwise 'flags' argument and 'flags' can often a more
> > readable
> > +alternative if the call-sites have naked true/false constants.
> 
> Of course, English is not my primary language, but it looks to me
> that here a "be" is missing: "[...] and 'flags' can often a more
> readable alternative [...]".

yes, sthanks
 
> > +Otherwise limited use of bool in structures and arguments can improve
> > +readability.
> 
> I'm going to update the Italian translations for this. Do you want
> me to contribute directly to this patch? Otherwise I will send a
> dedicated patch later when this one get accepted.
 
I think you should send it as an update I guess? I don't really know
the process for translations

Jason


Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

2019-01-14 Thread Jonathan Corbet
On Mon, 14 Jan 2019 09:56:21 -0700
Jason Gunthorpe  wrote:

> >  Documentation/infiniband/user_verbs.rst | 70 
> > +
> >  Documentation/infiniband/user_verbs.txt | 69 
> > 
> >  2 files changed, 70 insertions(+), 69 deletions(-)
> >  create mode 100644 Documentation/infiniband/user_verbs.rst
> >  delete mode 100644 Documentation/infiniband/user_verbs.txt  
> 
> Thanks for getting this going Joe, I've been mulling over writing more
> docs for this area for a while now.
> 
> Jonathan/linux-doc: Can you Ack at least the build system parts of
> this please? I can take it through the rdma tree, unless you prefer
> otherwise?

Can I make a request?  This appears to be user-oriented documentation; can
we please place it into the userspace-api manual, rather than keeping it
in its own silo?

Thanks,

jon


RE: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Kazuhito Hagio
On 1/11/2019 7:33 AM, Borislav Petkov wrote:
> On Thu, Jan 10, 2019 at 08:19:43PM +0800, Lianbo Jiang wrote:
>> +init_uts_ns.name.release
>> +
>> +
>> +The version of the Linux kernel. Used to find the corresponding source
>> +code from which the kernel has been built.
>> +
> 
> ...
> 
>> +
>> +init_uts_ns
>> +---
>> +
>> +This is the UTS namespace, which is used to isolate two specific
>> +elements of the system that relate to the uname(2) system call. The UTS
>> +namespace is named after the data structure used to store information
>> +returned by the uname(2) system call.
>> +
>> +User-space tools can get the kernel name, host name, kernel release
>> +number, kernel version, architecture name and OS type from it.
> 
> Already asked this but no reply so lemme paste my question again:
> 
> "And this document already fulfills its purpose - those two vmcoreinfo
> exports are redundant and the first one can be removed.
> 
> And now that we agreed that VMCOREINFO is not an ABI and is very tightly
> coupled to the kernel version, init_uts_ns.name.release can be removed,
> yes?
> 
> Or is there anything speaking against that?"

As for makedumpfile, it will be not impossible to remove the
init_uts_ns.name.relase (OSRELEASE), but some fixes are needed.
Because historically OSRELEASE has been a kind of a mandatory entry
in vmcoreinfo from the beginning of vmcoreinfo, so makedumpfile uses
its existence to check whether a vmcoreinfo is sane.

Also, I think crash also will need to be fixed if it is removed.
So I hope it will be left as it is.

Thanks,
Kazu



Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 05:48:48PM +, Kazuhito Hagio wrote:
> As for makedumpfile, it will be not impossible to remove the
> init_uts_ns.name.relase (OSRELEASE), but some fixes are needed.
> Because historically OSRELEASE has been a kind of a mandatory entry
> in vmcoreinfo from the beginning of vmcoreinfo, so makedumpfile uses
> its existence to check whether a vmcoreinfo is sane.

Well, init_uts_ns is exported in vmcoreinfo anyway - makedumpfile
can simply test init_uts_ns.name.release just as well. And the
"historically" argument doesn't matter because vmcoreinfo is not an ABI.

So makedumpfile needs to be changed to check that new export.

> Also, I think crash also will need to be fixed if it is removed.

Yes, I'm expecting user tools to be fixed and then exports removed.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


[PATCH] docs/core-api/mm: fix GFP combinations section name

2019-01-14 Thread Mike Rapoport
Fix the mismatch between "Useful GFP flag combinations" section naming in
the DOC: section in include/linux/gfp.h and
Documentation/core-api/mm-api.rst

Signed-off-by: Mike Rapoport 
---
 Documentation/core-api/mm-api.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/core-api/mm-api.rst 
b/Documentation/core-api/mm-api.rst
index aa8e54b8..128e8a7 100644
--- a/Documentation/core-api/mm-api.rst
+++ b/Documentation/core-api/mm-api.rst
@@ -35,7 +35,7 @@ users will want to use a plain ``GFP_KERNEL``.
:doc: Reclaim modifiers
 
 .. kernel-doc:: include/linux/gfp.h
-   :doc: Common combinations
+   :doc: Useful GFP flag combinations
 
 The Slab Cache
 ==
-- 
2.7.4



Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

2019-01-14 Thread Joel Nider
Jonathan Corbet  wrote on 01/14/2019 07:34:21 PM:

> From: Jonathan Corbet 
> To: Jason Gunthorpe 
> Cc: Joel Nider , Leon Romanovsky , 
Doug 
> Ledford , Mike Rapoport , 
linux-
> d...@vger.kernel.org, linux-ker...@vger.kernel.org
> Date: 01/14/2019 07:37 PM
> Subject: Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to 
rst
> 
> On Mon, 14 Jan 2019 09:56:21 -0700
> Jason Gunthorpe  wrote:
> 
> > >  Documentation/infiniband/user_verbs.rst | 70 
+
> > >  Documentation/infiniband/user_verbs.txt | 69 

> > >  2 files changed, 70 insertions(+), 69 deletions(-)
> > >  create mode 100644 Documentation/infiniband/user_verbs.rst
> > >  delete mode 100644 Documentation/infiniband/user_verbs.txt 
> > 
> > Thanks for getting this going Joel, I've been mulling over writing 
more
> > docs for this area for a while now.
> > 
> > Jonathan/linux-doc: Can you Ack at least the build system parts of
> > this please? I can take it through the rdma tree, unless you prefer
> > otherwise?
> 
> Can I make a request?  This appears to be user-oriented documentation; 
can
> we please place it into the userspace-api manual, rather than keeping it
> in its own silo?

I knew someone was going to ask me to do that :-)
I agree that userspace stuff should all be together, but I guess for 
historical reasons this one stayed in the infiniband section. So if Jason 
is ok with moving the doc, I'll take a look in the morning to see how best 
to work that into the patchset.

> Thanks,
> 
> jon
> 




Re: [PATCH 1/3] docs-rst: infiniband: Convert user verbs doc to rst

2019-01-14 Thread Jason Gunthorpe
On Mon, Jan 14, 2019 at 08:52:14PM +0200, Joel Nider wrote:
> > > Jonathan/linux-doc: Can you Ack at least the build system parts of
> > > this please? I can take it through the rdma tree, unless you prefer
> > > otherwise?
> > 
> > Can I make a request?  This appears to be user-oriented documentation;  can
> > we please place it into the userspace-api manual, rather than keeping it
> > in its own silo?
> 
> I knew someone was going to ask me to do that :-)
> I agree that userspace stuff should all be together, but I guess for 
> historical reasons this one stayed in the infiniband section. So if Jason 
> is ok with moving the doc, I'll take a look in the morning to see how best 
> to work that into the patchset.

I don't mind

Jason


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 05:48:48PM +, Kazuhito Hagio wrote:
> > As for makedumpfile, it will be not impossible to remove the
> > init_uts_ns.name.relase (OSRELEASE), but some fixes are needed.
> > Because historically OSRELEASE has been a kind of a mandatory entry
> > in vmcoreinfo from the beginning of vmcoreinfo, so makedumpfile uses
> > its existence to check whether a vmcoreinfo is sane.
> 
> Well, init_uts_ns is exported in vmcoreinfo anyway - makedumpfile
> can simply test init_uts_ns.name.release just as well. And the
> "historically" argument doesn't matter because vmcoreinfo is not an ABI.
> 
> So makedumpfile needs to be changed to check that new export.
> 
> > Also, I think crash also will need to be fixed if it is removed.

Preferably it would be left as-is.  The crash utility has a "crash --osrelease 
vmcore"
option that only looks at the dumpfile header, and just dump the string.  With 
respect
to compressed kdumps, crash could alternatively look at the utsname data that 
is stored
in the diskdump_header.utsname field, but with ELF vmcores, there is no such 
back-up.
What's the problem with leaving it alone?

Dave


> 
> Yes, I'm expecting user tools to be fixed and then exports removed.
> 
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
> 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 01:58:32PM -0500, Dave Anderson wrote:
> Preferably it would be left as-is.  The crash utility has a "crash 
> --osrelease vmcore"
> option that only looks at the dumpfile header, and just dump the string.  
> With respect
> to compressed kdumps, crash could alternatively look at the utsname data that 
> is stored
> in the diskdump_header.utsname field, but with ELF vmcores, there is no such 
> back-up.

Well, there is:

  4f 53 52 45 4c 45 41 53  45 3d 35 2e 30 2e 30 2d  |OSRELEASE=5.0.0-|
0010  72 63 32 2b 0a 50 41 47  45 53 49 5a 45 3d 34 30  |rc2+.PAGESIZE=40|
0020  39 36 0a 53 59 4d 42 4f  4c 28 6d 65 6d 5f 73 65  |96.SYMBOL(mem_se|
0030  63 74 69 6f 6e 29 3d 66  66 66 66 66 66 66 66 38  |ction)=8|
0040  34 35 31 61 31 61 38 0a  53 59 4d 42 4f 4c 28 69  |451a1a8.SYMBOL(i|
0050  6e 69 74 5f 75 74 73 5f  6e 73 29 3d 66 66 66 66  |nit_uts_ns)=|
 
0060  66 66 66 66 38 32 30 31  33 35 34 30 0a 53 59 4d  |82013540
 

This address has it.

> What's the problem with leaving it alone?

The problem is that I'd like to get all those vmcoreinfo exports under
control and to not have people frivolously export whatever they feel
like, for obvious reasons, and to get rid of the duplicate/unused pieces
being part of vmcoreinfo.

I'm guessing removing OSRELEASE would simplify the kernel a bit by
getting rid of the VMCOREINFO_OSRELEASE define and export, and userspace
can read out the kernel version from init_uts_ns which is also exported.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v2 5/5] psi: introduce psi monitor

2019-01-14 Thread Suren Baghdasaryan
On Mon, Jan 14, 2019 at 2:22 AM Peter Zijlstra  wrote:
>
> On Thu, Jan 10, 2019 at 02:07:18PM -0800, Suren Baghdasaryan wrote:
> > +/*
> > + * psi_update_work represents slowpath accounting part while
> > + * psi_group_change represents hotpath part.
> > + * There are two potential races between these path:
> > + * 1. Changes to group->polling when slowpath checks for new stall, then
> > + *hotpath records new stall and then slowpath resets group->polling
> > + *flag. This leads to the exit from the polling mode while monitored
> > + *states are still changing.
> > + * 2. Slowpath overwriting an immediate update scheduled from the hotpath
> > + *with a regular update further in the future and missing the
> > + *immediate update.
> > + * Both races are handled with a retry cycle in the slowpath:
> > + *
> > + *HOTPATH: |SLOWPATH:
> > + * |
> > + * A) times[cpu] += delta  | E) delta = times[*]
> > + * B) start_poll = (delta[poll_mask] &&|if delta[poll_mask]:
> > + *  cmpxchg(g->polling, 0, 1) == 0)| F)   polling_until = now +
> > + * |  grace_period
> > + * |if now > polling_until:
> > + *if start_poll:   |  if g->polling:
> > + * C)   mod_delayed_work(1)| G) g->polling = polling = 0
> > + *else if !delayed_work_pending(): | H) goto SLOWPATH
> > + * D)   schedule_delayed_work(PSI_FREQ)|else:
> > + * |  if !g->polling:
> > + * | I) g->polling = polling = 1
> > + * | J) if delta && first_pass:
> > + * |  next_avg = 
> > calculate_averages()
> > + * |  if polling:
> > + * |next_poll = poll_triggers()
> > + * |if (delta && first_pass) || 
> > polling:
> > + * | K)   mod_delayed_work(
> > + * |  min(next_avg, next_poll))
> > + * |  if !polling:
> > + * |first_pass = false
> > + * | L) goto SLOWPATH
> > + *
> > + * Race #1 is represented by (EABGD) sequence in which case slowpath
> > + * deactivates polling mode because it misses new monitored stall and 
> > hotpath
> > + * doesn't activate it because at (B) g->polling is not yet reset by 
> > slowpath
> > + * in (G). This race is handled by the (H) retry, which in the race 
> > described
> > + * above results in the new sequence of (EABGDHEIK) that reactivates 
> > polling
> > + * mode.
> > + *
> > + * Race #2 is represented by polling==false && (JABCK) sequence which
> > + * overwrites immediate update scheduled at (C) with a later (next_avg) 
> > update
> > + * scheduled at (K). This race is handled by the (L) retry which results 
> > in the
> > + * new sequence of polling==false && (JABCKLEIK) that reactivates polling 
> > mode
> > + * and reschedules next polling update (next_poll).
> > + *
> > + * Note that retries can't result in an infinite loop because retry #1 
> > happens
> > + * only during polling reactivation and retry #2 happens only on the first
> > + * pass. Constant reactivations are impossible because polling will stay 
> > active
> > + * for at least grace_period. Worst case scenario involves two retries 
> > (HEJKLE)
> > + */
>
> I'm having a fairly hard time with this. There's a distinct lack of
> memory ordering, and a suspicious mixing of atomic ops (cmpxchg) and
> regular loads and stores (without READ_ONCE/WRITE_ONCE even).
>
> Please clarify.

Thanks for the feedback.
I do mix atomic and regular loads with g->polling only because the
slowpath is the only one that resets it back to 0, so
cmpxchg(g->polling, 1, 0) == 1 at (G) would always return 1.
Setting g->polling back to 1 at (I) indeed needs an atomic operation
but at that point it does not matter whether hotpath or slowpath sets
it. In either case we will schedule a polling update.
Am I missing anything?

For memory ordering (which Johannes also pointed out) the critical point is:

times[cpu] += delta   | if g->polling:
smp_wmb() |   g->polling = polling = 0
cmpxchg(g->polling, 0, 1) |   smp_rmb()
  |   delta = times[*] (through goto SLOWPATH)

So that hotpath writes to times[] then g->polling and slowpath reads
g->polling then times[]. cmpxchg() implies a full barrier, so we can
drop smp_wmb(). Something like this:

times[cpu] += delta   | if g->polling:
cmpxchg(g->polling, 0, 1) |   g->polling = polling = 0
  |   smp_rmb()
  |   delta = t

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson




- Original Message -
> On Mon, Jan 14, 2019 at 01:58:32PM -0500, Dave Anderson wrote:
> > Preferably it would be left as-is.  The crash utility has a "crash
> > --osrelease vmcore"
> > option that only looks at the dumpfile header, and just dump the string.
> > With respect
> > to compressed kdumps, crash could alternatively look at the utsname data
> > that is stored
> > in the diskdump_header.utsname field, but with ELF vmcores, there is no
> > such back-up.
> 
> Well, there is:
> 
>   4f 53 52 45 4c 45 41 53  45 3d 35 2e 30 2e 30 2d |OSRELEASE=5.0.0-|
> 0010  72 63 32 2b 0a 50 41 47  45 53 49 5a 45 3d 34 30 |rc2+.PAGESIZE=40|
> 0020  39 36 0a 53 59 4d 42 4f  4c 28 6d 65 6d 5f 73 65 |96.SYMBOL(mem_se|
> 0030  63 74 69 6f 6e 29 3d 66  66 66 66 66 66 66 66 38 |ction)=8|
> 0040  34 35 31 61 31 61 38 0a  53 59 4d 42 4f 4c 28 69 |451a1a8.SYMBOL(i|
> 0050  6e 69 74 5f 75 74 73 5f  6e 73 29 3d 66 66 66 66 |nit_uts_ns)=|
>  
> 0060  66 66 66 66 38 32 30 31  33 35 34 30 0a 53 59 4d  |82013540
>  
> 
> This address has it.

There's no reading of the dumpfile's memory involved, and that being the case,
the vmlinux file is not utilized.  That's the whole point of the crash option, 
i.e.,
taking a vmcore file, and trying to determine what kernel should be used with 
it:

  $ man crash
  ...
   --osrelease dumpfile
  Display the OSRELEASE vmcoreinfo string from a kdump dumpfile 
header.
  ...


> 
> > What's the problem with leaving it alone?
> 
> The problem is that I'd like to get all those vmcoreinfo exports under
> control and to not have people frivolously export whatever they feel
> like, for obvious reasons, and to get rid of the duplicate/unused pieces
> being part of vmcoreinfo.

Well, I just don't agree that the OSRELEASE item is "frivolous".  It's been in 
place,
and depended upon, for many years.

Dave



Re: [PATCH v2 5/5] psi: introduce psi monitor

2019-01-14 Thread Johannes Weiner
On Mon, Jan 14, 2019 at 11:30:12AM -0800, Suren Baghdasaryan wrote:
> On Mon, Jan 14, 2019 at 2:22 AM Peter Zijlstra  wrote:
> >
> > On Thu, Jan 10, 2019 at 02:07:18PM -0800, Suren Baghdasaryan wrote:
> > > +/*
> > > + * psi_update_work represents slowpath accounting part while
> > > + * psi_group_change represents hotpath part.
> > > + * There are two potential races between these path:
> > > + * 1. Changes to group->polling when slowpath checks for new stall, then
> > > + *hotpath records new stall and then slowpath resets group->polling
> > > + *flag. This leads to the exit from the polling mode while monitored
> > > + *states are still changing.
> > > + * 2. Slowpath overwriting an immediate update scheduled from the hotpath
> > > + *with a regular update further in the future and missing the
> > > + *immediate update.
> > > + * Both races are handled with a retry cycle in the slowpath:
> > > + *
> > > + *HOTPATH: |SLOWPATH:
> > > + * |
> > > + * A) times[cpu] += delta  | E) delta = times[*]
> > > + * B) start_poll = (delta[poll_mask] &&|if delta[poll_mask]:
> > > + *  cmpxchg(g->polling, 0, 1) == 0)| F)   polling_until = now +
> > > + * |  grace_period
> > > + * |if now > polling_until:
> > > + *if start_poll:   |  if g->polling:
> > > + * C)   mod_delayed_work(1)| G) g->polling = polling = 0
> > > + *else if !delayed_work_pending(): | H) goto SLOWPATH
> > > + * D)   schedule_delayed_work(PSI_FREQ)|else:
> > > + * |  if !g->polling:
> > > + * | I) g->polling = polling = 1
> > > + * | J) if delta && first_pass:
> > > + * |  next_avg = 
> > > calculate_averages()
> > > + * |  if polling:
> > > + * |next_poll = 
> > > poll_triggers()
> > > + * |if (delta && first_pass) || 
> > > polling:
> > > + * | K)   mod_delayed_work(
> > > + * |  min(next_avg, 
> > > next_poll))
> > > + * |  if !polling:
> > > + * |first_pass = false
> > > + * | L) goto SLOWPATH
> > > + *
> > > + * Race #1 is represented by (EABGD) sequence in which case slowpath
> > > + * deactivates polling mode because it misses new monitored stall and 
> > > hotpath
> > > + * doesn't activate it because at (B) g->polling is not yet reset by 
> > > slowpath
> > > + * in (G). This race is handled by the (H) retry, which in the race 
> > > described
> > > + * above results in the new sequence of (EABGDHEIK) that reactivates 
> > > polling
> > > + * mode.
> > > + *
> > > + * Race #2 is represented by polling==false && (JABCK) sequence which
> > > + * overwrites immediate update scheduled at (C) with a later (next_avg) 
> > > update
> > > + * scheduled at (K). This race is handled by the (L) retry which results 
> > > in the
> > > + * new sequence of polling==false && (JABCKLEIK) that reactivates 
> > > polling mode
> > > + * and reschedules next polling update (next_poll).
> > > + *
> > > + * Note that retries can't result in an infinite loop because retry #1 
> > > happens
> > > + * only during polling reactivation and retry #2 happens only on the 
> > > first
> > > + * pass. Constant reactivations are impossible because polling will stay 
> > > active
> > > + * for at least grace_period. Worst case scenario involves two retries 
> > > (HEJKLE)
> > > + */
> >
> > I'm having a fairly hard time with this. There's a distinct lack of
> > memory ordering, and a suspicious mixing of atomic ops (cmpxchg) and
> > regular loads and stores (without READ_ONCE/WRITE_ONCE even).
> >
> > Please clarify.
> 
> Thanks for the feedback.
> I do mix atomic and regular loads with g->polling only because the
> slowpath is the only one that resets it back to 0, so
> cmpxchg(g->polling, 1, 0) == 1 at (G) would always return 1.
> Setting g->polling back to 1 at (I) indeed needs an atomic operation
> but at that point it does not matter whether hotpath or slowpath sets
> it. In either case we will schedule a polling update.
> Am I missing anything?
> 
> For memory ordering (which Johannes also pointed out) the critical point is:
> 
> times[cpu] += delta   | if g->polling:
> smp_wmb() |   g->polling = polling = 0
> cmpxchg(g->polling, 0, 1) |   smp_rmb()
>   |   delta = times[*] (through goto SLOWPATH)
> 
> So that hotpath writes to times[] then g->polling and slowpath reads
> g->polling then times

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 02:36:47PM -0500, Dave Anderson wrote:
> There's no reading of the dumpfile's memory involved, and that being the case,
> the vmlinux file is not utilized.  That's the whole point of the crash 
> option, i.e.,
> taking a vmcore file, and trying to determine what kernel should be used with 
> it:
> 
>   $ man crash
>   ...
>--osrelease dumpfile
>   Display the OSRELEASE vmcoreinfo string from a kdump dumpfile 
> header.

I don't understand - if you have the vmcoreinfo (which I assume is part
of the vmcore, yes, no?) you can go and dig out the kernel version from
it, no?

Why should you not utilize the vmcore file?

(I'm most likely missing something.)

> Well, I just don't agree that the OSRELEASE item is "frivolous". It's
> been in place, and depended upon, for many years.

Yeah, no. The ABI argument is moot in this case as in the last couple
of months people have been persuading me that vmcoreinfo is not ABI. So
you guys need to make up your mind what is it. And if it is an ABI, it
wasn't documented anywhere.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 02:36:47PM -0500, Dave Anderson wrote:
> > There's no reading of the dumpfile's memory involved, and that being the 
> > case,
> > the vmlinux file is not utilized.  That's the whole point of the crash 
> > option, i.e.,
> > taking a vmcore file, and trying to determine what kernel should be used
> > with it:
> > 
> >   $ man crash
> >   ...
> >--osrelease dumpfile
> >   Display the OSRELEASE vmcoreinfo string from a kdump dumpfile 
> > header.
> 
> I don't understand - if you have the vmcoreinfo (which I assume is part
> of the vmcore, yes, no?) you can go and dig out the kernel version from
> it, no?
> 
> Why should you not utilize the vmcore file?

That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and 
pulls out the OSRELEASE string from it, so that a user can determine what
vmlinux file should be used with that vmcore for normal crash analysis.

Dave

> 
> (I'm most likely missing something.)
> 
> > Well, I just don't agree that the OSRELEASE item is "frivolous". It's
> > been in place, and depended upon, for many years.
> 
> Yeah, no. The ABI argument is moot in this case as in the last couple
> of months people have been persuading me that vmcoreinfo is not ABI. So
> you guys need to make up your mind what is it. And if it is an ABI, it
> wasn't documented anywhere.
> 
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
> 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 03:07:33PM -0500, Dave Anderson wrote:
> That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and 
> pulls out the OSRELEASE string from it, so that a user can determine what
> vmlinux file should be used with that vmcore for normal crash analysis.

And the vmcoreinfo is part of the vmcore, right?

So it can just as well read out the address of init_uts_ns and get the
kernel version from there.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 03:07:33PM -0500, Dave Anderson wrote:
> > That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and
> > pulls out the OSRELEASE string from it, so that a user can determine what
> > vmlinux file should be used with that vmcore for normal crash analysis.
> 
> And the vmcoreinfo is part of the vmcore, right?

Correct.

> 
> So it can just as well read out the address of init_uts_ns and get the
> kernel version from there.

No.  It needs *both* the vmlinux file and the vmcore file in order to read 
kernel
virtual memory, so just having a kernel virtual address is insufficient.

So it's a chicken-and-egg situation.  This particular --osrelease option is used
to determine *what* vmlinux file would be required for an actual crash analysis 
session.

Dave

 


Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov
On Mon, Jan 14, 2019 at 03:26:32PM -0500, Dave Anderson wrote:
> No.  It needs *both* the vmlinux file and the vmcore file in order to read 
> kernel
> virtual memory, so just having a kernel virtual address is insufficient.
> 
> So it's a chicken-and-egg situation.  This particular --osrelease option is 
> used
> to determine *what* vmlinux file would be required for an actual crash 
> analysis 
> session.

Ok, that makes sense. I could've used that explanation when reviewing
the documentation. Do you mind skimming through this:

https://lkml.kernel.org/r/2019045650.gg4...@zn.tnic

in case we've missed explaining relevant usage - like that above - of
some of the vmcoreinfo members?

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v2 5/5] psi: introduce psi monitor

2019-01-14 Thread Suren Baghdasaryan
On Mon, Jan 14, 2019 at 11:42 AM Johannes Weiner  wrote:
>
> On Mon, Jan 14, 2019 at 11:30:12AM -0800, Suren Baghdasaryan wrote:
> > On Mon, Jan 14, 2019 at 2:22 AM Peter Zijlstra  wrote:
> > >
> > > On Thu, Jan 10, 2019 at 02:07:18PM -0800, Suren Baghdasaryan wrote:
> > > > +/*
> > > > + * psi_update_work represents slowpath accounting part while
> > > > + * psi_group_change represents hotpath part.
> > > > + * There are two potential races between these path:
> > > > + * 1. Changes to group->polling when slowpath checks for new stall, 
> > > > then
> > > > + *hotpath records new stall and then slowpath resets group->polling
> > > > + *flag. This leads to the exit from the polling mode while 
> > > > monitored
> > > > + *states are still changing.
> > > > + * 2. Slowpath overwriting an immediate update scheduled from the 
> > > > hotpath
> > > > + *with a regular update further in the future and missing the
> > > > + *immediate update.
> > > > + * Both races are handled with a retry cycle in the slowpath:
> > > > + *
> > > > + *HOTPATH: |SLOWPATH:
> > > > + * |
> > > > + * A) times[cpu] += delta  | E) delta = times[*]
> > > > + * B) start_poll = (delta[poll_mask] &&|if delta[poll_mask]:
> > > > + *  cmpxchg(g->polling, 0, 1) == 0)| F)   polling_until = now +
> > > > + * |  grace_period
> > > > + * |if now > polling_until:
> > > > + *if start_poll:   |  if g->polling:
> > > > + * C)   mod_delayed_work(1)| G) g->polling = polling = > > > > 0
> > > > + *else if !delayed_work_pending(): | H) goto SLOWPATH
> > > > + * D)   schedule_delayed_work(PSI_FREQ)|else:
> > > > + * |  if !g->polling:
> > > > + * | I) g->polling = polling = 
> > > > 1
> > > > + * | J) if delta && first_pass:
> > > > + * |  next_avg = 
> > > > calculate_averages()
> > > > + * |  if polling:
> > > > + * |next_poll = 
> > > > poll_triggers()
> > > > + * |if (delta && first_pass) 
> > > > || polling:
> > > > + * | K)   mod_delayed_work(
> > > > + * |  min(next_avg, 
> > > > next_poll))
> > > > + * |  if !polling:
> > > > + * |first_pass = false
> > > > + * | L) goto SLOWPATH
> > > > + *
> > > > + * Race #1 is represented by (EABGD) sequence in which case slowpath
> > > > + * deactivates polling mode because it misses new monitored stall and 
> > > > hotpath
> > > > + * doesn't activate it because at (B) g->polling is not yet reset by 
> > > > slowpath
> > > > + * in (G). This race is handled by the (H) retry, which in the race 
> > > > described
> > > > + * above results in the new sequence of (EABGDHEIK) that reactivates 
> > > > polling
> > > > + * mode.
> > > > + *
> > > > + * Race #2 is represented by polling==false && (JABCK) sequence which
> > > > + * overwrites immediate update scheduled at (C) with a later 
> > > > (next_avg) update
> > > > + * scheduled at (K). This race is handled by the (L) retry which 
> > > > results in the
> > > > + * new sequence of polling==false && (JABCKLEIK) that reactivates 
> > > > polling mode
> > > > + * and reschedules next polling update (next_poll).
> > > > + *
> > > > + * Note that retries can't result in an infinite loop because retry #1 
> > > > happens
> > > > + * only during polling reactivation and retry #2 happens only on the 
> > > > first
> > > > + * pass. Constant reactivations are impossible because polling will 
> > > > stay active
> > > > + * for at least grace_period. Worst case scenario involves two retries 
> > > > (HEJKLE)
> > > > + */
> > >
> > > I'm having a fairly hard time with this. There's a distinct lack of
> > > memory ordering, and a suspicious mixing of atomic ops (cmpxchg) and
> > > regular loads and stores (without READ_ONCE/WRITE_ONCE even).
> > >
> > > Please clarify.
> >
> > Thanks for the feedback.
> > I do mix atomic and regular loads with g->polling only because the
> > slowpath is the only one that resets it back to 0, so
> > cmpxchg(g->polling, 1, 0) == 1 at (G) would always return 1.
> > Setting g->polling back to 1 at (I) indeed needs an atomic operation
> > but at that point it does not matter whether hotpath or slowpath sets
> > it. In either case we will schedule a polling update.
> > Am I missing anything?
> >
> > For memory ordering (which Johannes also pointed out) the critical point is:
> >
> > times[cpu] += delta   | if g->poll

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Dave Anderson



- Original Message -
> On Mon, Jan 14, 2019 at 03:26:32PM -0500, Dave Anderson wrote:
> > No.  It needs *both* the vmlinux file and the vmcore file in order to read
> > kernel
> > virtual memory, so just having a kernel virtual address is insufficient.
> > 
> > So it's a chicken-and-egg situation.  This particular --osrelease option is 
> > used
> > to determine *what* vmlinux file would be required for an actual crash 
> > analysis
> > session.
> 
> Ok, that makes sense. I could've used that explanation when reviewing
> the documentation. Do you mind skimming through this:
> 
> https://lkml.kernel.org/r/2019045650.gg4...@zn.tnic
> 
> in case we've missed explaining relevant usage - like that above - of
> some of the vmcoreinfo members?

Yeah, I've been watching the thread, and the document looks fine to me.
It's just that when I saw the discussion of this one being removed that
I felt the need to respond...  ;-)

Dave


Re: [RFC PATCH] x86/speculation: Don't inherit TIF_SSBD on execve()

2019-01-14 Thread Waiman Long
On 01/11/2019 02:52 PM, Thomas Gleixner wrote:
> On Wed, 19 Dec 2018, Waiman Long wrote:
>
>> With the default SPEC_STORE_BYPASS_SECCOMP/SPEC_STORE_BYPASS_PRCTL mode,
>> the TIF_SSBD bit will be inherited when a new task is fork'ed or cloned.
>>
>> As only certain class of applications (like Java) requires disabling
>> speculative store bypass for security purpose, it may not make sense to
>> allow the TIF_SSBD bit to be inherited across execve() boundary where the
>> new application may not need SSBD at all and is probably not aware that
>> SSBD may have been turned on. This may cause an unnecessary performance
>> loss of up to 20% in some cases.
> Lot's of MAY's here. Aside of that this fundamentally changes the
> behaviour. I'm not really a fan of doing that.
>
> If there are good reasons to have a non-inherited variant, then we rather
> introduce that instead of changing the existing semantics without a way for
> existing userspace to notice.

I understand your point. How about adding a ",noexec" auxillary option
to the spec_store_bypass_disable command line to activate this new
behavior without changing the default. Will that be acceptable?

Cheers,
Longman


Re: [PATCH v10 09/12] Documentation: hwmon: Add documents for PECI hwmon client drivers

2019-01-14 Thread Jae Hyun Yoo

On 1/14/2019 3:43 AM, Joel Stanley wrote:

On Tue, 8 Jan 2019 at 08:11, Jae Hyun Yoo  wrote:


This commit adds hwmon documents for PECI cputemp and dimmtemp drivers.

Cc: Guenter Roeck 
Cc: Jean Delvare 
Cc: Jonathan Corbet 
Cc: Jason M Biils 
Cc: Randy Dunlap 
Signed-off-by: Jae Hyun Yoo 
Reviewed-by: Haiyue Wang 
Reviewed-by: James Feist 
Reviewed-by: Vernon Mauery 
Acked-by: Guenter Roeck 


Reviewed-by: Joel Stanley 



Hi Joel,

Thank you so much for your careful review on this patch series. I'll
submit v11 soon to address your review comments.

Regards,
Jae


Re: [PATCH v10 08/12] mfd: intel-peci-client: Add PECI client driver

2019-01-14 Thread Jae Hyun Yoo

On 1/14/2019 3:42 AM, Joel Stanley wrote:

On Tue, 8 Jan 2019 at 08:11, Jae Hyun Yoo  wrote:


This commit adds PECI client driver.


It looks like it's a PECI driver for the three CPU families, and it
implements cpu and dimm temp, with sideband functions deferred to the
future. If you add that information with a few more words it would
make for a nicer commit message.



Yes, that would be nicer. I'll add the description into the commit
message.

Thanks for your review!

Regards,
Jae


Signed-off-by: Jae Hyun Yoo 


Reviewed-by: Joel Stanley 



Re: [PATCH v2] cpuidle: Add 'above' and 'below' idle state metrics

2019-01-14 Thread Rafael J. Wysocki
On Mon, Jan 14, 2019 at 11:39 AM Daniel Lezcano
 wrote:
>
>
> Hi Rafael,
>
> sorry for the delay.
>
> On 10/01/2019 11:20, Rafael J. Wysocki wrote:
>
> [ ... ]
>
> >>>   if (entered_state >= 0) {
> >>> + s64 diff, delay = drv->states[entered_state].exit_latency;
> >>> + int i;
> >>> +
> >>>   /*
> >>>* Update cpuidle counters
> >>>* This can be moved to within driver enter routine,
> >>> @@ -260,6 +262,33 @@ int cpuidle_enter_state(struct cpuidle_d
> >>>   dev->last_residency = (int)diff;
> >>
> >> Shouldn't we subtract the 'delay' from the computed 'diff' in any case ?
> >
> > No.
> >
> >> Otherwise the 'last_residency' accumulates the effective sleep time and
> >> the time to wakeup. We are interested in the sleep time only for
> >> prediction and metrics no ?
> >
> > Yes, but 'delay' is the worst-case latency and not the actual one
> > experienced, most of the time, and (on average) we would underestimate
> > the sleep time if it was always subtracted.
>
> IMO, the exit latency is more or less constant for the cpu power down
> state. When it is the cluster power down state, the first cpu waking up
> has the worst latency, then the others have the same has the cpu power
> down state.
>
> If we can model that, the gray area you mention below can be reduced.
> There are platform where the exit latency is very high [1] and not
> taking it into account will give very wrong metrics.

That is kind of a special case, though, and there is no way for the
cpuidle core do distinguish it from all of the other cases.

> > The idea here is to only count the wakeup as 'above' if the total
> > 'last_residency' is below the target residency of the idle state that
> > was asked for (as in that case we know for certain that the CPU has
> > been woken up too early) and to only count it as 'below' if the
> > difference between 'last_residency' and 'delay' is greater than or
> > equal to the target residency of a deeper idle state (as in that case
> > we know for certain that the CPU has been woken up too late).
> >
> > Of course, this means that there is a "gray area" in which we are not
> > really sure if the sleep time has matched the idle state that was
> > asked for, but there's not much we can do about that IMO.
>
> There is another aspect of the metric which can be improved, the 'above'
> and the 'below' give an rough indication about the correctness of the
> prediction but they don't tell us which idle state we should have
> selected (we can be constantly choosing state3 instead of state1 for
> example).
>
> It would be nice to add a 'missed' field for each idle states, so when
> we check if there is a 'above' or a 'below' condition, we increment the
> idle state 'missed' field for the idle state we should have selected.

That's a governor's job however.  That's why there is the ->reflect
governor callback after all, among other things.


Re: [PATCH v1 1/2] Documentation/filesystems: add binderfs

2019-01-14 Thread Jonathan Corbet
On Fri, 11 Jan 2019 14:40:59 +0100
Christian Brauner  wrote:

> This documents the Android binderfs filesystem used to dynamically add and
> remove binder devices that are private to each instance.
> 
> Signed-off-by: Christian Brauner 

Two quick notes:

> ---
> /* Changelog */
> v1:
> - switch from *.txt to *.rst format
> ---
>  Documentation/filesystems/binderfs.rst | 70 ++
>  1 file changed, 70 insertions(+)
>  create mode 100644 Documentation/filesystems/binderfs.rst

You didn't add it to index.rst, so it won't actually become part of the
docs build.

> diff --git a/Documentation/filesystems/binderfs.rst 
> b/Documentation/filesystems/binderfs.rst
> new file mode 100644
> index ..74a744b42db7
> --- /dev/null
> +++ b/Documentation/filesystems/binderfs.rst
> @@ -0,0 +1,70 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +The Android binderfs Filesystem
> +===
> +
> +Android binderfs is a filesystem for the Android binder IPC mechanism.  It
> +allows to dynamically add and remove binder devices at runtime.  Binder 
> devices
> +located in a new binderfs instance are independent of binder devices located 
> in
> +other binderfs instances.  Mounting a new binderfs instance makes it possible
> +to get a set of private binder devices.
> +
> +Mounting binderfs
> +-
> +
> +Android binderfs can be mounted with:
> +
> +::

This can be more readably formatted as:

Android binderfs can be mounted with::

I've applied the patches, taking the liberty of fixing both of those
things up.  Thanks!

jon


Re: [PATCH] Documentation/sysctl/vm.txt: Fix drop_caches bit number

2019-01-14 Thread Jonathan Corbet
On Fri, 11 Jan 2019 17:14:10 +0100
Vincent Whitchurch  wrote:

> Bits are usually numbered starting from zero, so 4 should be bit 2, not
> bit 3.
> 
> Suggested-by: Matthew Wilcox 
> Signed-off-by: Vincent Whitchurch 
> ---
>  Documentation/sysctl/vm.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 187ce4f599a2..6af24cdb25cc 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -237,7 +237,7 @@ used:
>   cat (1234): drop_caches: 3
>  
>  These are informational only.  They do not mean that anything is wrong
> -with your system.  To disable them, echo 4 (bit 3) into drop_caches.
> +with your system.  To disable them, echo 4 (bit 2) into drop_caches.

Applied, thanks.

jon


Re: [PATCH] Documentation/dev-tools: Use gcc version number instead svn revision number

2019-01-14 Thread Jonathan Corbet
On Mon, 14 Jan 2019 11:08:07 +0100
Sebastian Andrzej Siewior  wrote:

> svn commit 231296 matches commit d29e939c63b71 ("Add fuzzing coverage
> support") in the gcc git. The change is part of gcc 6.1.0.
> 
> Replace the svn commit number with a gcc version which everyone can
> easily compare.
> 
> Signed-off-by: Sebastian Andrzej Siewior 

Makes sense to me.  Applied, thanks.

jon


Re: [PATCH] docs/core-api: memory-allocation: add mention of kmem_cache_create_userspace

2019-01-14 Thread Jonathan Corbet
On Mon, 14 Jan 2019 13:47:34 +0200
Mike Rapoport  wrote:

> Mention that when a part of a slab cache might be exported to the
> userspace, the cache should be created using kmem_cache_create_usercopy()
> 
> Signed-off-by: Mike Rapoport 

Hmm...I didn't know that :)

Applied, thanks.

jon


Re: [PATCH] docs/core-api/mm: fix GFP combinations section name

2019-01-14 Thread Jonathan Corbet
On Mon, 14 Jan 2019 20:32:58 +0200
Mike Rapoport  wrote:

> Fix the mismatch between "Useful GFP flag combinations" section naming in
> the DOC: section in include/linux/gfp.h and
> Documentation/core-api/mm-api.rst
> 
> Signed-off-by: Mike Rapoport 
> ---
>  Documentation/core-api/mm-api.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/core-api/mm-api.rst 
> b/Documentation/core-api/mm-api.rst
> index aa8e54b8..128e8a7 100644
> --- a/Documentation/core-api/mm-api.rst
> +++ b/Documentation/core-api/mm-api.rst
> @@ -35,7 +35,7 @@ users will want to use a plain ``GFP_KERNEL``.
> :doc: Reclaim modifiers
>  
>  .. kernel-doc:: include/linux/gfp.h
> -   :doc: Common combinations
> +   :doc: Useful GFP flag combinations

This also eliminates one warning:

./include/linux/gfp.h:1: warning: no structured comments found

I'll add that to the changelog.  Applied, thanks.

jon


Re: [PATCH v2] docs-rst: doc-guide: Minor grammar fixes

2019-01-14 Thread Jonathan Corbet
On Mon, 14 Jan 2019 09:14:59 +0200
Joel Nider  wrote:

> While using this guide to learn the new documentation method, I saw
> a few phrases that I felt could be improved. These small changes
> improve the grammar and choice of words to further enhance the
> installation instructions.
> 
> Signed-off-by: Joel Nider 
> Acked-by: Matthew Wilcox 
> ---
> v2: address Matthew's comment

Applied, thanks.

jon


Re: [PATCH] fgraph: record function return value

2019-01-14 Thread Changbin Du
Hi Steven,
On Mon, Jan 14, 2019 at 11:21:15AM -0500, Steven Rostedt wrote:
> On Sat, 12 Jan 2019 14:57:01 +0800
> Changbin Du  wrote:
> 
> > This patch adds a new trace option 'funcgraph-retval' and is disabled by
> > default. When this option is enabled, fgraph tracer will show the return
> > value of each function. This is useful to find/analyze a original error
> > source in a call graph.
> > 
> > One limitation is that kernel doesn't know the prototype of functions. So
> > fgraph assumes all functions have a retvalue of type int. You must ignore
> > the value of *void* function. And if the retvalue looks like an error code
> > then both hexadecimal and decimal number are displayed.
> > 
> > In this patch, only x86 and ARM platforms are supported.
> > 
> > Here is example showing the error is caused by vmx_create_vcpu() and the
> > error code is -5 (-EIO).
> > 
> > with echo 1 > /sys/kernel/debug/tracing/options/funcgraph-retval
> > 
> >  3)   |  kvm_vm_ioctl() {
> >  3)   |mutex_lock() {
> >  3)   |  _cond_resched() {
> >  3)   0.234 us|rcu_all_qs(); /* ret=0x8000 */
> >  3)   0.704 us|  } /* ret=0x0 */
> >  3)   1.226 us|} /* ret=0x0 */
> >  3)   0.247 us|mutex_unlock(); /* ret=0x8880738ed040 */
> >  3)   |kvm_arch_vcpu_create() {
> >  3)   |  vmx_create_vcpu() {
> >  3) + 17.969 us   |kmem_cache_alloc(); /* ret=0x88813a980040 */
> >  3) + 15.948 us   |kmem_cache_alloc(); /* ret=0x88813aa99200 */
> >  3)   0.653 us|allocate_vpid.part.88(); /* ret=0x1 */
> >  3)   6.964 us|kvm_vcpu_init(); /* ret=0xfffb */
> >  3)   0.323 us|free_vpid.part.89(); /* ret=0x1 */
> >  3)   9.985 us|kmem_cache_free(); /* ret=0x8000 */
> >  3)   9.491 us|kmem_cache_free(); /* ret=0x8000 */
> >  3) + 69.858 us   |  } /* ret=0xfffb/-5 */
> >  3) + 70.631 us   |} /* ret=0xfffb/-5 */
> >  3)   |mutex_lock() {
> >  3)   |  _cond_resched() {
> >  3)   0.199 us|rcu_all_qs(); /* ret=0x8000 */
> >  3)   0.594 us|  } /* ret=0x0 */
> >  3)   1.067 us|} /* ret=0x0 */
> >  3)   0.337 us|mutex_unlock(); /* ret=0x8880738ed040 */
> >  3) + 92.730 us   |  } /* ret=0xfffb/-5 */
> > 
> > Signed-off-by: Changbin Du 
> > ---
> >
> 
> Hi Changbin,
> 
> I'm rewriting a lot of the function graph tracer code to have
> kretprobes be able to work on top of it. It's still a work in progress.
> It would be easier to add something to that work when its done than to
> do it now.
> 
I cann't wait to see it! I can rebase my cheanges after your work. Thanks!

> Thanks!
> 
> -- Steve

-- 
Cheers,
Changbin Du