Re: [PATCH v6 02/13] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-15 Thread John Hubbard

On 8/15/21 8:37 AM, Christoph Hellwig wrote:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ae31622deef..d48a1f0889d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1218,7 +1218,7 @@ __maybe_unused struct page *try_grab_compound_head(struct 
page *page, int refs,
  static inline __must_check bool try_get_page(struct page *page)
  {
page = compound_head(page);
-   if (WARN_ON_ONCE(page_ref_count(page) <= 0))
+   if (WARN_ON_ONCE(page_ref_count(page) < 
(int)!is_zone_device_page(page)))


Please avoid the overly long line.  In fact I'd be tempted to just not
bother here and keep the old, more lose check.  Especially given that
John has a patch ready that removes try_get_page entirely.



Yes. Andrew has accepted it into mmotm.

Ralph's patch here was written well before my cleanup that removed
try_grab_page() [1]. But now that we're here, if you drop this hunk then
it will make merging easier, I think.


[1] https://lore.kernel.org/r/20210813044133.1536842-4-jhubb...@nvidia.com

thanks,
--
John Hubbard
NVIDIA


Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-24 Thread John Hubbard

On 8/24/21 2:32 AM, Christian König wrote:

Am 24.08.21 um 11:06 schrieb Gal Pressman:

On 23/08/2021 13:43, Christian König wrote:

Am 21.08.21 um 11:16 schrieb Gal Pressman:

On 20/08/2021 17:32, Jason Gunthorpe wrote:

On Fri, Aug 20, 2021 at 03:58:33PM +0300, Gal Pressman wrote:

...

IIUC, we're talking about three different exporter "types":
- Dynamic with move_notify (requires ODP)
- Dynamic with revoke_notify
- Static

Which changes do we need to make the third one work?

Basically none at all in the framework.

You just need to properly use the dma_buf_pin() function when you start using a
buffer (e.g. before you create an attachment) and the dma_buf_unpin() function
after you are done with the DMA-buf.

I replied to your previous mail, but I'll ask again.
Doesn't the pin operation migrate the memory to host memory?


Sorry missed your previous reply.

And yes at least for the amdgpu driver we migrate the memory to host memory as soon as it is pinned 
and I would expect that other GPU drivers do something similar.


Well...for many topologies, migrating to host memory will result in a
dramatically slower p2p setup. For that reason, some GPU drivers may
want to allow pinning of video memory in some situations.

Ideally, you've got modern ODP devices and you don't even need to pin.
But if not, and you still hope to do high performance p2p between a GPU
and a non-ODP Infiniband device, then you would need to leave the pinned
memory in vidmem.

So I think we don't want to rule out that behavior, right? Or is the
thinking more like, "you're lucky that this old non-ODP setup works at
all, and we'll make it work by routing through host/cpu memory, but it
will be slow"?


thanks,
--
John Hubbard
NVIDIA



This is intentional since we don't want any P2P to video memory with pinned objects and want to 
avoid to run into a situation where one device is doing P2P to video memory while another device 
needs the DMA-buf in host memory.


You can still do P2P with pinned object, it's just up to the exporting driver 
if it is allowed or not.

The other option is what Daniel suggested that we have some kind of revoke. This is essentially what 
our KFD is doing as well when doing interop with 3D GFX, but from Jasons responses I have a bit of 
doubt that this will actually work on the hardware level for RDMA.


Regards,
Christian.




Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-24 Thread John Hubbard

On 8/24/21 10:32 AM, Jason Gunthorpe wrote:
...

And yes at least for the amdgpu driver we migrate the memory to host
memory as soon as it is pinned and I would expect that other GPU drivers
do something similar.


Well...for many topologies, migrating to host memory will result in a
dramatically slower p2p setup. For that reason, some GPU drivers may
want to allow pinning of video memory in some situations.

Ideally, you've got modern ODP devices and you don't even need to pin.
But if not, and you still hope to do high performance p2p between a GPU
and a non-ODP Infiniband device, then you would need to leave the pinned
memory in vidmem.

So I think we don't want to rule out that behavior, right? Or is the
thinking more like, "you're lucky that this old non-ODP setup works at
all, and we'll make it work by routing through host/cpu memory, but it
will be slow"?


I think it depends on the user, if the user creates memory which is
permanently located on the GPU then it should be pinnable in this way
without force migration. But if the memory is inherently migratable
then it just cannot be pinned in the GPU at all as we can't
indefinately block migration from happening eg if the CPU touches it
later or something.



OK. I just want to avoid creating any API-level assumptions that dma_buf_pin()
necessarily implies or requires migrating to host memory.

thanks,
--
John Hubbard
NVIDIA


Re: [RFC] Make use of non-dynamic dmabuf in RDMA

2021-08-24 Thread John Hubbard

On 8/24/21 11:17 PM, Christian König wrote:
...

I think it depends on the user, if the user creates memory which is
permanently located on the GPU then it should be pinnable in this way
without force migration. But if the memory is inherently migratable
then it just cannot be pinned in the GPU at all as we can't
indefinately block migration from happening eg if the CPU touches it
later or something.


Yes, exactly that's the point. Especially GPUs have a great variety of setups.

For example we have APUs where the local memory is just stolen system memory and all buffers must be 
migrate-able because you might need all of this stolen memory for scanout or page tables. In this 
case P2P only makes sense to avoid the migration overhead in the first place.


Then you got dGPUs where only a fraction of the VRAM is accessible from the PCIe BUS. Here you also 
absolutely don't want to pin any buffers because that can easily crash when we need to migrate 
something into the visible window for CPU access.


The only real option where you could do P2P with buffer pinning are those compute boards where we 
know that everything is always accessible to everybody and we will never need to migrate anything. 
But even then you want some mechanism like cgroups to take care of limiting this. Otherwise any 
runaway process can bring down your whole system.


Key question at least for me as GPU maintainer is if we are going to see modern compute boards 
together with old non-ODP setups. Since those compute boards are usually used with new hardware 
(like PCIe v4 for example) the answer I think is most likely "no".




That is a really good point. Times have changed and I guess ODP is on most 
(all?) of
the new Infiniband products now, and maybe we don't need to worry so much about
providing first-class support for non-ODP setups.

I've got to drag my brain into 2021+! :)

thanks,
--
John Hubbard
NVIDIA


Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread John Hubbard

On 3/30/21 3:24 PM, Jason Gunthorpe wrote:
...

As far as I can tell this has always been called try_to_munlock() even though
it appears to do the opposite.


Maybe we should change it then?


/**
  * try_to_munlock - try to munlock a page
  * @page: the page to be munlocked
  *
  * Called from munlock code.  Checks all of the VMAs mapping the page
  * to make sure nobody else has this page mlocked. The page will be
  * returned with PG_mlocked cleared if no other vmas have it mlocked.
  */


In other words it sets PG_mlocked if one or more vmas has it mlocked. So
try_to_mlock() might be a better name, except that seems to have the potential
for confusion as well because it's only called from the munlock code path and
never for mlock.


That explanation makes more sense.. This function looks like it is
'set PG_mlocked of the page if any vm->flags has VM_LOCKED'

Maybe call it check_vm_locked or something then and reword the above
comment?

(and why is it OK to read vm->flags for this without any locking?)


Something needs attention here..


I think the code is correct, but perhaps the naming could be better. Would be
interested hearing any thoughts on renaming try_to_munlock() to try_to_mlock()
as the current name appears based on the context it is called from (munlock)
rather than what it does (mlock).


The point of this patch is to make it clearer, after all, so I'd
change something and maybe slightly clarify the comment.



I'd add that, after looking around the calling code, this is a really unhappy
pre-existing situation. Anyone reading this has to remember at which point in 
the
call stack the naming transitions from "do the opposite of what the name says",
to "do what the name says".

+1 for renaming "munlock*" items to "mlock*", where applicable. good grief.

Although, it seems reasonable to tack such renaming patches onto the tail end
of this series. But whatever works.

thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread John Hubbard

On 3/30/21 3:56 PM, Alistair Popple wrote:
...

+1 for renaming "munlock*" items to "mlock*", where applicable. good grief.


At least the situation was weird enough to prompt further investigation :)

Renaming to mlock* doesn't feel like the right solution to me either though. I
am not sure if you saw me responding to myself earlier but I am thinking
renaming try_to_munlock() -> page_mlocked() and try_to_munlock_one() ->
page_mlock_one() might be better. Thoughts?



Quite confused by this naming idea. Because: try_to_munlock() returns
void, so a boolean-style name such as "page_mlocked()" is already not a
good fit.

Even more important, though, is that try_to_munlock() is mlock-ing the
page, right? Is there some subtle point I'm missing? It really is doing
an mlock to the best of my knowledge here. Although the kerneldoc
comment for try_to_munlock() seems questionable too:

/**
 * try_to_munlock - try to munlock a page
 * @page: the page to be munlocked
 *
 * Called from munlock code.  Checks all of the VMAs mapping the page
 * to make sure nobody else has this page mlocked. The page will be
 * returned with PG_mlocked cleared if no other vmas have it mlocked.
 */

...because I don't see where, in *this* routine, it clears PG_mlocked!

Obviously we agree that a routine should be named based on what it does,
rather than on who calls it. So I think that still leads to:

 try_to_munlock() --> try_to_mlock()
 try_to_munlock_one() --> try_to_mlock_one()

Sorry if I'm missing something really obvious.



This is actually inspired from a suggestion in Documentation/vm/unevictable-
lru.rst which warns about this problem:

try_to_munlock() Reverse Map Scan
-

.. warning::
[!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
page_referenced() reverse map walker.



This is actually rather bad advice! page_referenced() returns an
int-that-is-really-a-boolean, whereas try_to_munlock(), at least as it
stands now, returns void. Usually when I'm writing a TODO item, I'm in a
hurry, and I think that's what probably happened here, too. :)



Although, it seems reasonable to tack such renaming patches onto the tail

end

of this series. But whatever works.


Unless anyone objects strongly I will roll the rename into this patch as there
is only one caller of try_to_munlock.

  - Alistair



No objections here. :)

thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread John Hubbard

On 3/30/21 8:56 PM, John Hubbard wrote:

On 3/30/21 3:56 PM, Alistair Popple wrote:
...

+1 for renaming "munlock*" items to "mlock*", where applicable. good grief.


At least the situation was weird enough to prompt further investigation :)

Renaming to mlock* doesn't feel like the right solution to me either though. I
am not sure if you saw me responding to myself earlier but I am thinking
renaming try_to_munlock() -> page_mlocked() and try_to_munlock_one() ->
page_mlock_one() might be better. Thoughts?



Quite confused by this naming idea. Because: try_to_munlock() returns
void, so a boolean-style name such as "page_mlocked()" is already not a
good fit.

Even more important, though, is that try_to_munlock() is mlock-ing the
page, right? Is there some subtle point I'm missing? It really is doing
an mlock to the best of my knowledge here. Although the kerneldoc
comment for try_to_munlock() seems questionable too:

/**
* try_to_munlock - try to munlock a page
* @page: the page to be munlocked
*
* Called from munlock code.  Checks all of the VMAs mapping the page
* to make sure nobody else has this page mlocked. The page will be
* returned with PG_mlocked cleared if no other vmas have it mlocked.
*/

...because I don't see where, in *this* routine, it clears PG_mlocked!

Obviously we agree that a routine should be named based on what it does,
rather than on who calls it. So I think that still leads to:

     try_to_munlock() --> try_to_mlock()
     try_to_munlock_one() --> try_to_mlock_one()

Sorry if I'm missing something really obvious.


Actually, re-reading your and Jason's earlier points in the thread, I see
that I'm *not* missing anything, and we are actually in agreement about how
the code operates. OK, good!

Also, as you point out above, maybe the "try_" prefix is not really accurate
either, given how this works. So maybe we have arrived at something like:

try_to_munlock() --> page_mlock() // or mlock_page()...
try_to_munlock_one() --> page_mlock_one()



thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH rdma-core v7 5/6] tests: Add tests for dma-buf based memory regions

2021-01-31 Thread John Hubbard

On 1/25/21 11:57 AM, Jianxin Xiong wrote:

Define a set of unit tests similar to regular MR tests and a set of
tests for send/recv and rdma traffic using dma-buf MRs. Add a utility
function to generate access flags for dma-buf based MRs because the
set of supported flags is smaller.

Signed-off-by: Jianxin Xiong 


Hi Jianxin,

It's awesome to see a GPU to IB test suite here!


---
  tests/args_parser.py |   4 +
  tests/test_mr.py | 266 ++-
  tests/utils.py   |  26 +
  3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/tests/args_parser.py b/tests/args_parser.py
index 446535a..5bc53b0 100644
--- a/tests/args_parser.py
+++ b/tests/args_parser.py
@@ -19,6 +19,10 @@ class ArgsParser(object):
  parser.add_argument('--port',
  help='Use port  of RDMA device', type=int,
  default=1)
+parser.add_argument('--gpu', nargs='?', type=int, const=0, default=0,
+help='GPU unit to allocate dmabuf from')
+parser.add_argument('--gtt', action='store_true', default=False,
+help='Allocate dmabuf from GTT instead of VRAM')


Just to be kind to non-GPU people, how about:

s/GTT/GTT (Graphics Translation Table)/



  parser.add_argument('-v', '--verbose', dest='verbosity',
  action='store_const',
  const=2, help='Verbose output')
diff --git a/tests/test_mr.py b/tests/test_mr.py
index b88ad23..03a645f 100644
--- a/tests/test_mr.py
+++ b/tests/test_mr.py
@@ -1,5 +1,6 @@
  # SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB)
  # Copyright (c) 2019 Mellanox Technologies, Inc. All rights reserved. See 
COPYING file
+# Copyright (c) 2020 Intel Corporation. All rights reserved. See COPYING file
  """
  Test module for pyverbs' mr module.
  """
@@ -9,15 +10,18 @@ import errno
  
  from tests.base import PyverbsAPITestCase, RCResources, RDMATestCase

  from pyverbs.pyverbs_error import PyverbsRDMAError, PyverbsError
-from pyverbs.mr import MR, MW, DMMR, MWBindInfo, MWBind
+from pyverbs.mr import MR, MW, DMMR, DmaBufMR, MWBindInfo, MWBind
  from pyverbs.qp import QPCap, QPInitAttr, QPAttr, QP
  from pyverbs.mem_alloc import posix_memalign, free
  from pyverbs.wr import SendWR
+from pyverbs.dmabuf import DmaBuf
  import pyverbs.device as d
  from pyverbs.pd import PD
  import pyverbs.enums as e
  import tests.utils as u
  
+MAX_IO_LEN = 1048576

+
  
  class MRRes(RCResources):

  def __init__(self, dev_name, ib_port, gid_index,
@@ -423,3 +427,263 @@ class DMMRTest(PyverbsAPITestCase):
  dm_mr = DMMR(pd, dm_mr_len, e.IBV_ACCESS_ZERO_BASED,
   dm=dm, offset=dm_mr_offset)
  dm_mr.close()
+
+
+def check_dmabuf_support(unit=0):
+"""
+Check if dma-buf allocation is supported by the system.
+Skip the test on failure.
+"""
+device_num = 128 + unit
+try:
+DmaBuf(1, unit=unit)


unit?? This is a GPU, never anything else! Let's s/unit/gpu/ throughout, yes?

thanks,
--
John Hubbard
NVIDIA


+except PyverbsRDMAError as ex:
+if ex.error_code == errno.ENOENT:
+raise unittest.SkipTest(f'Device /dev/dri/renderD{device_num} is 
not present')
+if ex.error_code == errno.EACCES:
+raise unittest.SkipTest(f'Lack of permission to access 
/dev/dri/renderD{device_num}')
+if ex.error_code == errno.EOPNOTSUPP:
+raise unittest.SkipTest(f'Allocating dmabuf is not supported by 
/dev/dri/renderD{device_num}')
+
+
+def check_dmabuf_mr_support(pd, unit=0):
+"""
+Check if dma-buf MR registration is supported by the driver.
+Skip the test on failure
+"""
+try:
+DmaBufMR(pd, 1, 0, unit=unit)
+except PyverbsRDMAError as ex:
+if ex.error_code == errno.EOPNOTSUPP:
+raise unittest.SkipTest('Reg dma-buf MR is not supported by the 
RDMA driver')
+
+
+class DmaBufMRTest(PyverbsAPITestCase):
+"""
+Test various functionalities of the DmaBufMR class.
+"""
+def setUp(self):
+super().setUp()
+self.unit = self.config['gpu']
+self.gtt = self.config['gtt']
+
+def test_dmabuf_reg_mr(self):
+"""
+Test ibv_reg_dmabuf_mr()
+"""
+check_dmabuf_support(self.unit)
+for ctx, attr, attr_ex in self.devices:
+with PD(ctx) as pd:
+check_dmabuf_mr_support(pd, self.unit)
+flags = u.get_dmabuf_access_flags(ctx)
+ 

Re: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-02-03 Thread John Hubbard

On 12/15/20 1:27 PM, Jianxin Xiong wrote:

This patch series adds dma-buf importer role to the RDMA driver in
attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
chosen for a few reasons: first, the API is relatively simple and allows
a lot of flexibility in implementing the buffer manipulation ops.
Second, it doesn't require page structure. Third, dma-buf is already
supported in many GPU drivers. However, we are aware that existing GPU
drivers don't allow pinning device memory via the dma-buf interface.
Pinning would simply cause the backing storage to migrate to system RAM.
True peer-to-peer access is only possible using dynamic attach, which
requires on-demand paging support from the NIC to work. For this reason,
this series only works with ODP capable NICs.


Hi,

Looking ahead to after this patchset is merged...

Are there design thoughts out there, about the future of pinning to vidmem,
for this? It would allow a huge group of older GPUs and NICs and such to
do p2p with this approach, and it seems like a natural next step, right?


thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-02-04 Thread John Hubbard

On 2/4/21 10:44 AM, Alex Deucher wrote:
...

The argument is that vram is a scarce resource, but I don't know if
that is really the case these days.  At this point, we often have as
much vram as system ram if not more.


I thought the main argument was that GPU memory could move at any time
between the GPU and CPU and the DMA buf would always track its current
location?


I think the reason for that is that VRAM is scarce so we have to be
able to move it around.  We don't enforce the same limitations for
buffers in system memory.  We could just support pinning dma-bufs in
vram like we do with system ram.  Maybe with some conditions, e.g.,
p2p is possible, and the device has a large BAR so you aren't tying up
the BAR window.



Excellent. And yes, we are already building systems in which VRAM is
definitely not scarce, but on the other hand, those newer systems can
also handle GPU (and NIC) page faults, so not really an issue. For that,
we just need to enhance HMM so that it does peer to peer.

We also have some older hardware with large BAR1 apertures, specifically
for this sort of thing.

And again, for slightly older hardware, without pinning to VRAM there is
no way to use this solution here for peer-to-peer. So I'm glad to see that
so far you're not ruling out the pinning option.



thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH rdma-core 2/3] pyverbs,tests: Cosmetic improvements for dma-buf allocation routines

2021-02-04 Thread John Hubbard

On 2/4/21 10:50 AM, Jianxin Xiong wrote:

Rename the parameter 'unit' to 'gpu'. Expand GTT to the full name in the
comments.

Signed-off-by: Jianxin Xiong 
---
  pyverbs/dmabuf.pyx | 12 
  pyverbs/dmabuf_alloc.c | 12 
  pyverbs/dmabuf_alloc.h |  2 +-
  pyverbs/mr.pyx |  6 ++--
  tests/test_mr.py   | 78 +-
  5 files changed, 55 insertions(+), 55 deletions(-)



Looks good!

If you care, you might want to add a space, like this, to the few GTT cases:

GTT (Graphics Translation Table)

Obviously not worth spinning another version for that, as it is still readable
as-is. Just mentioning it for the sake of pointless perfectionism, and in case
someone ever wonders why it was missed during a review. :) Either way, feel free
to add:

Reviewed-by: John Hubbard 


thanks,
--
John Hubbard
NVIDIA



diff --git a/pyverbs/dmabuf.pyx b/pyverbs/dmabuf.pyx
index b9406bd..9ed7f02 100644
--- a/pyverbs/dmabuf.pyx
+++ b/pyverbs/dmabuf.pyx
@@ -12,7 +12,7 @@ from pyverbs.mr cimport DmaBufMR
  cdef extern from "dmabuf_alloc.h":
  cdef struct dmabuf:
  pass
-dmabuf *dmabuf_alloc(unsigned long size, int unit, int gtt)
+dmabuf *dmabuf_alloc(unsigned long size, int gpu, int gtt)
  void dmabuf_free(dmabuf *dmabuf)
  int dmabuf_get_drm_fd(dmabuf *dmabuf)
  int dmabuf_get_fd(dmabuf *dmabuf)
@@ -20,20 +20,20 @@ cdef extern from "dmabuf_alloc.h":
  
  
  cdef class DmaBuf:

-def __init__(self, size, unit=0, gtt=0):
+def __init__(self, size, gpu=0, gtt=0):
  """
  Allocate DmaBuf object from a GPU device. This is done through the
  DRI device interface. Usually this requires the effective user id
  being a member of the 'render' group.
  :param size: The size (in number of bytes) of the buffer.
-:param unit: The unit number of the GPU to allocate the buffer from.
-:param gtt: Allocate from GTT instead of VRAM.
+:param gpu: The GPU unit to allocate the buffer from.
+:param gtt: Allocate from GTT(Graphics Translation Table) instead of 
VRAM.
  :return: The newly created DmaBuf object on success.
  """
  self.dmabuf_mrs = weakref.WeakSet()
-self.dmabuf = dmabuf_alloc(size, unit, gtt)
+self.dmabuf = dmabuf_alloc(size, gpu, gtt)
  if self.dmabuf == NULL:
-raise PyverbsRDMAErrno(f'Failed to allocate dmabuf of size {size} 
on unit {unit}')
+raise PyverbsRDMAErrno(f'Failed to allocate dmabuf of size {size} 
on gpu {gpu}')
  self.drm_fd = dmabuf_get_drm_fd(self.dmabuf)
  self.fd = dmabuf_get_fd(self.dmabuf)
  self.map_offset = dmabuf_get_offset(self.dmabuf)
diff --git a/pyverbs/dmabuf_alloc.c b/pyverbs/dmabuf_alloc.c
index 05eae75..93267bf 100644
--- a/pyverbs/dmabuf_alloc.c
+++ b/pyverbs/dmabuf_alloc.c
@@ -95,7 +95,7 @@ static int amdgpu_mmap_offset(struct drm *drm, uint32_t 
handle,
return 0;
  }
  
-static struct drm *drm_open(int unit)

+static struct drm *drm_open(int gpu)
  {
char path[32];
struct drm_version version = {};
@@ -107,7 +107,7 @@ static struct drm *drm_open(int unit)
if (!drm)
return NULL;
  
-	snprintf(path, sizeof(path), "/dev/dri/renderD%d", unit + 128);

+   snprintf(path, sizeof(path), "/dev/dri/renderD%d", gpu + 128);
  
  	drm->fd = open(path, O_RDWR);

if (drm->fd < 0)
@@ -204,10 +204,10 @@ struct dmabuf {
  /*
   * dmabuf_alloc - allocate a dmabuf from GPU
   * @size - byte size of the buffer to allocate
- * @unit - the GPU unit to use
- * @gtt - if true, allocate from GTT instead of VRAM
+ * @gpu - the GPU unit to use
+ * @gtt - if true, allocate from GTT(Graphics Translation Table) instead of 
VRAM
   */
-struct dmabuf *dmabuf_alloc(uint64_t size, int unit, int gtt)
+struct dmabuf *dmabuf_alloc(uint64_t size, int gpu, int gtt)
  {
struct dmabuf *dmabuf;
int err;
@@ -216,7 +216,7 @@ struct dmabuf *dmabuf_alloc(uint64_t size, int unit, int 
gtt)
if (!dmabuf)
return NULL;
  
-	dmabuf->drm = drm_open(unit);

+   dmabuf->drm = drm_open(gpu);
if (!dmabuf->drm)
goto out_free;
  
diff --git a/pyverbs/dmabuf_alloc.h b/pyverbs/dmabuf_alloc.h

index f1b03c5..4698b11 100644
--- a/pyverbs/dmabuf_alloc.h
+++ b/pyverbs/dmabuf_alloc.h
@@ -10,7 +10,7 @@
  
  struct dmabuf;
  
-struct dmabuf *dmabuf_alloc(uint64_t size, int unit, int gtt);

+struct dmabuf *dmabuf_alloc(uint64_t size, int gpu, int gtt);
  void dmabuf_free(struct dmabuf *dmabuf);
  int dmabuf_get_drm_fd(struct dmabuf *dmabuf);
  int dmabuf_get_fd(struct dmabuf *dmabuf);
diff --git a/pyverbs/mr.pyx b/pyverbs/mr.pyx
index aad47e2..d05d044 100644
--- a/pyverbs/mr.pyx
+++ b/pyverbs/mr.pyx
@@ -384,7 +384,7 @@ cdef class DMMR(M

Re: [PATCH v16 0/4] RDMA: Add dma-buf support

2021-02-05 Thread John Hubbard

On 2/5/21 7:53 AM, Daniel Vetter wrote:

On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:

On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:


And again, for slightly older hardware, without pinning to VRAM there is
no way to use this solution here for peer-to-peer. So I'm glad to see that
so far you're not ruling out the pinning option.


Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
cgroups) or something like that, so we could benefit from the work to make
sure pin_user_pages and all these never end up in there?


ZONE_DEVICE should already not be returned from GUP.

I've understood in the hmm casse the idea was a CPU touch of some
ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
want to follow the same logic, presumably it comes for free with the
fault handler somehow


Oh I didn't know this, I thought the proposed p2p direct i/o patches would
just use the fact that underneath ZONE_DEVICE there's "normal" struct
pages. And so I got worried that maybe also pin_user_pages can creep in.
But I didn't read the patches in full detail:

https://lore.kernel.org/linux-block/20201106170036.18713-12-log...@deltatee.com/

But if you're saying that this all needs specific code and all the gup/pup
code we have is excluded, I think we can make sure that we're not ever
building features that requiring time-unlimited pinning of ZONE_DEVICE.
Which I think we want.



From an HMM perspective, the above sounds about right. HMM relies on the
GPU/device memory being ZONE_DEVICE, *and* on that memory *not* being pinned.
(HMM's mmu notifier callbacks act as a sort of virtual pin, but not a refcount
pin.)

It's a nice clean design point that we need to preserve, and fortunately it
doesn't conflict with anything I'm seeing here. But I want to say this out
loud because I see some doubt about it creeping into the discussion.

thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/9] Add support for SVM atomics in Nouveau

2021-02-09 Thread John Hubbard

On 2/9/21 5:37 AM, Daniel Vetter wrote:

On Tue, Feb 9, 2021 at 1:57 PM Alistair Popple  wrote:


On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:


Recent changes to pin_user_pages() prevent the creation of pinned pages in
ZONE_MOVABLE. This series allows pinned pages to be created in

ZONE_MOVABLE

as attempts to migrate may fail which would be fatal to userspace.

In this case migration of the pinned page is unnecessary as the page can

be

unpinned at anytime by having the driver revoke atomic permission as it
does for the migrate_to_ram() callback. However a method of calling this
when memory needs to be moved has yet to be resolved so any discussion is
welcome.


Why do we need to pin for gpu atomics? You still have the callback for
cpu faults, so you
can move the page as needed, and hence a long-term pin sounds like the
wrong approach.


Technically a real long term unmoveable pin isn't required, because as you say
the page can be moved as needed at any time. However I needed some way of
stopping the CPU page from being freed once the userspace mappings for it had
been removed. Obviously I could have just used get_page() but from the
perspective of page migration the result is much the same as a pin - a page
which can't be moved because of the extra refcount.


long term pin vs short term page reference aren't fully fleshed out.
But the rule more or less is:
- short term page reference: _must_ get released in finite time for
migration and other things, either because you have a callback, or
because it's just for direct I/O, which will complete. This means
short term pins will delay migration, but not foul it complete



GPU atomic operations to sysmem are hard to categorize, because because 
application
programmers could easily write programs that do a long series of atomic 
operations.
Such a program would be a little weird, but it's hard to rule out.




- long term pin: the page cannot be moved, all migration must fail.
Also this will have an impact on COW behaviour for fork (but not sure
where those patches are, John Hubbard will know).



That would be Jason's commit 57efa1fe59576 ("mm/gup: prevent gup_fast from 
racing
with COW during fork"), which is in linux-next 20201216.




So I think for your use case here you want a) short term page
reference to make sure it doesn't disappear plus b) callback to make
sure migrate isn't blocked.

Breaking ZONE_MOVEABLE with either allowing long term pins or failing
migrations because you don't release your short term page reference
isn't good.


The normal solution of registering an MMU notifier to unpin the page when it
needs to be moved also doesn't work as the CPU page tables now point to the
device-private page and hence the migration code won't call any invalidate
notifiers for the CPU page.


Yeah you need some other callback for migration on the page directly.
it's a bit awkward since there is one already for struct
address_space, but that's own by the address_space/page cache, not
HMM. So I think we need something else, maybe something for each
ZONE_DEVICE?



This direction sounds at least...possible. Using MMU notifiers instead of pins
is definitely appealing. I'm not quite clear on the callback idea above, but
overall it seems like taking advantage of the ZONE_DEVICE tracking of pages
(without having to put anything additional in each struct page), could work.

Additional notes or ideas here are definitely welcome.



thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/9] Add support for SVM atomics in Nouveau

2021-02-10 Thread John Hubbard

On 2/10/21 4:59 AM, Daniel Vetter wrote:
...

GPU atomic operations to sysmem are hard to categorize, because because 
application
programmers could easily write programs that do a long series of atomic 
operations.
Such a program would be a little weird, but it's hard to rule out.


Yeah, but we can forcefully break this whenever we feel like by revoking
the page, moving it, and then reinstating the gpu pte again and let it
continue.


Oh yes, that's true.



If that's no possible then what we need here instead is an mlock() type of
thing I think.

No need for that, then.


thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/2] drm/etnaviv: User FOLL_LONGTERM in userptr

2021-03-01 Thread John Hubbard

On 3/1/21 01:52, Daniel Vetter wrote:

There's no mmu notifier or anything like that, releasing this pin is
entirely up to userspace. Hence FOLL_LONGTERM.

No cc: stable for this patch since a lot of the infrastructure around
FOLL_LONGETRM (like not allowing it for pages currently sitting in


  ^FOLL_LONGTERM


ZONE_MOVEABLE before they're migrated) is still being worked on. So
not big benefits yet.


Yes. Great write-up, that's very clear, and it's exactly where we're at.

Reviewed-by: John Hubbard 


thanks,
--
John Hubbard
NVIDIA



Cc: John Hubbard 
Signed-off-by: Daniel Vetter 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: etna...@lists.freedesktop.org
---
  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index a9e696d05b33..db69f19ab5bc 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -689,7 +689,8 @@ static int etnaviv_gem_userptr_get_pages(struct 
etnaviv_gem_object *etnaviv_obj)
struct page **pages = pvec + pinned;
  
  		ret = pin_user_pages_fast(ptr, num_pages,

- FOLL_WRITE | FOLL_FORCE, pages);
+ FOLL_WRITE | FOLL_FORCE | 
FOLL_LONGTERM,
+ pages);
if (ret < 0) {
unpin_user_pages(pvec, pinned);
kvfree(pvec);


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH rdma-core 5/5] tests: Bug fix for get_access_flags()

2020-11-24 Thread John Hubbard

Just some silly nits I stumbled across while trying to understand the tests.

On 11/23/20 9:53 AM, Jianxin Xiong wrote:

The filter defintion is wrong and causes get_access_flags() always


 definition


returning empty list. As the result the MR tests using this function
are effectively skipped (but report success).

Also fix a typo in the comments.


Was there another typo somewhere? All I see is an *added* typo...



Signed-off-by: Jianxin Xiong 
---
  tests/utils.py | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/utils.py b/tests/utils.py
index 0ad7110..eee44b4 100644
--- a/tests/utils.py
+++ b/tests/utils.py
@@ -55,8 +55,8 @@ def filter_illegal_access_flags(element):
  :param element: A list of access flags to check
  :return: True if this list is legal, else False
  """
-if e.IBV_ACCESS_REMOTE_ATOMIC in element or e.IBV_ACCESS_REMOTE_WRITE:
-if e.IBV_ACCESS_LOCAL_WRITE:
+if e.IBV_ACCESS_REMOTE_ATOMIC in element or e.IBV_ACCESS_REMOTE_WRITE in 
element:
+if not e.IBV_ACCESS_LOCAL_WRITE in element:
  return False
  return True
  
@@ -69,7 +69,7 @@ def get_access_flags(ctx):

  added as well.
  After verifying that the flags selection is legal, it is appended to an
  array, assuming it wasn't previously appended.
-:param ctx: Device Context to check capabilities
+:param ctx: Device Coyyntext to check capabilities


I liked the old spelling. "Coyyntext" just doesn't sound as good. :)


  :param num: Size of initial collection
  :return: A random legal value for MR flags
  """



thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [resource] 22b17dc667: Kernel panic - not syncing: Fatal exception

2020-11-02 Thread John Hubbard
  0010 DS:  ES:  CR0: 80050033
[   29.017409] CR2:  CR3: 000100120003 CR4: 007706f0
[   29.024539] DR0:  DR1:  DR2: 
[   29.031671] DR3:  DR6: fffe0ff0 DR7: 0400
[   29.038804] PKRU: 5554
[   29.041508] Kernel panic - not syncing: Fatal exception
ACPI MEMORY or I/O RESET_REG.


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml



Thanks,
oliver.s...@intel.com



thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM

2020-11-04 Thread John Hubbard

On 11/4/20 10:17 AM, Jason Gunthorpe wrote:

On Wed, Nov 04, 2020 at 04:41:19PM +, Christoph Hellwig wrote:

On Wed, Nov 04, 2020 at 04:37:58PM +, Christoph Hellwig wrote:

On Wed, Nov 04, 2020 at 05:26:58PM +0100, Daniel Vetter wrote:

What we're discussing is whether gup_fast and pup_fast also obey this,
or fall over and can give you the struct page that's backing the
dma_mmap_* memory. Since the _fast variant doesn't check for
vma->vm_flags, and afaict that's the only thing which closes this gap.
And like you restate, that would be a bit a problem. So where's that
check which Jason&me aren't spotting?


remap_pte_range uses pte_mkspecial to set up the PTEs, and gup_pte_range
errors out on pte_special.  Of course this only works for the
CONFIG_ARCH_HAS_PTE_SPECIAL case, for other architectures we do have
a real problem.


Except that we don't really support pte-level gup-fast without
CONFIG_ARCH_HAS_PTE_SPECIAL, and in fact all architectures selecting
HAVE_FAST_GUP also select ARCH_HAS_PTE_SPECIAL, so we should be fine.


Mm, I thought it was probably the special flag..

Knowing that CONFIG_HAVE_FAST_GUP can't be set without
CONFIG_ARCH_HAS_PTE_SPECIAL is pretty insightful, can we put that in
the Kconfig?

config HAVE_FAST_GUP
 depends on MMU
 depends on ARCH_HAS_PTE_SPECIAL
 bool


Well, the !CONFIG_ARCH_HAS_PTE_SPECIAL case points out in a comment that
gup-fast is not *completely* unavailable there, so I don't think you want
to shut it off like that:

/*
 * If we can't determine whether or not a pte is special, then fail immediately
 * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
 * to be special.
 *
 * For a futex to be placed on a THP tail page, get_futex_key requires a
 * get_user_pages_fast_only implementation that can pin pages. Thus it's still
 * useful to have gup_huge_pmd even if we can't operate on ptes.
 */


thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM

2020-11-05 Thread John Hubbard

On 11/5/20 4:49 AM, Jason Gunthorpe wrote:

On Thu, Nov 05, 2020 at 10:25:24AM +0100, Daniel Vetter wrote:

/*
  * If we can't determine whether or not a pte is special, then fail immediately
  * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not
  * to be special.
  *
  * For a futex to be placed on a THP tail page, get_futex_key requires a
  * get_user_pages_fast_only implementation that can pin pages. Thus it's still
  * useful to have gup_huge_pmd even if we can't operate on ptes.
  */


We support hugepage faults in gpu drivers since recently, and I'm not
seeing a pud_mkhugespecial anywhere. So not sure this works, but probably
just me missing something again.


It means ioremap can't create an IO page PUD, it has to be broken up.

Does ioremap even create anything larger than PTEs?



From my reading, yes. See ioremap_try_huge_pmd().

thanks,
--
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Phyr Starter

2022-01-20 Thread John Hubbard

On 1/20/22 6:12 AM, Christoph Hellwig wrote:

On Tue, Jan 11, 2022 at 12:17:18AM -0800, John Hubbard wrote:

Zooming in on the pinning aspect for a moment: last time I attempted to
convert O_DIRECT callers from gup to pup, I recall wanting very much to
record, in each bio_vec, whether these pages were acquired via FOLL_PIN,
or some non-FOLL_PIN method. Because at the end of the IO, it is not
easy to disentangle which pages require put_page() and which require
unpin_user_page*().


I don't think that is a problem.  Pinning only need to happen for
ITER_IOVEC, and the only non-user pages there is the ZERO_PAGE added
for padding that can be special cased.


I am really glad to hear you say that. Because I just worked through it
again in detail yesterday (including your and others' old emails about
this), and tentatively reached the same conclusion from seeing the call 
paths. But I wanted to confirm with someone who actually knows this code 
well, and that's not me. :)


Things like dio_refill_pages() are mixing in the zero page, but like you 
say, that can be handled. I have a few ideas for that.


Now that the goal is a considerably narrower as compared to in 2019 
("convert DIO callers to pup", instead of "convert the world to pup", 
ha), this looks quite feasible after all.



thanks,
--
John Hubbard
NVIDIA


Re: Phyr Starter

2022-01-11 Thread John Hubbard

On 1/10/22 11:34, Matthew Wilcox wrote:

TLDR: I want to introduce a new data type:

struct phyr {
 phys_addr_t addr;
 size_t len;
};

and use it to replace bio_vec as well as using it to replace the array
of struct pages used by get_user_pages() and friends.

---


This would certainly solve quite a few problems at once. Very compelling.

Zooming in on the pinning aspect for a moment: last time I attempted to
convert O_DIRECT callers from gup to pup, I recall wanting very much to
record, in each bio_vec, whether these pages were acquired via FOLL_PIN,
or some non-FOLL_PIN method. Because at the end of the IO, it is not
easy to disentangle which pages require put_page() and which require
unpin_user_page*().

And changing the bio_vec for *that* purpose was not really acceptable.

But now that you're looking to change it in a big way (and with some
spare bits avaiable...oohh!), maybe I can go that direction after all.

Or, are you looking at a design in which any phyr is implicitly FOLL_PIN'd
if it exists at all?

Or any other thoughts in this area are very welcome.



There are two distinct problems I want to address: doing I/O to memory
which does not have a struct page and efficiently doing I/O to large
blobs of physically contiguous memory, regardless of whether it has a
struct page.  There are some other improvements which I regard as minor.

There are many types of memory that one might want to do I/O to that do
not have a struct page, some examples:
  - Memory on a graphics card (or other PCI card, but gfx seems to be
the primary provider of DRAM on the PCI bus today)
  - DAX, or other pmem (there are some fake pages today, but this is
mostly a workaround for the IO problem today)
  - Guest memory being accessed from the hypervisor (KVM needs to
create structpages to make this happen.  Xen doesn't ...)
All of these kinds of memories can be addressed by the CPU and so also
by a bus master.  That is, there is a physical address that the CPU
can use which will address this memory, and there is a way to convert
that to a DMA address which can be programmed into another device.
There's no intent here to support memory which can be accessed by a
complex scheme like writing an address to a control register and then
accessing the memory through a FIFO; this is for memory which can be
accessed by DMA and CPU loads and stores.

For get_user_pages() and friends, we currently fill an array of struct
pages, each one representing PAGE_SIZE bytes.  For an application that
is using 1GB hugepages, writing 2^18 entries is a significant overhead.
It also makes drivers hard to write as they have to recoalesce the
struct pages, even though the VM can tell it whether those 2^18 pages
are contiguous.

On the minor side, struct phyr can represent any mappable chunk of memory.
A bio_vec is limited to 2^32 bytes, while on 64-bit machines a phyr
can represent larger than 4GB.  A phyr is the same size as a bio_vec
on 64 bit (16 bytes), and the same size for 32-bit with PAE (12 bytes).
It is smaller for 32-bit machines without PAE (8 bytes instead of 12).

Finally, it may be possible to stop using scatterlist to describe the
input to the DMA-mapping operation.  We may be able to get struct
scatterlist down to just dma_address and dma_length, with chaining
handled through an enclosing struct.

I would like to see phyr replace bio_vec everywhere it's currently used.
I don't have time to do that work now because I'm busy with folios.
If someone else wants to take that on, I shall cheer from the sidelines.


I'm starting to wonder if I should jump in here, in order to get this
as a way to make the O_DIRECT conversion much cleaner. But let's see.


What I do intend to do is:

  - Add an interface to gup.c to pin/unpin N phyrs
  - Add a sg_map_phyrs()
This will take an array of phyrs and allocate an sg for them
  - Whatever else I need to do to make one RDMA driver happy with
this scheme

At that point, I intend to stop and let others more familiar with this
area of the kernel continue the conversion of drivers.

P.S. If you've had the Prodigy song running through your head the whole
time you've been reading this email ... I'm sorry / You're welcome.
If people insist, we can rename this to phys_range or something boring,
but I quite like the spelling of phyr with the pronunciation of "fire".


A more conservative or traditional name might look like:

phys_vec (maintains some resemblance to what it's replacing)
phys_range
phys_addr_range

phyr is rather cool, but it also is awfully too close to "phys" for
reading comfort. And there is a lot to be said for self-descriptive names,
which phyr is not.

And of course, you're signing for another huge naming debate with Linus
if you go with the "cool" name here. :)


thanks,
--
John Hubbard
NVIDIA


Re: [PATCH v2 hmm 09/11] mm/hmm: Poison hmm_range during unregister

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> Trying to misuse a range outside its lifetime is a kernel bug. Use WARN_ON
> and poison bytes to detect this condition.
> 
> Signed-off-by: Jason Gunthorpe 
> Reviewed-by: Jérôme Glisse 
> ---
> v2
> - Keep range start/end valid after unregistration (Jerome)
> ---
>  mm/hmm.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 6802de7080d172..c2fecb3ecb11e1 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -937,7 +937,7 @@ void hmm_range_unregister(struct hmm_range *range)
>   struct hmm *hmm = range->hmm;
>  
>   /* Sanity check this really should not happen. */

That comment can also be deleted, as it has the same meaning as
the WARN_ON() that you just added.

> - if (hmm == NULL || range->end <= range->start)
> + if (WARN_ON(range->end <= range->start))
>   return;
>  
>   mutex_lock(&hmm->lock);
> @@ -948,7 +948,10 @@ void hmm_range_unregister(struct hmm_range *range)
>   range->valid = false;
>   mmput(hmm->mm);
>   hmm_put(hmm);
> - range->hmm = NULL;
> +
> + /* The range is now invalid, leave it poisoned. */

To be precise, we are poisoning the range's back pointer to it's
owning hmm instance.  Maybe this is clearer:

/*
 * The range is now invalid, so poison it's hmm pointer. 
 * Leave other range-> fields in place, for the caller's use.
 */

...or something like that?

> + range->valid = false;
> + memset(&range->hmm, POISON_INUSE, sizeof(range->hmm));
>  }
>  EXPORT_SYMBOL(hmm_range_unregister);
>  
> 

The above are very minor documentation points, so:

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 03/11] mm/hmm: Hold a mmgrab from hmm to mm

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> So long a a struct hmm pointer exists, so should the struct mm it is
> linked too. Hold the mmgrab() as soon as a hmm is created, and mmdrop() it
> once the hmm refcount goes to zero.
> 
> Since mmdrop() (ie a 0 kref on struct mm) is now impossible with a !NULL
> mm->hmm delete the hmm_hmm_destroy().
> 
> Signed-off-by: Jason Gunthorpe 
> Reviewed-by: Jérôme Glisse 
> ---
> v2:
>  - Fix error unwind paths in hmm_get_or_create (Jerome/Jason)
> ---
>  include/linux/hmm.h |  3 ---
>  kernel/fork.c   |  1 -
>  mm/hmm.c| 22 --
>  3 files changed, 4 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 2d519797cb134a..4ee3acabe5ed22 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -586,14 +586,11 @@ static inline int hmm_vma_fault(struct hmm_mirror 
> *mirror,
>  }
>  
>  /* Below are for HMM internal use only! Not to be used by device driver! */
> -void hmm_mm_destroy(struct mm_struct *mm);
> -
>  static inline void hmm_mm_init(struct mm_struct *mm)
>  {
>   mm->hmm = NULL;
>  }
>  #else /* IS_ENABLED(CONFIG_HMM_MIRROR) */
> -static inline void hmm_mm_destroy(struct mm_struct *mm) {}
>  static inline void hmm_mm_init(struct mm_struct *mm) {}
>  #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b2b87d450b80b5..588c768ae72451 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -673,7 +673,6 @@ void __mmdrop(struct mm_struct *mm)
>   WARN_ON_ONCE(mm == current->active_mm);
>   mm_free_pgd(mm);
>   destroy_context(mm);
> - hmm_mm_destroy(mm);


This is particularly welcome, not to have an "HMM is special" case
in such a core part of process/mm code. 


>   mmu_notifier_mm_destroy(mm);
>   check_mm(mm);
>   put_user_ns(mm->user_ns);
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 8796447299023c..cc7c26fda3300e 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -82,6 +83,7 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
>   hmm->notifiers = 0;
>   hmm->dead = false;
>   hmm->mm = mm;
> + mmgrab(hmm->mm);
>  
>   spin_lock(&mm->page_table_lock);
>   if (!mm->hmm)
> @@ -109,6 +111,7 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
>   mm->hmm = NULL;
>   spin_unlock(&mm->page_table_lock);
>  error:
> + mmdrop(hmm->mm);
>   kfree(hmm);
>   return NULL;
>  }
> @@ -130,6 +133,7 @@ static void hmm_free(struct kref *kref)
>   mm->hmm = NULL;
>   spin_unlock(&mm->page_table_lock);
>  
> + mmdrop(hmm->mm);
>   mmu_notifier_call_srcu(&hmm->rcu, hmm_free_rcu);
>  }
>  
> @@ -138,24 +142,6 @@ static inline void hmm_put(struct hmm *hmm)
>   kref_put(&hmm->kref, hmm_free);
>  }
>  
> -void hmm_mm_destroy(struct mm_struct *mm)
> -{
> - struct hmm *hmm;
> -
> - spin_lock(&mm->page_table_lock);
> - hmm = mm_get_hmm(mm);
> - mm->hmm = NULL;
> - if (hmm) {
> - hmm->mm = NULL;
> - hmm->dead = true;
> -         spin_unlock(&mm->page_table_lock);
> - hmm_put(hmm);
> - return;
> - }
> -
> - spin_unlock(&mm->page_table_lock);
> -}
> -
>  static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>  {
>   struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
> 

Failed to find any problems with this. :)

Reviewed-by: John Hubbard 

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 11/11] mm/hmm: Remove confusing comment and logic from hmm_release

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> hmm_release() is called exactly once per hmm. ops->release() cannot
> accidentally trigger any action that would recurse back onto
> hmm->mirrors_sem.
> 
> This fixes a use after-free race of the form:
> 
>CPU0   CPU1
>hmm_release()
>  up_write(&hmm->mirrors_sem);
>  hmm_mirror_unregister(mirror)
>   down_write(&hmm->mirrors_sem);
>   up_write(&hmm->mirrors_sem);
>   kfree(mirror)
>  mirror->ops->release(mirror)
> 
> The only user we have today for ops->release is an empty function, so this
> is unambiguously safe.
> 
> As a consequence of plugging this race drivers are not allowed to
> register/unregister mirrors from within a release op.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  mm/hmm.c | 28 +---
>  1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 709d138dd49027..3a45dd3d778248 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct 
> mm_struct *mm)
>   WARN_ON(!list_empty(&hmm->ranges));
>   mutex_unlock(&hmm->lock);
>  
> - down_write(&hmm->mirrors_sem);
> - mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
> -   list);
> - while (mirror) {
> - list_del_init(&mirror->list);
> - if (mirror->ops->release) {
> - /*
> -  * Drop mirrors_sem so the release callback can wait
> -  * on any pending work that might itself trigger a
> -  * mmu_notifier callback and thus would deadlock with
> -  * us.
> -  */
> - up_write(&hmm->mirrors_sem);
> + down_read(&hmm->mirrors_sem);

This is cleaner and simpler, but I suspect it is leading to the deadlock
that Ralph Campbell is seeing in his driver testing. (And in general, holding
a lock during a driver callback usually leads to deadlocks.)

Ralph, is this the one? It's the only place in this patchset where I can
see a lock around a callback to driver code, that wasn't there before. So
I'm pretty sure it is the one...


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 04/11] mm/hmm: Simplify hmm_get_or_create and make it reliable

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> As coded this function can false-fail in various racy situations. Make it
> reliable by running only under the write side of the mmap_sem and avoiding
> the false-failing compare/exchange pattern.
> 
> Also make the locking very easy to understand by only ever reading or
> writing mm->hmm while holding the write side of the mmap_sem.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
> v2:
> - Fix error unwind of mmgrab (Jerome)
> - Use hmm local instead of 2nd container_of (Jerome)
> ---
>  mm/hmm.c | 80 
>  1 file changed, 29 insertions(+), 51 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index cc7c26fda3300e..dc30edad9a8a02 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -40,16 +40,6 @@
>  #if IS_ENABLED(CONFIG_HMM_MIRROR)
>  static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
>  
> -static inline struct hmm *mm_get_hmm(struct mm_struct *mm)
> -{
> - struct hmm *hmm = READ_ONCE(mm->hmm);
> -
> - if (hmm && kref_get_unless_zero(&hmm->kref))
> - return hmm;
> -
> - return NULL;
> -}
> -
>  /**
>   * hmm_get_or_create - register HMM against an mm (HMM internal)
>   *
> @@ -64,11 +54,20 @@ static inline struct hmm *mm_get_hmm(struct mm_struct *mm)
>   */
>  static struct hmm *hmm_get_or_create(struct mm_struct *mm)
>  {
> - struct hmm *hmm = mm_get_hmm(mm);
> - bool cleanup = false;
> + struct hmm *hmm;
>  
> - if (hmm)
> - return hmm;
> + lockdep_assert_held_exclusive(&mm->mmap_sem);
> +
> + if (mm->hmm) {
> + if (kref_get_unless_zero(&mm->hmm->kref))
> + return mm->hmm;
> + /*
> +  * The hmm is being freed by some other CPU and is pending a
> +  * RCU grace period, but this CPU can NULL now it since we
> +  * have the mmap_sem.
> +  */
> + mm->hmm = NULL;
> + }
>  
>   hmm = kmalloc(sizeof(*hmm), GFP_KERNEL);
>   if (!hmm)
> @@ -83,57 +82,36 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm)
>   hmm->notifiers = 0;
>   hmm->dead = false;
>   hmm->mm = mm;
> - mmgrab(hmm->mm);
> -
> - spin_lock(&mm->page_table_lock);
> - if (!mm->hmm)
> - mm->hmm = hmm;
> - else
> - cleanup = true;
> - spin_unlock(&mm->page_table_lock);
>  
> - if (cleanup)
> - goto error;
> -
> - /*
> -  * We should only get here if hold the mmap_sem in write mode ie on
> -  * registration of first mirror through hmm_mirror_register()
> -  */
>   hmm->mmu_notifier.ops = &hmm_mmu_notifier_ops;
> - if (__mmu_notifier_register(&hmm->mmu_notifier, mm))
> - goto error_mm;
> + if (__mmu_notifier_register(&hmm->mmu_notifier, mm)) {
> + kfree(hmm);
> + return NULL;
> + }
>  
> + mmgrab(hmm->mm);
> + mm->hmm = hmm;
>   return hmm;
> -
> -error_mm:
> - spin_lock(&mm->page_table_lock);
> - if (mm->hmm == hmm)
> - mm->hmm = NULL;
> - spin_unlock(&mm->page_table_lock);
> -error:
> - mmdrop(hmm->mm);
> - kfree(hmm);
> - return NULL;
>  }
>  
>  static void hmm_free_rcu(struct rcu_head *rcu)
>  {
> - kfree(container_of(rcu, struct hmm, rcu));
> + struct hmm *hmm = container_of(rcu, struct hmm, rcu);
> +
> + down_write(&hmm->mm->mmap_sem);
> + if (hmm->mm->hmm == hmm)
> + hmm->mm->hmm = NULL;
> + up_write(&hmm->mm->mmap_sem);
> + mmdrop(hmm->mm);
> +
> + kfree(hmm);
>  }
>  
>  static void hmm_free(struct kref *kref)
>  {
>   struct hmm *hmm = container_of(kref, struct hmm, kref);
> - struct mm_struct *mm = hmm->mm;
> -
> - mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm);
>  
> - spin_lock(&mm->page_table_lock);
> - if (mm->hmm == hmm)
> - mm->hmm = NULL;
> - spin_unlock(&mm->page_table_lock);
> -
> - mmdrop(hmm->mm);
> + mmu_notifier_unregister_no_release(&hmm->mmu_notifier, hmm->mm);
>   mmu_notifier_call_srcu(&hmm->rcu, hmm_free_rcu);
>  }
>  
> 

Yes.

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 10/11] mm/hmm: Do not use list*_rcu() for hmm->ranges

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> This list is always read and written while holding hmm->lock so there is
> no need for the confusing _rcu annotations.
> 
> Signed-off-by: Jason Gunthorpe 
> Reviewed-by: Jérôme Glisse 
> ---
>  mm/hmm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index c2fecb3ecb11e1..709d138dd49027 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -911,7 +911,7 @@ int hmm_range_register(struct hmm_range *range,
>   mutex_lock(&hmm->lock);
>  
>   range->hmm = hmm;
> - list_add_rcu(&range->list, &hmm->ranges);
> + list_add(&range->list, &hmm->ranges);
>  
>   /*
>* If there are any concurrent notifiers we have to wait for them for
> @@ -941,7 +941,7 @@ void hmm_range_unregister(struct hmm_range *range)
>   return;
>  
>   mutex_lock(&hmm->lock);
> - list_del_rcu(&range->list);
> + list_del(&range->list);
>   mutex_unlock(&hmm->lock);
>  
>   /* Drop reference taken by hmm_range_register() */
> 

Good point. I'd overlooked this many times.

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 07/11] mm/hmm: Use lockdep instead of comments

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> So we can check locking at runtime.
> 
> Signed-off-by: Jason Gunthorpe 
> Reviewed-by: Jérôme Glisse 
> ---
> v2
> - Fix missing & in lockdeps (Jason)
> ---
>  mm/hmm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f67ba32983d9f1..c702cd72651b53 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -254,11 +254,11 @@ static const struct mmu_notifier_ops 
> hmm_mmu_notifier_ops = {
>   *
>   * To start mirroring a process address space, the device driver must 
> register
>   * an HMM mirror struct.
> - *
> - * THE mm->mmap_sem MUST BE HELD IN WRITE MODE !
>   */
>  int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm)
>  {
> + lockdep_assert_held_exclusive(&mm->mmap_sem);
> +
>   /* Sanity check */
>   if (!mm || !mirror || !mirror->ops)
>   return -EINVAL;
> 

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 02/11] mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_register

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> Ralph observes that hmm_range_register() can only be called by a driver
> while a mirror is registered. Make this clear in the API by passing in the
> mirror structure as a parameter.
> 
> This also simplifies understanding the lifetime model for struct hmm, as
> the hmm pointer must be valid as part of a registered mirror so all we
> need in hmm_register_range() is a simple kref_get.
> 
> Suggested-by: Ralph Campbell 
> Signed-off-by: Jason Gunthorpe 
> ---
> v2
> - Include the oneline patch to nouveau_svm.c
> ---
>  drivers/gpu/drm/nouveau/nouveau_svm.c |  2 +-
>  include/linux/hmm.h   |  7 ---
>  mm/hmm.c  | 15 ++-
>  3 files changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
> b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index 93ed43c413f0bb..8c92374afcf227 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -649,7 +649,7 @@ nouveau_svm_fault(struct nvif_notify *notify)
>   range.values = nouveau_svm_pfn_values;
>   range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT;
>  again:
> - ret = hmm_vma_fault(&range, true);
> + ret = hmm_vma_fault(&svmm->mirror, &range, true);
>   if (ret == 0) {
>   mutex_lock(&svmm->mutex);
>   if (!hmm_vma_range_done(&range)) {
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 688c5ca7068795..2d519797cb134a 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -505,7 +505,7 @@ static inline bool hmm_mirror_mm_is_alive(struct 
> hmm_mirror *mirror)
>   * Please see Documentation/vm/hmm.rst for how to use the range API.
>   */
>  int hmm_range_register(struct hmm_range *range,
> -struct mm_struct *mm,
> +struct hmm_mirror *mirror,
>  unsigned long start,
>  unsigned long end,
>  unsigned page_shift);
> @@ -541,7 +541,8 @@ static inline bool hmm_vma_range_done(struct hmm_range 
> *range)
>  }
>  
>  /* This is a temporary helper to avoid merge conflict between trees. */
> -static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> +static inline int hmm_vma_fault(struct hmm_mirror *mirror,
> + struct hmm_range *range, bool block)
>  {
>   long ret;
>  
> @@ -554,7 +555,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, 
> bool block)
>   range->default_flags = 0;
>   range->pfn_flags_mask = -1UL;
>  
> - ret = hmm_range_register(range, range->vma->vm_mm,
> + ret = hmm_range_register(range, mirror,
>range->start, range->end,
>PAGE_SHIFT);
>   if (ret)
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 547002f56a163d..8796447299023c 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -925,13 +925,13 @@ static void hmm_pfns_clear(struct hmm_range *range,
>   * Track updates to the CPU page table see include/linux/hmm.h
>   */
>  int hmm_range_register(struct hmm_range *range,
> -struct mm_struct *mm,
> +struct hmm_mirror *mirror,
>  unsigned long start,
>  unsigned long end,
>  unsigned page_shift)
>  {
>   unsigned long mask = ((1UL << page_shift) - 1UL);
> - struct hmm *hmm;
> + struct hmm *hmm = mirror->hmm;
>  
>   range->valid = false;
>   range->hmm = NULL;
> @@ -945,15 +945,12 @@ int hmm_range_register(struct hmm_range *range,
>   range->start = start;
>   range->end = end;
>  
> - hmm = hmm_get_or_create(mm);
> - if (!hmm)
> - return -EFAULT;
> -
>   /* Check if hmm_mm_destroy() was call. */
> - if (hmm->mm == NULL || hmm->dead) {
> -     hmm_put(hmm);
> +     if (hmm->mm == NULL || hmm->dead)
>   return -EFAULT;
> - }
> +
> + range->hmm = hmm;
> + kref_get(&hmm->kref);
>  
>   /* Initialize range to track CPU page table updates. */
>   mutex_lock(&hmm->lock);
> 

I'm not a qualified Nouveau reviewer, but this looks obviously correct,
so:

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 05/11] mm/hmm: Remove duplicate condition test before wait_event_timeout

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> The wait_event_timeout macro already tests the condition as its first
> action, so there is no reason to open code another version of this, all
> that does is skip the might_sleep() debugging in common cases, which is
> not helpful.
> 
> Further, based on prior patches, we can no simplify the required condition

  "now simplify"

> test:
>  - If range is valid memory then so is range->hmm
>  - If hmm_release() has run then range->valid is set to false
>at the same time as dead, so no reason to check both.
>  - A valid hmm has a valid hmm->mm.
> 
> Also, add the READ_ONCE for range->valid as there is no lock held here.
> 
> Signed-off-by: Jason Gunthorpe 
> Reviewed-by: Jérôme Glisse 
> ---
>  include/linux/hmm.h | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 4ee3acabe5ed22..2ab35b40992b24 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -218,17 +218,9 @@ static inline unsigned long hmm_range_page_size(const 
> struct hmm_range *range)
>  static inline bool hmm_range_wait_until_valid(struct hmm_range *range,
> unsigned long timeout)
>  {
> - /* Check if mm is dead ? */
> - if (range->hmm == NULL || range->hmm->dead || range->hmm->mm == NULL) {
> - range->valid = false;
> - return false;
> - }
> - if (range->valid)
> - return true;
> - wait_event_timeout(range->hmm->wq, range->valid || range->hmm->dead,
> + wait_event_timeout(range->hmm->wq, range->valid,
>  msecs_to_jiffies(timeout));
> - /* Return current valid status just in case we get lucky */
> - return range->valid;
> + return READ_ONCE(range->valid);

Just to ensure that I actually understand the model: I'm assuming that the 
READ_ONCE is there solely to ensure that range->valid is read *after* the
wait_event_timeout() returns. Is that correct?


>  }
>  
>  /*
> 

In any case, it looks good, so:

Reviewed-by: John Hubbard 

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 01/11] mm/hmm: fix use after free with struct hmm in the mmu notifiers

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
...
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 8e7403f081f44a..547002f56a163d 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
...
> @@ -125,7 +130,7 @@ static void hmm_free(struct kref *kref)
>   mm->hmm = NULL;
>   spin_unlock(&mm->page_table_lock);
>  
> - kfree(hmm);
> + mmu_notifier_call_srcu(&hmm->rcu, hmm_free_rcu);


It occurred to me to wonder if it is best to use the MMU notifier's
instance of srcu, instead of creating a separate instance for HMM.
But this really does seem appropriate, since we are after all using
this to synchronize with MMU notifier callbacks. So, fine.


>  }
>  
>  static inline void hmm_put(struct hmm *hmm)
> @@ -153,10 +158,14 @@ void hmm_mm_destroy(struct mm_struct *mm)
>  
>  static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>  {
> - struct hmm *hmm = mm_get_hmm(mm);
> + struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
>   struct hmm_mirror *mirror;
>   struct hmm_range *range;
>  
> + /* hmm is in progress to free */

Well, sometimes, yes. :)

Maybe this wording is clearer (if we need any comment at all):

/* Bail out if hmm is in the process of being freed */

> + if (!kref_get_unless_zero(&hmm->kref))
> + return;
> +
>   /* Report this HMM as dying. */
>   hmm->dead = true;
>  
> @@ -194,13 +203,15 @@ static void hmm_release(struct mmu_notifier *mn, struct 
> mm_struct *mm)
>  static int hmm_invalidate_range_start(struct mmu_notifier *mn,
>   const struct mmu_notifier_range *nrange)
>  {
> - struct hmm *hmm = mm_get_hmm(nrange->mm);
> + struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
>   struct hmm_mirror *mirror;
>   struct hmm_update update;
>   struct hmm_range *range;
>   int ret = 0;
>  
> - VM_BUG_ON(!hmm);
> + /* hmm is in progress to free */

Same here.

> + if (!kref_get_unless_zero(&hmm->kref))
> + return 0;
>  
>   update.start = nrange->start;
>   update.end = nrange->end;
> @@ -245,9 +256,11 @@ static int hmm_invalidate_range_start(struct 
> mmu_notifier *mn,
>  static void hmm_invalidate_range_end(struct mmu_notifier *mn,
>   const struct mmu_notifier_range *nrange)
>  {
> - struct hmm *hmm = mm_get_hmm(nrange->mm);
> + struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
>  
> - VM_BUG_ON(!hmm);
> + /* hmm is in progress to free */

And here.

> + if (!kref_get_unless_zero(&hmm->kref))
> + return;
>  
>   mutex_lock(&hmm->lock);
>   hmm->notifiers--;
> 

Elegant fix. Regardless of the above chatter I added, you can add:

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 06/11] mm/hmm: Hold on to the mmget for the lifetime of the range

2019-06-07 Thread John Hubbard
!mmget_not_zero(hmm->mm))
>   return -EFAULT;
>  
>   range->hmm = hmm;
> @@ -955,6 +951,7 @@ void hmm_range_unregister(struct hmm_range *range)
>  
>   /* Drop reference taken by hmm_range_register() */
>   range->valid = false;
> + mmput(hmm->mm);
>   hmm_put(hmm);
>   range->hmm = NULL;
>  }
> @@ -982,10 +979,7 @@ long hmm_range_snapshot(struct hmm_range *range)
>   struct vm_area_struct *vma;
>   struct mm_walk mm_walk;
>  
> - /* Check if hmm_mm_destroy() was call. */
> - if (hmm->mm == NULL || hmm->dead)
> - return -EFAULT;
> -
> + lockdep_assert_held(&hmm->mm->mmap_sem);
>   do {
>   /* If range is no longer valid force retry. */
>   if (!range->valid)
> @@ -1080,9 +1074,7 @@ long hmm_range_fault(struct hmm_range *range, bool 
> block)
>   struct mm_walk mm_walk;
>   int ret;
>  
> - /* Check if hmm_mm_destroy() was call. */
> - if (hmm->mm == NULL || hmm->dead)
> - return -EFAULT;
> + lockdep_assert_held(&hmm->mm->mmap_sem);
>  
>   do {
>   /* If range is no longer valid force retry. */
> 

Nice cleanup.

Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 08/11] mm/hmm: Remove racy protection against double-unregistration

2019-06-07 Thread John Hubbard
On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> No other register/unregister kernel API attempts to provide this kind of
> protection as it is inherently racy, so just drop it.
> 
> Callers should provide their own protection, it appears nouveau already
> does, but just in case drop a debugging POISON.
> 
> Signed-off-by: Jason Gunthorpe 
> Reviewed-by: Jérôme Glisse 
> ---
>  mm/hmm.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index c702cd72651b53..6802de7080d172 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -284,18 +284,13 @@ EXPORT_SYMBOL(hmm_mirror_register);
>   */
>  void hmm_mirror_unregister(struct hmm_mirror *mirror)
>  {
> - struct hmm *hmm = READ_ONCE(mirror->hmm);
> -
> - if (hmm == NULL)
> - return;
> + struct hmm *hmm = mirror->hmm;
>  
>   down_write(&hmm->mirrors_sem);
>   list_del_init(&mirror->list);
> - /* To protect us against double unregister ... */
> - mirror->hmm = NULL;
>   up_write(&hmm->mirrors_sem);
> -
>   hmm_put(hmm);
> + memset(&mirror->hmm, POISON_INUSE, sizeof(mirror->hmm));

I hadn't thought of POISON_* for these types of cases, it's a 
good technique to remember.

I noticed that this is now done outside of the lock, but that
follows directly from your commit description, so that all looks 
correct.

>  }
>  EXPORT_SYMBOL(hmm_mirror_unregister);
>  
> 


Reviewed-by: John Hubbard 

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 01/11] mm/hmm: fix use after free with struct hmm in the mmu notifiers

2019-06-09 Thread John Hubbard
On 6/7/19 5:34 AM, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2019 at 07:29:08PM -0700, John Hubbard wrote:
>> On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
>>> From: Jason Gunthorpe 
>> ...
>>> @@ -153,10 +158,14 @@ void hmm_mm_destroy(struct mm_struct *mm)
>>>  
>>>  static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>>>  {
>>> -   struct hmm *hmm = mm_get_hmm(mm);
>>> +   struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
>>> struct hmm_mirror *mirror;
>>> struct hmm_range *range;
>>>  
>>> +   /* hmm is in progress to free */
>>
>> Well, sometimes, yes. :)
> 
> It think it is in all cases actually.. The only way we see a 0 kref
> and still reach this code path is if another thread has alreay setup
> the hmm_free in the call_srcu..
> 
>> Maybe this wording is clearer (if we need any comment at all):
> 
> I always find this hard.. This is a very standard pattern when working
> with RCU - however in my experience few people actually know the RCU
> patterns, and missing the _unless_zero is a common bug I find when
> looking at code.
> 
> This is mm/ so I can drop it, what do you think?
> 

I forgot to respond to this section, so catching up now:

I think we're talking about slightly different things. I was just
noting that the comment above the "if" statement was only accurate
if the branch is taken, which is why I recommended this combination
of comment and code:

/* Bail out if hmm is in the process of being freed */
if (!kref_get_unless_zero(&hmm->kref))
return;

As for the actual _unless_zero part, I think that's good to have.
And it's a good reminder if nothing else, even in mm/ code.

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3 hmm 05/11] mm/hmm: Remove duplicate condition test before wait_event_timeout

2019-06-09 Thread John Hubbard
On 6/7/19 6:31 AM, Jason Gunthorpe wrote:
> The wait_event_timeout macro already tests the condition as its first
> action, so there is no reason to open code another version of this, all
> that does is skip the might_sleep() debugging in common cases, which is
> not helpful.
> 
> Further, based on prior patches, we can now simplify the required condition
> test:
>  - If range is valid memory then so is range->hmm
>  - If hmm_release() has run then range->valid is set to false
>at the same time as dead, so no reason to check both.
>  - A valid hmm has a valid hmm->mm.
> 
> Allowing the return value of wait_event_timeout() (along with its internal
> barriers) to compute the result of the function.
> 
> Signed-off-by: Jason Gunthorpe 
> ---


Reviewed-by: John Hubbard 


thanks,
-- 
John Hubbard
NVIDIA



> v3
> - Simplify the wait_event_timeout to not check valid
> ---
>  include/linux/hmm.h | 13 ++---
>  1 file changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 1d97b6d62c5bcf..26e7c477490c4e 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -209,17 +209,8 @@ static inline unsigned long hmm_range_page_size(const 
> struct hmm_range *range)
>  static inline bool hmm_range_wait_until_valid(struct hmm_range *range,
> unsigned long timeout)
>  {
> - /* Check if mm is dead ? */
> - if (range->hmm == NULL || range->hmm->dead || range->hmm->mm == NULL) {
> - range->valid = false;
> - return false;
> - }
> - if (range->valid)
> - return true;
> - wait_event_timeout(range->hmm->wq, range->valid || range->hmm->dead,
> -msecs_to_jiffies(timeout));
> - /* Return current valid status just in case we get lucky */
> - return range->valid;
> + return wait_event_timeout(range->hmm->wq, range->valid,
> +   msecs_to_jiffies(timeout)) != 0;
>  }
>  
>  /*
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 hmm 01/11] mm/hmm: fix use after free with struct hmm in the mmu notifiers

2019-06-09 Thread John Hubbard
On 6/7/19 5:34 AM, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2019 at 07:29:08PM -0700, John Hubbard wrote:
>> On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
>>> From: Jason Gunthorpe 
>> ...
>>> diff --git a/mm/hmm.c b/mm/hmm.c
>>> index 8e7403f081f44a..547002f56a163d 100644
>>> +++ b/mm/hmm.c
>> ...
>>> @@ -125,7 +130,7 @@ static void hmm_free(struct kref *kref)
>>> mm->hmm = NULL;
>>> spin_unlock(&mm->page_table_lock);
>>>  
>>> -   kfree(hmm);
>>> +   mmu_notifier_call_srcu(&hmm->rcu, hmm_free_rcu);
>>
>>
>> It occurred to me to wonder if it is best to use the MMU notifier's
>> instance of srcu, instead of creating a separate instance for HMM.
> 
> It *has* to be the MMU notifier SRCU because we are synchornizing
> against the read side of that SRU inside the mmu notifier code, ie:
> 
> int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> id = srcu_read_lock(&srcu);
> hlist_for_each_entry_rcu(mn, &range->mm->mmu_notifier_mm->list, 
> hlist) {
> if (mn->ops->invalidate_range_start) {
>^
> 
> Here 'mn' is really hmm (hmm = container_of(mn, struct hmm,
> mmu_notifier)), so we must protect the memory against free for the mmu
> notifier core.
> 
> Thus we have no choice but to use its SRCU.

Ah right. It's embarassingly obvious when you say it out loud. :) 
Thanks for explaining.

> 
> CH also pointed out a more elegant solution, which is to get the write
> side of the mmap_sem during hmm_mirror_unregister - no notifier
> callback can be running in this case. Then we delete the kref, srcu
> and so forth.
> 
> This is much clearer/saner/better, but.. requries the callers of
> hmm_mirror_unregister to be safe to get the mmap_sem write side.
> 
> I think this is true, so maybe this patch should be switched, what do
> you think?

OK, your follow-up notes that we'll leave it as is, got it.


thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC] mm/hmm: pass mmu_notifier_range to sync_cpu_device_pagetables

2019-06-10 Thread John Hubbard
On 6/8/19 4:41 AM, Jason Gunthorpe wrote:
> On Sat, Jun 08, 2019 at 02:10:08AM -0700, Christoph Hellwig wrote:
>> On Fri, Jun 07, 2019 at 05:14:52PM -0700, Ralph Campbell wrote:
>>> HMM defines its own struct hmm_update which is passed to the
>>> sync_cpu_device_pagetables() callback function. This is
>>> sufficient when the only action is to invalidate. However,
>>> a device may want to know the reason for the invalidation and
>>> be able to see the new permissions on a range, update device access
>>> rights or range statistics. Since sync_cpu_device_pagetables()
>>> can be called from try_to_unmap(), the mmap_sem may not be held
>>> and find_vma() is not safe to be called.
>>> Pass the struct mmu_notifier_range to sync_cpu_device_pagetables()
>>> to allow the full invalidation information to be used.
>>>
>>> Signed-off-by: Ralph Campbell 
>>>
>>> I'm sending this out now since we are updating many of the HMM APIs
>>> and I think it will be useful.
>>
>> This is the right thing to do.  But the really right thing is to just
>> kill the hmm_mirror API entirely and move to mmu_notifiers.  At least
>> for noveau this already is way simpler, although right now it defeats
>> Jasons patch to avoid allocating the struct hmm in the fault path.
>> But as said before that can be avoided by just killing struct hmm,
>> which for many reasons is the right thing to do anyway.
>>
>> I've got a series here, which is a bit broken (epecially the last
>> patch can't work as-is), but should explain where I'm trying to head:
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-mirror-simplification
> 
> At least the current hmm approach does rely on the collision retry
> locking scheme in struct hmm/struct hmm_range for the pagefault side
> to work right.
> 
> So, before we can apply patch one in this series we need to fix
> hmm_vma_fault() and all its varients. Otherwise the driver will be
> broken.
> 
> I'm hoping to first define what this locking should be (see other
> emails to Ralph) then, ideally, see if we can extend mmu notifiers to
> get it directly withouth hmm stuff.
> 
> Then we apply your patch one and the hmm ops wrapper dies.
> 

This all makes sense, and thanks for all this work to simplify and clarify
HMM. It's going to make it a lot easier to work with, when the dust settles.

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] [PATCH 03/22] mm: remove hmm_devmem_add_resource

2019-06-14 Thread John Hubbard
On 6/13/19 2:43 AM, Christoph Hellwig wrote:
> This function has never been used since it was first added to the kernel
> more than a year and a half ago, and if we ever grow a consumer of the
> MEMORY_DEVICE_PUBLIC infrastructure it can easily use devm_memremap_pages
> directly now that we've simplified the API for it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/hmm.h |  3 ---
>  mm/hmm.c| 54 -
>  2 files changed, 57 deletions(-)
> 

No objections here, good cleanup.

Reviewed-by: John Hubbard  

thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 4867b9da1b6c..5761a39221a6 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -688,9 +688,6 @@ struct hmm_devmem {
>  struct hmm_devmem *hmm_devmem_add(const struct hmm_devmem_ops *ops,
> struct device *device,
> unsigned long size);
> -struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
> -struct device *device,
> -struct resource *res);
>  
>  /*
>   * hmm_devmem_page_set_drvdata - set per-page driver data field
> diff --git a/mm/hmm.c b/mm/hmm.c
> index ff2598eb7377..0c62426d1257 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -1445,58 +1445,4 @@ struct hmm_devmem *hmm_devmem_add(const struct 
> hmm_devmem_ops *ops,
>   return devmem;
>  }
>  EXPORT_SYMBOL_GPL(hmm_devmem_add);
> -
> -struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
> -struct device *device,
> -struct resource *res)
> -{
> - struct hmm_devmem *devmem;
> - void *result;
> - int ret;
> -
> - if (res->desc != IORES_DESC_DEVICE_PUBLIC_MEMORY)
> - return ERR_PTR(-EINVAL);
> -
> - dev_pagemap_get_ops();
> -
> - devmem = devm_kzalloc(device, sizeof(*devmem), GFP_KERNEL);
> - if (!devmem)
> - return ERR_PTR(-ENOMEM);
> -
> - init_completion(&devmem->completion);
> - devmem->pfn_first = -1UL;
> - devmem->pfn_last = -1UL;
> - devmem->resource = res;
> - devmem->device = device;
> - devmem->ops = ops;
> -
> - ret = percpu_ref_init(&devmem->ref, &hmm_devmem_ref_release,
> -   0, GFP_KERNEL);
> - if (ret)
> - return ERR_PTR(ret);
> -
> - ret = devm_add_action_or_reset(device, hmm_devmem_ref_exit,
> - &devmem->ref);
> - if (ret)
> - return ERR_PTR(ret);
> -
> - devmem->pfn_first = devmem->resource->start >> PAGE_SHIFT;
> - devmem->pfn_last = devmem->pfn_first +
> -(resource_size(devmem->resource) >> PAGE_SHIFT);
> - devmem->page_fault = hmm_devmem_fault;
> -
> - devmem->pagemap.type = MEMORY_DEVICE_PUBLIC;
> - devmem->pagemap.res = *devmem->resource;
> - devmem->pagemap.page_free = hmm_devmem_free;
> - devmem->pagemap.altmap_valid = false;
> - devmem->pagemap.ref = &devmem->ref;
> - devmem->pagemap.data = devmem;
> - devmem->pagemap.kill = hmm_devmem_ref_kill;
> -
> - result = devm_memremap_pages(devmem->device, &devmem->pagemap);
> - if (IS_ERR(result))
> - return result;
> - return devmem;
> -}
> -EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
>  #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 05/22] mm: export alloc_pages_vma

2019-06-14 Thread John Hubbard
On 6/13/19 2:43 AM, Christoph Hellwig wrote:
> noveau is currently using this through an odd hmm wrapper, and I plan

  "nouveau"

> to switch it to the real thing later in this series.
> 
> Signed-off-by: Christoph Hellwig 
> ---

Reviewed-by: John Hubbard  

thanks,
-- 
John Hubbard
NVIDIA

>  mm/mempolicy.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 01600d80ae01..f9023b5fba37 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2098,6 +2098,7 @@ alloc_pages_vma(gfp_t gfp, int order, struct 
> vm_area_struct *vma,
>  out:
>   return page;
>  }
> +EXPORT_SYMBOL_GPL(alloc_pages_vma);
>  
>  /**
>   *   alloc_pages_current - Allocate pages.
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

2019-06-14 Thread John Hubbard
On 6/13/19 5:11 PM, Ralph Campbell wrote:
> In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked before
> calling nouveau_dmem_chunk_alloc().
> Reacquire the lock before continuing to the next page.
> 
> Signed-off-by: Ralph Campbell 
> ---
> 
> I found this while testing Jason Gunthorpe's hmm tree but this is
> independant of those changes. I guess it could go through
> David Airlie's tree for nouveau or Jason's tree.
> 

Hi Ralph,

btw, was this the fix for the crash you were seeing? It might be nice to
mention in the commit description, if you are seeing real symptoms.


>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 27aa4e72abe9..00f7236af1b9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -379,9 +379,10 @@ nouveau_dmem_pages_alloc(struct nouveau_drm *drm,
>   ret = nouveau_dmem_chunk_alloc(drm);
>   if (ret) {
>   if (c)
> - break;

Actually, the pre-existing code is a little concerning. Your change preserves
the behavior, but it seems questionable to be doing a "return 0" (whether
via the above break, or your change) when it's in this partially allocated
state. It's reporting success when it only allocates part of what was requested,
and it doesn't fill in the pages array either.



> + return 0;
>   return ret;
>   }
> + mutex_lock(&drm->dmem->mutex);
>   continue;
>   }
>  
> 

The above comment is about pre-existing potential problems, but your patch 
itself
looks correct, so:

Reviewed-by: John Hubbard  


thanks,
-- 
John Hubbard
NVIDIA

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-14 Thread John Hubbard
On 6/13/19 5:43 PM, Ira Weiny wrote:
> On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
>> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
>>>
...
>> Hum, so the only thing this config does is short circuit here:
>>
>> static inline bool is_device_public_page(const struct page *page)
>> {
>> return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
>> IS_ENABLED(CONFIG_DEVICE_PUBLIC) &&
>> is_zone_device_page(page) &&
>> page->pgmap->type == MEMORY_DEVICE_PUBLIC;
>> }
>>
>> Which is called all over the place.. 
> 
>   yes but the earlier patch:
> 
> [PATCH 03/22] mm: remove hmm_devmem_add_resource
> 
> Removes the only place type is set to MEMORY_DEVICE_PUBLIC.
> 
> So I think it is ok.  Frankly I was wondering if we should remove the public
> type altogether but conceptually it seems ok.  But I don't see any users of it
> so...  should we get rid of it in the code rather than turning the config off?
> 
> Ira

That seems reasonable. I recall that the hope was for those IBM Power 9
systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
memory, and so the memory really is visible to the CPU. And the IBM team
was thinking of taking advantage of it. But I haven't seen anything on
that front for a while.

So maybe it will get re-added as part of a future patchset to use that
kind of memory, but yes, we should not hesitate to clean house at this
point, and delete unused code.


thanks,
-- 
John Hubbard
NVIDIA

> 
>>
>> So, yes, we really don't want any distro or something to turn this on
>> until it has a use.
>>
>> Reviewed-by: Jason Gunthorpe 
>>
>> Jason
>> ___
>> Linux-nvdimm mailing list
>> linux-nvd...@lists.01.org
>> https://lists.01.org/mailman/listinfo/linux-nvdimm
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] [PATCH 02/22] mm: remove the struct hmm_device infrastructure

2019-06-14 Thread John Hubbard
On 6/13/19 2:43 AM, Christoph Hellwig wrote:
> This code is a trivial wrapper around device model helpers, which
> should have been integrated into the driver device model usage from
> the start.  Assuming it actually had users, which it never had since
> the code was added more than 1 1/2 years ago.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/hmm.h | 20 
>  mm/hmm.c| 80 -
>  2 files changed, 100 deletions(-)
> 

Yes. This code is definitely unnecessary, and it's a good housecleaning here.

(As to the history: I know that there was some early "HMM dummy device" 
testing when the HMM code was much younger, but such testing has long since
been superseded by more elaborate testing with real drivers.)


Reviewed-by: John Hubbard  


thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 0fa8ea34ccef..4867b9da1b6c 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -717,26 +717,6 @@ static inline unsigned long 
> hmm_devmem_page_get_drvdata(const struct page *page)
>  {
>   return page->hmm_data;
>  }
> -
> -
> -/*
> - * struct hmm_device - fake device to hang device memory onto
> - *
> - * @device: device struct
> - * @minor: device minor number
> - */
> -struct hmm_device {
> - struct device   device;
> - unsigned intminor;
> -};
> -
> -/*
> - * A device driver that wants to handle multiple devices memory through a
> - * single fake device can use hmm_device to do so. This is purely a helper 
> and
> - * it is not strictly needed, in order to make use of any HMM functionality.
> - */
> -struct hmm_device *hmm_device_new(void *drvdata);
> -void hmm_device_put(struct hmm_device *hmm_device);
>  #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
>  #else /* IS_ENABLED(CONFIG_HMM) */
>  static inline void hmm_mm_destroy(struct mm_struct *mm) {}
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 886b18695b97..ff2598eb7377 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -1499,84 +1499,4 @@ struct hmm_devmem *hmm_devmem_add_resource(const 
> struct hmm_devmem_ops *ops,
>   return devmem;
>  }
>  EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
> -
> -/*
> - * A device driver that wants to handle multiple devices memory through a
> - * single fake device can use hmm_device to do so. This is purely a helper
> - * and it is not needed to make use of any HMM functionality.
> - */
> -#define HMM_DEVICE_MAX 256
> -
> -static DECLARE_BITMAP(hmm_device_mask, HMM_DEVICE_MAX);
> -static DEFINE_SPINLOCK(hmm_device_lock);
> -static struct class *hmm_device_class;
> -static dev_t hmm_device_devt;
> -
> -static void hmm_device_release(struct device *device)
> -{
> - struct hmm_device *hmm_device;
> -
> - hmm_device = container_of(device, struct hmm_device, device);
> - spin_lock(&hmm_device_lock);
> - clear_bit(hmm_device->minor, hmm_device_mask);
> - spin_unlock(&hmm_device_lock);
> -
> - kfree(hmm_device);
> -}
> -
> -struct hmm_device *hmm_device_new(void *drvdata)
> -{
> - struct hmm_device *hmm_device;
> -
> - hmm_device = kzalloc(sizeof(*hmm_device), GFP_KERNEL);
> - if (!hmm_device)
> - return ERR_PTR(-ENOMEM);
> -
> - spin_lock(&hmm_device_lock);
> - hmm_device->minor = find_first_zero_bit(hmm_device_mask, 
> HMM_DEVICE_MAX);
> - if (hmm_device->minor >= HMM_DEVICE_MAX) {
> - spin_unlock(&hmm_device_lock);
> - kfree(hmm_device);
> - return ERR_PTR(-EBUSY);
> - }
> - set_bit(hmm_device->minor, hmm_device_mask);
> - spin_unlock(&hmm_device_lock);
> -
> - dev_set_name(&hmm_device->device, "hmm_device%d", hmm_device->minor);
> - hmm_device->device.devt = MKDEV(MAJOR(hmm_device_devt),
> - hmm_device->minor);
> - hmm_device->device.release = hmm_device_release;
> - dev_set_drvdata(&hmm_device->device, drvdata);
> - hmm_device->device.class = hmm_device_class;
> - device_initialize(&hmm_device->device);
> -
> - return hmm_device;
> -}
> -EXPORT_SYMBOL(hmm_device_new);
> -
> -void hmm_device_put(struct hmm_device *hmm_device)
> -{
> - put_device(&hmm_device->device);
> -}
> -EXPORT_SYMBOL(hmm_device_put);
> -
> -static int __init hmm_init(void)
> -{
> - int ret;
> -
> - ret = alloc_chrdev_region(&hmm_device_devt, 0,
> -   HMM_DEVICE_MAX,
> -   "hmm_

Re: [PATCH 04/22] mm: don't clear ->mapping in hmm_devmem_free

2019-06-14 Thread John Hubbard
On 6/13/19 2:43 AM, Christoph Hellwig wrote:
> ->mapping isn't even used by HMM users, and the field at the same offset
> in the zone_device part of the union is declared as pad.  (Which btw is
> rather confusing, as DAX uses ->pgmap and ->mapping from two different
> sides of the union, but DAX doesn't use hmm_devmem_free).
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/hmm.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 0c62426d1257..e1dc98407e7b 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -1347,8 +1347,6 @@ static void hmm_devmem_free(struct page *page, void 
> *data)
>  {
>   struct hmm_devmem *devmem = data;
>  
> - page->mapping = NULL;
> -
>   devmem->ops->free(devmem, page);
>  }
>  
> 

Yes, I think that line was unnecessary. I see from git history that it was
originally being set to NULL from within __put_devmap_managed_page(), and then
in commit 2fa147bdbf672c53386a8f5f2c7fe358004c3ef8, Dan moved it out of there,
and stashed in specifically here. But it appears to have been unnecessary from
the beginning.

Reviewed-by: John Hubbard  

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] [PATCH 22/22] mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR

2019-06-14 Thread John Hubbard
On 6/13/19 2:43 AM, Christoph Hellwig wrote:
> The migrate_vma helper is only used by noveau to migrate device private
> pages around.  Other HMM_MIRROR users like amdgpu or infiniband don't
> need it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/nouveau/Kconfig | 1 +
>  mm/Kconfig  | 1 -
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
> index 66c839d8e9d1..96b9814e6d06 100644
> --- a/drivers/gpu/drm/nouveau/Kconfig
> +++ b/drivers/gpu/drm/nouveau/Kconfig
> @@ -88,6 +88,7 @@ config DRM_NOUVEAU_SVM
>   depends on DRM_NOUVEAU
>   depends on HMM_MIRROR
>   depends on STAGING
> + select MIGRATE_VMA_HELPER
>   default n
>   help
> Say Y here if you want to enable experimental support for
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 73676cb4693f..eca88679b624 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -679,7 +679,6 @@ config HMM_MIRROR
>   bool "HMM mirror CPU page table into a device page table"
>   depends on MMU
>   select MMU_NOTIFIER
> - select MIGRATE_VMA_HELPER
>   help
> Select HMM_MIRROR if you want to mirror range of the CPU page table 
> of a
> process into a device page table. Here, mirror means "keep 
> synchronized".
> 

For those who have out of tree drivers that need migrate_vma(), but are not
Nouveau, could we pretty please allow a way to select that independently?

It's not a big deal, as I expect the Nouveau option will normally be selected, 
but it would be nice. Because there is a valid configuration that involves 
Nouveau not being selected, but our driver still wanting to run.

Maybe we can add something like this on top of what you have?

diff --git a/mm/Kconfig b/mm/Kconfig
index eca88679b624..330996632513 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -670,7 +670,10 @@ config ZONE_DEVICE
  If FS_DAX is enabled, then say Y.
 
 config MIGRATE_VMA_HELPER
-   bool
+   bool "migrate_vma() helper routine"
+   help
+ Provides a migrate_vma() routine that GPUs and other
+ device drivers may need.
 
 config DEV_PAGEMAP_OPS
bool



thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 06/22] mm: factor out a devm_request_free_mem_region helper

2019-06-16 Thread John Hubbard
On 6/13/19 2:43 AM, Christoph Hellwig wrote:
> Keep the physical address allocation that hmm_add_device does with the
> rest of the resource code, and allow future reuse of it without the hmm
> wrapper.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/ioport.h |  2 ++
>  kernel/resource.c  | 39 +++
>  mm/hmm.c   | 33 -
>  3 files changed, 45 insertions(+), 29 deletions(-)

Some trivial typos noted below, but this accurately moves the code
into a helper routine, looks good.

Reviewed-by: John Hubbard  


> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..76a33ae3bf6c 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -286,6 +286,8 @@ static inline bool resource_overlaps(struct resource *r1, 
> struct resource *r2)
> return (r1->start <= r2->end && r1->end >= r2->start);
>  }
>  
> +struct resource *devm_request_free_mem_region(struct device *dev,
> + struct resource *base, unsigned long size);
>  
>  #endif /* __ASSEMBLY__ */
>  #endif   /* _LINUX_IOPORT_H */
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 158f04ec1d4f..99c58134ed1c 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -1628,6 +1628,45 @@ void resource_list_free(struct list_head *head)
>  }
>  EXPORT_SYMBOL(resource_list_free);
>  
> +#ifdef CONFIG_DEVICE_PRIVATE
> +/**
> + * devm_request_free_mem_region - find free region for device private memory
> + *
> + * @dev: device struct to bind the resource too

   "to"

> + * @size: size in bytes of the device memory to add
> + * @base: resource tree to look in
> + *
> + * This function tries to find an empty range of physical address big enough 
> to
> + * contain the new resource, so that it can later be hotpluged as ZONE_DEVICE

"hotplugged"

> + * memory, which in turn allocates struct pages.
> + */
> +struct resource *devm_request_free_mem_region(struct device *dev,
> + struct resource *base, unsigned long size)
> +{
> + resource_size_t end, addr;
> + struct resource *res;
> +
> + size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
> + end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> + addr = end - size + 1UL;
> +
> + for (; addr > size && addr >= base->start; addr -= size) {
> + if (region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> + REGION_DISJOINT)
> + continue;
> +
> + res = devm_request_mem_region(dev, addr, size, dev_name(dev));
> + if (!res)
> + return ERR_PTR(-ENOMEM);
> + res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> + return res;
> + }
> +
> + return ERR_PTR(-ERANGE);
> +}
> +EXPORT_SYMBOL_GPL(devm_request_free_mem_region);
> +#endif /* CONFIG_DEVICE_PRIVATE */
> +
>  static int __init strict_iomem(char *str)
>  {
>   if (strstr(str, "relaxed"))
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e1dc98407e7b..13a16faf0a77 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -26,8 +26,6 @@
>  #include 
>  #include 
>  
> -#define PA_SECTION_SIZE (1UL << PA_SECTION_SHIFT)
> -
>  #if IS_ENABLED(CONFIG_HMM_MIRROR)
>  static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
>  
> @@ -1372,7 +1370,6 @@ struct hmm_devmem *hmm_devmem_add(const struct 
> hmm_devmem_ops *ops,
> unsigned long size)
>  {
>   struct hmm_devmem *devmem;
> - resource_size_t addr;
>   void *result;
>   int ret;
>  
> @@ -1398,32 +1395,10 @@ struct hmm_devmem *hmm_devmem_add(const struct 
> hmm_devmem_ops *ops,
>   if (ret)
>   return ERR_PTR(ret);
>  
> - size = ALIGN(size, PA_SECTION_SIZE);
> - addr = min((unsigned long)iomem_resource.end,
> -(1UL << MAX_PHYSMEM_BITS) - 1);
> - addr = addr - size + 1UL;
> -
> - /*
> -  * FIXME add a new helper to quickly walk resource tree and find free
> -  * range
> -  *
> -  * FIXME what about ioport_resource resource ?
> -  */
> - for (; addr > size && addr >= iomem_resource.start; addr -= size) {
> - ret = region_intersects(addr, size, 0, IORES_DESC_NONE);
> - if (ret != REGION_DISJOINT)
> - continue;
> -
> - devmem->resource = devm_request_me

Re: [PATCH] drm/nouveau/dmem: missing mutex_lock in error path

2019-06-16 Thread John Hubbard
On 6/14/19 10:39 AM, Ralph Campbell wrote:
> On 6/13/19 5:49 PM, John Hubbard wrote:
>> On 6/13/19 5:11 PM, Ralph Campbell wrote:
...
>> Actually, the pre-existing code is a little concerning. Your change preserves
>> the behavior, but it seems questionable to be doing a "return 0" (whether
>> via the above break, or your change) when it's in this partially allocated
>> state. It's reporting success when it only allocates part of what was 
>> requested,
>> and it doesn't fill in the pages array either.
>>
>>
>>
>>> +    return 0;
>>>   return ret;
>>>   }
>>> +    mutex_lock(&drm->dmem->mutex);
>>>   continue;
>>>       }
>>>  
>>
>> The above comment is about pre-existing potential problems, but your patch 
>> itself
>> looks correct, so:
>>
>> Reviewed-by: John Hubbard 
>>
>>
>> thanks,
>>
> The crash was the NULL pointer bug in Christoph's patch #10.
> I sent a separate reply for that.
> 
> Below is the console output I got, then I made the changes just based on
> code inspection. Do you think I should include it in the change log?

Yes, I think it's good to have it in there. If you look at git log,
you'll see that it's common to include the symptoms, including the
backtrace. It helps people see if they are hitting the same problem,
for one thing.

> 
> As for the "return 0", If you follow the call chain,
> nouveau_dmem_pages_alloc() is only ever called for one page so this
> currently "works" but I agree it is a bit of a time bomb. There are a
> number of other bugs that I can see that need fixing but I think those
> should be separate patches.
> 

Yes of course. I called it out for the benefit of the email list, not to
say that your patch needs any changes. 

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-26 Thread John Hubbard
On 6/19/19 12:27 PM, Jason Gunthorpe wrote:
> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
>> On 6/13/19 5:43 PM, Ira Weiny wrote:
>>> On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
>>>> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
>>>>>
>> ...
>>> So I think it is ok.  Frankly I was wondering if we should remove the public
>>> type altogether but conceptually it seems ok.  But I don't see any users of 
>>> it
>>> so...  should we get rid of it in the code rather than turning the config 
>>> off?
>>>
>>> Ira
>>
>> That seems reasonable. I recall that the hope was for those IBM Power 9
>> systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
>> memory, and so the memory really is visible to the CPU. And the IBM team
>> was thinking of taking advantage of it. But I haven't seen anything on
>> that front for a while.
> 
> Does anyone know who those people are and can we encourage them to
> send some patches? :)
> 

I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there
in order to provide an alternative way to do things (such as migrate memory
to and from a device), in case the combination of existing and near-future
NUMA APIs was insufficient. This probably came as a follow-up to the early
2017-ish conversations about NUMA, in which the linux-mm recommendation was
"try using HMM mechanisms, and if those are inadequate, then maybe we can
look at enhancing NUMA so that it has better handling of advanced (GPU-like)
devices".

In the end, however, _PUBLIC was never used, nor does anyone in the local
(NVIDIA + IBM) kernel vicinity seem to have plans to use it.  So it really
does seem safe to remove, although of course it's good to start with 
BROKEN and see if anyone pops up and complains.

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-26 Thread John Hubbard
On 6/25/19 10:45 PM, Michal Hocko wrote:
> On Tue 25-06-19 20:15:28, John Hubbard wrote:
>> On 6/19/19 12:27 PM, Jason Gunthorpe wrote:
>>> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
>>>> On 6/13/19 5:43 PM, Ira Weiny wrote:
>>>>> On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
>>>>>> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
>>>>>>>
>>>> ...
>>>>> So I think it is ok.  Frankly I was wondering if we should remove the 
>>>>> public
>>>>> type altogether but conceptually it seems ok.  But I don't see any users 
>>>>> of it
>>>>> so...  should we get rid of it in the code rather than turning the config 
>>>>> off?
>>>>>
>>>>> Ira
>>>>
>>>> That seems reasonable. I recall that the hope was for those IBM Power 9
>>>> systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
>>>> memory, and so the memory really is visible to the CPU. And the IBM team
>>>> was thinking of taking advantage of it. But I haven't seen anything on
>>>> that front for a while.
>>>
>>> Does anyone know who those people are and can we encourage them to
>>> send some patches? :)
>>>
>>
>> I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there
>> in order to provide an alternative way to do things (such as migrate memory
>> to and from a device), in case the combination of existing and near-future
>> NUMA APIs was insufficient. This probably came as a follow-up to the early
>> 2017-ish conversations about NUMA, in which the linux-mm recommendation was
>> "try using HMM mechanisms, and if those are inadequate, then maybe we can
>> look at enhancing NUMA so that it has better handling of advanced (GPU-like)
>> devices".
> 
> Yes that was the original idea. It sounds so much better to use a common
> framework rather than awkward special cased cpuless NUMA nodes with
> a weird semantic. User of the neither of the two has shown up so I guess
> that the envisioned HW just didn't materialized. Or has there been a
> completely different approach chosen?

The HW showed up, alright: it's the IBM Power 9, which provides HW-based
memory coherency between its CPUs and GPUs. So on this system, the CPU is
allowed to access GPU memory, which *could* be modeled as DEVICE_PUBLIC.

However, what happened was that the system worked well enough with a combination
of the device driver, plus NUMA APIs, plus heaven knows what sort of /proc 
tuning
might have also gone on. :) No one saw the need to reach for the DEVICE_PUBLIC
functionality.

> 
>> In the end, however, _PUBLIC was never used, nor does anyone in the local
>> (NVIDIA + IBM) kernel vicinity seem to have plans to use it.  So it really
>> does seem safe to remove, although of course it's good to start with 
>> BROKEN and see if anyone pops up and complains.
> 
> Well, I do not really see much of a difference. Preserving an unused
> code which doesn't have any user in sight just adds a maintenance burden
> whether the code depends on BROKEN or not. We can always revert patches
> which remove the code once a real user shows up.

Sure, I don't see much difference either. Either way seems fine.

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 3/3] gup: new put_user_page_dirty*() helpers

2019-07-21 Thread john . hubbard
From: John Hubbard 

While converting call sites to use put_user_page*() [1], quite a few
places ended up needing a single-page routine to put and dirty a
page.

Provide put_user_page_dirty() and put_user_page_dirty_lock(),
and use them in a few places: net/xdp, drm/via/, drivers/infiniband.

Cc: Jason Gunthorpe 
Cc: Jan Kara 
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c|  2 +-
 drivers/infiniband/core/umem.c   |  2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c |  2 +-
 include/linux/mm.h   | 10 ++
 net/xdp/xdp_umem.c   |  2 +-
 5 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 219827ae114f..d30b2d75599f 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -189,7 +189,7 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
for (i = 0; i < vsg->num_pages; ++i) {
if (NULL != (page = vsg->pages[i])) {
if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   put_user_pages_dirty(&page, 1);
+   put_user_page_dirty(page);
else
put_user_page(page);
}
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 08da840ed7ee..a7337cc3ca20 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
page = sg_page_iter_page(&sg_iter);
if (umem->writable && dirty)
-   put_user_pages_dirty_lock(&page, 1);
+   put_user_page_dirty_lock(page);
else
put_user_page(page);
}
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 0b0237d41613..d2ded624fb2a 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -76,7 +76,7 @@ static void usnic_uiom_put_pages(struct list_head 
*chunk_list, int dirty)
page = sg_page(sg);
pa = sg_phys(sg);
if (dirty)
-   put_user_pages_dirty_lock(&page, 1);
+   put_user_page_dirty_lock(page);
else
put_user_page(page);
usnic_dbg("pa: %pa\n", &pa);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..c0584c6d9d78 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1061,6 +1061,16 @@ void put_user_pages_dirty(struct page **pages, unsigned 
long npages);
 void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
 void put_user_pages(struct page **pages, unsigned long npages);
 
+static inline void put_user_page_dirty(struct page *page)
+{
+   put_user_pages_dirty(&page, 1);
+}
+
+static inline void put_user_page_dirty_lock(struct page *page)
+{
+   put_user_pages_dirty_lock(&page, 1);
+}
+
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define SECTION_IN_PAGE_FLAGS
 #endif
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 9cbbb96c2a32..1d122e52c6de 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -171,7 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
for (i = 0; i < umem->npgs; i++) {
struct page *page = umem->pgs[i];
 
-   put_user_pages_dirty_lock(&page, 1);
+   put_user_page_dirty_lock(page);
}
 
kfree(umem->pgs);
-- 
2.22.0



[PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-21 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..219827ae114f 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -189,8 +189,9 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
for (i = 0; i < vsg->num_pages; ++i) {
if (NULL != (page = vsg->pages[i])) {
if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   SetPageDirty(page);
-   put_page(page);
+   put_user_pages_dirty(&page, 1);
+   else
+   put_user_page(page);
}
}
/* fall through */
-- 
2.22.0



[PATCH 2/3] net/xdp: convert put_page() to put_user_page*()

2019-07-21 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Björn Töpel 
Cc: Magnus Karlsson 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 net/xdp/xdp_umem.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..9cbbb96c2a32 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -171,8 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
for (i = 0; i < umem->npgs; i++) {
struct page *page = umem->pgs[i];
 
-   set_page_dirty_lock(page);
-   put_page(page);
+   put_user_pages_dirty_lock(&page, 1);
}
 
kfree(umem->pgs);
-- 
2.22.0



[PATCH 0/4] put_user_page: new put_user_page_dirty*() helpers

2019-07-21 Thread john . hubbard
From: John Hubbard 

Hi,

Here is the first small batch of call site conversions for put_page()
to put_user_page().

This batch includes some, but not all of the places that benefit from the
two new put_user_page_dirty*() helper functions. (The ordering of call site
conversion patch submission makes it better to wait until later, to convert
the rest.)

There are about 50+ patches in my tree [1], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've
  had to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set. That
  can only be applied after all call sites are converted, but it's
  good to get an early look at it.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

[1] https://github.com/johnhubbard/linux/tree/gup_dma_core

John Hubbard (4):
  drivers/gpu/drm/via: convert put_page() to put_user_page*()
  net/xdp: convert put_page() to put_user_page*()
  net/rds: convert put_page() to put_user_page*()
  gup: new put_user_page_dirty*() helpers

 drivers/gpu/drm/via/via_dmablit.c|  5 +++--
 drivers/infiniband/core/umem.c   |  2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c |  2 +-
 include/linux/mm.h   | 10 ++
 net/rds/info.c   |  5 ++---
 net/rds/message.c|  2 +-
 net/rds/rdma.c   | 15 +++
 net/xdp/xdp_umem.c   |  3 +--
 8 files changed, 26 insertions(+), 18 deletions(-)

-- 
2.22.0



Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-22 Thread John Hubbard
On 7/22/19 2:33 AM, Christoph Hellwig wrote:
> On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubb...@gmail.com wrote:
>>  for (i = 0; i < vsg->num_pages; ++i) {
>>  if (NULL != (page = vsg->pages[i])) {
>>  if (!PageReserved(page) && (DMA_FROM_DEVICE == 
>> vsg->direction))
>> -SetPageDirty(page);
>> -put_page(page);
>> +put_user_pages_dirty(&page, 1);
>> +else
>> +put_user_page(page);
>>  }
> 
> Can't just pass a dirty argument to put_user_pages?  Also do we really

Yes, and in fact that would help a lot more than the single page case,
which is really just cosmetic after all.

> need a separate put_user_page for the single page case?
> put_user_pages_dirty?

Not really. I'm still zeroing in on the ideal API for all these call sites,
and I agree that the approach below is cleaner.

> 
> Also the PageReserved check looks bogus, as I can't see how a reserved
> page can end up here.  So IMHO the above snippled should really look
> something like this:
> 
>   put_user_pages(vsg->pages[i], vsg->num_pages,
>   vsg->direction == DMA_FROM_DEVICE);
> 
> in the end.
> 

Agreed.

thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 3/3] gup: new put_user_page_dirty*() helpers

2019-07-22 Thread John Hubbard
On 7/21/19 9:30 PM, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> While converting call sites to use put_user_page*() [1], quite a few
> places ended up needing a single-page routine to put and dirty a
> page.
> 
> Provide put_user_page_dirty() and put_user_page_dirty_lock(),
> and use them in a few places: net/xdp, drm/via/, drivers/infiniband.
> 

Please disregard this one, I'm going to drop it, as per the
discussion in patch 1.

thanks,
-- 
John Hubbard
NVIDIA

> Cc: Jason Gunthorpe 
> Cc: Jan Kara 
> Signed-off-by: John Hubbard 
> ---
>  drivers/gpu/drm/via/via_dmablit.c|  2 +-
>  drivers/infiniband/core/umem.c   |  2 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c |  2 +-
>  include/linux/mm.h   | 10 ++
>  net/xdp/xdp_umem.c   |  2 +-
>  5 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/via/via_dmablit.c 
> b/drivers/gpu/drm/via/via_dmablit.c
> index 219827ae114f..d30b2d75599f 100644
> --- a/drivers/gpu/drm/via/via_dmablit.c
> +++ b/drivers/gpu/drm/via/via_dmablit.c
> @@ -189,7 +189,7 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
> *vsg)
>   for (i = 0; i < vsg->num_pages; ++i) {
>   if (NULL != (page = vsg->pages[i])) {
>   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
> vsg->direction))
> - put_user_pages_dirty(&page, 1);
> + put_user_page_dirty(page);
>   else
>   put_user_page(page);
>   }
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 08da840ed7ee..a7337cc3ca20 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
> ib_umem *umem, int d
>   for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
>   page = sg_page_iter_page(&sg_iter);
>   if (umem->writable && dirty)
> - put_user_pages_dirty_lock(&page, 1);
> + put_user_page_dirty_lock(page);
>   else
>   put_user_page(page);
>   }
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
> b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 0b0237d41613..d2ded624fb2a 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -76,7 +76,7 @@ static void usnic_uiom_put_pages(struct list_head 
> *chunk_list, int dirty)
>   page = sg_page(sg);
>   pa = sg_phys(sg);
>   if (dirty)
> - put_user_pages_dirty_lock(&page, 1);
> + put_user_page_dirty_lock(page);
>   else
>   put_user_page(page);
>   usnic_dbg("pa: %pa\n", &pa);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0334ca97c584..c0584c6d9d78 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1061,6 +1061,16 @@ void put_user_pages_dirty(struct page **pages, 
> unsigned long npages);
>  void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
>  void put_user_pages(struct page **pages, unsigned long npages);
>  
> +static inline void put_user_page_dirty(struct page *page)
> +{
> + put_user_pages_dirty(&page, 1);
> +}
> +
> +static inline void put_user_page_dirty_lock(struct page *page)
> +{
> + put_user_pages_dirty_lock(&page, 1);
> +}
> +
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 9cbbb96c2a32..1d122e52c6de 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -171,7 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>   for (i = 0; i < umem->npgs; i++) {
>   struct page *page = umem->pgs[i];
>  
> - put_user_pages_dirty_lock(&page, 1);
> + put_user_page_dirty_lock(page);
>   }
>  
>   kfree(umem->pgs);
> 


Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-22 Thread John Hubbard
On 7/22/19 12:07 PM, Matthew Wilcox wrote:
> On Mon, Jul 22, 2019 at 11:53:54AM -0700, John Hubbard wrote:
>> On 7/22/19 2:33 AM, Christoph Hellwig wrote:
>>> On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubb...@gmail.com wrote:
>>>>for (i = 0; i < vsg->num_pages; ++i) {
>>>>if (NULL != (page = vsg->pages[i])) {
>>>>if (!PageReserved(page) && (DMA_FROM_DEVICE == 
>>>> vsg->direction))
>>>> -  SetPageDirty(page);
>>>> -  put_page(page);
>>>> +  put_user_pages_dirty(&page, 1);
>>>> +  else
>>>> +  put_user_page(page);
>>>>}
>>>
>>> Can't just pass a dirty argument to put_user_pages?  Also do we really
>>
>> Yes, and in fact that would help a lot more than the single page case,
>> which is really just cosmetic after all.
>>
>>> need a separate put_user_page for the single page case?
>>> put_user_pages_dirty?
>>
>> Not really. I'm still zeroing in on the ideal API for all these call sites,
>> and I agree that the approach below is cleaner.
> 
> so enum { CLEAN = 0, DIRTY = 1, LOCK = 2, DIRTY_LOCK = 3 };
> ?
> 

Sure. In fact, I just applied something similar to bio_release_pages()
locally, in order to reconcile Christoph's and Jerome's approaches 
(they each needed to add a bool arg), so I'm all about the enums in the
arg lists. :)

I'm going to post that one shortly, let's see how it goes over. heh.

thanks,
-- 
John Hubbard
NVIDIA


[PATCH 1/3] mm/gup: introduce __put_user_pages()

2019-07-22 Thread john . hubbard
From: John Hubbard 

Add a more capable variation of put_user_pages() to the
API set, and call it from the simple ones.

The new __put_user_pages() takes an enum that handles the various
combinations of needing to call set_page_dirty() or
set_page_dirty_lock(), before calling put_user_page().

Cc: Matthew Wilcox 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Signed-off-by: John Hubbard 
---
 include/linux/mm.h |  58 ++-
 mm/gup.c   | 137 ++---
 2 files changed, 124 insertions(+), 71 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..7218585681b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1057,8 +1057,62 @@ static inline void put_user_page(struct page *page)
put_page(page);
 }
 
-void put_user_pages_dirty(struct page **pages, unsigned long npages);
-void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
+enum pup_flags_t {
+   PUP_FLAGS_CLEAN = 0,
+   PUP_FLAGS_DIRTY = 1,
+   PUP_FLAGS_LOCK  = 2,
+   PUP_FLAGS_DIRTY_LOCK= 3,
+};
+
+void __put_user_pages(struct page **pages, unsigned long npages,
+ enum pup_flags_t flags);
+
+/**
+ * put_user_pages_dirty() - release and dirty an array of gup-pinned pages
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ * "gup-pinned page" refers to a page that has had one of the get_user_pages()
+ * variants called on that page.
+ *
+ * For each page in the @pages array, make that page (or its head page, if a
+ * compound page) dirty, if it was previously listed as clean. Then, release
+ * the page using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * set_page_dirty(), which does not lock the page, is used here.
+ * Therefore, it is the caller's responsibility to ensure that this is
+ * safe. If not, then put_user_pages_dirty_lock() should be called instead.
+ *
+ */
+static inline void put_user_pages_dirty(struct page **pages,
+   unsigned long npages)
+{
+   __put_user_pages(pages, npages, PUP_FLAGS_DIRTY);
+}
+
+/**
+ * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ * For each page in the @pages array, make that page (or its head page, if a
+ * compound page) dirty, if it was previously listed as clean. Then, release
+ * the page using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().
+ *
+ */
+static inline void put_user_pages_dirty_lock(struct page **pages,
+unsigned long npages)
+{
+   __put_user_pages(pages, npages, PUP_FLAGS_DIRTY_LOCK);
+}
+
 void put_user_pages(struct page **pages, unsigned long npages);
 
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
diff --git a/mm/gup.c b/mm/gup.c
index 98f13ab37bac..6831ef064d76 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,87 +29,86 @@ struct follow_page_context {
unsigned int page_mask;
 };
 
-typedef int (*set_dirty_func_t)(struct page *page);
-
-static void __put_user_pages_dirty(struct page **pages,
-  unsigned long npages,
-  set_dirty_func_t sdf)
-{
-   unsigned long index;
-
-   for (index = 0; index < npages; index++) {
-   struct page *page = compound_head(pages[index]);
-
-   /*
-* Checking PageDirty at this point may race with
-* clear_page_dirty_for_io(), but that's OK. Two key cases:
-*
-* 1) This code sees the page as already dirty, so it skips
-* the call to sdf(). That could happen because
-* clear_page_dirty_for_io() called page_mkclean(),
-* followed by set_page_dirty(). However, now the page is
-* going to get written back, which meets the original
-* intention of setting it dirty, so all is well:
-* clear_page_dirty_for_io() goes on to call
-* TestClearPageDirty(), and write the page back.
-*
-* 2) This code sees the page as clean, so it calls sdf().
-* The page stays dirty, despite being written back, so it
-* gets written back again in the next writeback cycle.
-* This is harmless.
-*/
-   if (!PageDirty(page))
-   sdf(page);
-
-   put_user_page(page);
-   }
-}
-
 /**
- * put_user_pages_dirty() - release and dirty an array of gup-pi

[PATCH 0/3] introduce __put_user_pages(), convert a few call sites

2019-07-22 Thread john . hubbard
From: John Hubbard 

As discussed in [1] just now, this adds a more capable variation of
put_user_pages() to the API set, and uses it to simplify both the
main implementation, and (especially) the call sites.

Thanks to Christoph for the simplifying ideas, and Matthew for (again)
recommending an enum in the API. Matthew, I seem to recall you asked
for enums before this, so I'm sorry it took until now for me to add
them. :)

The new __put_user_pages() takes an enum that handles the various
combinations of needing to call set_page_dirty() or
set_page_dirty_lock(), before calling put_user_page().

I'm using the same CC list as in [1], even though IB is no longer
included in the series. That's everyone can see what the end result
turns out to be.

Notes about the remaining patches to come:

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've
  had to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set. That
  can only be applied after all call sites are converted, but it's
  good to get an early look at it.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

[1] https://lore.kernel.org/r/20190722093355.gb29...@lst.de
[2] https://github.com/johnhubbard/linux/tree/gup_dma_core

John Hubbard (3):
  mm/gup: introduce __put_user_pages()
  drivers/gpu/drm/via: convert put_page() to put_user_page*()
  net/xdp: convert put_page() to put_user_page*()

 drivers/gpu/drm/via/via_dmablit.c |  11 +--
 include/linux/mm.h|  58 +++-
 mm/gup.c  | 149 +++---
 net/xdp/xdp_umem.c|   9 +-
 4 files changed, 135 insertions(+), 92 deletions(-)

-- 
2.22.0



[PATCH 3/3] net/xdp: convert put_page() to put_user_page*()

2019-07-22 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Björn Töpel 
Cc: Magnus Karlsson 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 net/xdp/xdp_umem.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..0325a17915de 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-   unsigned int i;
-
-   for (i = 0; i < umem->npgs; i++) {
-   struct page *page = umem->pgs[i];
-
-   set_page_dirty_lock(page);
-   put_page(page);
-   }
+   put_user_pages_dirty_lock(umem->pgs, umem->npgs);
 
kfree(umem->pgs);
umem->pgs = NULL;
-- 
2.22.0



[PATCH 2/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-22 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..754f2bb97d61 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-   struct page *page;
int i;
 
switch (vsg->state) {
@@ -186,13 +185,9 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
kfree(vsg->desc_pages);
/* fall through */
case dr_via_pages_locked:
-   for (i = 0; i < vsg->num_pages; ++i) {
-   if (NULL != (page = vsg->pages[i])) {
-   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
+   __put_user_pages(vsg->pages, vsg->num_pages,
+(vsg->direction == DMA_FROM_DEVICE) ?
+PUP_FLAGS_DIRTY : PUP_FLAGS_CLEAN);
/* fall through */
case dr_via_pages_alloc:
vfree(vsg->pages);
-- 
2.22.0



Re: [PATCH 3/3] net/xdp: convert put_page() to put_user_page*()

2019-07-22 Thread John Hubbard

On 7/22/19 5:25 PM, Ira Weiny wrote:

On Mon, Jul 22, 2019 at 03:34:15PM -0700, john.hubb...@gmail.com wrote:

From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Björn Töpel 
Cc: Magnus Karlsson 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Signed-off-by: John Hubbard 
---
  net/xdp/xdp_umem.c | 9 +
  1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..0325a17915de 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
  
  static void xdp_umem_unpin_pages(struct xdp_umem *umem)

  {
-   unsigned int i;
-
-   for (i = 0; i < umem->npgs; i++) {
-   struct page *page = umem->pgs[i];
-
-   set_page_dirty_lock(page);
-   put_page(page);
-   }
+   put_user_pages_dirty_lock(umem->pgs, umem->npgs);


What is the difference between this and

__put_user_pages(umem->pgs, umem->npgs, PUP_FLAGS_DIRTY_LOCK);

?


No difference.



I'm a bit concerned with adding another form of the same interface.  We should
either have 1 call with flags (enum in this case) or multiple calls.  Given the
previous discussion lets move in the direction of having the enum but don't
introduce another caller of the "old" interface.


I disagree that this is a "problem". There is no maintenance pitfall here; there
are merely two ways to call the put_user_page*() API. Both are correct, and
neither one will get you into trouble.

Not only that, but there is ample precedent for this approach in other
kernel APIs.



So I think on this patch NAK from me.

I also don't like having a __* call in the exported interface but there is a
__get_user_pages_fast() call so I guess there is precedent.  :-/



I thought about this carefully, and looked at other APIs. And I noticed that
things like __get_user_pages*() are how it's often done:

* The leading underscores are often used for the more elaborate form of the
call (as oppposed to decorating the core function name with "_flags", for
example).

* There are often calls in which you can either call the simpler form, or the
form with flags and additional options, and yes, you'll get the same result.

Obviously, this stuff is all subject to a certain amount of opinion, but I
think I'm on really solid ground as far as precedent goes. So I'm pushing
back on the NAK... :)

thanks,
--
John Hubbard
NVIDIA



Re: [PATCH 1/3] mm/gup: introduce __put_user_pages()

2019-07-22 Thread John Hubbard

On 7/22/19 10:53 PM, Christoph Hellwig wrote:

On Mon, Jul 22, 2019 at 03:34:13PM -0700, john.hubb...@gmail.com wrote:

+enum pup_flags_t {
+   PUP_FLAGS_CLEAN = 0,
+   PUP_FLAGS_DIRTY = 1,
+   PUP_FLAGS_LOCK  = 2,
+   PUP_FLAGS_DIRTY_LOCK= 3,
+};


Well, the enum defeats the ease of just being able to pass a boolean
expression to the function, which would simplify a lot of the caller,
so if we need to support the !locked version I'd rather see that as
a separate helper.

But do we actually have callers where not using the _lock version is
not a bug?  set_page_dirty makes sense in the context of a file systems
that have a reference to the inode the page hangs off, but that is
(almost?) never the case for get_user_pages.



I'm seeing about 18 places where set_page_dirty() is used, in the call site
conversions so far, and about 20 places where set_page_dirty_lock() is
used. So without knowing how many of the former (if any) represent bugs,
you can see why the proposal here supports both DIRTY and DIRTY_LOCK.

Anyway, yes, I could change it, based on your estimation that most of the 
set_page_dirty() calls really should be set_page_dirty_lock().

In that case, we would end up with approximately the following:

/* Here, "dirty" really means, "call set_page_dirty_lock()": */
void __put_user_pages(struct page **pages, unsigned long npages,
  bool dirty);

/* Here, "dirty" really means, "call set_page_dirty()": */
void __put_user_pages_unlocked(struct page **pages, unsigned long npages,
       bool dirty);

?


thanks,
--
John Hubbard
NVIDIA


Re: [PATCH 3/3] net/xdp: convert put_page() to put_user_page*()

2019-07-23 Thread John Hubbard
On 7/23/19 11:06 AM, Ira Weiny wrote:
> On Mon, Jul 22, 2019 at 09:41:34PM -0700, John Hubbard wrote:
>> On 7/22/19 5:25 PM, Ira Weiny wrote:
>>> On Mon, Jul 22, 2019 at 03:34:15PM -0700, john.hubb...@gmail.com wrote:
...
>> Obviously, this stuff is all subject to a certain amount of opinion, but I
>> think I'm on really solid ground as far as precedent goes. So I'm pushing
>> back on the NAK... :)
> 
> Fair enough...  However, we have discussed in the past how GUP can be a
> confusing interface to use.
> 
> So I'd like to see it be more directed.  Only using the __put_user_pages()
> version allows us to ID callers easier through a grep of PUP_FLAGS_DIRTY_LOCK
> in addition to directing users to use that interface rather than having to 
> read
> the GUP code to figure out that the 2 calls above are equal.  It is not a huge
> deal but...
> 

OK, combining all the feedback to date, which is:

* the leading double underscore is unloved,

* set_page_dirty() is under investigation, but likely guilty of incitement
  to cause bugs,


...we end up with this:

void put_user_pages_dirty_lock(struct page **pages, unsigned long npages,
   bool make_dirty)

...which I have a v2 patchset for, ready to send out. It makes IB all pretty 
too. :)


thanks,
-- 
John Hubbard
NVIDIA


[PATCH v2 3/3] net/xdp: convert put_page() to put_user_page*()

2019-07-23 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Björn Töpel 
Cc: Magnus Karlsson 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 net/xdp/xdp_umem.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..17c4b3d3dc34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-   unsigned int i;
-
-   for (i = 0; i < umem->npgs; i++) {
-   struct page *page = umem->pgs[i];
-
-   set_page_dirty_lock(page);
-   put_page(page);
-   }
+   put_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
 
kfree(umem->pgs);
umem->pgs = NULL;
-- 
2.22.0



[PATCH v2 2/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-23 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..b5b5bf0ba65e 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-   struct page *page;
int i;
 
switch (vsg->state) {
@@ -186,13 +185,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
kfree(vsg->desc_pages);
/* fall through */
case dr_via_pages_locked:
-   for (i = 0; i < vsg->num_pages; ++i) {
-   if (NULL != (page = vsg->pages[i])) {
-   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
+   put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+ (vsg->direction == DMA_FROM_DEVICE));
/* fall through */
case dr_via_pages_alloc:
vfree(vsg->pages);
-- 
2.22.0



[PATCH v2 0/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-07-23 Thread john . hubbard
From: John Hubbard 

Changes since v1:

* Instead of providing __put_user_pages(), add an argument to
  put_user_pages_dirty_lock(), and delete put_user_pages_dirty().
  This is based on the following points:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

* Added the Infiniband driver back to the patch series, because it is
  a caller of put_user_pages_dirty_lock().

Unchanged parts from the v1 cover letter (except for the diffstat):

Notes about the remaining patches to come:

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've
  had to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set. That
  can only be applied after all call sites are converted, but it's
  good to get an early look at it.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

John Hubbard (3):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  drivers/gpu/drm/via: convert put_page() to put_user_page*()
  net/xdp: convert put_page() to put_user_page*()

 drivers/gpu/drm/via/via_dmablit.c  |  10 +-
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|   8 +-
 include/linux/mm.h |   5 +-
 mm/gup.c   | 116 +
 net/xdp/xdp_umem.c |   9 +-
 9 files changed, 62 insertions(+), 106 deletions(-)

-- 
2.22.0



[PATCH v2 1/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-07-23 Thread john . hubbard
From: John Hubbard 

Provide more capable variation of put_user_pages_dirty_lock(),
and delete put_user_pages_dirty(). This is based on the
following:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

This leads to the following API choices:

* put_user_pages_dirty_lock(page, npages, make_dirty)

* There is no put_user_pages_dirty(). You have to
  hand code that, in the rare case that it's
  required.

Cc: Matthew Wilcox 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Ira Weiny 
Cc: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|   8 +-
 include/linux/mm.h |   5 +-
 mm/gup.c   | 116 +
 7 files changed, 59 insertions(+), 90 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 08da840ed7ee..965cf9dea71a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
page = sg_page_iter_page(&sg_iter);
-   if (umem->writable && dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, umem->writable && dirty);
}
 
sg_free_table(&umem->sg_head);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index b89a9b9aef7a..469acb961fbd 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, npages);
-   else
-   put_user_pages(p, npages);
+   put_user_pages_dirty_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
atomic64_sub(npages, &mm->pinned_vm);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index bfbfbb7e0ff4..6bf764e41891 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,10 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 int dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, num_pages);
-   else
-   put_user_pages(p, num_pages);
+   put_user_pages_dirty_lock(p, num_pages, dirty);
 }
 
 /**
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 0b0237d41613..62e6ffa9ad78 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head 
*chunk_list, int dirty)
for_each_sg(chunk->page_list, sg, chunk->nents, i) {
page = sg_page(sg);
pa = sg_phys(sg);
-   if (dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, dirty);
usnic_dbg("pa: %pa\n", &pa);
}
kfree(chunk);
diff --git a/drivers/infiniband/sw/siw/siw_mem.c 
b/drivers/infiniband/sw/siw/siw_mem.c
index 67171c82b0c4..358d440efa11 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -65,13 +65,7 @@ static void siw_free_plist(struct siw_page_chunk *chunk, int 
num_pages,
 {
struct page **p = chunk->plist;
 
-   while (num_pages--) {
-   if (!PageDirty(*p) && dirty)
-   put_user_pages_dirty_lock(p, 1);
-   else
-   put_user_page(*p);
-   p++;
-   }
+   put_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
 }
 
 void siw_umem_release(struct siw_umem *

Re: [PATCH v2 1/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-07-23 Thread John Hubbard
On 7/23/19 6:26 PM, john.hubb...@gmail.com wrote:
> From: John Hubbard 
...
> +  * 2) This code sees the page as clean, so it calls
> +  * set_page_dirty(). The page stays dirty, despite being
> +  * written back, so it gets written back again in the
> +  * next writeback cycle. This is harmless.
> +  */
> + if (!PageDirty(page))
> + set_page_dirty_lock(page);
> + break;

ahem, the above "break" should not be there, it's an artifact, sorry about 
that. Will correct on the next iteration.

thanks,
-- 
John Hubbard
NVIDIA


> + put_user_page(page);
> + }
>  }
>  EXPORT_SYMBOL(put_user_pages_dirty_lock);
>  
> 


[PATCH v3 0/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-07-23 Thread john . hubbard
From: John Hubbard 

Hi,

I apologize for the extra emails (v2 was sent pretty recently), but I
didn't want to leave a known-broken version sitting out there, creating
problems.

Changes since v2:

* Critical bug fix: remove a stray "break;" from the new routine.

Changes since v1:

* Instead of providing __put_user_pages(), add an argument to
  put_user_pages_dirty_lock(), and delete put_user_pages_dirty().
  This is based on the following points:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

* Added the Infiniband driver back to the patch series, because it is
  a caller of put_user_pages_dirty_lock().

Unchanged parts from the v1 cover letter (except for the diffstat):

Notes about the remaining patches to come:

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've
  had to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set. That
  can only be applied after all call sites are converted, but it's
  good to get an early look at it.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

John Hubbard (3):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  drivers/gpu/drm/via: convert put_page() to put_user_page*()
  net/xdp: convert put_page() to put_user_page*()

 drivers/gpu/drm/via/via_dmablit.c  |  10 +-
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|   8 +-
 include/linux/mm.h |   5 +-
 mm/gup.c   | 115 +
 net/xdp/xdp_umem.c |   9 +-
 9 files changed, 61 insertions(+), 106 deletions(-)

-- 
2.22.0



[PATCH v3 3/3] net/xdp: convert put_page() to put_user_page*()

2019-07-23 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Björn Töpel 
Cc: Magnus Karlsson 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 net/xdp/xdp_umem.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..17c4b3d3dc34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-   unsigned int i;
-
-   for (i = 0; i < umem->npgs; i++) {
-   struct page *page = umem->pgs[i];
-
-   set_page_dirty_lock(page);
-   put_page(page);
-   }
+   put_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
 
kfree(umem->pgs);
umem->pgs = NULL;
-- 
2.22.0



[PATCH v3 1/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-07-23 Thread john . hubbard
From: John Hubbard 

Provide more capable variation of put_user_pages_dirty_lock(),
and delete put_user_pages_dirty(). This is based on the
following:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

This leads to the following API choices:

* put_user_pages_dirty_lock(page, npages, make_dirty)

* There is no put_user_pages_dirty(). You have to
  hand code that, in the rare case that it's
  required.

Cc: Matthew Wilcox 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Ira Weiny 
Cc: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|   8 +-
 include/linux/mm.h |   5 +-
 mm/gup.c   | 115 +
 7 files changed, 58 insertions(+), 90 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 08da840ed7ee..965cf9dea71a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
page = sg_page_iter_page(&sg_iter);
-   if (umem->writable && dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, umem->writable && dirty);
}
 
sg_free_table(&umem->sg_head);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index b89a9b9aef7a..469acb961fbd 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, npages);
-   else
-   put_user_pages(p, npages);
+   put_user_pages_dirty_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
atomic64_sub(npages, &mm->pinned_vm);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index bfbfbb7e0ff4..6bf764e41891 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,10 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 int dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, num_pages);
-   else
-   put_user_pages(p, num_pages);
+   put_user_pages_dirty_lock(p, num_pages, dirty);
 }
 
 /**
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 0b0237d41613..62e6ffa9ad78 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head 
*chunk_list, int dirty)
for_each_sg(chunk->page_list, sg, chunk->nents, i) {
page = sg_page(sg);
pa = sg_phys(sg);
-   if (dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, dirty);
usnic_dbg("pa: %pa\n", &pa);
}
kfree(chunk);
diff --git a/drivers/infiniband/sw/siw/siw_mem.c 
b/drivers/infiniband/sw/siw/siw_mem.c
index 67171c82b0c4..358d440efa11 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -65,13 +65,7 @@ static void siw_free_plist(struct siw_page_chunk *chunk, int 
num_pages,
 {
struct page **p = chunk->plist;
 
-   while (num_pages--) {
-   if (!PageDirty(*p) && dirty)
-   put_user_pages_dirty_lock(p, 1);
-   else
-   put_user_page(*p);
-   p++;
-   }
+   put_user_pages_dirty_lock(chunk->plist, num_pages, dirty);
 }
 
 void siw_umem_release(struct siw_umem *

[PATCH v3 2/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-23 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..b5b5bf0ba65e 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-   struct page *page;
int i;
 
switch (vsg->state) {
@@ -186,13 +185,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
kfree(vsg->desc_pages);
/* fall through */
case dr_via_pages_locked:
-   for (i = 0; i < vsg->num_pages; ++i) {
-   if (NULL != (page = vsg->pages[i])) {
-   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
+   put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+ (vsg->direction == DMA_FROM_DEVICE));
/* fall through */
case dr_via_pages_alloc:
vfree(vsg->pages);
-- 
2.22.0



[PATCH v4 2/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-07-31 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..b5b5bf0ba65e 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-   struct page *page;
int i;
 
switch (vsg->state) {
@@ -186,13 +185,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
kfree(vsg->desc_pages);
/* fall through */
case dr_via_pages_locked:
-   for (i = 0; i < vsg->num_pages; ++i) {
-   if (NULL != (page = vsg->pages[i])) {
-   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
+   put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+ (vsg->direction == DMA_FROM_DEVICE));
/* fall through */
case dr_via_pages_alloc:
vfree(vsg->pages);
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 1/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-08-01 Thread john . hubbard
From: John Hubbard 

Provide more capable variation of put_user_pages_dirty_lock(),
and delete put_user_pages_dirty(). This is based on the
following:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

This leads to the following API choices:

* put_user_pages_dirty_lock(page, npages, make_dirty)

* There is no put_user_pages_dirty(). You have to
  hand code that, in the rare case that it's
  required.

Reviewed-by: Christoph Hellwig 
Cc: Matthew Wilcox 
Cc: Jan Kara 
Cc: Ira Weiny 
Cc: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |  13 +--
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|  18 +---
 include/linux/mm.h |   5 +-
 mm/gup.c   | 115 +
 7 files changed, 60 insertions(+), 106 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 08da840ed7ee..965cf9dea71a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
page = sg_page_iter_page(&sg_iter);
-   if (umem->writable && dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, umem->writable && dirty);
}
 
sg_free_table(&umem->sg_head);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index b89a9b9aef7a..469acb961fbd 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, npages);
-   else
-   put_user_pages(p, npages);
+   put_user_pages_dirty_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
atomic64_sub(npages, &mm->pinned_vm);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index bfbfbb7e0ff4..26c1fb8d45cc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -37,15 +37,6 @@
 
 #include "qib.h"
 
-static void __qib_release_user_pages(struct page **p, size_t num_pages,
-int dirty)
-{
-   if (dirty)
-   put_user_pages_dirty_lock(p, num_pages);
-   else
-   put_user_pages(p, num_pages);
-}
-
 /**
  * qib_map_page - a safety wrapper around pci_map_page()
  *
@@ -124,7 +115,7 @@ int qib_get_user_pages(unsigned long start_page, size_t 
num_pages,
 
return 0;
 bail_release:
-   __qib_release_user_pages(p, got, 0);
+   put_user_pages_dirty_lock(p, got, false);
 bail:
atomic64_sub(num_pages, ¤t->mm->pinned_vm);
return ret;
@@ -132,7 +123,7 @@ int qib_get_user_pages(unsigned long start_page, size_t 
num_pages,
 
 void qib_release_user_pages(struct page **p, size_t num_pages)
 {
-   __qib_release_user_pages(p, num_pages, 1);
+   put_user_pages_dirty_lock(p, num_pages, true);
 
/* during close after signal, mm can be NULL */
if (current->mm)
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 0b0237d41613..62e6ffa9ad78 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head 
*chunk_list, int dirty)
for_each_sg(chunk->page_list, sg, chunk->nents, i) {
page = sg_page(sg);
pa = sg_phys(sg);
-   if (dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, dirty);
usnic_dbg("pa: %pa\n", &pa);
}
kfree

[PATCH v5 0/3] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-08-01 Thread john . hubbard
From: John Hubbard 

Changes since v4:

* Christophe Hellwig's review applied: deleted siw_free_plist() and
  __qib_release_user_pages(), now that put_user_pages_dirty_lock() does
  what those routines were doing.

* Applied Bjorn's ACK for net/xdp, and Christophe's Reviewed-by for patch
  #1.

Changes since v3:

* Fixed an unused variable warning in siw_mem.c

Changes since v2:

* Critical bug fix: remove a stray "break;" from the new routine.

Changes since v1:

* Instead of providing __put_user_pages(), add an argument to
  put_user_pages_dirty_lock(), and delete put_user_pages_dirty().
  This is based on the following points:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

* Added the Infiniband driver back to the patch series, because it is
  a caller of put_user_pages_dirty_lock().

Unchanged parts from the v1 cover letter (except for the diffstat):

Notes about the remaining patches to come:

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

* The block/bio related changes (Jerome mostly wrote those, but I've
  had to move stuff around extensively, and add a little code)

* mm/ changes

* other subsystem patches

* an RFC that shows the current state of the tracking patch set. That
  can only be applied after all call sites are converted, but it's
  good to get an early look at it.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").



John Hubbard (3):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  drivers/gpu/drm/via: convert put_page() to put_user_page*()
  net/xdp: convert put_page() to put_user_page*()

 drivers/gpu/drm/via/via_dmablit.c  |  10 +-
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |  13 +--
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|  18 +---
 include/linux/mm.h |   5 +-
 mm/gup.c   | 115 +
 net/xdp/xdp_umem.c |   9 +-
 9 files changed, 63 insertions(+), 122 deletions(-)

-- 
2.22.0



[PATCH v5 2/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()

2019-08-01 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..b5b5bf0ba65e 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-   struct page *page;
int i;
 
switch (vsg->state) {
@@ -186,13 +185,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
kfree(vsg->desc_pages);
/* fall through */
case dr_via_pages_locked:
-   for (i = 0; i < vsg->num_pages; ++i) {
-   if (NULL != (page = vsg->pages[i])) {
-   if (!PageReserved(page) && (DMA_FROM_DEVICE == 
vsg->direction))
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
+   put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+ (vsg->direction == DMA_FROM_DEVICE));
/* fall through */
case dr_via_pages_alloc:
vfree(vsg->pages);
-- 
2.22.0



[PATCH v5 3/3] net/xdp: convert put_page() to put_user_page*()

2019-08-01 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Acked-by: Björn Töpel 
Cc: Magnus Karlsson 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 net/xdp/xdp_umem.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..17c4b3d3dc34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-   unsigned int i;
-
-   for (i = 0; i < umem->npgs; i++) {
-   struct page *page = umem->pgs[i];
-
-   set_page_dirty_lock(page);
-   put_page(page);
-   }
+   put_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
 
kfree(umem->pgs);
umem->pgs = NULL;
-- 
2.22.0



[PATCH 29/34] mm/process_vm_access.c: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Al Viro 
Cc: Andrea Arcangeli 
Cc: Christopher Yeoh 
Cc: Dave Hansen 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Jann Horn 
Cc: Lorenzo Stoakes 
Cc: Mathieu Desnoyers 
Cc: Mike Rapoport 
Cc: Rashika Kheria 
Signed-off-by: John Hubbard 
---
 mm/process_vm_access.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 357aa7bef6c0..4d29d54ec93f 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -96,7 +96,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
flags |= FOLL_WRITE;
 
while (!rc && nr_pages && iov_iter_count(iter)) {
-   int pages = min(nr_pages, max_pages_per_loop);
+   int pinned_pages = min(nr_pages, max_pages_per_loop);
int locked = 1;
size_t bytes;
 
@@ -106,14 +106,15 @@ static int process_vm_rw_single_vec(unsigned long addr,
 * current/current->mm
 */
down_read(&mm->mmap_sem);
-   pages = get_user_pages_remote(task, mm, pa, pages, flags,
- process_pages, NULL, &locked);
+   pinned_pages = get_user_pages_remote(task, mm, pa, pinned_pages,
+flags, process_pages, NULL,
+&locked);
if (locked)
up_read(&mm->mmap_sem);
-   if (pages <= 0)
+   if (pinned_pages <= 0)
return -EFAULT;
 
-   bytes = pages * PAGE_SIZE - start_offset;
+   bytes = pinned_pages * PAGE_SIZE - start_offset;
if (bytes > len)
bytes = len;
 
@@ -122,10 +123,9 @@ static int process_vm_rw_single_vec(unsigned long addr,
 vm_write);
len -= bytes;
start_offset = 0;
-   nr_pages -= pages;
-   pa += pages * PAGE_SIZE;
-   while (pages)
-   put_page(process_pages[--pages]);
+   nr_pages -= pinned_pages;
+   pa += pinned_pages * PAGE_SIZE;
+   put_user_pages(process_pages, pinned_pages);
}
 
return rc;
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 06/34] drm/i915: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Note that this effectively changes the code's behavior in
i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(),
instead of set_page_dirty(). This is probably more accurate.

As Christophe Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.gb...@lst.de

Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: David Airlie 
Cc: intel-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 528b61678334..c18008d3cc2a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -527,7 +527,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
*_work)
}
mutex_unlock(&obj->mm.lock);
 
-   release_pages(pvec, pinned);
+   put_user_pages(pvec, pinned);
kvfree(pvec);
 
i915_gem_object_put(obj);
@@ -640,7 +640,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
__i915_gem_userptr_set_active(obj, true);
 
if (IS_ERR(pages))
-   release_pages(pvec, pinned);
+   put_user_pages(pvec, pinned);
kvfree(pvec);
 
return PTR_ERR_OR_ZERO(pages);
@@ -663,11 +663,8 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
i915_gem_gtt_finish_pages(obj, pages);
 
for_each_sgt_page(page, sgt_iter, pages) {
-   if (obj->mm.dirty)
-   set_page_dirty(page);
-
mark_page_accessed(page);
-   put_page(page);
+   put_user_pages_dirty_lock(&page, 1, obj->mm.dirty);
}
obj->mm.dirty = false;
 
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 30/34] crypt: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Herbert Xu 
Cc: David S. Miller 
Cc: linux-cry...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 crypto/af_alg.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 879cf23f7489..edd358ea64da 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -428,10 +428,7 @@ static void af_alg_link_sg(struct af_alg_sgl *sgl_prev,
 
 void af_alg_free_sg(struct af_alg_sgl *sgl)
 {
-   int i;
-
-   for (i = 0; i < sgl->npages; i++)
-   put_page(sgl->pages[i]);
+   put_user_pages(sgl->pages, sgl->npages);
 }
 EXPORT_SYMBOL_GPL(af_alg_free_sg);
 
@@ -668,7 +665,7 @@ static void af_alg_free_areq_sgls(struct af_alg_async_req 
*areq)
for_each_sg(tsgl, sg, areq->tsgl_entries, i) {
if (!sg_page(sg))
continue;
-   put_page(sg_page(sg));
+   put_user_page(sg_page(sg));
}
 
sock_kfree_s(sk, tsgl, areq->tsgl_entries * sizeof(*tsgl));
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 25/34] mm/frame_vector.c: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Dan Williams 
Cc: Jan Kara 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Signed-off-by: John Hubbard 
---
 mm/frame_vector.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index c64dca6e27c2..f590badac776 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -120,7 +120,6 @@ EXPORT_SYMBOL(get_vaddr_frames);
  */
 void put_vaddr_frames(struct frame_vector *vec)
 {
-   int i;
struct page **pages;
 
if (!vec->got_ref)
@@ -133,8 +132,7 @@ void put_vaddr_frames(struct frame_vector *vec)
 */
if (WARN_ON(IS_ERR(pages)))
goto out;
-   for (i = 0; i < vec->nr_frames; i++)
-   put_page(pages[i]);
+   put_user_pages(pages, vec->nr_frames);
vec->got_ref = false;
 out:
vec->nr_frames = 0;
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 32/34] goldfish_pipe: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Note that this effectively changes the code's behavior in
qp_release_pages(): it now ultimately calls set_page_dirty_lock(),
instead of set_page_dirty(). This is probably more accurate.

As Christophe Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.gb...@lst.de

Cc: Greg Kroah-Hartman 
Cc: Roman Kiryanov 
Signed-off-by: John Hubbard 
---
 drivers/platform/goldfish/goldfish_pipe.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/platform/goldfish/goldfish_pipe.c 
b/drivers/platform/goldfish/goldfish_pipe.c
index cef0133aa47a..2bd21020e288 100644
--- a/drivers/platform/goldfish/goldfish_pipe.c
+++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -288,15 +288,12 @@ static int pin_user_pages(unsigned long first_page,
 static void release_user_pages(struct page **pages, int pages_count,
   int is_write, s32 consumed_size)
 {
-   int i;
+   bool dirty = !is_write && consumed_size > 0;
 
-   for (i = 0; i < pages_count; i++) {
-   if (!is_write && consumed_size > 0)
-   set_page_dirty(pages[i]);
-   put_page(pages[i]);
-   }
+   put_user_pages_dirty_lock(pages, pages_count, dirty);
 }
 
+
 /* Populate the call parameters, merging adjacent pages together */
 static void populate_rw_params(struct page **pages,
   int pages_count,
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 23/34] uprobes: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Signed-off-by: John Hubbard 
---
 kernel/events/uprobes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 84fa00497c49..4a575de8cec8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -397,7 +397,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, 
short d)
ret = 0;
 out:
kunmap_atomic(kaddr);
-   put_page(page);
+   put_user_page(page);
return ret;
 }
 
@@ -504,7 +504,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct 
mm_struct *mm,
ret = __replace_page(vma, vaddr, old_page, new_page);
put_page(new_page);
 put_old:
-   put_page(old_page);
+   put_user_page(old_page);
 
if (unlikely(ret == -EAGAIN))
goto retry;
@@ -1981,7 +1981,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned 
long vaddr)
return result;
 
copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
-   put_page(page);
+   put_user_page(page);
  out:
/* This needs to return true for any variant of the trap insn */
return is_trap_insn(&opcode);
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 28/34] mm/madvise.c: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Dan Williams 
Cc: Daniel Black 
Cc: Jan Kara 
Cc: Jérôme Glisse 
Cc: Matthew Wilcox 
Cc: Mike Kravetz 
Signed-off-by: John Hubbard 
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 968df3aa069f..1c6881a761a5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -672,7 +672,7 @@ static int madvise_inject_error(int behavior,
 * routine is responsible for pinning the page to prevent it
 * from being released back to the page allocator.
 */
-   put_page(page);
+   put_user_page(page);
ret = memory_failure(pfn, 0);
if (ret)
return ret;
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 01/34] mm/gup: add make_dirty arg to put_user_pages_dirty_lock()

2019-08-02 Thread john . hubbard
From: John Hubbard 

Provide more capable variation of put_user_pages_dirty_lock(),
and delete put_user_pages_dirty(). This is based on the
following:

1. Lots of call sites become simpler if a bool is passed
into put_user_page*(), instead of making the call site
choose which put_user_page*() variant to call.

2. Christoph Hellwig's observation that set_page_dirty_lock()
is usually correct, and set_page_dirty() is usually a
bug, or at least questionable, within a put_user_page*()
calling chain.

This leads to the following API choices:

* put_user_pages_dirty_lock(page, npages, make_dirty)

* There is no put_user_pages_dirty(). You have to
  hand code that, in the rare case that it's
  required.

Cc: Matthew Wilcox 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Ira Weiny 
Cc: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c|   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c|  10 +-
 include/linux/mm.h |   5 +-
 mm/gup.c   | 115 +
 7 files changed, 58 insertions(+), 92 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 08da840ed7ee..965cf9dea71a 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,10 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
page = sg_page_iter_page(&sg_iter);
-   if (umem->writable && dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, umem->writable && dirty);
}
 
sg_free_table(&umem->sg_head);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index b89a9b9aef7a..469acb961fbd 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,10 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, npages);
-   else
-   put_user_pages(p, npages);
+   put_user_pages_dirty_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
atomic64_sub(npages, &mm->pinned_vm);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
index bfbfbb7e0ff4..6bf764e41891 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,10 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
 int dirty)
 {
-   if (dirty)
-   put_user_pages_dirty_lock(p, num_pages);
-   else
-   put_user_pages(p, num_pages);
+   put_user_pages_dirty_lock(p, num_pages, dirty);
 }
 
 /**
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c 
b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 0b0237d41613..62e6ffa9ad78 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -75,10 +75,7 @@ static void usnic_uiom_put_pages(struct list_head 
*chunk_list, int dirty)
for_each_sg(chunk->page_list, sg, chunk->nents, i) {
page = sg_page(sg);
pa = sg_phys(sg);
-   if (dirty)
-   put_user_pages_dirty_lock(&page, 1);
-   else
-   put_user_page(page);
+   put_user_pages_dirty_lock(&page, 1, dirty);
usnic_dbg("pa: %pa\n", &pa);
}
kfree(chunk);
diff --git a/drivers/infiniband/sw/siw/siw_mem.c 
b/drivers/infiniband/sw/siw/siw_mem.c
index 67171c82b0c4..ab83a9cec562 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -63,15 +63,7 @@ struct siw_mem *siw_mem_id2obj(struct siw_device *sdev, int 
stag_index)
 static void siw_free_plist(struct siw_page_chunk *chunk, int num_pages,
   bool dirty)
 {
-   struct page **p = chunk->plist;
-
-   while (num_pages--) {
-   if (!PageDirty(*p) && dirty)
-   put_user_pages_dirty_lock(p, 1);
-   else
-   put_user_page(*p);
-   p++;
-   }
+  

[PATCH 17/34] vfio: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Note that this effectively changes the code's behavior in
qp_release_pages(): it now ultimately calls set_page_dirty_lock(),
instead of set_page_dirty(). This is probably more accurate.

As Christophe Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.gb...@lst.de

Cc: Alex Williamson 
Cc: k...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/vfio/vfio_iommu_type1.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f30fa8..5a5461a14299 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -320,9 +320,9 @@ static int put_pfn(unsigned long pfn, int prot)
 {
if (!is_invalid_reserved_pfn(pfn)) {
struct page *page = pfn_to_page(pfn);
-   if (prot & IOMMU_WRITE)
-   SetPageDirty(page);
-   put_page(page);
+   bool dirty = prot & IOMMU_WRITE;
+
+   put_user_pages_dirty_lock(&page, 1, dirty);
return 1;
}
return 0;
@@ -356,7 +356,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
 */
if (ret > 0 && vma_is_fsdax(vmas[0])) {
ret = -EOPNOTSUPP;
-   put_page(page[0]);
+   put_user_page(page[0]);
}
}
up_read(&mm->mmap_sem);
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 21/34] fs/exec.c: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 fs/exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index f7f6a140856a..ee442151582f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -227,7 +227,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, 
unsigned long pos,
 
 static void put_arg_page(struct page *page)
 {
-   put_page(page);
+   put_user_page(page);
 }
 
 static void free_arg_pages(struct linux_binprm *bprm)
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 10/34] genwqe: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

This changes the release code slightly, because each page slot in the
page_list[] array is no longer checked for NULL. However, that check
was wrong anyway, because the get_user_pages() pattern of usage here
never allowed for NULL entries within a range of pinned pages.

Cc: Frank Haverkamp 
Cc: "Guilherme G. Piccoli" 
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Signed-off-by: John Hubbard 
---
 drivers/misc/genwqe/card_utils.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/genwqe/card_utils.c b/drivers/misc/genwqe/card_utils.c
index 2e1c4d2905e8..2a888f31d2c5 100644
--- a/drivers/misc/genwqe/card_utils.c
+++ b/drivers/misc/genwqe/card_utils.c
@@ -517,24 +517,13 @@ int genwqe_free_sync_sgl(struct genwqe_dev *cd, struct 
genwqe_sgl *sgl)
 /**
  * genwqe_free_user_pages() - Give pinned pages back
  *
- * Documentation of get_user_pages is in mm/gup.c:
- *
- * If the page is written to, set_page_dirty (or set_page_dirty_lock,
- * as appropriate) must be called after the page is finished with, and
- * before put_page is called.
+ * The pages may have been written to, so we call put_user_pages_dirty_lock(),
+ * rather than put_user_pages().
  */
 static int genwqe_free_user_pages(struct page **page_list,
unsigned int nr_pages, int dirty)
 {
-   unsigned int i;
-
-   for (i = 0; i < nr_pages; i++) {
-   if (page_list[i] != NULL) {
-   if (dirty)
-   set_page_dirty_lock(page_list[i]);
-   put_page(page_list[i]);
-   }
-   }
+   put_user_pages_dirty_lock(page_list, nr_pages, dirty);
return 0;
 }
 
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 22/34] orangefs: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Signed-off-by: John Hubbard 
---
 fs/orangefs/orangefs-bufmap.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c
index 2bb916d68576..f2f33a16d604 100644
--- a/fs/orangefs/orangefs-bufmap.c
+++ b/fs/orangefs/orangefs-bufmap.c
@@ -168,10 +168,7 @@ static DEFINE_SPINLOCK(orangefs_bufmap_lock);
 static void
 orangefs_bufmap_unmap(struct orangefs_bufmap *bufmap)
 {
-   int i;
-
-   for (i = 0; i < bufmap->page_count; i++)
-   put_page(bufmap->page_array[i]);
+   put_user_pages(bufmap->page_array, bufmap->page_count);
 }
 
 static void
@@ -280,7 +277,7 @@ orangefs_bufmap_map(struct orangefs_bufmap *bufmap,
 
for (i = 0; i < ret; i++) {
SetPageError(bufmap->page_array[i]);
-   put_page(bufmap->page_array[i]);
+   put_user_page(bufmap->page_array[i]);
}
return -ENOMEM;
}
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 00/34] put_user_pages(): miscellaneous call sites

2019-08-02 Thread john . hubbard
From: John Hubbard 

Hi,

These are best characterized as miscellaneous conversions: many (not all)
call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
out a few call sites that require some more work. These are mostly pretty
simple ones.

It's probably best to send all of these via Andrew's -mm tree, assuming
that there are no significant merge conflicts with ongoing work in other
trees (which I doubt, given that these are small changes).

These patches apply to the latest linux.git. Patch #1 is also already in
Andrew's tree, but given the broad non-linux-mm Cc list, I thought it
would be more convenient to just include that patch here, so that people
can use linux.git as the base--even though these are probably destined
for linux-mm.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions"). That commit
has an extensive description of the problem and the planned steps to
solve it, but the highlites are:

1) Provide put_user_page*() routines, intended to be used
for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of
call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem.

And a few references, also from that commit:

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"


Ira Weiny (1):
  fs/binfmt_elf: convert put_page() to put_user_page*()

John Hubbard (33):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  net/rds: convert put_page() to put_user_page*()
  net/ceph: convert put_page() to put_user_page*()
  x86/kvm: convert put_page() to put_user_page*()
  drm/etnaviv: convert release_pages() to put_user_pages()
  drm/i915: convert put_page() to put_user_page*()
  drm/radeon: convert put_page() to put_user_page*()
  media/ivtv: convert put_page() to put_user_page*()
  media/v4l2-core/mm: convert put_page() to put_user_page*()
  genwqe: convert put_page() to put_user_page*()
  scif: convert put_page() to put_user_page*()
  vmci: convert put_page() to put_user_page*()
  rapidio: convert put_page() to put_user_page*()
  oradax: convert put_page() to put_user_page*()
  staging/vc04_services: convert put_page() to put_user_page*()
  drivers/tee: convert put_page() to put_user_page*()
  vfio: convert put_page() to put_user_page*()
  fbdev/pvr2fb: convert put_page() to put_user_page*()
  fsl_hypervisor: convert put_page() to put_user_page*()
  xen: convert put_page() to put_user_page*()
  fs/exec.c: convert put_page() to put_user_page*()
  orangefs: convert put_page() to put_user_page*()
  uprobes: convert put_page() to put_user_page*()
  futex: convert put_page() to put_user_page*()
  mm/frame_vector.c: convert put_page() to put_user_page*()
  mm/gup_benchmark.c: convert put_page() to put_user_page*()
  mm/memory.c: convert put_page() to put_user_page*()
  mm/madvise.c: convert put_page() to put_user_page*()
  mm/process_vm_access.c: convert put_page() to put_user_page*()
  crypt: convert put_page() to put_user_page*()
  nfs: convert put_page() to put_user_page*()
  goldfish_pipe: convert put_page() to put_user_page*()
  kernel/events/core.c: convert put_page() to put_user_page*()

 arch/x86/kvm/svm.c|   4 +-
 crypto/af_alg.c   |   7 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   9 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |   2 +-
 drivers/infiniband/core/umem.c|   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c   |   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c|   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c  |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c   |  10 +-
 drivers/media/pci/ivtv/ivtv-udma.c|  14 +--
 drivers/media/pci/ivtv/ivtv-yuv.c |  10 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |   3 +-
 drivers/misc/genwqe/card_utils.c  |  17 +--
 drivers/misc/mic/scif/scif_rma.c  |  17 ++-
 drivers/misc/vmw_vmci/vmci_context.c  |   2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c   |  11 +-
 drivers/platform/goldfish/goldfish_pipe.c |   9 +-
 drivers/rapidio/devices/rio_mport_cdev.c  |   9 +-
 drivers/sbus/char/oradax.c|   2 +-
 .../interface/vchiq_arm/vchiq_2835_arm.c  |  10 +-
 drivers/tee/tee_shm.c |  10 +-
 drivers/vfio/vfio_i

[PATCH 18/34] fbdev/pvr2fb: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Bartlomiej Zolnierkiewicz 
Cc: Kees Cook 
Cc: Al Viro 
Cc: Bhumika Goyal 
Cc: Arvind Yadav 
Cc: dri-devel@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/video/fbdev/pvr2fb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index 7ff4b6b84282..0e4f9aa6444d 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -700,8 +700,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
ret = count;
 
 out_unmap:
-   for (i = 0; i < nr_pages; i++)
-   put_page(pages[i]);
+   put_user_pages(pages, nr_pages);
 
kfree(pages);
 
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 08/34] media/ivtv: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Andy Walls 
Cc: Mauro Carvalho Chehab 
Cc: linux-me...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/media/pci/ivtv/ivtv-udma.c | 14 --
 drivers/media/pci/ivtv/ivtv-yuv.c  | 10 +++---
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/media/pci/ivtv/ivtv-udma.c 
b/drivers/media/pci/ivtv/ivtv-udma.c
index 5f8883031c9c..7c7f33c2412b 100644
--- a/drivers/media/pci/ivtv/ivtv-udma.c
+++ b/drivers/media/pci/ivtv/ivtv-udma.c
@@ -92,7 +92,7 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long 
ivtv_dest_addr,
 {
struct ivtv_dma_page_info user_dma;
struct ivtv_user_dma *dma = &itv->udma;
-   int i, err;
+   int err;
 
IVTV_DEBUG_DMA("ivtv_udma_setup, dst: 0x%08x\n", (unsigned 
int)ivtv_dest_addr);
 
@@ -119,8 +119,7 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long 
ivtv_dest_addr,
IVTV_DEBUG_WARN("failed to map user pages, returned %d instead 
of %d\n",
   err, user_dma.page_count);
if (err >= 0) {
-   for (i = 0; i < err; i++)
-   put_page(dma->map[i]);
+   put_user_pages(dma->map, err);
return -EINVAL;
}
return err;
@@ -130,9 +129,7 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long 
ivtv_dest_addr,
 
/* Fill SG List with new values */
if (ivtv_udma_fill_sg_list(dma, &user_dma, 0) < 0) {
-   for (i = 0; i < dma->page_count; i++) {
-   put_page(dma->map[i]);
-   }
+   put_user_pages(dma->map, dma->page_count);
dma->page_count = 0;
return -ENOMEM;
}
@@ -153,7 +150,6 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long 
ivtv_dest_addr,
 void ivtv_udma_unmap(struct ivtv *itv)
 {
struct ivtv_user_dma *dma = &itv->udma;
-   int i;
 
IVTV_DEBUG_INFO("ivtv_unmap_user_dma\n");
 
@@ -170,9 +166,7 @@ void ivtv_udma_unmap(struct ivtv *itv)
ivtv_udma_sync_for_cpu(itv);
 
/* Release User Pages */
-   for (i = 0; i < dma->page_count; i++) {
-   put_page(dma->map[i]);
-   }
+   put_user_pages(dma->map, dma->page_count);
dma->page_count = 0;
 }
 
diff --git a/drivers/media/pci/ivtv/ivtv-yuv.c 
b/drivers/media/pci/ivtv/ivtv-yuv.c
index cd2fe2d444c0..9465a7d450b6 100644
--- a/drivers/media/pci/ivtv/ivtv-yuv.c
+++ b/drivers/media/pci/ivtv/ivtv-yuv.c
@@ -81,8 +81,7 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct 
ivtv_user_dma *dma,
 uv_pages, uv_dma.page_count);
 
if (uv_pages >= 0) {
-   for (i = 0; i < uv_pages; i++)
-   put_page(dma->map[y_pages + i]);
+   put_user_pages(&dma->map[y_pages], uv_pages);
rc = -EFAULT;
} else {
rc = uv_pages;
@@ -93,8 +92,7 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct 
ivtv_user_dma *dma,
 y_pages, y_dma.page_count);
}
if (y_pages >= 0) {
-   for (i = 0; i < y_pages; i++)
-   put_page(dma->map[i]);
+   put_user_pages(dma->map, y_pages);
/*
 * Inherit the -EFAULT from rc's
 * initialization, but allow it to be
@@ -112,9 +110,7 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct 
ivtv_user_dma *dma,
/* Fill & map SG List */
if (ivtv_udma_fill_sg_list (dma, &uv_dma, ivtv_udma_fill_sg_list (dma, 
&y_dma, 0)) < 0) {
IVTV_DEBUG_WARN("could not allocate bounce buffers for highmem 
userspace buffers\n");
-   for (i = 0; i < dma->page_count; i++) {
-   put_page(dma->map[i]);
-   }
+   put_user_pages(dma->map, dma->page_count);
dma->page_count = 0;
return -ENOMEM;
}
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 16/34] drivers/tee: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Jens Wiklander 
Signed-off-by: John Hubbard 
---
 drivers/tee/tee_shm.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index 2da026fd12c9..c967d0420b67 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -31,16 +31,13 @@ static void tee_shm_release(struct tee_shm *shm)
 
poolm->ops->free(poolm, shm);
} else if (shm->flags & TEE_SHM_REGISTER) {
-   size_t n;
int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm);
 
if (rc)
dev_err(teedev->dev.parent,
"unregister shm %p failed: %d", shm, rc);
 
-   for (n = 0; n < shm->num_pages; n++)
-   put_page(shm->pages[n]);
-
+   put_user_pages(shm->pages, shm->num_pages);
kfree(shm->pages);
}
 
@@ -313,16 +310,13 @@ struct tee_shm *tee_shm_register(struct tee_context *ctx, 
unsigned long addr,
return shm;
 err:
if (shm) {
-   size_t n;
-
if (shm->id >= 0) {
mutex_lock(&teedev->mutex);
idr_remove(&teedev->idr, shm->id);
mutex_unlock(&teedev->mutex);
}
if (shm->pages) {
-   for (n = 0; n < shm->num_pages; n++)
-   put_page(shm->pages[n]);
+   put_user_pages(shm->pages, shm->num_pages);
kfree(shm->pages);
}
}
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 00/34] put_user_pages(): miscellaneous call sites

2019-08-02 Thread john . hubbard
From: John Hubbard 

Hi,

These are best characterized as miscellaneous conversions: many (not all)
call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
out a few call sites that require some more work. These are mostly pretty
simple ones.

It's probably best to send all of these via Andrew's -mm tree, assuming
that there are no significant merge conflicts with ongoing work in other
trees (which I doubt, given that these are small changes).

These patches apply to the latest linux.git. Patch #1 is also already in
Andrew's tree, but given the broad non-linux-mm Cc list, I thought it
would be more convenient to just include that patch here, so that people
can use linux.git as the base--even though these are probably destined
for linux-mm.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions"). That commit
has an extensive description of the problem and the planned steps to
solve it, but the highlites are:

1) Provide put_user_page*() routines, intended to be used
for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of
call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem.

And a few references, also from that commit:

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"


Ira Weiny (1):
  fs/binfmt_elf: convert put_page() to put_user_page*()

John Hubbard (33):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  net/rds: convert put_page() to put_user_page*()
  net/ceph: convert put_page() to put_user_page*()
  x86/kvm: convert put_page() to put_user_page*()
  drm/etnaviv: convert release_pages() to put_user_pages()
  drm/i915: convert put_page() to put_user_page*()
  drm/radeon: convert put_page() to put_user_page*()
  media/ivtv: convert put_page() to put_user_page*()
  media/v4l2-core/mm: convert put_page() to put_user_page*()
  genwqe: convert put_page() to put_user_page*()
  scif: convert put_page() to put_user_page*()
  vmci: convert put_page() to put_user_page*()
  rapidio: convert put_page() to put_user_page*()
  oradax: convert put_page() to put_user_page*()
  staging/vc04_services: convert put_page() to put_user_page*()
  drivers/tee: convert put_page() to put_user_page*()
  vfio: convert put_page() to put_user_page*()
  fbdev/pvr2fb: convert put_page() to put_user_page*()
  fsl_hypervisor: convert put_page() to put_user_page*()
  xen: convert put_page() to put_user_page*()
  fs/exec.c: convert put_page() to put_user_page*()
  orangefs: convert put_page() to put_user_page*()
  uprobes: convert put_page() to put_user_page*()
  futex: convert put_page() to put_user_page*()
  mm/frame_vector.c: convert put_page() to put_user_page*()
  mm/gup_benchmark.c: convert put_page() to put_user_page*()
  mm/memory.c: convert put_page() to put_user_page*()
  mm/madvise.c: convert put_page() to put_user_page*()
  mm/process_vm_access.c: convert put_page() to put_user_page*()
  crypt: convert put_page() to put_user_page*()
  nfs: convert put_page() to put_user_page*()
  goldfish_pipe: convert put_page() to put_user_page*()
  kernel/events/core.c: convert put_page() to put_user_page*()

 arch/x86/kvm/svm.c|   4 +-
 crypto/af_alg.c   |   7 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   9 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |   2 +-
 drivers/infiniband/core/umem.c|   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c   |   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c|   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c  |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c   |  10 +-
 drivers/media/pci/ivtv/ivtv-udma.c|  14 +--
 drivers/media/pci/ivtv/ivtv-yuv.c |  10 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |   3 +-
 drivers/misc/genwqe/card_utils.c  |  17 +--
 drivers/misc/mic/scif/scif_rma.c  |  17 ++-
 drivers/misc/vmw_vmci/vmci_context.c  |   2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c   |  11 +-
 drivers/platform/goldfish/goldfish_pipe.c |   9 +-
 drivers/rapidio/devices/rio_mport_cdev.c  |   9 +-
 drivers/sbus/char/oradax.c|   2 +-
 .../interface/vchiq_arm/vchiq_2835_arm.c  |  10 +-
 drivers/tee/tee_shm.c |  10 +-
 drivers/vfio/vfio_i

[PATCH 15/34] staging/vc04_services: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Eric Anholt 
Cc: Stefan Wahren 
Cc: Greg Kroah-Hartman 
Cc: Mihaela Muraru 
Cc: Suniel Mahesh 
Cc: Al Viro 
Cc: Sidong Yang 
Cc: Kishore KP 
Cc: linux-rpi-ker...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: de...@driverdev.osuosl.org
Signed-off-by: John Hubbard 
---
 .../vc04_services/interface/vchiq_arm/vchiq_2835_arm.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c 
b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
index 61c69f353cdb..ec92b4c50e95 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
@@ -336,10 +336,7 @@ cleanup_pagelistinfo(struct vchiq_pagelist_info 
*pagelistinfo)
}
 
if (pagelistinfo->pages_need_release) {
-   unsigned int i;
-
-   for (i = 0; i < pagelistinfo->num_pages; i++)
-   put_page(pagelistinfo->pages[i]);
+   put_user_pages(pagelistinfo->pages, pagelistinfo->num_pages);
}
 
dma_free_coherent(g_dev, pagelistinfo->pagelist_buffer_size,
@@ -454,10 +451,7 @@ create_pagelist(char __user *buf, size_t count, unsigned 
short type)
   __func__, actual_pages, num_pages);
 
/* This is probably due to the process being killed */
-   while (actual_pages > 0) {
-   actual_pages--;
-   put_page(pages[actual_pages]);
-   }
+   put_user_pages(pages, actual_pages);
cleanup_pagelistinfo(pagelistinfo);
return NULL;
}
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

2019-08-02 Thread John Hubbard
On 8/1/19 7:16 PM, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Hi,
> 
> These are best characterized as miscellaneous conversions: many (not all)
> call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
> out a few call sites that require some more work. These are mostly pretty
> simple ones.
> 
> It's probably best to send all of these via Andrew's -mm tree, assuming
> that there are no significant merge conflicts with ongoing work in other
> trees (which I doubt, given that these are small changes).
> 

In case anyone is wondering, this truncated series is due to a script failure:
git-send-email chokes when it hits email addresses whose names have a
comma in them, as happened here with patch 0003.  

Please disregard this set and reply to the other thread.

thanks,
-- 
John Hubbard
NVIDIA
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 09/34] media/v4l2-core/mm: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Mauro Carvalho Chehab 
Cc: Kees Cook 
Cc: Hans Verkuil 
Cc: Sakari Ailus 
Cc: Jan Kara 
Cc: Robin Murphy 
Cc: Souptick Joarder 
Cc: Dan Williams 
Cc: linux-me...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c 
b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 66a6c6c236a7..d6eeb437ec19 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,7 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
BUG_ON(dma->sglen);
 
if (dma->pages) {
-   for (i = 0; i < dma->nr_pages; i++)
-   put_page(dma->pages[i]);
+   put_user_pages(dma->pages, dma->nr_pages);
kfree(dma->pages);
dma->pages = NULL;
}
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 20/34] xen: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: xen-de...@lists.xenproject.org
Signed-off-by: John Hubbard 
---
 drivers/xen/gntdev.c  | 5 +
 drivers/xen/privcmd.c | 7 +--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4c339c7e66e5..2586b3df2bb6 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -864,10 +864,7 @@ static int gntdev_get_page(struct gntdev_copy_batch 
*batch, void __user *virt,
 
 static void gntdev_put_pages(struct gntdev_copy_batch *batch)
 {
-   unsigned int i;
-
-   for (i = 0; i < batch->nr_pages; i++)
-   put_page(batch->pages[i]);
+   put_user_pages(batch->pages, batch->nr_pages);
batch->nr_pages = 0;
 }
 
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 2f5ce7230a43..29e461dbee2d 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -611,15 +611,10 @@ static int lock_pages(
 
 static void unlock_pages(struct page *pages[], unsigned int nr_pages)
 {
-   unsigned int i;
-
if (!pages)
return;
 
-   for (i = 0; i < nr_pages; i++) {
-   if (pages[i])
-   put_page(pages[i]);
-   }
+   put_user_pages(pages, nr_pages);
 }
 
 static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 05/34] drm/etnaviv: convert release_pages() to put_user_pages()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Joerg Roedel 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index e8778ebb72e6..a0144a5ee325 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -686,7 +686,7 @@ static int etnaviv_gem_userptr_get_pages(struct 
etnaviv_gem_object *etnaviv_obj)
ret = get_user_pages_fast(ptr, num_pages,
  !userptr->ro ? FOLL_WRITE : 0, pages);
if (ret < 0) {
-   release_pages(pvec, pinned);
+   put_user_pages(pvec, pinned);
kvfree(pvec);
return ret;
}
@@ -710,7 +710,7 @@ static void etnaviv_gem_userptr_release(struct 
etnaviv_gem_object *etnaviv_obj)
if (etnaviv_obj->pages) {
int npages = etnaviv_obj->base.size >> PAGE_SHIFT;
 
-   release_pages(etnaviv_obj->pages, npages);
+   put_user_pages(etnaviv_obj->pages, npages);
kvfree(etnaviv_obj->pages);
}
 }
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 31/34] nfs: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Trond Myklebust 
Cc: Anna Schumaker 
Cc: linux-...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 fs/nfs/direct.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 0cb442406168..b00b89dda3c5 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -278,9 +278,7 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter)
 
 static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
 {
-   unsigned int i;
-   for (i = 0; i < npages; i++)
-   put_page(pages[i]);
+   put_user_pages(pages, npages);
 }
 
 void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 27/34] mm/memory.c: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Aneesh Kumar K.V 
Cc: Huang Ying 
Cc: Jérôme Glisse 
Cc: Matthew Wilcox 
Cc: Michal Hocko 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Souptick Joarder 
Cc: Will Deacon 
Signed-off-by: John Hubbard 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..8870968496ea 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4337,7 +4337,7 @@ int __access_remote_vm(struct task_struct *tsk, struct 
mm_struct *mm,
buf, maddr + offset, bytes);
}
kunmap(page);
-   put_page(page);
+   put_user_page(page);
}
len -= bytes;
buf += bytes;
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 19/34] fsl_hypervisor: convert put_page() to put_user_page*()

2019-08-02 Thread john . hubbard
From: John Hubbard 

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

This changes the release code slightly, because each page slot in the
page_list[] array is no longer checked for NULL. However, that check
was wrong anyway, because the get_user_pages() pattern of usage here
never allowed for NULL entries within a range of pinned pages.

Cc: Al Viro 
Cc: Kees Cook 
Cc: Rob Herring 
Signed-off-by: John Hubbard 
---
 drivers/virt/fsl_hypervisor.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/virt/fsl_hypervisor.c b/drivers/virt/fsl_hypervisor.c
index 93d5bebf9572..a8f78d572c45 100644
--- a/drivers/virt/fsl_hypervisor.c
+++ b/drivers/virt/fsl_hypervisor.c
@@ -292,11 +292,8 @@ static long ioctl_memcpy(struct fsl_hv_ioctl_memcpy __user 
*p)
virt_to_phys(sg_list), num_pages);
 
 exit:
-   if (pages) {
-   for (i = 0; i < num_pages; i++)
-   if (pages[i])
-   put_page(pages[i]);
-   }
+   if (pages)
+   put_user_pages(pages, num_pages);
 
kfree(sg_list_unaligned);
kfree(pages);
-- 
2.22.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  1   2   3   4   5   6   7   8   >