from:"Myklebust, Trond"

Re: [PATCH 11/14] nfs: idr_destroy() no longer needs idr_remove_all()

2013-01-29 Thread Myklebust, Trond

On Tue, 2013-01-29 at 17:58 -0500, J. Bruce Fields wrote:
> On Fri, Jan 25, 2013 at 05:31:09PM -0800, Tejun Heo wrote:
> > idr_destroy() can destroy idr by itself and idr_remove_all() is being
> > deprecated.  Drop reference to idr_remove_all().  Note that the code
> > wasn't completely correct before because idr_remove() on all entries
> > doesn't necessarily release all idr_layers which could lead to memory
> > leak.
> 
> Seems fine, but actually this is client-side so I think you meant the cc
> to be to Trond.

Acked-by: Trond Myklebust 

No problems whatsoever with removing a comment. :-)

My worry is more about Tejun's comment that we did actually need a call
to idr_remove_all() in the original code. Do we need to queue up a fix
for the 3.8 and existing stable kernels?

Cheers
  Trond

> > Signed-off-by: Tejun Heo 
> > Cc: "J. Bruce Fields" 
> > Cc: linux-...@vger.kernel.org
> > ---
> > This patch depends on an earlier idr patch and given the trivial
> > nature of the patch, I think it would be best to route these together
> > through -mm.  Please holler if there's any objection.
> > 
> > Thanks.
> > 
> >  fs/nfs/client.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 9f3c664..84d8eae 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -197,7 +197,6 @@ error_0:
> >  EXPORT_SYMBOL_GPL(nfs_alloc_client);
> >  
> >  #if IS_ENABLED(CONFIG_NFS_V4)
> > -/* idr_remove_all is not needed as all id's are removed by nfs_put_client 
> > */
> >  void nfs_cleanup_cb_ident_idr(struct net *net)
> >  {
> > struct nfs_net *nn = net_generic(net, nfs_net_id);
> > -- 
> > 1.8.1
> > 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client fixes

2013-01-31 Thread Myklebust, Trond

Hi Linus,
The following changes since commit 949db153b6466c6f7cad5a427ecea94985927311:

  Linux 3.8-rc5 (2013-01-25 11:57:28 -0800)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.8-4

for you to fetch changes up to c489ee290bdbbace6bb63ebe6ebd4dd605819495:

  NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session (2013-01-30 
17:45:15 -0500)


NFS client bugfixe for Linux 3.8

- Error reporting in nfs_xdev_mount incorrectly maps all errors to ENOMEM
- Fix an NFSv4 refcounting issue
- Fix a mount failure when the server reboots during NFSv4 trunking discovery
- NFSv4.1 mounts may need to run the lease recovery thread.
- Don't silently fail setattr() requests on mountpoints
- Fix a SUNRPC socket/transport livelock and priority queue issue
- We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.


Trond Myklebust (7):
  NFS: Fix error reporting in nfs_xdev_mount
  NFSv4: Fix NFSv4 reference counting for trunked sessions
  NFSv4: Fix NFSv4 trunking discovery
  NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
  NFS: Don't silently fail setattr() requests on mountpoints
  SUNRPC: When changing the queue priority, ensure that we change the owner
  NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session

 fs/nfs/namespace.c  | 20 +
 fs/nfs/nfs4client.c | 62 ++---
 fs/nfs/nfs4state.c  | 22 ---
 fs/nfs/super.c  | 22 ---
 net/sunrpc/sched.c  | 18 +++-
 5 files changed, 86 insertions(+), 58 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 15/19] sunrpc: don't warn for unused variable 'buf'

2013-01-25 Thread Myklebust, Trond

> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Friday, January 25, 2013 5:44 PM
> To: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org; Arnd Bergmann; J. Bruce Fields;
> Myklebust, Trond; linux-...@vger.kernel.org; net...@vger.kernel.org
> Subject: [PATCH 15/19] sunrpc: don't warn for unused variable 'buf'
> 
> When RPC_DEBUG is unset, the dprintk() macro does nothing, causing the
> 'buf' variable in svc_printk to become unused.
> Marking it as __maybe_unused avoids a harmless gcc warning.
> 
> Without this patch, building at91_dt_defconfig results in:
> 
> net/sunrpc/svc.c: In function 'svc_printk':
> net/sunrpc/svc.c:1051:7: warning: unused variable 'buf' [-Wunused-variable]
> 
> Signed-off-by: Arnd Bergmann 
> Cc: "J. Bruce Fields" 
> Cc: Trond Myklebust 
> Cc: linux-...@vger.kernel.org
> Cc: net...@vger.kernel.org
> ---
>  net/sunrpc/svc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index dbf12ac..b1f5223
> 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1047,7 +1047,7 @@ void svc_printk(struct svc_rqst *rqstp, const char
> *fmt, ...)  {
>   struct va_format vaf;
>   va_list args;
> - charbuf[RPC_MAX_ADDRBUFLEN];
> + char buf[RPC_MAX_ADDRBUFLEN] __maybe_unused;
> 
>   va_start(args, fmt);

Alternatively, just declare it using the RPC_IFDEBUG() macro.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Myklebust, Trond

On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> >>>> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >>>>>> sendfile64() pretty much already has the right arguments for a
> >>>>>> "copyfile", however it would be nice to add a 'flags' parameter: the
> >>>>>> NFSv4.2 version would use that to specify whether or not to copy file
> >>>>>> metadata.
> >>>>> That would seem to be enough to me and has the advantage that it is an
> >>>>> relatively obvious extension to something that is at least not totally
> >>>>> unknown to developers.
> >>>>>
> >>>>> Do we need more than that for non-NFS paths I wonder? What does reflink
> >>>>> need or the SCSI mechanism?
> >>>> For virt we would like to be able to specify arbitrary block ranges.
> >>>> Copying an entire file helps some copy operations like storage
> >>>> migration.  However, it is not enough to convert the guest's offloaded
> >>>> copies to host-side offloaded copies.
> >>> So how would a system call based on sendfile64() plus my flag parameter
> >>> prevent an underlying implementation from meeting your criterion?
> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
> >> it's missing an out_fd_offset.  The host will want to offload the
> >> guest's copies by calling sendfile on block ranges of a guest disk image
> >> file that correspond to the mappings of the in and out files in the
> >> guest.
> >>
> >> You could make it work with some locking and out_fd seeking to set the
> >> write offset before calling sendfile64()+flags, but ugh.
> >>
> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >>out_offset, size_t count, int flags);
> >>
> >> That seems closer.
> >>
> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> because that's the very first thing that's going to be requested after
> >> people prototype having to iterate calling sendfile() for each
> >> contiguous copy region.
> > I thought the first thing people would ask for is to atomically create a
> > new file and copy the old file into it (at least on local file systems).
> >   The idea is that nothing should see an empty destination file, either
> > by race or by crash.  (This feature would perhaps be described as a
> > pony, but it should be implementable.)
> >
> > This would be like a better link(2).
> >
> > --Andy
> 
> Why would this need to be atomic? That would seem to be a very difficult 
> property to provide across all target types with multi-GB sized files...

Right. It may sound cool, but what's the real-life use case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Myklebust, Trond

On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
>  wrote:
> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> >> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> >> >>>> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >> >>>>>> sendfile64() pretty much already has the right arguments for a
> >> >>>>>> "copyfile", however it would be nice to add a 'flags' parameter: the
> >> >>>>>> NFSv4.2 version would use that to specify whether or not to copy 
> >> >>>>>> file
> >> >>>>>> metadata.
> >> >>>>> That would seem to be enough to me and has the advantage that it is 
> >> >>>>> an
> >> >>>>> relatively obvious extension to something that is at least not 
> >> >>>>> totally
> >> >>>>> unknown to developers.
> >> >>>>>
> >> >>>>> Do we need more than that for non-NFS paths I wonder? What does 
> >> >>>>> reflink
> >> >>>>> need or the SCSI mechanism?
> >> >>>> For virt we would like to be able to specify arbitrary block ranges.
> >> >>>> Copying an entire file helps some copy operations like storage
> >> >>>> migration.  However, it is not enough to convert the guest's offloaded
> >> >>>> copies to host-side offloaded copies.
> >> >>> So how would a system call based on sendfile64() plus my flag parameter
> >> >>> prevent an underlying implementation from meeting your criterion?
> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
> >> >> it's missing an out_fd_offset.  The host will want to offload the
> >> >> guest's copies by calling sendfile on block ranges of a guest disk image
> >> >> file that correspond to the mappings of the in and out files in the
> >> >> guest.
> >> >>
> >> >> You could make it work with some locking and out_fd seeking to set the
> >> >> write offset before calling sendfile64()+flags, but ugh.
> >> >>
> >> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >> >>out_offset, size_t count, int flags);
> >> >>
> >> >> That seems closer.
> >> >>
> >> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> >> because that's the very first thing that's going to be requested after
> >> >> people prototype having to iterate calling sendfile() for each
> >> >> contiguous copy region.
> >> > I thought the first thing people would ask for is to atomically create a
> >> > new file and copy the old file into it (at least on local file systems).
> >> >   The idea is that nothing should see an empty destination file, either
> >> > by race or by crash.  (This feature would perhaps be described as a
> >> > pony, but it should be implementable.)
> >> >
> >> > This would be like a better link(2).
> >> >
> >> > --Andy
> >>
> >> Why would this need to be atomic? That would seem to be a very difficult
> >> property to provide across all target types with multi-GB sized files...
> >
> > Right. It may sound cool, but what's the real-life use case?
> >
> 
> Download file from some source and then verify it.  Now copyfile it
> into my repository of known-good files.
> 
> Admittedly I could link + unlink or rename it there, but I consider
> hard links to be rather evil, especially when cow links are available.

Rename is the right way to do that as it can't corrupt the data after
you have verified it. copyfile can...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Myklebust, Trond

On Mon, 2013-02-25 at 15:35 -0800, Andy Lutomirski wrote:
> On Mon, Feb 25, 2013 at 3:28 PM, Myklebust, Trond
>  wrote:
> > On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
> >> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
> >>  wrote:
> >> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> >> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> >> >> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> >> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> >> >> >>>> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >> >> >>>>>> sendfile64() pretty much already has the right arguments for a
> >> >> >>>>>> "copyfile", however it would be nice to add a 'flags' parameter: 
> >> >> >>>>>> the
> >> >> >>>>>> NFSv4.2 version would use that to specify whether or not to copy 
> >> >> >>>>>> file
> >> >> >>>>>> metadata.
> >> >> >>>>> That would seem to be enough to me and has the advantage that it 
> >> >> >>>>> is an
> >> >> >>>>> relatively obvious extension to something that is at least not 
> >> >> >>>>> totally
> >> >> >>>>> unknown to developers.
> >> >> >>>>>
> >> >> >>>>> Do we need more than that for non-NFS paths I wonder? What does 
> >> >> >>>>> reflink
> >> >> >>>>> need or the SCSI mechanism?
> >> >> >>>> For virt we would like to be able to specify arbitrary block 
> >> >> >>>> ranges.
> >> >> >>>> Copying an entire file helps some copy operations like storage
> >> >> >>>> migration.  However, it is not enough to convert the guest's 
> >> >> >>>> offloaded
> >> >> >>>> copies to host-side offloaded copies.
> >> >> >>> So how would a system call based on sendfile64() plus my flag 
> >> >> >>> parameter
> >> >> >>> prevent an underlying implementation from meeting your criterion?
> >> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying 
> >> >> >> because
> >> >> >> it's missing an out_fd_offset.  The host will want to offload the
> >> >> >> guest's copies by calling sendfile on block ranges of a guest disk 
> >> >> >> image
> >> >> >> file that correspond to the mappings of the in and out files in the
> >> >> >> guest.
> >> >> >>
> >> >> >> You could make it work with some locking and out_fd seeking to set 
> >> >> >> the
> >> >> >> write offset before calling sendfile64()+flags, but ugh.
> >> >> >>
> >> >> >>   ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >> >> >>out_offset, size_t count, int flags);
> >> >> >>
> >> >> >> That seems closer.
> >> >> >>
> >> >> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> >> >> because that's the very first thing that's going to be requested 
> >> >> >> after
> >> >> >> people prototype having to iterate calling sendfile() for each
> >> >> >> contiguous copy region.
> >> >> > I thought the first thing people would ask for is to atomically 
> >> >> > create a
> >> >> > new file and copy the old file into it (at least on local file 
> >> >> > systems).
> >> >> >   The idea is that nothing should see an empty destination file, 
> >> >> > either
> >> >> > by race or by crash.  (This feature would perhaps be described as a
> >> >> > pony, but it should be implementable.)
> >> >> >
> >> >> > This would be like a better link(2).
> >> >> >
> >> >> > --Andy
> >> >>
> >> >> Why would this need to be atomic? That would seem to be a very difficult
> >> >> property to provide across all target types with multi-GB sized files...
> >> >
> >> > Right. It may sound cool, but what's the real-life use case?
> >> >
> >>
> >> Download file from some source and then verify it.  Now copyfile it
> >> into my repository of known-good files.
> >>
> >> Admittedly I could link + unlink or rename it there, but I consider
> >> hard links to be rather evil, especially when cow links are available.
> >
> > Rename is the right way to do that as it can't corrupt the data after
> > you have verified it. copyfile can...
> 
> ...copyfile doesn't exist.

Wrong! The underlying NFS and SCSI copy offload protocols are fully
defined at this time, and will constrain any implementation that you may
dream up.

>   I think it would be neat if it couldn't
> corrupt data.

It would also be neat if the moon were made of cheese... The underlying
NFS and SCSI protocols do not guarantee perfect copies; the copy may,
for instance, be interrupted due to external circumstances.

> In any case, this may be a bad idea -- presumably you'd have to fsync
> the file you're copying *from* first to avoid a massive performance
> hit.

You have to do that anyway.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the nfsd tree

2013-02-28 Thread Myklebust, Trond

On Fri, 2013-03-01 at 12:04 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the nfsd tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> net/sunrpc/xprtsock.c:1923:30: error: 'struct rpc_task' has no member named 
> 'tk_xprt'
> 
> Caused by commit dc107402ae06 ("SUNRPC: make AF_LOCAL connect
> synchronous") interacting with commit 77102893ae68 ("SUNRPC: Nuke the
> tk_xprt macro") from Linus' tree.
> 
> I have no idea how to fix this, so I have used the version of the nfsd
> tree from next-20130228 for today.

Hi Bruce,

The attached patch should suffice to fix this up.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 2e7e09c..c1d8476 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1918,9 +1918,8 @@ out:
 	return status;
 }
 
-static void xs_local_connect(struct rpc_task *task)
+static void xs_local_connect(struct rpc_xprt *xprt, struct rpc_task *task)
 {
-	struct rpc_xprt *xprt = task->tk_xprt;
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
 	int ret;

[GIT PULL] Please pull NFS client bugfixes

2013-02-20 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 88b62b915b0b7e25870eb0604ed9a92ba4bfc9f7:

  Linux 3.8-rc6 (2013-02-01 12:08:14 +1100)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.9-1

for you to fetch changes up to 666b3d803a511fbc9bc5e5ea8ce66010cf03ea13:

  NLM: Ensure that we resend all pending blocking locks after a reclaim 
(2013-02-19 12:18:27 -0500)


NFS client bugfixes for Linux 3.9

- Fix an Oops in the pNFS layoutget code
- Fix a number of NFSv4 and v4.1 state recovery deadlocks and hangs
  due to the interaction of the session drain lock and state management
  locks.
- Remove task->tk_xprt, which was hiding a lot of RCU dereferencing bugs
- Fix a long standing NFSv3 posix lock recovery bug.
- Revert commit 324d003b0cd82151adbaecefef57b73f7959a469. It turned out
  that the root cause of the deadlock was due to interactions with the
  workqueues that have now been resolved.


Jeff Layton (1):
  sunrpc: silence build warning in gss_fill_context

Tim Gardner (1):
  nfs: remove kfree() redundant null checks

Trond Myklebust (18):
  SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()
  SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback
  SUNRPC: Fix an RCU dereference in xs_local_rpcbind
  SUNRPC: Pass pointers to struct rpc_xprt to the congestion window
  SUNRPC: Fix an RCU dereference in xprt_reserve
  SUNRPC: Avoid RCU dereferences in the transport bind and connect code
  SUNRPC: Nuke the tk_xprt macro
  Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"
  SUNRPC: Add missing static declaration to _gss_mech_get_by_name
  NFSv4: Allow the state manager to mark an open_owner as being recovered
  NFSv4.1: Prevent deadlocks between state recovery and file locking
  NFSv4.1: Don't lose locks when a server reboots during delegation return
  NFSv4: Fix up the return values of nfs4_open_delegation_recall
  NFSv4: Ensure delegation recall and byte range lock removal don't conflict
  NFSv4: Fix a reboot recovery race when opening a file
  NFSv4.1: Fix an ABBA locking issue with session and state serialisation
  NFSv4.1: Fix bulk recall and destroy of layouts
  NLM: Ensure that we resend all pending blocking locks after a reclaim

Weston Andros Adamson (1):
  NFSv4.1: Don't decode skipped layoutgets

fanchaoting (1):
  umount oops when remove blocklayoutdriver first

 fs/lockd/clntproc.c   |   3 +
 fs/nfs/blocklayout/blocklayout.c  |   1 +
 fs/nfs/callback_proc.c|  61 ++
 fs/nfs/delegation.c   | 154 --
 fs/nfs/delegation.h   |   1 +
 fs/nfs/getroot.c  |   3 +-
 fs/nfs/inode.c|   5 +-
 fs/nfs/internal.h |   1 -
 fs/nfs/nfs4_fs.h  |   4 +
 fs/nfs/nfs4proc.c | 133 -
 fs/nfs/nfs4state.c|  11 ++-
 fs/nfs/objlayout/objio_osd.c  |   1 +
 fs/nfs/pnfs.c | 150 -
 fs/nfs/pnfs.h |   7 +-
 fs/nfs/super.c|  49 ---
 fs/nfs/unlink.c   |   5 +-
 include/linux/sunrpc/sched.h  |   1 -
 include/linux/sunrpc/xprt.h   |   6 +-
 net/sunrpc/auth_gss/auth_gss.c|   5 +-
 net/sunrpc/auth_gss/gss_mech_switch.c |   4 +-
 net/sunrpc/clnt.c |  16 ++--
 net/sunrpc/xprt.c |  21 +++--
 net/sunrpc/xprtrdma/rpc_rdma.c|   4 +-
 net/sunrpc/xprtrdma/transport.c   |   7 +-
 net/sunrpc/xprtrdma/xprt_rdma.h   |   6 +-
 net/sunrpc/xprtsock.c |  16 ++--
 26 files changed, 415 insertions(+), 260 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Myklebust, Trond

On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:
> We have debated the need to have a system call to allow for offloading copy 
> operations, for example to an NFS server (part to the new NFS 4.2 
> specification), SCSI target device (two different SCSI commands do this), 
> local 
> file systems (reflink, etc) and I suspect many other possible parts of the 
> stack 
> could implement this.

sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
metadata.

> The earliest discussion of such a system call I saw happened back in 2001, I 
> know we had another more recent flurry (2-3 years back?) as well that got 
> tangled up and died away.
>
> Given the new popularity of this in storage devices and the use case for virt 
> guests, any chance to get a proposal floated this year that might be able to 
> land upstream in our life times :) ?

I'm planning on soon dusting off the NFS prototype that NetApp wrote 3
years ago and converting at least the client implementation into
something that can go upstream. We do also have a server prototype for
Linux, but the copy offload between 2 different servers is a hack and
would need significant work.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sunrpc/auth_gss: fix sparse warnings for gss_mech_switch

2013-02-21 Thread Myklebust, Trond

On Thu, 2013-02-21 at 20:10 +0200, Silviu-Mihai Popescu wrote:
> This fixes the following sparse warnings:
>   * net/sunrpc/auth_gss/gss_mech_switch.c:143:21: warning: symbol
> '_gss_mech_get_by_name' was not declared. Should it be static?
>   * net/sunrpc/auth_gss/gss_mech_switch.c:208:21: warning: symbol
> '_gss_mech_get_by_pseudoflavor' was not declared. Should it be static?
> 
> Signed-off-by: Silviu-Mihai Popescu 
> ---
>  net/sunrpc/auth_gss/gss_mech_switch.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/auth_gss/gss_mech_switch.c 
> b/net/sunrpc/auth_gss/gss_mech_switch.c
> index b174fcd..f0f4eee 100644
> --- a/net/sunrpc/auth_gss/gss_mech_switch.c
> +++ b/net/sunrpc/auth_gss/gss_mech_switch.c
> @@ -140,7 +140,7 @@ gss_mech_get(struct gss_api_mech *gm)
>  
>  EXPORT_SYMBOL_GPL(gss_mech_get);
>  
> -struct gss_api_mech *
> +static struct gss_api_mech *
>  _gss_mech_get_by_name(const char *name)
>  {
>   struct gss_api_mech *pos, *gm = NULL;
> @@ -205,7 +205,7 @@ mech_supports_pseudoflavor(struct gss_api_mech *gm, u32 
> pseudoflavor)
>   return 0;
>  }
>  
> -struct gss_api_mech *_gss_mech_get_by_pseudoflavor(u32 pseudoflavor)
> +static struct gss_api_mech *_gss_mech_get_by_pseudoflavor(u32 pseudoflavor)
>  {
>   struct gss_api_mech *gm = NULL, *pos;
>  

This is identical to commit c5f5e9c5d2e9178fb0bfe4f44f0afcc8ad6488ef
(SUNRPC: Add missing static declaration to _gss_mech_get_by_name) which
is already upstream.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Myklebust, Trond

On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >>>
> >> sendfile64() pretty much already has the right arguments for a
> >> "copyfile", however it would be nice to add a 'flags' parameter: the
> >> NFSv4.2 version would use that to specify whether or not to copy file
> >> metadata.
> > 
> > That would seem to be enough to me and has the advantage that it is an
> > relatively obvious extension to something that is at least not totally
> > unknown to developers.
> > 
> > Do we need more than that for non-NFS paths I wonder? What does reflink
> > need or the SCSI mechanism?
> 
> For virt we would like to be able to specify arbitrary block ranges.
> Copying an entire file helps some copy operations like storage
> migration.  However, it is not enough to convert the guest's offloaded
> copies to host-side offloaded copies.

So how would a system call based on sendfile64() plus my flag parameter
prevent an underlying implementation from meeting your criterion?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Myklebust, Trond

On Thu, 2013-02-21 at 23:05 +0100, Ric Wheeler wrote:
> On 02/21/2013 09:00 PM, Paolo Bonzini wrote:
> > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >>> sendfile64() pretty much already has the right arguments for a
> >>> "copyfile", however it would be nice to add a 'flags' parameter: the
> >>> NFSv4.2 version would use that to specify whether or not to copy file
> >>> metadata.
> >> That would seem to be enough to me and has the advantage that it is an
> >> relatively obvious extension to something that is at least not totally
> >> unknown to developers.
> >>
> >> Do we need more than that for non-NFS paths I wonder? What does reflink
> >> need or the SCSI mechanism?
> > For virt we would like to be able to specify arbitrary block ranges.
> > Copying an entire file helps some copy operations like storage
> > migration.  However, it is not enough to convert the guest's offloaded
> > copies to host-side offloaded copies.
> >
> > Paolo
> 
> I don't think that the NFS protocol allows arbitrary ranges, but the SCSI 
> commands are ranged based.
> 
> If I remember what the windows people said at a SNIA event a few years back, 
> they have a requirement that the target file be pre-allocated (at least for 
> the 
> SCSI based copy). Not clear to me where they iterate over that target file to 
> do 
> the block range copies, but I suspect it is in their kernel.

The NFSv4.2 copy offload protocol _does_ allow the copying of arbitrary
byte ranges. The main target for that functionality is indeed
virtualisation and thin provisioning of virtual machines.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: New copyfile system call - discuss before LSF?

2013-02-21 Thread Myklebust, Trond

> -Original Message-
> From: Zach Brown [mailto:z...@redhat.com]
> Sent: Thursday, February 21, 2013 5:25 PM
> To: Myklebust, Trond
> Cc: Paolo Bonzini; Ric Wheeler; Linux FS Devel; linux-kernel@vger.kernel.org;
> Chris L. Mason; Christoph Hellwig; Alexander Viro; Martin K. Petersen;
> Hannes Reinecke; Joel Becker
> Subject: Re: New copyfile system call - discuss before LSF?
> 
> On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote:
> > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> > > Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> > > >>>
> > > >> sendfile64() pretty much already has the right arguments for a
> > > >> "copyfile", however it would be nice to add a 'flags' parameter:
> > > >> the
> > > >> NFSv4.2 version would use that to specify whether or not to copy
> > > >> file metadata.
> > > >
> > > > That would seem to be enough to me and has the advantage that it
> > > > is an relatively obvious extension to something that is at least
> > > > not totally unknown to developers.
> > > >
> > > > Do we need more than that for non-NFS paths I wonder? What does
> > > > reflink need or the SCSI mechanism?
> > >
> > > For virt we would like to be able to specify arbitrary block ranges.
> > > Copying an entire file helps some copy operations like storage
> > > migration.  However, it is not enough to convert the guest's
> > > offloaded copies to host-side offloaded copies.
> >
> > So how would a system call based on sendfile64() plus my flag
> > parameter prevent an underlying implementation from meeting your
> criterion?
> 
> If I'm guessing correctly, sendfile64()+flags would be annoying because it's
> missing an out_fd_offset.  The host will want to offload the guest's copies by
> calling sendfile on block ranges of a guest disk image file that correspond to
> the mappings of the in and out files in the guest.
> 
> You could make it work with some locking and out_fd seeking to set the
> write offset before calling sendfile64()+flags, but ugh.
> 
>  ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
>   out_offset, size_t count, int flags);
> 
> That seems closer.

psendfile() ?

I fully agree that sounds reasonable... Just being an ass. :-)

> We might also want to pre-emptively offer iovs instead of offsets, because
> that's the very first thing that's going to be requested after people 
> prototype
> having to iterate calling sendfile() for each contiguous copy region.

vpsendfile() then? I agree that might be a little more future-proof. 
Particularly given that the underlying protocols tend to be fully asynchronous, 
and so it makes sense to queue up more than one copy at a time...

Cheers,
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: New copyfile system call - discuss before LSF?

2013-02-22 Thread Myklebust, Trond

> -Original Message-
> From: Zach Brown [mailto:z...@redhat.com]
> Sent: Friday, February 22, 2013 1:22 PM
> To: Ric Wheeler
> Cc: Paolo Bonzini; Myklebust, Trond; Linux FS Devel; linux-
> ker...@vger.kernel.org; Chris L. Mason; Christoph Hellwig; Alexander Viro;
> Martin K. Petersen; Hannes Reinecke; Joel Becker
> Subject: Re: New copyfile system call - discuss before LSF?
> 
> > This seems to be suspiciously close to a clear consensus on how to
> > move forward after many years of spinning our wheels. Anyone want to
> > promote an actual patch before we change our collective minds?
> 
> It seems like we'd want to start with the exisiting (presumably
> bitrotten) prototypes that Trond has for nfs and that Martin has for
> block->scsi.  Mash the new syscall on top of and get them working in
> current mainline.
> 
> I'd be happy to take responsibility for making forward progress if no one else
> has the bandwidth.
> 
> Trond, Martin, would that make sense?  Are the most recent versions of the
> prototypes available somewhere?

Hi Zach,

The wildly bitrotten NFS copyfile prototype can be found on


ftp://ftp.netapp.com/frm-ntap/opensource/linux_copyfileat/v2/linux_copyfileat_v2.tgz

Please open with extreme caution and apply the resulting patches to a Linux 
2.6.34.2 kernel...

Cheers
   Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix memset in NFS zap caches

2012-09-03 Thread Myklebust, Trond

On Mon, 2012-09-03 at 20:35 +0200, Andi Kleen wrote:
> Fix memset in nfs_zap_caches_locked
> 
> This memset overruns the buffer by 4 bytes on 64bit systems.
> 
> gcc 4.8 correct complains:
> 
> /backup/lsrc/git/linux-lto-2.6/fs/nfs/inode.c: In function
> 'nfs_zap_caches_locked':
> /backup/lsrc/git/linux-lto-2.6/fs/nfs/inode.c:157:41: warning: argument
> to 'sizeof' in 'memset' call is the same pointer type '__be32 *' as the
> destination; expected '__be32' or an explicit length
> [-Wsizeof-pointer-memaccess]
>   memset(NFS_COOKIEVERF(inode), 0, sizeof(NFS_COOKIEVERF(inode)));
>  ^
> Add a * to sizeof the correct type.
> 
> Signed-off-by: Andi Kleen 
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index c6e895f..69e7f0f 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -154,7 +154,7 @@ static void nfs_zap_caches_locked(struct inode *inode)
>   nfsi->attrtimeo = NFS_MINATTRTIMEO(inode);
>   nfsi->attrtimeo_timestamp = jiffies;
>  
> - memset(NFS_COOKIEVERF(inode), 0, sizeof(NFS_COOKIEVERF(inode)));
> + memset(NFS_COOKIEVERF(inode), 0, sizeof(*NFS_COOKIEVERF(inode)));
>   if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode))
>   nfsi->cache_validity |= 
> NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL|NFS_INO_REVAL_PAGECACHE;
>   else

Hi Andi,

No, this is a gcc bug.

NFS_COOKIEVERF(inode) resolves to an array, so the current code is
correct. The above change will cause the 2nd half of the array to remain
uninitialised...

Cheers
  Trond


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH] Fix memset in NFS zap caches

2012-09-03 Thread Myklebust, Trond

On Mon, 2012-09-03 at 20:47 +0200, Andi Kleen wrote:
> > No, this is a gcc bug.
> > 
> > NFS_COOKIEVERF(inode) resolves to an array, so the current code is
> > correct. The above change will cause the 2nd half of the array to remain
> > uninitialised...
> 
> Are you sure?
> 
> include/linux/nfs_fs.h:268:static inline __be32 *NFS_COOKIEVERF(const struct 
> inode *inode)
> 
> That doesn't look like an array type to me. 

Argh... It used to be a #define, but got converted in the commit
99fadcd7646 static inline blitz...

OK, let's just get rid of the NFS_COOKIEVERF thing altogether. At this
point it is clearly just obfuscating the code.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH] Fix memset in NFS zap caches

2012-09-03 Thread Myklebust, Trond

On Mon, 2012-09-03 at 14:52 -0400, Trond Myklebust wrote:
> On Mon, 2012-09-03 at 20:47 +0200, Andi Kleen wrote:
> > > No, this is a gcc bug.
> > > 
> > > NFS_COOKIEVERF(inode) resolves to an array, so the current code is
> > > correct. The above change will cause the 2nd half of the array to remain
> > > uninitialised...
> > 
> > Are you sure?
> > 
> > include/linux/nfs_fs.h:268:static inline __be32 *NFS_COOKIEVERF(const 
> > struct inode *inode)
> > 
> > That doesn't look like an array type to me. 
> 
> Argh... It used to be a #define, but got converted in the commit
> 99fadcd7646 static inline blitz...
> 
> OK, let's just get rid of the NFS_COOKIEVERF thing altogether. At this
> point it is clearly just obfuscating the code.
> 
This should do the right thing:

8<--
From c8879cbdf7c4697e450c4a001f24b88e04b70857 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 3 Sep 2012 14:56:02 -0400
Subject: [PATCH] NFS: Fix the initialisation of the readdir 'cookieverf'
 array

When the NFS_COOKIEVERF helper macro was converted into a static
inline function, we broke the initialisation of the readdir cookies,
since it depended on a 'sizeof(NFS_COOKIEVERF(inode))'.

At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.

Reported-by: Andi Kleen 
Signed-off-by: Trond Myklebust 
---
 fs/nfs/inode.c | 2 +-
 fs/nfs/nfs3proc.c  | 2 +-
 fs/nfs/nfs4proc.c  | 4 ++--
 include/linux/nfs_fs.h | 5 -
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c6e895f..9b47610 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -154,7 +154,7 @@ static void nfs_zap_caches_locked(struct inode *inode)
nfsi->attrtimeo = NFS_MINATTRTIMEO(inode);
nfsi->attrtimeo_timestamp = jiffies;
 
-   memset(NFS_COOKIEVERF(inode), 0, sizeof(NFS_COOKIEVERF(inode)));
+   memset(NFS_I(inode)->cookieverf, 0, sizeof(NFS_I(inode)->cookieverf));
if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode))
nfsi->cache_validity |= 
NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL|NFS_INO_REVAL_PAGECACHE;
else
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index d6b3b5f..6932209 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -643,7 +643,7 @@ nfs3_proc_readdir(struct dentry *dentry, struct rpc_cred 
*cred,
  u64 cookie, struct page **pages, unsigned int count, int plus)
 {
struct inode*dir = dentry->d_inode;
-   __be32  *verf = NFS_COOKIEVERF(dir);
+   __be32  *verf = NFS_I(dir)->cookieverf;
struct nfs3_readdirargs arg = {
.fh = NFS_FH(dir),
.cookie = cookie,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 74f5c26..1e50326 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3215,11 +3215,11 @@ static int _nfs4_proc_readdir(struct dentry *dentry, 
struct rpc_cred *cred,
dentry->d_parent->d_name.name,
dentry->d_name.name,
(unsigned long long)cookie);
-   nfs4_setup_readdir(cookie, NFS_COOKIEVERF(dir), dentry, &args);
+   nfs4_setup_readdir(cookie, NFS_I(dir)->cookieverf, dentry, &args);
res.pgbase = args.pgbase;
status = nfs4_call_sync(NFS_SERVER(dir)->client, NFS_SERVER(dir), &msg, 
&args.seq_args, &res.seq_res, 0);
if (status >= 0) {
-   memcpy(NFS_COOKIEVERF(dir), res.verifier.data, 
NFS4_VERIFIER_SIZE);
+   memcpy(NFS_I(dir)->cookieverf, res.verifier.data, 
NFS4_VERIFIER_SIZE);
status += args.pgbase;
}
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1f8fc7f..4b03f56 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -265,11 +265,6 @@ static inline const struct nfs_rpc_ops *NFS_PROTO(const 
struct inode *inode)
return NFS_SERVER(inode)->nfs_client->rpc_ops;
 }
 
-static inline __be32 *NFS_COOKIEVERF(const struct inode *inode)
-{
-   return NFS_I(inode)->cookieverf;
-}
-
 static inline unsigned NFS_MINATTRTIMEO(const struct inode *inode)
 {
struct nfs_server *nfss = NFS_SERVER(inode);
-- 
1.7.11.4


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client bugfixes

2012-09-12 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 086600430493e04b802bee6e5b3ce0458e4eb77f:

  NFSv3: Ensure that do_proc_get_root() reports errors correctly (2012-08-20 
12:52:42 -0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.6-4

for you to fetch changes up to 7b281ee026552f10862b617a2a51acf49c829554:

  NFS: fsync() must exit with an error if page writeback failed (2012-09-11 
15:38:32 -0400)


NFS client bugfixes for Linux 3.6

- Final (hopefully) fix for the range checking code in NFSv4 getacl. This
  should fix the Oopses being seen when the acl size is close to PAGE_SIZE.
- Fix a regression with the legacy binary mount code
- Fix a regression in the readdir cookieverf initialisation
- Fix an RPC over UDP regression
- Ensure that we report all errors in the NFSv4 open code
- Ensure that fsync() reports all relevant synchronisation errors.


Trond Myklebust (6):
  NFS: Fix the initialisation of the readdir 'cookieverf' array
  NFS: Fix a problem with the legacy binary mount code
  NFSv4: Fix range checking in __nfs4_get_acl_uncached and 
__nfs4_proc_set_acl
  NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
  SUNRPC: Fix a UDP transport regression
  NFS: fsync() must exit with an error if page writeback failed

Weston Andros Adamson (1):
  NFS: return error from decode_getfh in decode open

 fs/nfs/file.c   |  4 ++-
 fs/nfs/inode.c  |  2 +-
 fs/nfs/nfs3proc.c   |  2 +-
 fs/nfs/nfs4file.c   |  4 ++-
 fs/nfs/nfs4proc.c   | 55 +++--
 fs/nfs/nfs4xdr.c| 17 ++---
 fs/nfs/super.c  |  2 ++
 include/linux/nfs_fs.h  |  5 
 include/linux/nfs_xdr.h |  2 +-
 include/linux/sunrpc/xprt.h |  3 +++
 net/sunrpc/xprt.c   | 34 ++---
 net/sunrpc/xprtrdma/transport.c |  1 +
 net/sunrpc/xprtsock.c   |  3 +++
 13 files changed, 70 insertions(+), 64 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [ 042/108] NFS: Alias the nfs module to nfs4

2012-09-12 Thread Myklebust, Trond

On Wed, 2012-09-12 at 21:08 -0400, Josh Boyer wrote:
> On Wed, Sep 12, 2012 at 7:28 PM, Greg Kroah-Hartman
>  wrote:
> > From: Greg KH 
> >
> > 3.5-stable review patch.  If anyone has any objections, please let me know.
> >
> > --
> >
> > commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream.
> >
> > This allows distros to remove the line from their modprobe
> > configuration.
> >
> > Signed-off-by: Bryan Schumaker 
> > Signed-off-by: Trond Myklebust 
> > [bwh: Backported to 3.2: adjust context]
> > Signed-off-by: Ben Hutchings 
> > Signed-off-by: Greg Kroah-Hartman 
> 
> Could someone elaborate why this is acceptable now, while it wasn't
> during the timeframe I sent an identical patch months ago?
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/1204.3/02064.html
> 
> We had some trouble in Fedora 17 because of this and would up with
> patches to nfs-utils because the modalias was NAKed.  If we're bringing
> it into stable (and upstream) at this point, it would be good to know
> what changed so I can go and undo what we did because I was told no.
> 
> Please don't get me wrong, I'm all for simplicity.  I'm just perhaps
> so simple that the brief changelog confuses me because it doesn't
> actually describe why the original solution wasn't followed up on.

Linux-3.6 converts NFSv2/v3/v4 into modules, and so modprobe.conf
entries of the form

  alias nfs4 nfs

will now fail to ensure that 'mount -t nfs4 ...' works. In order to
facilitate the task for the distros when dealing with kernels >= 3.5.x,
we've added the automatic module alias so that they can remove the
hard-coded aliases in modprobe.conf while remaining backward-compatible.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v2] SUNRPC: check current nsproxy before set of node name on client creation

2012-09-13 Thread Myklebust, Trond

On Thu, 2012-09-13 at 16:11 +0400, Stanislav Kinsbursky wrote:
> 10.09.2012 19:41, Myklebust, Trond пишет:
> > On Mon, 2012-09-10 at 19:37 +0400, Stanislav Kinsbursky wrote:
> >> Hi, Trond.
> >> So, if I understand you right, we can create rpc client (or increase usage
> >> counter) on NSMPROC_MON call and destroy (or decrease usage counter) on
> >> NSMPROC_UNMON call.
> >> Will this solution works?
> >
> > The rpc client(s) will need to be per-net-namespace, which complicates
> > matters a little bit, but yes, creation at NSMPROC_MON, and destruction
> > at NSMPROC_UNMON should work.
> >
> 
> Hi, Trond.
> With reference-counted NSM client I got this:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0008
> IP: [] encode_mon_id+0x2e/0x64 [lockd]
> PGD 0
> Oops:  [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: nfsv3 nfs_acl nfs lockd veth sunrpc bridge stp llc 
> i2c_piix4 
> i2c_core virtio_blk virtio_net floppy virtio_pci virtio_ring virtio
> CPU 1
> Pid: 1174, comm: bash Not tainted 3.5.0-kitchensink+ #44 Bochs Bochs
> RIP: 0010:[]  [] encode_mon_id+0x2e/0x64 
> [lockd]
> RSP: 0018:88001d0f1a08  EFLAGS: 00010246
> RAX:  RBX: 88001d0f1c38 RCX: 
> RDX: 88001c85803f RSI: 88001d23b204 RDI: 88001d0f1a48
> RBP: 88001d0f1a18 R08: 88001c858040 R09: 0003
> R10: 88001a5aba10 R11: 88001d0f1ac8 R12: 88001d0f1a48
> R13: 88001a8a3138 R14: a008d580 R15: a00c0db5
> FS:  7f0d60eea700() GS:88001f70() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 0008 CR3: 1db3d000 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process bash (pid: 1174, threadinfo 88001d0f, task 88001d1f4160)
> Stack:
>   88001d0f1a48 88001c858030 88001d0f1a28 a00c0dc9
>   88001d0f1ab8 a00731a0 88001a5aba10 88001d0f1c38
>   88001c858040 88001a8a3140 88001c858854 88001a8a3140
> Call Trace:
>   [] nsm_xdr_enc_unmon+0x14/0x16 [lockd]
>   [] rpcauth_wrap_req+0xa1/0xb2 [sunrpc]
>   [] ? xprt_prepare_transmit+0x83/0x93 [sunrpc]
>   [] call_transmit+0x1e4/0x263 [sunrpc]
>   [] __rpc_execute+0xe7/0x313 [sunrpc]
>   [] ? call_transmit_status+0xb8/0xb8 [sunrpc]
>   [] ? wake_up_bit+0x25/0x2a
>   [] rpc_execute+0x9d/0xa5 [sunrpc]
>   [] rpc_run_task+0x7e/0x86 [sunrpc]
>   [] rpc_call_sync+0x44/0x65 [sunrpc]
>   [] nsm_mon_unmon+0x81/0xad [lockd]
>   [] nsm_unmonitor+0x82/0x13a [lockd]
>   [] nlm_destroy_host_locked+0x93/0xc9 [lockd]
>   [] nlmclnt_release_host+0xb3/0xc3 [lockd]
>   [] nlmclnt_done+0x1a/0x26 [lockd]
>   [] nfs_destroy_server+0x24/0x26 [nfs]
>   [] nfs_free_server+0xad/0x134 [nfs]
>   [] nfs_kill_super+0x22/0x26 [nfs]
>   [] deactivate_locked_super+0x26/0x57
>   [] deactivate_super+0x45/0x4c
>   [] mntput_no_expire+0x110/0x119
>   [] mntput+0x2a/0x2c
>   [] release_mounts+0x72/0x84
>   [] put_mnt_ns+0x6f/0x7e
>   [] free_nsproxy+0x1f/0x87
>   [] switch_task_namespaces+0x5c/0x65
>   [] exit_task_namespaces+0x10/0x12
>   [] do_exit+0x69b/0x84f
>   [] do_group_exit+0x83/0xae
>   [] sys_exit_group+0x17/0x1b
>   [] system_call_fastpath+0x16/0x1b
> Code: e5 41 54 53 66 66 66 66 90 48 89 f3 49 89 fc 48 8b 76 18 e8 93 ff ff ff 
> 4c 
> 89 e7 65 48 8b 04 25 c0 b9 00 00 48 8b 80 00 05 00 00 <48> 8b 70 08 48 83 c6 
> 45 
> e8 73 ff ff ff 4c 89 e7 be 0c 00 00 00
> RIP  [] encode_mon_id+0x2e/0x64 [lockd]
> 
> 
> There are more places, where NFS and Lockd code dereference utsname().
> In XDR layer, for instance.
> 
> So, probably, we have to tie NFS to utsns as well as to netns.
> Add one more ns to xprt? What do you think?
> 

We've already saved the utsname in the rpc_client as clnt->cl_nodename.
All XDR users can be trivially converted to use that.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 0/3] lockd: use per-net refrence-counted NSM clients

2012-09-14 Thread Myklebust, Trond

On Fri, 2012-09-14 at 13:01 -0400, Chuck Lever wrote:
> What happens if statd is restarted?

Nothing unusual. Why?

> Sent from my iPhone
> 
> On Sep 14, 2012, at 10:25 AM, Stanislav Kinsbursky 
>  wrote:
> 
> > This is a bug fix for https://bugzilla.redhat.com/show_bug.cgi?id=830862.
> > 
> > The problem is that with NFSv4 mount in container (with separated mount
> > namesapce) and active lock on it, dying child reaped of this container will
> > try to umount NFS and doing this will try to create RPC client to send
> > unmonitor request to statd.
> > But creation of RCP client requires valid current->nsproxy (for operation 
> > with
> > utsname()) and during umount on child reaper exit it's equal to zero.
> > 
> > Proposed solution is to introduce refrence-counter per-net NSM client, which
> > is created on fist monitor call and destroyed after the lst monitor call.
> > 
> > The following series implements...
> > 
> > ---
> > 
> > Stanislav Kinsbursky (3):
> >  lockd: use rpc client's cl_nodename for id encoding
> >  lockd: per-net NSM client creation and destruction helpers introduced
> >  lockd: create and use per-net NSM RPC clients on MON/UNMON requests
> > 
> > 
> > fs/lockd/mon.c   |   91 
> > +++---
> > fs/lockd/netns.h |4 ++
> > fs/lockd/svc.c   |1 +
> > 3 files changed, 77 insertions(+), 19 deletions(-)
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

RE: linux-next: build failure after merge of the nfs tree

2012-10-01 Thread Myklebust, Trond

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Stephen Rothwell
> Sent: Monday, October 01, 2012 8:39 PM
> To: Trond Myklebust
> Cc: linux-n...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: linux-next: build failure after merge of the nfs tree
> 
> Hi Trond,
> 
> After merging the nfs tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> fs/nfs/callback.c: In function 'nfs_minorversion_callback_svc_setup':
> fs/nfs/callback.c:206:21: error: 'ENOTSUP' undeclared (first use in this
> function)
> 
> Caused by commit 141e1f553b6e ("NFSv4: Fix the minor version callback
> channel startup").
> 

Hi Stephen,

Thanks! We've already fixed this issue and pushed out a new tree...

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Myklebust, Trond

On Tue, 2012-10-23 at 12:46 -0400, J. Bruce Fields wrote:
> On Tue, Oct 23, 2012 at 05:32:21PM +0100, Nix wrote:
> > On 23 Oct 2012, J. Bruce Fields uttered the following:
> > > nfs-utils shouldn't be capable of oopsing the kernel, so from my
> > > (selfish) point of view I'd actually rather you stick with whatever you
> > > have and try to reproduce the oops.
> > 
> > Reproduced in 3.6.3, not in 3.6.1, not tried 3.6.2. Capturing it was
> > rendered somewhat difficult by an ext4/JBD2 bug which leads to data loss
> > in /var on every reboot out of 3.6.1 and on some reboots out of 3.6.3 (I
> > have runs of NULs in my logs now, which keep eating the oopses):
> > 
> > [while in 3.6.1]
> > [   88.565698] JBD2: Spotted dirty metadata buffer (dev = dm-5, blocknr = 
> > 0). There's a risk of filesystem corruption in case of system crash.
> > [   88.799263] JBD2: Spotted dirty metadata buffer (dev = dm-5, blocknr = 
> > 0). There's a risk of filesystem corruption in case of system crash.
> > [   89.648152] [ cut here ]
> > [   89.648386] WARNING: at fs/inode.c:280 drop_nlink+0x25/0x42()
> > [   89.648614] Hardware name: empty
> > [   89.648833] Modules linked in: firewire_ohci firewire_core [last 
> > unloaded: microcode]
> > [   89.649382] Pid: 1484, comm: dhcpd Not tainted 3.6.1-dirty #1
> > [   89.649610] Call Trace:
> > [   89.649832]  [] warn_slowpath_common+0x83/0x9b
> > [   89.650063]  [] warn_slowpath_null+0x1a/0x1c
> > [   89.650292]  [] drop_nlink+0x25/0x42
> > [   89.650533]  [] ext4_dec_count+0x26/0x28
> > [   89.650763]  [] ext4_rename+0x4ec/0x7b4
> > [   89.650993]  [] ? vfs_rename+0xbe/0x3b7
> > [   89.651224]  [] vfs_rename+0x27c/0x3b7
> > [   89.651454]  [] sys_renameat+0x1b1/0x228
> > [   89.651682]  [] ? fsnotify+0x226/0x249
> > [   89.651911]  [] ? security_inode_permission+0x1e/0x20
> > [   89.652145]  [] ? vfs_write+0x116/0x142
> > [   89.652372]  [] sys_rename+0x1b/0x1e
> > [   89.652601]  [] system_call_fastpath+0x16/0x1b
> > [...]
> > [while having just booted into 3.6.1 after some time in 3.6.3: the FS
> >  was clean, and fscked on the previous boot into 3.6.3 after a previous
> >  instance of this bug]
> > Oct 23 17:18:26 spindle crit: [   67.625319] EXT4-fs error (device dm-5): 
> > mb_free_blocks:1300: group 65, block 2143748:freeing already freed block 
> > (bit 13828)
> > 
> > This may well be a 3.6.1-specific bug fixed in 3.6.3, but it's hard to
> > tell since most of my reboots are 3.6.1->3.6.3 or vice versa right now.
> > 
> > 
> > Anyway, here's the NFSv4 oops (not a panic: it helps if I remember to
> > turn off panic_on_oops when I come home from the holidays).
> > 
> > It's a lockd problem, and probably happens during delivery of mail over
> > NFS (my mailserver load soars when it happens):
> > 
> > [  813.110354] [ cut here ]
> > [  813.110585] kernel BUG at fs/lockd/mon.c:150!
> 
> So nsm_mon_unmon() is being passed a NULL client.
> 
> There are three container patches between 3.6.1 and 3.6.3:
> 
>   lockd: per-net NSM client creation and destruction helpers introduced
>   lockd: use rpc client's cl_nodename for id encoding
>   lockd: create and use per-net NSM RPC clients on MON/UNMON requests
> 
> and that last does change nsm_monitor's call to nsm_mon_unmon, so that's
> almost certainly it
> 
> Looks like there's some confusion about whether nsm_client_get() returns
> NULL or an error?

nsm_client_get() looks extremely racy in the case where ln->nsm_users ==
0.

Since we never recheck the value of ln->nsm_users after taking
nsm_create_mutex, what is stopping 2 different threads from both setting
ln->nsm_clnt and re-initialising ln->nsm_users?


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

RE: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Myklebust, Trond

> -Original Message-
> From: Nix [mailto:n...@esperi.org.uk]
> Sent: Tuesday, October 23, 2012 1:36 PM
> To: Myklebust, Trond
> Cc: J. Bruce Fields; Ted Ts'o; linux-kernel@vger.kernel.org; Schumaker,
> Bryan; Peng Tao; gre...@linuxfoundation.org; linux-...@vger.kernel.org;
> Stanislav Kinsbursky
> Subject: Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also
> an unrelated ext4 data loss bug)
> 
> On 23 Oct 2012, n...@esperi.org.uk uttered the following:
> 
> > On 23 Oct 2012, Trond Myklebust spake thusly:
> >> On Tue, 2012-10-23 at 12:46 -0400, J. Bruce Fields wrote:
> >>> Looks like there's some confusion about whether nsm_client_get()
> >>> returns NULL or an error?
> >>
> >> nsm_client_get() looks extremely racy in the case where ln->nsm_users
> >> == 0.  Since we never recheck the value of ln->nsm_users after taking
> >> nsm_create_mutex, what is stopping 2 different threads from both
> >> setting
> >> ln->nsm_clnt and re-initialising ln->nsm_users?
> >
> > Yep. At the worst possible time:
> >
> > spin_lock(&ln->nsm_clnt_lock);
> > if (ln->nsm_users) {
> > if (--ln->nsm_users)
> > ln->nsm_clnt = NULL;
> > (1) shutdown = !ln->nsm_users;
> > }
> > spin_unlock(&ln->nsm_clnt_lock);
> >
> > If a thread reinitializes nsm_users at point (1), after the
> > assignment, we could well end up with ln->nsm_clnt NULL and shutdown
> > false. A bit later, nsm_mon_unmon gets called with a NULL clnt, and boom.
> 
> Possible fix if so, utterly untested so far (will test when I can face yet 
> another
> reboot and fs-corruption-recovery-hell cycle, in a few hours), may ruin
> performance, violate locking hierarchies, and consume
> kittens:
> 
> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index e4fb3ba..da91cdf 100644
> --- a/fs/lockd/mon.c
> +++ b/fs/lockd/mon.c
> @@ -98,7 +98,6 @@ static struct rpc_clnt *nsm_client_get(struct net *net)
>   spin_unlock(&ln->nsm_clnt_lock);
>   goto out;
>   }
> - spin_unlock(&ln->nsm_clnt_lock);
> 
>   mutex_lock(&nsm_create_mutex);
>   clnt = nsm_create(net);
> @@ -108,6 +107,7 @@ static struct rpc_clnt *nsm_client_get(struct net *net)
>   ln->nsm_users = 1;
>   }
>   mutex_unlock(&nsm_create_mutex);
> + spin_unlock(&ln->nsm_clnt_lock);

You can't hold a spinlock while sleeping. Both mutex_lock() and nsm_create() 
can definitely sleep.

The correct way to do this is to grab the spinlock and recheck the value of 
ln->nsm_users inside the 'if (!IS_ERR())' condition. If it is still zero, bump 
it and set ln->nsm_clnt, otherwise bump it, get the existing ln->nsm_clnt and 
call rpc_shutdown_clnt() on the redundant nsm client after dropping the 
spinlock.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Myklebust, Trond

On Tue, 2012-10-23 at 17:44 +, Myklebust, Trond wrote:
> You can't hold a spinlock while sleeping. Both mutex_lock() and nsm_create() 
> can definitely sleep.
> 
> The correct way to do this is to grab the spinlock and recheck the value of 
> ln->nsm_users inside the 'if (!IS_ERR())' condition. If it is still zero, 
> bump it and set ln->nsm_clnt, otherwise bump it, get the existing 
> ln->nsm_clnt and call rpc_shutdown_clnt() on the redundant nsm client after 
> dropping the spinlock.
> 
> Cheers
>   Trond

Can you please check if the following patch fixes the issue?

Cheers
  Trond

8<
From 44a070455d246e09de0cefc8875833f21ca655e8 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Tue, 23 Oct 2012 13:51:58 -0400
Subject: [PATCH] LOCKD: fix races in nsm_client_get

Commit e9406db20fecbfcab646bad157b4cfdc7cadddfb (lockd: per-net
NSM client creation and destruction helpers introduced) contains
a nasty race on initialisation of the per-net NSM client because
it doesn't check whether or not the client is set after grabbing
the nsm_create_mutex.

Reported-by: Nix 
Signed-off-by: Trond Myklebust 
---
 fs/lockd/mon.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index e4fb3ba..9755603 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -88,7 +88,7 @@ static struct rpc_clnt *nsm_create(struct net *net)
 static struct rpc_clnt *nsm_client_get(struct net *net)
 {
static DEFINE_MUTEX(nsm_create_mutex);
-   struct rpc_clnt *clnt;
+   struct rpc_clnt *clnt, *new;
struct lockd_net *ln = net_generic(net, lockd_net_id);
 
spin_lock(&ln->nsm_clnt_lock);
@@ -101,11 +101,19 @@ static struct rpc_clnt *nsm_client_get(struct net *net)
spin_unlock(&ln->nsm_clnt_lock);
 
mutex_lock(&nsm_create_mutex);
-   clnt = nsm_create(net);
-   if (!IS_ERR(clnt)) {
-   ln->nsm_clnt = clnt;
-   smp_wmb();
-   ln->nsm_users = 1;
+   new = nsm_create(net);
+   clnt = new;
+   if (!IS_ERR(new)) {
+   spin_lock(&ln->nsm_clnt_lock);
+   if (!ln->nsm_users) {
+   ln->nsm_clnt = new;
+   new = NULL;
+   }
+   clnt = ln->nsm_clnt;
+   ln->nsm_users++;
+   spin_unlock(&ln->nsm_clnt_lock);
+   if (new)
+   rpc_shutdown_client(new);
}
mutex_unlock(&nsm_create_mutex);
 out:
-- 
1.7.11.7


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug)

2012-10-23 Thread Myklebust, Trond

On Tue, 2012-10-23 at 13:57 -0400, Trond Myklebust wrote:
> On Tue, 2012-10-23 at 17:44 +0000, Myklebust, Trond wrote:
> > You can't hold a spinlock while sleeping. Both mutex_lock() and 
> > nsm_create() can definitely sleep.
> > 
> > The correct way to do this is to grab the spinlock and recheck the value of 
> > ln->nsm_users inside the 'if (!IS_ERR())' condition. If it is still zero, 
> > bump it and set ln->nsm_clnt, otherwise bump it, get the existing 
> > ln->nsm_clnt and call rpc_shutdown_clnt() on the redundant nsm client after 
> > dropping the spinlock.
> > 
> > Cheers
> >   Trond
> 
> Can you please check if the following patch fixes the issue?
> 
> Cheers
>   Trond
> 
Meh... This one gets rid of the 100% redundant mutex...

8<---
From 4187c816a15df12544ebcfa6b961fce96458e244 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Tue, 23 Oct 2012 13:51:58 -0400
Subject: [PATCH] LOCKD: fix races in nsm_client_get

Commit e9406db20fecbfcab646bad157b4cfdc7cadddfb (lockd: per-net
NSM client creation and destruction helpers introduced) contains
a nasty race on initialisation of the per-net NSM client because
it doesn't check whether or not the client is set after grabbing
the nsm_create_mutex.

Reported-by: Nix 
Signed-off-by: Trond Myklebust 
Cc: sta...@vger.kernel.org
---
 fs/lockd/mon.c | 43 ++-
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index e4fb3ba..fe69560 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -85,29 +85,38 @@ static struct rpc_clnt *nsm_create(struct net *net)
return rpc_create(&args);
 }
 
+static struct rpc_clnt *nsm_client_set(struct lockd_net *ln,
+   struct rpc_clnt *clnt)
+{
+   spin_lock(&ln->nsm_clnt_lock);
+   if (ln->nsm_users == 0) {
+   if (clnt == NULL)
+   goto out;
+   ln->nsm_clnt = clnt;
+   }
+   clnt = ln->nsm_clnt;
+   ln->nsm_users++;
+out:
+   spin_unlock(&ln->nsm_clnt_lock);
+   return clnt;
+}
+
 static struct rpc_clnt *nsm_client_get(struct net *net)
 {
-   static DEFINE_MUTEX(nsm_create_mutex);
-   struct rpc_clnt *clnt;
+   struct rpc_clnt *clnt, *new;
struct lockd_net *ln = net_generic(net, lockd_net_id);
 
-   spin_lock(&ln->nsm_clnt_lock);
-   if (ln->nsm_users) {
-   ln->nsm_users++;
-   clnt = ln->nsm_clnt;
-   spin_unlock(&ln->nsm_clnt_lock);
+   clnt = nsm_client_set(ln, NULL);
+   if (clnt != NULL)
goto out;
-   }
-   spin_unlock(&ln->nsm_clnt_lock);
 
-   mutex_lock(&nsm_create_mutex);
-   clnt = nsm_create(net);
-   if (!IS_ERR(clnt)) {
-   ln->nsm_clnt = clnt;
-   smp_wmb();
-   ln->nsm_users = 1;
-   }
-   mutex_unlock(&nsm_create_mutex);
+   clnt = new = nsm_create(net);
+   if (IS_ERR(clnt))
+   goto out;
+
+   clnt = nsm_client_set(ln, new);
+   if (clnt != new)
+   rpc_shutdown_client(new);
 out:
return clnt;
 }
-- 
1.7.11.7


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH 182/193] net/sunrpc: remove CONFIG_EXPERIMENTAL

2012-10-23 Thread Myklebust, Trond

On Tue, 2012-10-23 at 13:04 -0700, Kees Cook wrote:
> This config item has not carried much meaning for a while now and is
> almost always enabled by default. As agreed during the Linux kernel
> summit, remove it.
> 
> CC: Trond Myklebust 
> CC: "J. Bruce Fields" 
> CC: "David S. Miller" 
> Signed-off-by: Kees Cook 
> ---
>  net/sunrpc/Kconfig |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
> index 03d03e3..516fe2c 100644
> --- a/net/sunrpc/Kconfig
> +++ b/net/sunrpc/Kconfig
> @@ -10,7 +10,7 @@ config SUNRPC_BACKCHANNEL
>  
>  config SUNRPC_XPRT_RDMA
>   tristate
> - depends on SUNRPC && INFINIBAND && INFINIBAND_ADDR_TRANS && EXPERIMENTAL
> + depends on SUNRPC && INFINIBAND && INFINIBAND_ADDR_TRANS
>   default SUNRPC && INFINIBAND
>   help
> This option allows the NFS client and server to support

Acked-by: Trond Myklebust 
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[GIT PULL] Please pull more NFS related bugfixes

2012-10-25 Thread Myklebust, Trond

Hi Linus,

This pull fixes a fairly urgent issue with the NFSv2/v3 statd code that
is causing Oopses, as well as some long standing races with the SUNRPC
tcp code.

The following changes since commit 0e9e3e306c7e472bdcffa34c4c4584301eda03b3:

  Merge tag 'stable/for-linus-3.7-rc2-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen (2012-10-24 05:17:27 
+0300)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.7-3

for you to fetch changes up to e498daa81295d02f7359af313c2b7f87e1062207:

  LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero (2012-10-24 
10:46:22 -0400)


NFS bugfixes for Linux 3.7

- Fix the NFSv2/v3 kernel statd protocol, which broke due to net namespace
  related changes.
- Fix a number of races in the SUNRPC TCP disconnect/reconnect code.


Trond Myklebust (6):
  SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT
  Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
  SUNRPC: Prevent races in xs_abort_connection()
  SUNRPC: Get rid of the xs_error_report socket callback
  LOCKD: fix races in nsm_client_get
  LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero

 fs/lockd/mon.c| 57 +--
 net/sunrpc/xprtsock.c | 41 +---
 2 files changed, 42 insertions(+), 56 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

RE: [PATCH] lockd: fix races in per-net NSM client handling

2012-10-31 Thread Myklebust, Trond

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Pawel Sikora
> Sent: Wednesday, October 31, 2012 2:03 PM
> To: Greg KH
> Cc: skinsbur...@parallels.com; sta...@vger.kernel.org; linux-
> ker...@vger.kernel.org; bagg...@pld-linux.org; ar...@pld-linux.org
> Subject: Re: [PATCH] lockd: fix races in per-net NSM client handling
> 
> On Wednesday 31 of October 2012 10:49:46 Greg KH wrote:
> > On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
> > > Hi,
> > >
> > > the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to
> > > fix the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> > > please queue this path for 3.6.$next.
> >
> > Is it in Linus's tree already?  If so, what is the git commit id?
> 
> the mainstream contains some lock deamon fixes already:
> 
> * e498daa LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
> * a4ee8d9 LOCKD: fix races in nsm_client_get
> 
> but i don't know where is the right fix. Stanislav, could you put some light 
> on
> this?

The above 2 patches (which are already in 3.6.5) replace Stanislav's patch, 
which will not be merged upstream.

Trond

[GIT PULL] Please pull NFS client bugfixes

2012-11-03 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 08f05c49749ee655bef921d12160960a273aad47:

  Return the right error value when dup[23]() newfd argument is too large 
(2012-10-30 21:27:28 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.7-4

for you to fetch changes up to 998f40b550f257e436485291802fa938e4cf580f:

  NFS4: nfs4_opendata_access should return errno (2012-11-02 18:51:54 -0400)


NFS bugfixes for Linux 3.7

- Fix a bunch of deadlock situations:
  * State recovery can deadlock if we fail to release sequence ids before
scheduling the recovery thread.
  * Calling deactivate_super() from an RPC workqueue thread can deadlock
because of the call to rpc_shutdown_client.
- Display the device name correctly in /proc/*/mounts
- Fix a number of incorrect error return values:
  * When NFSv3 mounts fail due to a timeout.
  * On NFSv4.1 backchannel setup failure
  * On NFSv4 open access checks
- pnfs_find_alloc_layout() must check the layout pointer for NULL
- Fix a regression in the legacy DNS resolved


Ben Hutchings (1):
  nfs: Show original device name verbatim in /proc/*/mount{s,info}

Bryan Schumaker (1):
  NFS: Wait for session recovery to finish before returning

NeilBrown (1):
  NFS: fix bug in legacy DNS resolver.

Scott Mayhew (1):
  nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts

Trond Myklebust (3):
  NFSv4.1: We must release the sequence id when we fail to get a session 
slot
  NFSv4: nfs4_locku_done must release the sequence id
  NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly

Weston Andros Adamson (3):
  NFS: add nfs_sb_deactive_async to avoid deadlock
  SUNRPC: return proper errno from backchannel_rqst
  NFS4: nfs4_opendata_access should return errno

Yanchuan Nian (1):
  nfs: Check whether a layout pointer is NULL before free it

 fs/nfs/dns_resolve.c  |  5 +++--
 fs/nfs/inode.c|  5 -
 fs/nfs/internal.h |  6 +++--
 fs/nfs/mount_clnt.c   |  2 +-
 fs/nfs/namespace.c| 19 +++-
 fs/nfs/nfs4namespace.c|  3 ++-
 fs/nfs/nfs4proc.c | 46 +++---
 fs/nfs/pnfs.c |  4 ++--
 fs/nfs/super.c| 51 ++-
 fs/nfs/unlink.c   |  2 +-
 net/sunrpc/backchannel_rqst.c |  2 +-
 11 files changed, 110 insertions(+), 35 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [ 243/262] NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidate

2012-11-03 Thread Myklebust, Trond

On Fri, 2012-09-28 at 11:52 -0700, Greg Kroah-Hartman wrote:
> From: Greg KH 
> 
> 3.5-stable review patch.  If anyone has any objections, please let me know.


Hi Greg,

Can we please apply this patch to the older stable kernels too? I see
that Ben has already applied it to the 3.2 kernel, but it really wants
to go into 3.4 and 3.0 as well...

Thanks!
  Trond

> --
> 
> From: Trond Myklebust 
> 
> [Fixed upstream as part of 0b728e1911c, but that's a much larger patch,
> this is only the nfs portion backported as needed.]
> 
> Fix the following Oops in 3.5.1:
> 
>  BUG: unable to handle kernel NULL pointer dereference at 0038
>  IP: [] nfs_lookup_revalidate+0x2d/0x480 [nfs]
>  PGD 337c63067 PUD 0
>  Oops:  [#1] SMP
>  CPU 5
>  Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc 
> af_packet binfmt_misc cpufreq_conservative cpufreq_userspace 
> cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel 
> joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp 
> serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel 
> microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit 
> sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common scsi_dh_rdac 
> scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan ata_piix thermal 
> processor thermal_sys
> 
>  Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro X8DTT/X8DTT
>  RIP: 0010:[]  [] 
> nfs_lookup_revalidate+0x2d/0x480 [nfs]
>  RSP: 0018:8801b418bd38  EFLAGS: 00010292
>  RAX: fff6 RBX: 88032016d800 RCX: 0020
>  RDX:  RSI:  RDI: 8801824a7b00
>  RBP: 8801b418bdf8 R08: 7f0034323030 R09: f04c03ed
>  R10: 8801824a7b00 R11: 0002 R12: 8801824a7b00
>  R13: 8801824a7b00 R14:  R15: 8803201725d0
>  FS:  2b53a46cb700() GS:88033fc2() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 0038 CR3: 00020a426000 CR4: 07e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: 0ff0 DR7: 0400
>  Process java (pid: 30431, threadinfo 8801b418a000, task 8801b5d20600)
>  Stack:
>   8801b418be44 88032016d800 8801b418bdf8 
>   8801824a7b00 8801b418bdd7 8803201725d0 8116a9c0
>   8801b5c38dc0 0007 88032016d800 
>  Call Trace:
>   [] lookup_dcache+0x80/0xe0
>   [] __lookup_hash+0x23/0x90
>   [] lookup_one_len+0xc5/0x100
>   [] nfs_sillyrename+0xe3/0x210 [nfs]
>   [] vfs_unlink.part.25+0x7f/0xe0
>   [] do_unlinkat+0x1ac/0x1d0
>   [] system_call_fastpath+0x16/0x1b
>   [<2b5348b5f527>] 0x2b5348b5f526
>  Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 89 5c 
> 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30  46 38 40 0f 
> 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89
>  RIP  [] nfs_lookup_revalidate+0x2d/0x480 [nfs]
>   RSP 
>  CR2: 0038
>  ---[ end trace 845113ed191985dd ]---
> 
> This Oops affects 3.5 kernels and older, and is due to lookup_one_len()
> calling down to the dentry revalidation code with a NULL pointer
> to struct nameidata.
> 
> It is fixed upstream by commit 0b728e1911c (stop passing nameidata *
> to ->d_revalidate())
> 
> Reported-by: Richard Ems 
> Signed-off-by: Trond Myklebust 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  fs/nfs/dir.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1123,7 +1123,7 @@ static int nfs_lookup_revalidate(struct
>   struct nfs_fattr *fattr = NULL;
>   int error;
>  
> - if (nd->flags & LOOKUP_RCU)
> + if (nd && (nd->flags & LOOKUP_RCU))
>   return -ECHILD;
>  
>   parent = dget_parent(dentry);
> @@ -1526,7 +1526,7 @@ static int nfs4_lookup_revalidate(struct
>   struct inode *dir;
>   int openflags, ret = 0;
>  
> - if (nd->flags & LOOKUP_RCU)
> + if (nd && (nd->flags & LOOKUP_RCU))
>   return -ECHILD;
>  
>   inode = dentry->d_inode;
> 
> 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates for 3.6...

2012-07-30 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 9249e17fe094d853d1ef7475dd559a2cc7e23d42:

  VFS: Pass mount flags to sget() (2012-07-14 16:38:34 +0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.6-1

for you to fetch changes up to f44106e2173f08ccb1c9195d85a6c22388b461c1:

  nfs: fix fl_type tests in NFSv4 code (2012-07-30 18:09:13 -0400)

Please note that due to my own family vacation, I've been asking Bryan
to collect some of the patches that came in the past 2 weeks. I'm
therefore planning on sending a second pull request on Wednesday
containing the stuff that he has been accumulating.

This is all stuff that I had queued up before I left, plus a couple of
regression fixes.

Cheers
  Trond


NFS client updates for Linux 3.6

Features include:
- More preparatory patches for modularising NFSv2/v3/v4.
  Split out the various NFSv2/v3/v4-specific code into separate
  files
- More preparation for the NFSv4 migration code
- Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters
- pNFS fast failover when the data servers are down
- Various cleanups and debugging patches


Andy Adamson (6):
  NFSv4.1: Use session max response size for GETDEVICEINFO gdia_maxcount
  NFSv4.1 handle OPEN O_CREATE mdsthreshold
  NFSv4.1 return the LAYOUT for each file with failed DS connection I/O
  NFSv4.1 don't send LAYOUTCOMMIT if data resent through MDS
  NFSv4.1 mark layout when already returned
  NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list

Bryan Schumaker (23):
  NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()
  NFS: Use nfs4_destroy_server() to clean up NFS v4
  NFS: Create a v4-specific fsync function
  NFS: Create a have_delegation rpc_op
  NFS: Create a return_delegation rpc op
  NFS: Create a free_client rpc_op
  NFS: Create an alloc_client rpc_op
  NFS: Create an read_pageio_init() function
  NFS: Create an write_pageio_init() function
  NFS: Create custom NFS v4 write_inode() function
  NFS: Split out NFS v2 inode operations
  NFS: Split out NFS v3 inode operations
  NFS: Split out NFS v4 inode operations
  NFS: Create an init_nfs_v4() function
  NFS: Initialize v4 sysctls from nfs_init_v4()
  NFS: Split out NFS v4 file operations
  NFS: Move the v4 getroot code to nfs4getroot.c
  NFS: Initialize the NFS v4 client from init_nfs_v4()
  NFS: Split out NFS v4 server creating code
  NFS: Create a single nfs_clone_super() function
  NFS: Split out the NFS v4 filesystem types
  NFS: Split out NFS v4 client functions
  NFS: exit_nfs_v4() shouldn't be an __exit function

Chuck Lever (10):
  NFS: Fix up TEST_STATEID and FREE_STATEID return code handling
  NFS: Don't free a state ID the server does not recognize
  NFS: State reclaim clears OPEN and LOCK state
  NFS: Clean up nfs41_check_expired_stateid()
  NFS: Clean up TEST_STATEID and FREE_STATEID error reporting
  NFS: nfs_getaclargs.acl_len is a size_t
  SUNRPC: Add rpcauth_list_flavors()
  NFS: When state recovery fails, waiting tasks should exit
  NFS: Treat NFS4ERR_CLID_INUSE as a fatal error
  NFS: Clean up nfs4_proc_setclientid() and friends

Fred Isaman (2):
  NFS: fix pnfs regression with directio reads
  NFS: fix pnfs regression with directio writes

Jeff Layton (1):
  nfs: fix fl_type tests in NFSv4 code

Joe Perches (1):
  sunrpc: clnt: Add missing braces

Randy Dunlap (1):
  nfs: fix stub return type warnings

Trond Myklebust (23):
  NFSv2/v3: Remove incorrect dprintks from the readdir reply code
  SUNRPC: xdr_read_pages needs to clear xdr->page_ptr.
  SUNRPC: Clean up xdr_set_iov()
  SUNRPC: Don't decode beyond the end of the RPC reply message
  SUNRPC: xdr_read_pages should return the amount of XDR encoded page data
  NFS: Let xdr_read_pages() check for buffer overflows
  SUNRPC: Add the helper xdr_stream_pos
  NFSv4: Simplify the GETATTR attribute length calculation
  NFSv3: Don't open code stream position calculation in decode_getacl3resok
  SUNRPC: Remove open coded stream position calculation in xdr_read_pages
  SUNRPC: Simplify the end-of-buffer calculation in xdr_read_pages
  SUNRPC: Clean up xdr_read_pages
  SUNRPC: Clean up xdr_enter_page
  NFSv4.1: Handle slot recalls before doing state recovery
  NFSv4.1: Clean up nfs4_recall_slot()
  NFSv4.1: Cleanup - move nfs4_has_session tests out of state manager loop
  NFSv4.1: Clean up nfs4_reclaim_lease
  SUNRPC: Remove unused function xdr_encode_pages
  NFSv4: Decode getdevicelist should use nfs4_verifier
  NFS: Cleanup - only store the write verifier in struct nfs_page
  NFS: Simplify NFSv4.1 Kconfig
  Merge commit '9249e17f

Re: [PATCH] SUNRPC: return negative value in case rpcbind client creation error

2012-07-30 Thread Myklebust, Trond

On Fri, 2012-07-20 at 15:57 +0400, Stanislav Kinsbursky wrote:
> Without this patch kernel will panic on LockD start, because lockd_up() checks
> lockd_up_net() result for negative value.
> >From my pow it's better to return negative value from rpcbind routines 
> >instead
> of replacing all such checks like in lockd_up().
> 
> Signed-off-by: Stanislav Kinsbursky 
> ---
>  net/sunrpc/rpcb_clnt.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index 92509ff..a70acae 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -251,7 +251,7 @@ static int rpcb_create_local_unix(struct net *net)
>   if (IS_ERR(clnt)) {
>   dprintk("RPC:   failed to create AF_LOCAL rpcbind "
>   "client (errno %ld).\n", PTR_ERR(clnt));
> - result = -PTR_ERR(clnt);
> + result = PTR_ERR(clnt);
>   goto out;
>   }
>  
> @@ -298,7 +298,7 @@ static int rpcb_create_local_net(struct net *net)
>   if (IS_ERR(clnt)) {
>   dprintk("RPC:   failed to create local rpcbind "
>   "client (errno %ld).\n", PTR_ERR(clnt));
> - result = -PTR_ERR(clnt);
> + result = PTR_ERR(clnt);
>   goto out;
>   }

Who is supposed to carry this patch? Is it Bruce or is it me?

Cheers
  Trond

Re: linux-next: manual merge of the akpm tree with the nfs tree

2012-07-31 Thread Myklebust, Trond

On Tue, 2012-07-31 at 11:33 +0100, Mel Gorman wrote:
> On Tue, Jul 31, 2012 at 02:24:41PM +1000, Stephen Rothwell wrote:
> > Hi Andrew,
> > 
> > Today's linux-next merge of the akpm tree got a conflict in
> > net/sunrpc/xprtsock.c between commit 5cf02d09b50b ("nfs: skip commit in
> > releasepage if we're freeing memory for fs-related reasons") from the nfs
> > tree and commit "nfs: enable swap on NFS" from the akpm tree.
> > 
> > Just context changes?  I fixed it up (I think - see below) and can carry
> > the fix as necessary.
> 
> Functionally it looks fine. As you say, it all looks like context
> changes. Arguably code like this
> 
> current->flags &= ~PF_FSTRANS
> 
> could use tsk_restore_flags instead() even though it should never be
> necessary as PF_FSTRANS would not be set on function entry. However,
> it would set up a depedency between the patch sets that is undesirable.
> If both sets get merged then it might make sense as a cleanup to use
> tsk_restore_flags() but not until then.
> 
> Thanks Stephen.
> 

Do we really need to set both PF_FSTRANS and PF_MEMALLOC here? The
reason why I merged the PF_FSTRANS patch is that we have the deadlock
problem when allocating a new socket even before we add swap-over-nfs.
Adding PF_FSTRANS to disallow entry into the NFS layer by the memory
allocator fixes that issue.
What value does PF_MEMALLOC add? Is that in order to prevent recursion
into other areas of the swap code (say, if you mix swap-over-nfs with
ordinary swap-to-disk)?

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: linux-next: manual merge of the akpm tree with the nfs tree

2012-07-31 Thread Myklebust, Trond

On Tue, 2012-07-31 at 16:19 +0100, Mel Gorman wrote:
> On Tue, Jul 31, 2012 at 02:37:24PM +0000, Myklebust, Trond wrote:
> > On Tue, 2012-07-31 at 11:33 +0100, Mel Gorman wrote:
> > > On Tue, Jul 31, 2012 at 02:24:41PM +1000, Stephen Rothwell wrote:
> > > > Hi Andrew,
> > > > 
> > > > Today's linux-next merge of the akpm tree got a conflict in
> > > > net/sunrpc/xprtsock.c between commit 5cf02d09b50b ("nfs: skip commit in
> > > > releasepage if we're freeing memory for fs-related reasons") from the 
> > > > nfs
> > > > tree and commit "nfs: enable swap on NFS" from the akpm tree.
> > > > 
> > > > Just context changes?  I fixed it up (I think - see below) and can carry
> > > > the fix as necessary.
> > > 
> > > Functionally it looks fine. As you say, it all looks like context
> > > changes. Arguably code like this
> > > 
> > > current->flags &= ~PF_FSTRANS
> > > 
> > > could use tsk_restore_flags instead() even though it should never be
> > > necessary as PF_FSTRANS would not be set on function entry. However,
> > > it would set up a depedency between the patch sets that is undesirable.
> > > If both sets get merged then it might make sense as a cleanup to use
> > > tsk_restore_flags() but not until then.
> > > 
> > > Thanks Stephen.
> > > 
> > 
> > Do we really need to set both PF_FSTRANS and PF_MEMALLOC here? The
> > reason why I merged the PF_FSTRANS patch is that we have the deadlock
> > problem when allocating a new socket even before we add swap-over-nfs.
> > Adding PF_FSTRANS to disallow entry into the NFS layer by the memory
> > allocator fixes that issue.
> 
> PF_FSTRANS is to prevent recursion into NFS and is set whether swap-over-NFS
> is used or not and for all requests.
> 
> > What value does PF_MEMALLOC add? Is that in order to prevent recursion
> > into other areas of the swap code (say, if you mix swap-over-nfs with
> > ordinary swap-to-disk)?
> > 
> 
> PF_MEMALLOC is normally to prevent the page reclaim recursing into
> itself. Page reclaim can call the page allocator and that cannot re-enter
> page reclaim.
> 
> In the case of swap-over-NFS, PF_MEMALLOC is set only if the socket is
> being used for swapping. In softirq context, the allocation request is
> allowed to use PFMEMALLOC reserves to avoid deadlock.
> 
> I do not see an obvious way to collapse the two flags together.
> PF_FSTRANS should not mean the PFMEMALLOC reserves can be used and
> PFMEMALLOC is not set for all requests.

Right, but in this case, we're talking about a GFP_KERNEL allocation
that always happens in an rpciod workqueue process context, so we still
won't be able to access the PFMEMALLOC reserves if I understand you
correctly?

I understand the value of preventing the page reclaim recursing into
itself, but in this case, we're talking about a separate process that is
operating on behalf of the allocator (much like kswapd does).

Cheers
  Trond
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: linux-next: manual merge of the akpm tree with the nfs tree

2012-07-31 Thread Myklebust, Trond

On Tue, 2012-07-31 at 11:44 -0700, Andrew Morton wrote:
> On Tue, 31 Jul 2012 18:50:22 +0100
> Mel Gorman  wrote:
> 
> > Stephen Rothwell reported a merge conflict between a MM patch "nfs:
> > enable swap on NFS" and an NFS patch "nfs: skip commit in
> > releasepage if we're freeing memory for fs-related reasons".
> 
> grumble.  This happened becase new stuff was added to -next right in
> the middle of the merge window.  Please don't dothat.
> 
Normally, I would have added it when Jeff asked me to, but I've been on
vacation for 2 weeks, and this is a bugfix that needs to go into
stable...

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull second wave of NFS client changes for 3.6

2012-07-31 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f44106e2173f08ccb1c9195d85a6c22388b461c1:

  nfs: fix fl_type tests in NFSv4 code (2012-07-30 18:09:13 -0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.6-2

for you to fetch changes up to ad0fcd4eb68059de02e1766948263c71b8a5b1dc:

  nfs: explicitly reject LOCK_MAND flock() requests (2012-07-31 14:42:20 -0400)

Cheers,
  Trond


NFS client updates for Linux 3.6

Features include:
- Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
  separate modules.
- Fix Oopses in the NFSv4 idmapper
- Fix a deadlock whereby rpciod tries to allocate a new socket and
  ends up recursing into the NFS code due to memory reclaim.
- Increase the number of permitted callback connections.


Bryan Schumaker (10):
  NFS: Add version registering framework
  NFS: Remove the NFS v4 xdev mount function
  NFS: Create a try_mount rpc op
  NFS: Only initialize the ACL client in the v3 case
  NFS: Pass super operations and xattr handlers in the nfs_subversion
  NFS: Split out remaining NFS v4 inode functions
  NFS: Keep module parameters in the generic NFS client
  NFS: Convert v2 into a module
  NFS: Convert v3 into a module
  NFS: Convert v4 into a module

David Howells (1):
  NFS: Fix a number of bugs in the idmapper

Jeff Layton (3):
  sunrpc: clarify comments on rpc_make_runnable
  nfs: skip commit in releasepage if we're freeing memory for fs-related 
reasons
  nfs: explicitly reject LOCK_MAND flock() requests

NeilBrown (1):
  nfs: increase number of permitted callback connections.

Peng Tao (1):
  pnfsblock: bail out partial page IO

Stanislav Kinsbursky (1):
  SUNRPC: return negative value in case rpcbind client creation error

 fs/nfs/Kconfig   |   6 +-
 fs/nfs/Makefile  |  27 +++---
 fs/nfs/blocklayout/blocklayout.c |  39 +++-
 fs/nfs/callback.c|  28 +-
 fs/nfs/callback.h|   2 +-
 fs/nfs/client.c  | 191 +--
 fs/nfs/delegation.h  |   2 +-
 fs/nfs/dir.c |  20 +++-
 fs/nfs/direct.c  |   2 +-
 fs/nfs/dns_resolve.c |   4 +
 fs/nfs/file.c|  31 ++-
 fs/nfs/idmap.c   |  29 --
 fs/nfs/inode.c   | 105 ++---
 fs/nfs/internal.h|  38 +---
 fs/nfs/namespace.c   |  17 +---
 fs/nfs/netns.h   |   2 +-
 fs/nfs/nfs.h |  29 ++
 fs/nfs/nfs2super.c   |  31 +++
 fs/nfs/nfs3client.c  |  65 +
 fs/nfs/nfs3proc.c|   3 +
 fs/nfs/nfs3super.c   |  31 +++
 fs/nfs/nfs4_fs.h |  13 ++-
 fs/nfs/nfs4client.c  |  23 ++---
 fs/nfs/nfs4proc.c|   9 +-
 fs/nfs/nfs4super.c   | 106 --
 fs/nfs/nfs4xdr.c |   6 --
 fs/nfs/pagelist.c|   4 +
 fs/nfs/pnfs.c|   2 +
 fs/nfs/proc.c|   3 +
 fs/nfs/read.c|   5 +
 fs/nfs/super.c   | 172 ---
 fs/nfs/write.c   |  35 +++
 include/linux/nfs_fs.h   |   6 +-
 include/linux/nfs_fs_sb.h|   7 +-
 include/linux/nfs_idmap.h|   2 +-
 include/linux/nfs_xdr.h  |   9 +-
 net/sunrpc/rpcb_clnt.c   |   4 +-
 net/sunrpc/sched.c   |   7 +-
 net/sunrpc/xprtrdma/transport.c  |   3 +-
 net/sunrpc/xprtsock.c|  10 ++
 40 files changed, 725 insertions(+), 403 deletions(-)
 create mode 100644 fs/nfs/nfs.h
 create mode 100644 fs/nfs/nfs2super.c
 create mode 100644 fs/nfs/nfs3client.c
 create mode 100644 fs/nfs/nfs3super.c

[GIT PULL] Please pull one NFS client bugfix

2013-08-29 Thread Myklebust, Trond

Hi Linus,

The following changes since commit fa8218def1b1a16f0a410e2c1c767b4738cc81fa:

  Merge tag 'regmap-v3.11-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap (2013-08-27 
10:10:30 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-5

for you to fetch changes up to 347e2233b7667e336d9f671f1a52dfa3f0416e2c:

  SUNRPC: Fix memory corruption issue on 32-bit highmem systems (2013-08-28 
15:43:43 -0400)


NFS client bugfix for 3.11

- Stable patch to fix a highmem-related data corruption issue on 32-bit
  ARM platforms


Trond Myklebust (1):
  SUNRPC: Fix memory corruption issue on 32-bit highmem systems

 net/sunrpc/xdr.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v3 24/25] sunrpc: Change how dentry's d_lock field is accessed

2013-07-08 Thread Myklebust, Trond

On Thu, 2013-07-04 at 05:20 +0100, Al Viro wrote:
> On Wed, Jul 03, 2013 at 04:25:32PM -0400, Waiman Long wrote:
> > There is no change in logic and everything should just work.
> 
> > -   spin_lock(&file->f_path.dentry->d_lock);
> > +   d_lock(file->f_path.dentry);
> > if (!d_unhashed(file->f_path.dentry))
> > clnt = RPC_I(inode)->private;
> > if (clnt != NULL && atomic_inc_not_zero(&clnt->cl_count)) {
> > -   spin_unlock(&file->f_path.dentry->d_lock);
> > +   d_unlock(file->f_path.dentry);
> 
> Could somebody explain WTF is being protected here?  It's not ->private -
> that gets set (and, more importantly, cleared) without ->d_lock in sight.
> Trond, that seems to be your code from about three years ago (introduced
> in "SUNRPC: Fix a race in rpc_info_open").  What's going on there?

AFAICR we're using the fact that the dentry will remain hashed until
we're in the process of freeing up the rpc_client. By testing that the
dentry is hashed under the dentry->d_lock, we are assured that the
non-NULL 'clnt' is still pointing to a valid rpc_client, and that it is
safe to access clnt->cl_count.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates

2013-07-08 Thread Myklebust, Trond

Hi Linus,

The following changes since commit f722406faae2d073cc1d01063d1123c35425939e:

  Linux 3.10-rc1 (2013-05-11 17:14:08 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-1

for you to fetch changes up to 959d921f5eb8878ea16049a7f6e9bcbb6dfbcb88:

  Merge branch 'labeled-nfs' into linux-next (2013-06-28 16:29:51 -0400)



NFS client updates for Linux 3.11

Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
  add support for NFSv4.1 state protection.

Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation


Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been reviewed
and acked by James Morris.


Andy Adamson (6):
  NFSv4.1 Fix a pNFS session draining deadlock
  NFSv4.1 end back channel session draining
  NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
  NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
  NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
  NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs

Bryan Schumaker (4):
  NFS: Make callbacks minor version generic
  NFS: Add in v4.2 callback operation
  NFS: Apply v4.1 capabilities to v4.2
  NFS: Improve legacy idmapping fallback

Chuck Lever (3):
  NFS: Fix SETCLIENTID fallback if GSS is not available
  NFS: Fix security flavor negotiation with legacy binary mounts
  NFS: Set NFS_CS_MIGRATION for NFSv4 mounts

David Quigley (10):
  Security: Add hook to calculate context based on a negative dentry.
  Security: Add Hook to test if the particular xattr is part of a MAC model.
  LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
  SELinux: Add new labeling type native labels
  NFSv4: Add label recommended attribute and NFSv4 flags
  NFSv4: Extend fattr bitmaps to support all 3 words
  NFS:Add labels to client function prototypes
  NFS: Add label lifecycle management
  NFS: Client implementation of Labeled-NFS
  NFS: Extend NFS xattr handlers to accept the security namespace

Djalal Harouni (1):
  NFSv4: SETCLIENTID add the format string for the NETID

Jeff Layton (5):
  rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
  nfs: refactor "need_mount" code out of nfs_try_mount
  nfs: move server_authlist into nfs_try_mount_request
  nfs: have nfs_mount fake up a auth_flavs list when the server didn't 
provide it
  nfs: have NFSv3 try server-specified auth flavors in turn

Stanislav Kinsbursky (4):
  SUNRPC: fix races on PipeFS MOUNT notifications
  SUNRPC: fix races on PipeFS UMOUNT notifications
  SUNRPC: split client creation routine into setup and registration
  SUNRPC: PipeFS MOUNT notification optimization for dying clients

Steve Dickson (4):
  NFS: Add NFSv4.2 protocol constants
  NFSv4.2: Added NFS v4.2 support to the NFS client
  NFSv4: Introduce new label structure
  Kconfig: Add Kconfig entry for Labeled NFS V4 client

Trond Myklebust (26):
  SUNRPC: Fix a bug in gss_create_upcall
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Prevent an rpc_task wakeup race
  NFSv4: Fix a thinko in nfs4_try_open_cached
  NFSv4.1: Ensure that layoutget is called using the layout credential
  NFSv4.1: Ensure that layoutreturn uses the correct credential
  NFSv4.1: Ensure that reclaim_complete uses the right credential
  NFSv4.1: Ensure that test_stateid and free_stateid use correct credentials
  NFSv4.1: Use layout credentials for get_deviceinfo calls
  NFSv4.1: Enable state protection
  NFSv4.1: Simplify setting the layout header credential
  SUNRPC: Fix a potential race in rpc_execute
  SUNRPC: Remove unused function rpc_queue_empty
  SUNRPC: Remove the unused helpers task_for_each() and task_for_first()
  SUNRPC: Remove unused functions rpc_task_set/has_priority
  SUNRPC: Remove redundant call to rpc_set_running() in __rpc_execute()
  NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_open
  NFSv4: Cleanup: pass the nfs_open_context to nfs4_do_open
  NFSv4: Refactor _nfs4_open_and_get_state to set ctx->state
  NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code
  NFSv4: Close another NFSv4 recovery race
  NFSv4: Move the DNS resolver into the NFSv4 module

[GIT PULL] Please pull NFS client updates

2013-07-11 Thread Myklebust, Trond

Hi Linus,

The following pull request mainly contains some small readdir
optimisations that had dependencies on Al Viro's readdir rewrite. There
is also a fix for a nasty deadlock which surfaced earlier in this merge
window.

The following changes since commit a82a729f04232ccd0b59406574ba4cf20027a49d:

  Merge branch 'akpm' (updates from Andrew Morton) (2013-07-09 13:33:36 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-2

for you to fetch changes up to 245268c951262b861bc1be4e9dc812352499:

  SUNRPC: Fix a deadlock in rpc_client_register() (2013-07-10 15:58:55 -0400)


NFS client updates for Linux 3.11 (part 2)

Highlights include:
- Fix an_rpc pipefs regression that causes a deadlock on mount
- Readdir optimisations by Scott Mayhew and Jeff Layton
- clean up the rpc_pipefs dentry operation setup


Fengguang Wu (1):
  rpc_pipe: rpc_dir_inode_operations can be static

Jeff Layton (2):
  nfs: set verifier on existing dentries in nfs_prime_dcache
  rpc_pipe: set dentry operations at d_alloc time

Scott Mayhew (3):
  NFS: Make nfs_attribute_cache_expired() non-static
  NFS: Make nfs_readdir revalidate less often
  NFS: Allow nfs_updatepage to extend a write under additional circumstances

Trond Myklebust (1):
  SUNRPC: Fix a deadlock in rpc_client_register()

 fs/nfs/dir.c   |  6 --
 fs/nfs/inode.c |  2 +-
 fs/nfs/write.c | 31 +++
 include/linux/nfs_fs.h |  1 +
 net/sunrpc/clnt.c  | 16 +---
 net/sunrpc/rpc_pipe.c  | 25 -
 6 files changed, 58 insertions(+), 23 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client updates for 3.12

2013-09-09 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 7c6d4dca777d6423cb9ccdc019cad94c75adcbe4:

  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha (2013-07-23 
14:39:57 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-1

for you to fetch changes up to b1b3e136948a2bf4915326acb0d825d7d180753f:

  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 
-0400)


NFS client updates for Linux 3.12

Highlights include:

- Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
  lease loss due to a network partition, where doing so may result in data
  corruption. Add a kernel parameter to control choice of legacy behaviour
  or not.
- Performance improvements when 2 processes are writing to the same file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
  NFS clients from being able to manipulate our lease and file lockingr
  state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients
- Fix the broken NFSv4 security auto-negotiation between client and server
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration support.


Andy Adamson (10):
  NFSv4.1 Use the mount point rpc_clnt for layoutreturn
  NFS Remove unused authflavour parameter from init_client
  NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE
  NFSv4.1 Use clientid management rpc_clnt for secinfo
  NFSv4.1 Use clientid management rpc_clnt for secinfo_no_name
  SUNRPC: don't map EKEYEXPIRED to EACCES in call_refreshresult
  SUNRPC new rpc_credops to test credential expiry
  NFS avoid expired credential keys for buffered writes
  SUNRPC refactor rpcauth_checkverf error returns
  NFSv4.1 Use MDS auth flavor for data server connection

Chuck Lever (20):
  NFS: Fix return type of nfs4_end_drain_session() stub
  NFS: Use root's credential for lease management when keytab is missing
  NFS: Never use user credentials for lease renewal
  NFS: When displaying session slot numbers, use "%u" consistently
  NFS: Rename nfs41_call_sync_data as a common data structure
  NFS: Clean up nfs4_setup_sequence()
  NFS: Common versions of sequence helper functions
  NFS: Add RPC callouts to start NFSv4.0 synchronous requests
  NFS: Remove unused call_sync minor version op
  NFS: Enable slot table helpers for NFSv4.0
  NFS: Add global helper to set up a stand-along nfs4_slot_table
  NFS: Add global helper for releasing slot table resources
  NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking
  NFS: NFSv4.0 transport blocking
  NFS: Enable nfs4_setup_sequence() for DELEGRETURN
  NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER
  NFS: Add nfs4_sequence calls for OPEN_CONFIRM
  NFS: Update session draining barriers for NFSv4.0 transport blocking
  When CONFIG_NFS_V4_1 is not enabled, "make C=2" emits this warning:
  NFS: Fix warning introduced by NFSv4.0 transport blocking patches

Jeff Layton (2):
  rpc_pipe: convert back to simple_dir_inode_operations
  nfs: verify open flags before allowing an atomic open

Nadav Shemer (1):
  nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0

NeilBrown (2):
  NFS: remove incorrect "Lock reclaim failed!" warning.
  NFSv4: Don't try to recover NFSv4 locks when they are lost.

Trond Myklebust (63):
  NFSv4: encode_attrs should not backfill the bitmap and attribute length
  NFSv4: Fix nfs4_init_uniform_client_string for net namespaces
  NFSv4: Refuse mount attempts with proto=udp
  NFS: Remove the NFSv4 "open optimisation" from nfs_permission
  NFSv3: Deal with a sparse warning in nfs3_proc_create
  NFSv4: Deal with a sparse warning in nfs4_opendata_alloc
  NFSv4: Deal with some more sparse warnings
  NFSv4: Deal with a sparse warning in nfs_idmap_get_key()
  NFSv4: Fix an incorrect pointer declaration in 
decode_first_pnfs_layout_type
  NFS: Clean up nfs_sillyrename()
  NFS: refactor code for calculating the crc32 hash of a filehandle
  NFS: Add event tracing for generic NFS events
  NFS: Pass in lookup flags from nfs_atomic_open to nfs_lookup
  NFS: Add event tracing for generic NFS lookups
  NFS: Add tracepoints for debugging generic file create events
  NFS: Add tracepoints for debugging directory changes
  NFS: Add tracepoints for debugging NFS rename and

Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 15:24 +0200, Geert Uytterhoeven wrote:
> On Mon, Sep 9, 2013 at 6:57 PM, Linux Kernel Mailing List
>  wrote:
> > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> > index 5ec15bb..dc4b449 100644
> > --- a/net/sunrpc/auth_gss/auth_gss.c
> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> > @@ -51,6 +51,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "../netns.h"
> >
> > @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = 
> > GSS_RETRY_EXPIRED;
> >   * using integrity (two 4-byte integers): */
> >  #define GSS_VERF_SLACK 100
> >
> > +static DEFINE_HASHTABLE(gss_auth_hash_table, 16);
> > +static DEFINE_SPINLOCK(gss_auth_hash_lock);
> 
> Today's m68k/atari-defconfig kernel no longer boots, as it became larger than
> 4 MiB.
> 
> bloat-o-meter tells me:
> 
> function old new   delta
> gss_auth_hash_table-  262144 +262144
> 
> Woops...

Whoops indeed. The above should have declared 16 buckets, and not 1<<16.
I fell for Sasha's subtle trap...

> Are you trying to game Tim's survey? ;-)
> (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi)
> 
> Can this memory be allocated dynamically / only when it's used?

:-) It's declared inside a module, so that should already be the case,
however I'll send in a patch to change the above to the intended:

DEFINE_HASHTABLE(gss_auth_hash_table, 4);

Thanks Geert!

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] sunrpc: Add missing kuids conversion for printing

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 15:09 +0200, Geert Uytterhoeven wrote:
> m68k/allmodconfig:
> 
> net/sunrpc/auth_generic.c: In function ‘generic_key_timeout’:
> net/sunrpc/auth_generic.c:241: warning: format ‘%d’ expects type ‘int’, but
> argument 2 has type ‘kuid_t’
> 
> commit cdba321e291f0fbf5abda4d88340292b858e3d4d ("sunrpc: Convert kuids and
> kgids to uids and gids for printing") forgot to convert one instance.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---

Thanks! Applied...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[GIT PULL] Please pull NFS client changes (part 2)

2013-09-12 Thread Myklebust, Trond

Hi Linus,

The following changes since commit b1b3e136948a2bf4915326acb0d825d7d180753f:

  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity (2013-09-07 18:39:25 
-0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-2

for you to fetch changes up to 23c323af0375a7f63732bed0386aba5935b8de69:

  SUNRPC: No, I did not intend to create a 256KiB hashtable (2013-09-12 
10:16:31 -0400)


NFS client bugfixes:

- Fix a few credential reference leaks resulting from the SP4_MACH_CRED
  NFSv4.1 state protection code.
- Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the
  intended 64 byte structure.
- Fix a long standing XDR issue with FREE_STATEID
- Fix a potential WARN_ON spamming issue
- Fix a missing dprintk() kuid conversion

New features:
- Enable the NFSv4.1 state protection support for the WRITE and COMMIT
  operations.


Andy Adamson (1):
  NFSv4.1 fix decode_free_stateid

Geert Uytterhoeven (1):
  sunrpc: Add missing kuids conversion for printing

Trond Myklebust (1):
  SUNRPC: No, I did not intend to create a 256KiB hashtable

Weston Andros Adamson (4):
  NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT
  NFSv4.1: fix SECINFO* use of put_rpccred
  NFSv4.1: sp4_mach_cred: no need to ref count creds
  NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE

 fs/nfs/nfs4_fs.h   | 10 +-
 fs/nfs/nfs4proc.c  | 22 ++
 fs/nfs/nfs4xdr.c   | 17 ++---
 net/sunrpc/auth_generic.c  |  2 +-
 net/sunrpc/auth_gss/auth_gss.c |  2 +-
 5 files changed, 23 insertions(+), 30 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: Kernel size increase of +256 KiB (was: Re: RPCSEC_GSS: Share all credential caches on a per-transport basis)

2013-09-12 Thread Myklebust, Trond

On Thu, 2013-09-12 at 21:20 +0200, Geert Uytterhoeven wrote:
> On Thu, Sep 12, 2013 at 4:13 PM, Myklebust, Trond
>  wrote:
> >> > --- a/net/sunrpc/auth_gss/auth_gss.c
> >> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> >> > @@ -51,6 +51,7 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> >
> >> >  #include "../netns.h"
> >> >
> >> > @@ -71,6 +72,9 @@ static unsigned int gss_expired_cred_retry_delay = 
> >> > GSS_RETRY_EXPIRED;
> >> >   * using integrity (two 4-byte integers): */
> >> >  #define GSS_VERF_SLACK 100
> >> >
> >> > +static DEFINE_HASHTABLE(gss_auth_hash_table, 16);
> >> > +static DEFINE_SPINLOCK(gss_auth_hash_lock);
> >>
> >> Today's m68k/atari-defconfig kernel no longer boots, as it became larger 
> >> than
> >> 4 MiB.
> >>
> >> bloat-o-meter tells me:
> >>
> >> function old new   delta
> >> gss_auth_hash_table-  262144 +262144
> >>
> >> Woops...
> >
> > Whoops indeed. The above should have declared 16 buckets, and not 1<<16.
> > I fell for Sasha's subtle trap...
> >
> >> Are you trying to game Tim's survey? ;-)
> >> (question 13 at http://www.embeddedlinuxconference.com/cgi-bin/survey.cgi)
> >>
> >> Can this memory be allocated dynamically / only when it's used?
> >
> > :-) It's declared inside a module, so that should already be the case,
> 
> Only for the modular case. What about builtin, e.g. for nfsroot?
> 
> Or is it better to not build in NFS_V4 support in that case?
> 
> config NFS_V4
>   If unsure, say Y.
> 
> config NFSD_V4
>   If unsure, say N.
> 
> So that's why my defconfig has NFS_V4 but not NFSD_V4.

It should be possible now to compile in NFSv3 support (and/or NFSv2),
while keeping NFSv4 a module. That will usually result in
CONFIG_SUNRPC_GSS=m...

Of course, if your defconfig doesn't have module support then, yes, your
only option to avoid compiling in rpcsec_gss is to not select NFSv4 at
all.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

RE: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in

2013-09-26 Thread Myklebust, Trond

> -Original Message-
> From: J. Bruce Fields [mailto:bfie...@fieldses.org]
> Sent: Thursday, September 26, 2013 10:21 AM
> To: David Howells
> Cc: Myklebust, Trond; o...@lixom.net; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [RFC][PATCH 0/4] SunRPC/NFS: Use no_printk() in
> 
> On Thu, Sep 26, 2013 at 03:45:02PM +0100, David Howells wrote:
> >
> >
> > Here's a series of patches to make SunRPC/NFS use no_printk() to
> > implement its null dfprintk() macro (ie. when RPC_DEBUG is disabled).
> > This prevents 'unused variable' errors from occurring when a variable
> > is set only for use in debugging statements and renders RPC/NFS_IFDEBUG
> unnecessary.
> 
> Does this patch series fix any actual warnings?  Or does it just change the 
> way
> that we prevent the warnings?
> 

Right. If this is just code churn, then let's drop it. Otherwise, please 
explain why it is a good idea.

Cheers,
  Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and dfprintk()

2013-09-26 Thread Myklebust, Trond

> -Original Message-
> From: David Howells [mailto:dhowe...@redhat.com]
> Sent: Thursday, September 26, 2013 10:36 AM
> To: Joe Perches
> Cc: dhowe...@redhat.com; bfie...@fieldses.org; Myklebust, Trond;
> o...@lixom.net; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 3/4] SunRPC: Use no_printk() for the null dprintk() and
> dfprintk()
> 
> Joe Perches  wrote:
> 
> > no_printk doesn't prevent any argument side-effects from being
> > optimized away by the compiler.
> >
> > ie:
> > dprintk("%d", func())
> > func is now always called when before it wasn't.
> 
> Yes, I know.  There are half a dozen places where this is the case.  Those 
> I've
> wrapped in ifdebug(FACILITY) { ... } in the code.  It's not the nicest, but at
> least the compiler always gets to see everything, rather than bits of it 
> getting
> hidden by the preprocessor - which means the call points will be less likely 
> to
> bit rot over time.

Your assumption is that RPC_DEBUG is disabled for most compiles. That is not 
the case.

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [RFC] extending splice for copy offloading

2013-09-28 Thread Myklebust, Trond

> -Original Message-
> From: Miklos Szeredi [mailto:mik...@szeredi.hu]
> Sent: Saturday, September 28, 2013 12:50 AM
> To: Zach Brown
> Cc: J. Bruce Fields; Ric Wheeler; Anna Schumaker; Kernel Mailing List; Linux-
> Fsdevel; linux-...@vger.kernel.org; Myklebust, Trond; Schumaker, Bryan;
> Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
> Subject: Re: [RFC] extending splice for copy offloading
> 
> On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown  wrote:
> >> Also, I don't get the first option above at all.  The argument is
> >> that it's safer to have more copies?  How much safety does another
> >> copy on the same disk really give you?  Do systems that do dedup
> >> provide interfaces to turn it off per-file?
> 
> I don't see the safety argument very compelling either.  There are real
> semantic differences, however: ENOSPC on a write to a
> (apparentlíy) already allocated block.  That could be a bit unexpected.  Do we
> need a fallocate extension to deal with shared blocks?

The above has been the case for all enterprise storage arrays ever since the 
invention of snapshots. The NFSv4.2 spec does allow you to set a per-file 
attribute that causes the storage server to always preallocate enough buffers 
to guarantee that you can rewrite the entire file, however the fact that we've 
lived without it for said 20 years leads me to believe that demand for it is 
going to be limited. I haven't put it top of the list of features we care to 
implement...

Cheers,
   Trond

RE: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

> -Original Message-
> From: Ric Wheeler [mailto:rwhee...@redhat.com]
> Sent: Monday, September 30, 2013 10:29 AM
> To: Miklos Szeredi
> Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel
> Mailing List; Linux-Fsdevel; linux-...@vger.kernel.org; Schumaker, Bryan;
> Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
> Subject: Re: [RFC] extending splice for copy offloading
> 
> On 09/30/2013 10:24 AM, Miklos Szeredi wrote:
> > On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler 
> wrote:
> >> On 09/30/2013 10:51 AM, Miklos Szeredi wrote:
> >>> On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields
> >>> 
> >>> wrote:
> >>>>> My other worry is about interruptibility/restartability.  Ideas?
> >>>>>
> >>>>> What happens on splice(from, to, 4G) and it's a non-reflink copy?
> >>>>> Can the page cache copy be made restartable?   Or should splice() be
> >>>>> allowed to return a short count?  What happens on (non-reflink)
> >>>>> remote copies and huge request sizes?
> >>>> If I were writing an application that required copies to be
> >>>> restartable, I'd probably use the largest possible range in the
> >>>> reflink case but break the copy into smaller chunks in the splice case.
> >>>>
> >>> The app really doesn't want to care about that.  And it doesn't want
> >>> to care about restartability, etc..  It's something the *kernel* has
> >>> to care about.   You just can't have uninterruptible syscalls that
> >>> sleep for a "long" time, otherwise first you'll just have annoyed
> >>> users pressing ^C in vain; then, if the sleep is even longer,
> >>> warnings about task sleeping too long.
> >>>
> >>> One idea is letting splice() return a short count, and so the app
> >>> can safely issue SIZE_MAX requests and the kernel can decide if it
> >>> can copy the whole file in one go or if it wants to do it in smaller
> >>> chunks.
> >>>
> >> You cannot rely on a short count. That implies that an offloaded copy
> >> starts at byte 0 and the short count first bytes are all valid.
> > Huh?
> >
> > - app calls splice(from, 0, to, 0, SIZE_MAX)
> >   1) VFS calls ->direct_splice(from, 0,  to, 0, SIZE_MAX)
> >  1.a) fs reflinks the whole file in a jiffy and returns the size of the 
> > file
> >  1 b) fs does copy offload of, say, 64MB and returns 64M
> >   2) VFS does page copy of, say, 1MB and returns 1MB
> > - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset
> > ...
> >
> > The point is: the app is always doing the same (incrementing offset
> > with the return value from splice) and the kernel can decide what is
> > the best size it can service within a single uninterruptible syscall.
> >
> > Wouldn't that work?
> >
> > Thanks,
> > Miklos
> 
> No.
> 
> Keep in mind that the offload operation in (1) might fail partially. The 
> target
> file (the copy) is allocated, the question is what ranges have valid data.
> 
> I don't see that (2) is interesting or really needed to be done in the kernel.
> If nothing else, it tends to confuse the discussion
> 

Anna's figures, that were presented at Plumber's, show that (2) is still worth 
doing on the _server_ for the case of NFS.

Cheers
  Trond

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> It would be nice if there would be way if the file system would get a 
> hint that the target file is supposed to be copy of another file. That 
> way distributed file systems could also create the target-file with the 
> correct meta-information (same storage targets as in-file has).
> Well, if we cannot agree on that, file system with a custom protocol at 
> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not 
> sure if this would work for pNFS, though.

splice() does not create new files. What you appear to be asking for
lies way outside the scope of that system call interface.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >> It would be nice if there would be way if the file system would get a
> >> hint that the target file is supposed to be copy of another file. That
> >> way distributed file systems could also create the target-file with the
> >> correct meta-information (same storage targets as in-file has).
> >> Well, if we cannot agree on that, file system with a custom protocol at
> >> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >> sure if this would work for pNFS, though.
> >
> > splice() does not create new files. What you appear to be asking for
> > lies way outside the scope of that system call interface.
> >
> 
> Sorry I know, definitely outside the scope of splice, but in the context 
> of offloaded file copies. So the question is, what is the best way to 
> address/discuss that?

Why does it need to be addressed in the first place?

What is preventing an application from retrieving and setting this
information using standard libc functions such as fstat()+open(), and
supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
where appropriate?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
> On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> >> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> >>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >>>> It would be nice if there would be way if the file system would get a
> >>>> hint that the target file is supposed to be copy of another file. That
> >>>> way distributed file systems could also create the target-file with the
> >>>> correct meta-information (same storage targets as in-file has).
> >>>> Well, if we cannot agree on that, file system with a custom protocol at
> >>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >>>> sure if this would work for pNFS, though.
> >>>
> >>> splice() does not create new files. What you appear to be asking for
> >>> lies way outside the scope of that system call interface.
> >>>
> >>
> >> Sorry I know, definitely outside the scope of splice, but in the context
> >> of offloaded file copies. So the question is, what is the best way to
> >> address/discuss that?
> >
> > Why does it need to be addressed in the first place?
> 
> An offloaded copy is still not efficient if different storage 
> servers/targets used by from-file and to-file.

So? 

> >
> > What is preventing an application from retrieving and setting this
> > information using standard libc functions such as fstat()+open(), and
> > supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
> > where appropriate?
> >
> 
> At a minimum this requires network and metadata overhead. And while I'm 
> working on FhGFS now, I still wonder what other file system need to do - 
> for example Lustre pre-allocates storage-target files on creating a 
> file, so file layout changes mean even more overhead there.

The problem you are describing is limited to a narrow set of storage
architectures. If copy offload using splice() doesn't make sense for
those architectures, then don't implement it for them.
You might be able to provide ioctls() to do these special hinted file
creations for those filesystems that need it, but the vast majority
don't, and you shouldn't enforce it on them.

> Anyway, if we could agree on to use libattr or libacl to teach the file 
> system about the upcoming splice call I would be fine.

libattr and libacl are generic libraries that exist to manipulate xattrs
and acls. They do not need to contain Lustre-specific code.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote:
> On 09/30/2013 09:34 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
> >> On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
> >>> On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> >>>> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> >>>>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >>>>>> It would be nice if there would be way if the file system would get a
> >>>>>> hint that the target file is supposed to be copy of another file. That
> >>>>>> way distributed file systems could also create the target-file with the
> >>>>>> correct meta-information (same storage targets as in-file has).
> >>>>>> Well, if we cannot agree on that, file system with a custom protocol at
> >>>>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >>>>>> sure if this would work for pNFS, though.
> >>>>>
> >>>>> splice() does not create new files. What you appear to be asking for
> >>>>> lies way outside the scope of that system call interface.
> >>>>>
> >>>>
> >>>> Sorry I know, definitely outside the scope of splice, but in the context
> >>>> of offloaded file copies. So the question is, what is the best way to
> >>>> address/discuss that?
> >>>
> >>> Why does it need to be addressed in the first place?
> >>
> >> An offloaded copy is still not efficient if different storage
> >> servers/targets used by from-file and to-file.
> >
> > So?
> 
> mds1: orig-file
> oss1/target1: orig-chunk1
> 
> mds1: target-file
> ossN/targetN: target-chunk1
> 
> clientN: Performs the copy
> 
> Ideally, orig-chunk1 and target-chunk1 are on the same server and same 
> target. Copy offload then even could done from the underlying fs, 
> similiar as local splice.
> If different ossN servers are used copies still have to be done over 
> network by these storage servers, although the client only would need to 
> initiate the copy. Still faster, but also not ideal.
> 
> >
> >>>
> >>> What is preventing an application from retrieving and setting this
> >>> information using standard libc functions such as fstat()+open(), and
> >>> supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
> >>> where appropriate?
> >>>
> >>
> >> At a minimum this requires network and metadata overhead. And while I'm
> >> working on FhGFS now, I still wonder what other file system need to do -
> >> for example Lustre pre-allocates storage-target files on creating a
> >> file, so file layout changes mean even more overhead there.
> >
> > The problem you are describing is limited to a narrow set of storage
> > architectures. If copy offload using splice() doesn't make sense for
> > those architectures, then don't implement it for them.
> 
> But it _does_ make sense. The file system just needs a hint that a 
> splice copy is going to come up.

Just wait for the splice() system call. How is this any different from
write()?

> > You might be able to provide ioctls() to do these special hinted file
> > creations for those filesystems that need it, but the vast majority
> > don't, and you shouldn't enforce it on them.
> 
> And exactly for that we need a standard - it does not make sense if each 
> and every distributed file system implements its own 
> ioctl/libattr/libacl interface for that.
> 
> >
> >> Anyway, if we could agree on to use libattr or libacl to teach the file
> >> system about the upcoming splice call I would be fine.
> >
> > libattr and libacl are generic libraries that exist to manipulate xattrs
> > and acls. They do not need to contain Lustre-specific code.
> >
> 
> pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
> interface? And userspace needs to address all of them differently?
>
> I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, 
> didn't find a better name yet), which would take in-file-path and 
> out-file-path and allow the file system to create out-file-path with the 
> same meta-layout as in-file-path. And it would need some flags, such as 
> AUTO (file system decides if it makes sense to do a local copy) and 
> FORCE (always try a local copy).

splice() is not a whole-file copy operation; it's a byte range copy. How
does the above help other than in the whole-file case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [RFC] extending splice for copy offloading

2013-09-30 Thread Myklebust, Trond

On Mon, 2013-09-30 at 16:08 -0400, Ric Wheeler wrote:
> On 09/30/2013 04:00 PM, Bernd Schubert wrote:
> > pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
> > interface? And userspace needs to address all of them differently? 
> 
> The NFS and SCSI groups have each defined a standard which Zach's proposal 
> abstracts into a common user API.
> 
> Distributed file systems tend to be rather unique and do not have similar 
> standard bodies, but a lot of them could hide server specific implementations 
> under the current proposed interfaces.
> 
> What is not a good idea is to drag out the core, simple copy offload 
> discussion 
> for another 5 years to pull in every odd use case :)

Agreed. The whole idea of a common system call interface should be to
allow us to abstract away the underlying storage and filesystem
architectures. If filesystem developers also want a way to expose that
underlying architecture to applications in order to enable further
optimisations, then that belongs in a separate discussion.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client bugfixes

2013-09-30 Thread Myklebust, Trond


Hi Linus,

The following changes since commit 4a10c2ac2f368583138b774ca41fac4207911983:

  Linux 3.12-rc2 (2013-09-23 15:41:09 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-4

for you to fetch changes up to 367156d9a87b21b5232dd93107c5fc61b09ba2ef:

  NFS: Give "flavor" an initial value to fix a compile warning (2013-09-29 
16:03:34 -0400)


NFS client bugfixes for 3.12

- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root filesystem
- Fix a memory ordering issue in the pNFS files layout driver


Anna Schumaker (1):
  NFS: Give "flavor" an initial value to fix a compile warning

Trond Myklebust (3):
  NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem 
method
  NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
  NFSv4.1: Ensure memory ordering between nfs4_ds_connect and 
nfs4_fl_prepare_ds

Weston Andros Adamson (1):
  NFSv4.1: try SECINFO_NO_NAME flavs until one works

 fs/nfs/dir.c   |  2 +-
 fs/nfs/nfs4file.c  |  3 ++-
 fs/nfs/nfs4filelayoutdev.c | 20 +---
 fs/nfs/nfs4proc.c  | 58 +-
 include/linux/nfs_xdr.h|  3 ++-
 5 files changed, 63 insertions(+), 23 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[GIT PULL] Please pull an NFS client bugfix

2013-09-21 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f:

  Linux 3.12-rc1 (2013-09-16 16:17:51 -0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.12-3

for you to fetch changes up to a0f6ed8ebe4f6d494ef70f67d4c0c153cbf59577:

  RPCSEC_GSS: fix crash on destroying gss auth (2013-09-18 10:18:44 -0500)


NFS client bugfix for 3.12

- Fix a regression due to incorrect sharing of gss auth caches


J. Bruce Fields (1):
  RPCSEC_GSS: fix crash on destroying gss auth

 net/sunrpc/auth_gss/auth_gss.c | 11 +++
 1 file changed, 11 insertions(+)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v2] SUNRPC: check current nsproxy before set of node name on client creation

2012-09-07 Thread Myklebust, Trond

On Mon, 2012-08-13 at 08:10 -0400, Jeff Layton wrote:
> On Mon, 13 Aug 2012 15:37:31 +0400
> Stanislav Kinsbursky  wrote:
> 
> > v2:
> > 1) rpc_clnt_set_nodename() prototype updated.
> > 2) fixed errors in comment.
> > 
> > When child reaper exits, it can destroy mount namespace it belongs to, and 
> > if
> > there are NFS mounts inside, then it will try to umount them. But in this
> > point current->nsproxy is set to NULL and all namespaces will be destroyed 
> > one
> > by one. I.e. we can't dereference current->nsproxy to obtain uts namespace.
> > 
> > Signed-off-by: Stanislav Kinsbursky 
> > ---
> >  net/sunrpc/clnt.c |   16 +---
> >  1 files changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > index 9a9676e..8fbcbc8 100644
> > --- a/net/sunrpc/clnt.c
> > +++ b/net/sunrpc/clnt.c
> > @@ -277,8 +277,18 @@ void rpc_clients_notifier_unregister(void)
> > return rpc_pipefs_notifier_unregister(&rpc_clients_block);
> >  }
> >  
> > -static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char 
> > *nodename)
> > +static void rpc_clnt_set_nodename(struct rpc_clnt *clnt)
> >  {
> > +   const char *nodename;
> > +
> > +   /*
> > +* We have to protect against dying child reaper, which has released
> > +* its nsproxy already and is trying to destroy mount namespace.
> > +*/
> > +   if (current->nsproxy == NULL)
> > +   return;
> > +
> > +   nodename = utsname()->nodename;
> > clnt->cl_nodelen = strlen(nodename);
> > if (clnt->cl_nodelen > UNX_MAXNODENAME)
> > clnt->cl_nodelen = UNX_MAXNODENAME;
> > @@ -365,7 +375,7 @@ static struct rpc_clnt * rpc_new_client(const struct 
> > rpc_create_args *args, stru
> > }
> >  
> > /* save the nodename */
> > -   rpc_clnt_set_nodename(clnt, utsname()->nodename);
> > +   rpc_clnt_set_nodename(clnt);
> > rpc_register_client(clnt);
> > return clnt;
> >  
> > @@ -524,7 +534,7 @@ rpc_clone_client(struct rpc_clnt *clnt)
> > err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
> > if (err != 0)
> > goto out_no_path;
> > -   rpc_clnt_set_nodename(new, utsname()->nodename);
> > +   rpc_clnt_set_nodename(new);
> > if (new->cl_auth)
> > atomic_inc(&new->cl_auth->au_count);
> > atomic_inc(&clnt->cl_count);
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Acked-by: Jeff Layton 

OK, colour me confused (again). Why should a umount trigger an
rpc_create() or rpc_clone_client()?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v2] SUNRPC: check current nsproxy before set of node name on client creation

2012-09-08 Thread Myklebust, Trond

On Sat, 2012-09-08 at 08:59 +0300, Stanislav Kinsbursky wrote:
> 08.09.2012 01:32, Myklebust, Trond пишет:
> > On Mon, 2012-08-13 at 08:10 -0400, Jeff Layton wrote:
> >> On Mon, 13 Aug 2012 15:37:31 +0400
> >> Stanislav Kinsbursky  wrote:
> >>
> >>> v2:
> >>> 1) rpc_clnt_set_nodename() prototype updated.
> >>> 2) fixed errors in comment.
> >>>
> >>> When child reaper exits, it can destroy mount namespace it belongs to, 
> >>> and if
> >>> there are NFS mounts inside, then it will try to umount them. But in this
> >>> point current->nsproxy is set to NULL and all namespaces will be 
> >>> destroyed one
> >>> by one. I.e. we can't dereference current->nsproxy to obtain uts 
> >>> namespace.
> >>>
> >>> Signed-off-by: Stanislav Kinsbursky 
> >>> ---
> >>>   net/sunrpc/clnt.c |   16 +---
> >>>   1 files changed, 13 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> >>> index 9a9676e..8fbcbc8 100644
> >>> --- a/net/sunrpc/clnt.c
> >>> +++ b/net/sunrpc/clnt.c
> >>> @@ -277,8 +277,18 @@ void rpc_clients_notifier_unregister(void)
> >>>   return rpc_pipefs_notifier_unregister(&rpc_clients_block);
> >>>   }
> >>>   
> >>> -static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char 
> >>> *nodename)
> >>> +static void rpc_clnt_set_nodename(struct rpc_clnt *clnt)
> >>>   {
> >>> + const char *nodename;
> >>> +
> >>> + /*
> >>> +  * We have to protect against dying child reaper, which has released
> >>> +  * its nsproxy already and is trying to destroy mount namespace.
> >>> +  */
> >>> + if (current->nsproxy == NULL)
> >>> + return;
> >>> +
> >>> + nodename = utsname()->nodename;
> >>>   clnt->cl_nodelen = strlen(nodename);
> >>>   if (clnt->cl_nodelen > UNX_MAXNODENAME)
> >>>   clnt->cl_nodelen = UNX_MAXNODENAME;
> >>> @@ -365,7 +375,7 @@ static struct rpc_clnt * rpc_new_client(const struct 
> >>> rpc_create_args *args, stru
> >>>   }
> >>>   
> >>>   /* save the nodename */
> >>> - rpc_clnt_set_nodename(clnt, utsname()->nodename);
> >>> + rpc_clnt_set_nodename(clnt);
> >>>   rpc_register_client(clnt);
> >>>   return clnt;
> >>>   
> >>> @@ -524,7 +534,7 @@ rpc_clone_client(struct rpc_clnt *clnt)
> >>>   err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
> >>>   if (err != 0)
> >>>   goto out_no_path;
> >>> - rpc_clnt_set_nodename(new, utsname()->nodename);
> >>> + rpc_clnt_set_nodename(new);
> >>>   if (new->cl_auth)
> >>>   atomic_inc(&new->cl_auth->au_count);
> >>>   atomic_inc(&clnt->cl_count);
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>> the body of a message to majord...@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Acked-by: Jeff Layton 
> > OK, colour me confused (again).
> 
> What color?
> 
> > Why should a umount trigger an
> > rpc_create() or rpc_clone_client()?
> 
> It calls nsm_create().
> Here is the trace (https://bugzilla.redhat.com/show_bug.cgi?id=830862, 
> comment 68):

Right, but if we're using NFSv3 lock monitoring, we know in advance that
we're going to need an nsm call to localhost. Why can't we just cache
the one that we used to start lock monitoring in the first place?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v2] SUNRPC: check current nsproxy before set of node name on client creation

2012-09-10 Thread Myklebust, Trond

On Mon, 2012-09-10 at 12:43 +0400, Stanislav Kinsbursky wrote:
> 08.09.2012 18:33, Myklebust, Trond пишет:
> > On Sat, 2012-09-08 at 08:59 +0300, Stanislav Kinsbursky wrote:
> >> 08.09.2012 01:32, Myklebust, Trond пишет:
> >>> On Mon, 2012-08-13 at 08:10 -0400, Jeff Layton wrote:
> >>>> On Mon, 13 Aug 2012 15:37:31 +0400
> >>>> Stanislav Kinsbursky  wrote:
> >>>>
> >>>>> v2:
> >>>>> 1) rpc_clnt_set_nodename() prototype updated.
> >>>>> 2) fixed errors in comment.
> >>>>>
> >>>>> When child reaper exits, it can destroy mount namespace it belongs to, 
> >>>>> and if
> >>>>> there are NFS mounts inside, then it will try to umount them. But in 
> >>>>> this
> >>>>> point current->nsproxy is set to NULL and all namespaces will be 
> >>>>> destroyed one
> >>>>> by one. I.e. we can't dereference current->nsproxy to obtain uts 
> >>>>> namespace.
> >>>>>
> >>>>> Signed-off-by: Stanislav Kinsbursky 
> >>>>> ---
> >>>>>net/sunrpc/clnt.c |   16 +---
> >>>>>1 files changed, 13 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> >>>>> index 9a9676e..8fbcbc8 100644
> >>>>> --- a/net/sunrpc/clnt.c
> >>>>> +++ b/net/sunrpc/clnt.c
> >>>>> @@ -277,8 +277,18 @@ void rpc_clients_notifier_unregister(void)
> >>>>> return rpc_pipefs_notifier_unregister(&rpc_clients_block);
> >>>>>}
> >>>>>
> >>>>> -static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char 
> >>>>> *nodename)
> >>>>> +static void rpc_clnt_set_nodename(struct rpc_clnt *clnt)
> >>>>>{
> >>>>> +   const char *nodename;
> >>>>> +
> >>>>> +   /*
> >>>>> +* We have to protect against dying child reaper, which has 
> >>>>> released
> >>>>> +* its nsproxy already and is trying to destroy mount namespace.
> >>>>> +*/
> >>>>> +   if (current->nsproxy == NULL)
> >>>>> +   return;
> >>>>> +
> >>>>> +   nodename = utsname()->nodename;
> >>>>> clnt->cl_nodelen = strlen(nodename);
> >>>>> if (clnt->cl_nodelen > UNX_MAXNODENAME)
> >>>>> clnt->cl_nodelen = UNX_MAXNODENAME;
> >>>>> @@ -365,7 +375,7 @@ static struct rpc_clnt * rpc_new_client(const 
> >>>>> struct rpc_create_args *args, stru
> >>>>> }
> >>>>>
> >>>>> /* save the nodename */
> >>>>> -   rpc_clnt_set_nodename(clnt, utsname()->nodename);
> >>>>> +   rpc_clnt_set_nodename(clnt);
> >>>>> rpc_register_client(clnt);
> >>>>> return clnt;
> >>>>>
> >>>>> @@ -524,7 +534,7 @@ rpc_clone_client(struct rpc_clnt *clnt)
> >>>>> err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
> >>>>> if (err != 0)
> >>>>> goto out_no_path;
> >>>>> -   rpc_clnt_set_nodename(new, utsname()->nodename);
> >>>>> +   rpc_clnt_set_nodename(new);
> >>>>> if (new->cl_auth)
> >>>>> atomic_inc(&new->cl_auth->au_count);
> >>>>> atomic_inc(&clnt->cl_count);
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>>>> the body of a message to majord...@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>> Acked-by: Jeff Layton 
> >>> OK, colour me confused (again).
> >>
> >> What color?
> >>
> >>> Why should a umount trigger an
> >>> rpc_create() or rpc_clone_client()?
> >>
> >> It calls nsm_create().
> >> Here is the trace (https://bugzilla.redhat.com/show_bug.cgi?id=830862,
> >> comment 68):
> >
> > Right, but if we're using NFSv3 lock monitoring, we know in advance that
> > we're going to need an nsm call to localhost. Why can't we just cache
> > the one that we used to start lock monitoring in the first place?
> >
> 
> Do you suggest to cache the call or the client for the call?

Hi Stanislav,

Sorry, I agree that the above was unclear. My intention was to suggest
that we should cache a reference to the rpc client that we used to
connect to rpc.statd when initiating lock monitoring.

Basically, I'm suggesting that we should do something similar to the
rpcbind rpc_client caching scheme in net/sunrpc/rpcb_clnt.c.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v2] SUNRPC: check current nsproxy before set of node name on client creation

2012-09-10 Thread Myklebust, Trond

On Mon, 2012-09-10 at 19:37 +0400, Stanislav Kinsbursky wrote:
> Hi, Trond.
> So, if I understand you right, we can create rpc client (or increase usage 
> counter) on NSMPROC_MON call and destroy (or decrease usage counter) on 
> NSMPROC_UNMON call.
> Will this solution works?

The rpc client(s) will need to be per-net-namespace, which complicates
matters a little bit, but yes, creation at NSMPROC_MON, and destruction
at NSMPROC_UNMON should work.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [GIT PULL] Disintegrate UAPI for nfs

2012-10-09 Thread Myklebust, Trond

On Tue, 2012-10-09 at 14:30 +0100, David Howells wrote:
> Can you merge the following branch into the nfs tree please.
> 
> This is to complete part of the Userspace API (UAPI) disintegration for which
> the preparatory patches were pulled recently.  After these patches, userspace
> headers will be segregated into:
> 
>   include/uapi/linux/.../foo.h
> 
> for the userspace interface stuff, and:
> 
>   include/linux/.../foo.h
> 
> for the strictly kernel internal stuff.
> 
> ---
> The following changes since commit 9e2d8656f5e8aa214e66b462680cf86b210b74a8:
> 
>   Merge branch 'akpm' (Andrew's patch-bomb) (2012-10-09 16:23:15 +0900)
> 
> are available in the git repository at:
> 
> 
>   git://git.infradead.org/users/dhowells/linux-headers.git 
> tags/disintegrate-nfs-20121009
> 
> for you to fetch changes up to e3dd9a52cb5552c46c2a4ca7ccdfb4dab5c72457:
> 
>   UAPI: (Scripted) Disintegrate include/linux/sunrpc (2012-10-09 09:49:04 
> +0100)
> 
> 
> UAPI Disintegration 2012-10-09
> 
> 
> David Howells (2):
>   UAPI: (Scripted) Disintegrate include/linux/nfsd
>   UAPI: (Scripted) Disintegrate include/linux/sunrpc
> 
>  include/linux/nfsd/Kbuild   |   5 --
>  include/linux/nfsd/debug.h  |  31 +
>  include/linux/nfsd/export.h |  52 +--
>  include/linux/nfsd/nfsfh.h  | 111 +---
>  include/linux/nfsd/stats.h  |   8 +--
>  include/linux/sunrpc/Kbuild |   1 -
>  include/linux/sunrpc/debug.h|  39 +---
>  include/uapi/linux/nfsd/Kbuild  |   5 ++
>  include/{ => uapi}/linux/nfsd/cld.h |   0
>  include/uapi/linux/nfsd/debug.h |  40 
>  include/uapi/linux/nfsd/export.h|  58 +
>  include/uapi/linux/nfsd/nfsfh.h | 122 
> 
>  include/uapi/linux/nfsd/stats.h |  17 +
>  include/uapi/linux/sunrpc/Kbuild|   1 +
>  include/uapi/linux/sunrpc/debug.h   |  48 ++
>  15 files changed, 296 insertions(+), 242 deletions(-)
>  rename include/{ => uapi}/linux/nfsd/cld.h (100%)
>  create mode 100644 include/uapi/linux/nfsd/debug.h
>  create mode 100644 include/uapi/linux/nfsd/export.h
>  create mode 100644 include/uapi/linux/nfsd/nfsfh.h
>  create mode 100644 include/uapi/linux/nfsd/stats.h
>  create mode 100644 include/uapi/linux/sunrpc/debug.h

These files look like they are mainly NFS server related, so perhaps it
is better to merge through Bruce's tree.

Bruce?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-10-09 Thread Myklebust, Trond

On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote:
> Cc'ing Eric since I seem to recall he suggested doing it this way?
> 
> Seems OK to me, but maybe that swap_root should be in common code?  (Or
> maybe we could use set_fs_root()?)
> 
> I'm assuming it's up to Trond to take this.--b.

I'm reluctant to do that at this time since the original proposal was
precisely that of export set_fs_root() and using it around the AF_LOCAL
socket bind. That proposal was NACKed by Al Viro.

If Al is OK with the idea of us creating a private version of
set_fs_root, then I'd like to see an official Acked-by: that we can
append to this commit.

Cheers
  Trond

> On Mon, Oct 08, 2012 at 02:56:32PM +0400, Stanislav Kinsbursky wrote:
> > Today, there is a problem in connecting of local SUNRPC thansports. These
> > transports uses UNIX sockets and connection itself is done by rpciod
> > workqueue.
> > But all local transports are connecting in rpciod context. I.e. UNIX sockets
> > lookup is done in context of process file system root.
> > This works nice until we will try to mount NFS from process with other root 
> > -
> > for example in container. This container can have it's own (nested) root and
> > rcpbind process, listening on it's own unix sockets. But NFS mount attempt 
> > in
> > this container will register new service (Lockd for example) in global 
> > rpcbind
> > - not containers's one.
> > This patch solves the problem by switching rpciod kernel thread's file 
> > system
> > root to the right one (stored on transport) while connecting of local
> > transports.
> > 
> > Signed-off-by: Stanislav Kinsbursky 
> > ---
> >  net/sunrpc/xprtsock.c |   52 
> > +++--
> >  1 files changed, 50 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index dfb..ecbced1 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -37,6 +37,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #ifdef CONFIG_SUNRPC_BACKCHANNEL
> >  #include 
> >  #endif
> > @@ -255,6 +256,11 @@ struct sock_xprt {
> > void(*old_state_change)(struct sock *);
> > void(*old_write_space)(struct sock *);
> > void(*old_error_report)(struct sock *);
> > +
> > +   /*
> > +* Saved transport creator root. Required for local transports only.
> > +*/
> > +   struct path root;
> >  };
> >  
> >  /*
> > @@ -1876,6 +1882,30 @@ static int xs_local_finish_connecting(struct 
> > rpc_xprt *xprt,
> > return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
> >  }
> >  
> > +/*
> > + * __xs_local_swap_root - swap current root to tranport's root helper.
> > + * @new_root - path to set to current fs->root.
> > + * @old_root - optinal paoinet to save current root to
> > + *
> > + * This routine is requieed to connecting unix sockets to absolute path in
> > + * proper root environment.
> > + * Note: no path_get() will be called. I.e. caller have to hold reference 
> > to
> > + * new_root.
> > + */
> > +static void __xs_local_swap_root(struct path *new_root, struct path 
> > *old_root)
> > +{
> > +   struct fs_struct *fs = current->fs;
> > +
> > +   spin_lock(&fs->lock);
> > +   write_seqcount_begin(&fs->seq);
> > +   if (old_root)
> > +   *old_root = fs->root;
> > +   fs->root = *new_root;
> > +   write_seqcount_end(&fs->seq);
> > +   spin_unlock(&fs->lock);
> > +}
> > +
> > +
> >  /**
> >   * xs_local_setup_socket - create AF_LOCAL socket, connect to a local 
> > endpoint
> >   * @xprt: RPC transport to connect
> > @@ -1891,6 +1921,7 @@ static void xs_local_setup_socket(struct work_struct 
> > *work)
> > struct rpc_xprt *xprt = &transport->xprt;
> > struct socket *sock;
> > int status = -EIO;
> > +   struct path root;
> >  
> > current->flags |= PF_FSTRANS;
> >  
> > @@ -1907,7 +1938,10 @@ static void xs_local_setup_socket(struct work_struct 
> > *work)
> > dprintk("RPC:   worker connecting xprt %p via AF_LOCAL to %s\n",
> > xprt, xprt->address_strings[RPC_DISPLAY_ADDR]);
> >  
> > +   __xs_local_swap_root(&transport->root, &root);
> > status = xs_local_finish_connecting(xprt, sock);
> > +   __xs_local_swap_root(&root, NULL);
> > +
> > switch (status) {
> > case 0:
> > dprintk("RPC:   xprt %p connected to %s\n",
> > @@ -2256,6 +2290,18 @@ static void xs_connect(struct rpc_task *task)
> > }
> >  }
> >  
> > +static void xs_local_destroy(struct rpc_xprt *xprt)
> > +{
> > +   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
> > xprt);
> > +   struct path root = transport->root;
> > +
> > +   dprintk("RPC:   xs_local_destroy xprt %p\n", xprt);
> > +
> > +   xs_destroy(xprt);
> > +
> > +   path_put(&root);
> > +}
> > +
> >  /**
> >   * xs_local_print_stats - display AF_LOCAL socket-specifc stats
> >   * @xprt: rpc_xprt struct containing statis

[GIT PULL] Please pull NFS client updates for 3.7

2012-10-10 Thread Myklebust, Trond

Hi Linus,

The following changes since commit c46de2263f42fb4bbde411b9126f471e9343cb22:

  Merge branch 'for-linus' of git://git.kernel.dk/linux-block (2012-09-19 
11:04:34 -0700)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.7-1

for you to fetch changes up to af283885b70248268617955a5ea5476647bd556b:

  pnfsblock: cleanup nfs4_blkdev_get (2012-10-08 19:32:40 -0400)


NFS client updates for Linux 3.7

Features include:

- Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
  Aside from the issues discussed at the LKS, distros are shipping
  NFSv4.1 with all the trimmings.
- Fix fdatasync()/fsync() for the corner case of a server reboot.
- NFSv4 OPEN access fix: finally distinguish correctly between
  open-for-read and open-for-execute permissions in all situations.
- Ensure that the TCP socket is closed when we're in CLOSE_WAIT
- More idmapper bugfixes
- Lots of pNFS bugfixes and cleanups to remove unnecessary state and
  make the code easier to read.
- In cases where a pNFS read or write fails, allow the client to
  resume trying layoutgets after two minutes of read/write-through-mds.
- More net namespace fixes to the NFSv4 callback code.
- More net namespace fixes to the NFSv3 locking code.
- More NFSv4 migration preparatory patches.
  Including patches to detect network trunking in both NFSv4 and NFSv4.1
- pNFS block updates to optimise LAYOUTGET calls.


Andy Adamson (3):
  NFSv4.0 reclaim reboot state when re-establishing clientid
  NFSv4 reduce attribute requests for open reclaim
  NFSv4 set open access operation call flag in nfs4_init_opendata_res

Bryan Schumaker (5):
  SUNRPC: Set alloc_slot for backchannel tcp ops
  NFS: Always use the open stateid when checking for expired opens
  NFS: Remove bad delegations during open recovery
  NFS: Use kzalloc() instead of kmalloc() in the idmapper
  NFS: Set key construction data for the legacy upcall

Chuck Lever (11):
  NFS: nfs_parsed_mount_options can use unsigned int
  NFS: Slow down state manager after an unhandled error
  SUNRPC: Clean up dprintk messages in rpc_pipe.c
  SUNRPC: Use __func__ in dprintk() in auth_gss.c
  SUNRPC: Refactor rpc_clone_client()
  SUNRPC: Introduce rpc_clone_client_set_auth()
  NFS: Introduce "migration" mount option
  NFS: Use the same nfs_client_id4 for every server
  NFS: Discover NFSv4 server trunking when mounting
  NFS: Add nfs4_unique_id boot parameter
  NFS: nfs41_walk_client_list(): re-lock before iterating

Daniel Walter (1):
  nfs: replace strict_strto* with kstrto*

NeilBrown (1):
  NFS4: avoid underflow when converting error to pointer.

Peng Tao (10):
  NFSv41: fix DIO write_io calculation
  NFS41: fix error of setting blocklayoutdriver
  Revert "pnfsblock: bail out partial page IO"
  pnfsblock: fix partial page buffer wirte
  pnfsblock: fix non-aligned DIO read
  pnfsblock: fix non-aligned DIO write
  NFS: track direct IO left bytes
  NFS41: send real write size in layoutget
  NFS41: send real read size in layoutget
  pnfsblock: cleanup nfs4_blkdev_get

Stanislav Kinsbursky (18):
  NFS: pass net to nfs_callback_down()
  NFS: callback service creation function introduced
  NFS: move per-net callback thread initialization to nfs_callback_up_net()
  NFS: callback up - transport backchannel cleanup
  NFS: callback service start function introduced
  NFS: callback up - users counting cleanup
  NFS: make nfs_callback_tcpport per network context
  NFS: make nfs_callback_tcpport6 per network context
  NFS: callback per-net usage counting introduced
  NFS: add debug messages to callback down function
  lockd: per-net NSM client creation and destruction helpers introduced
  lockd: use rpc client's cl_nodename for id encoding
  lockd: create and use per-net NSM RPC clients on MON/UNMON requests
  nfs: include NFSv4 header in netns.h
  nfs: declare nfs_callback_tcp_port in header
  nfs: declare nfs_xdev_mount as static
  nfs: include nfs4_fh.h in nfs4sysctl.c
  nfs: include internal.h in getroot.h

Trond Myklebust (58):
  SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT
  SUNRPC: Fix the return value of xdr_align_pages()
  NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return value
  NFSv4: Remove BUG_ON() and ACCESS_ONCE() calls in the idmapper
  NFSv4: Clean up the legacy idmapper upcall
  NFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall data
  SUNRPC: Optimise away unnecessary data moves in xdr_align_pages
  NFS: Convert nfs_get_lock_context to return an ERR_PTR on failure
  NFS: Clean up helper function nfs4_select_rw_stateid()
  NFSv4: Convert the

RE: [PATCH 041/270] pnfsblock: fix partial page buffer wirte

2012-11-30 Thread Myklebust, Trond

> -Original Message-
> From: Greg Kroah-Hartman [mailto:gre...@linuxfoundation.org]
> Sent: Thursday, November 29, 2012 8:46 PM
> To: Ben Hutchings
> Cc: Peng Tao; Myklebust, Trond; linux-kernel@vger.kernel.org;
> sta...@vger.kernel.org; kernel-t...@lists.ubuntu.com; Herton Ronaldo
> Krzesinski
> Subject: Re: [PATCH 041/270] pnfsblock: fix partial page buffer wirte
> 
> On Tue, Nov 27, 2012 at 02:33:24AM +, Ben Hutchings wrote:
> > On Mon, 2012-11-26 at 14:55 -0200, Herton Ronaldo Krzesinski wrote:
> > > 3.5.7u1 -stable review patch.  If anyone has any objections, please let me
> know.
> > >
> > > --
> > >
> > > From: Peng Tao 
> > >
> > > commit fe6e1e8d9fad86873eb74a26e80a8f91f9e870b5 upstream.
> > >
> > > If applications use flock to protect its write range, generic NFS
> > > will not do read-modify-write cycle at page cache level. Therefore
> > > LD should know how to handle non-sector aligned writes. Otherwise
> > > there will be data corruption.
> > >
> > > Signed-off-by: Peng Tao 
> > > Signed-off-by: Trond Myklebust 
> > > Signed-off-by: Herton Ronaldo Krzesinski
> > > 
> > [...]
> >
> > I notice that this fix is missing from 3.4, and will need backporting.
> 
> I don't trust myself with backporting this, as I got it wrong.  So if one of 
> the
> NFS developers wants to do this (same goes for the other NFS patch), I'll
> gladly take it.
> 

Tao would be the right person to do this since he has access to pNFS blocks 
hardware and can test the results.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-30 Thread Myklebust, Trond

On Fri, 2012-11-30 at 02:00 +, Al Viro wrote:
> On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:
> > >   Very interesting.  Do you have anything mounted on the corresponding
> > > directories on server?  The picture looks like you are getting empty
> > > fhandles in readdir+ respons for exactly the same directories that happen
> > > to be mountpoints on client.  In any case, we shouldn't do that blind
> > > d_drop() - empty fhandles can happen.  The only remaining question is
> > > why do they happen on that set of entries.  From my reading of
> > > encode_entryplus_baggage() it looks like we have compose_entry_fh()
> > > failing for those entries and those entries alone.  One possible cause
> > > would be d_mountpoint(dchild) being true on server.  If it is true, we
> > > can declare the case closed; if not, I really wonder what's going on.
> > 
> > Those directories do have the server's own copies of the said directories 
> > bind mounted at the moment in a separate mount namespace.
> > 
> > Unmounting those directories on the server does appear to stop the WARN_ON 
> > from triggering.
> 
> OK, that settles it.  WARN_ON() and printks in the area can be dropped;
> the right fix is below.  However, there's a similar place in cifs that
> also needs to be dealt with and I really, really wonder why the hell do
> we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
> bug, but I would like to understand what's wrong with simply returning
> 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care
> of unhashing, etc. itself.  Would make have_submounts() in there pointless
> as well - we could just return 0 and let d_invalidate() take care of the
> checks...  Trond?

The reason for the choice of d_drop over d_invalidate() is the d_count
checks. It really doesn't matter whether or not the client thinks it has
users for a directory if the server is telling you that it is ESTALE. So
we force a d_drop to prevent further lookups from finding it.

IOW: It is there in order to fix the case where the user does
'rmdir("foo"); mkdir("foo")' on the server.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

RE: [PATCH 20/22] nfs: add support for read_iter, write_iter

2012-10-22 Thread Myklebust, Trond

> -Original Message-
> From: Dave Kleikamp [mailto:dave.kleik...@oracle.com]
> Sent: Monday, October 22, 2012 11:15 AM
> To: linux-fsde...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; Zach Brown; Maxim V. Patlasov; Dave
> Kleikamp; Myklebust, Trond; linux-...@vger.kernel.org
> Subject: [PATCH 20/22] nfs: add support for read_iter, write_iter
> 
> This patch implements the read_iter and write_iter file operations which
> allow kernel code to initiate directIO. This allows the loop device to read 
> and
> write directly to the server, bypassing the page cache.
> 
> Signed-off-by: Dave Kleikamp 
> Cc: Zach Brown 
> Cc: Trond Myklebust 
> Cc: linux-...@vger.kernel.org
> ---
>  fs/nfs/direct.c| 169 +---
> -
>  fs/nfs/file.c  |  48 ++
>  fs/nfs/internal.h  |   2 +
>  fs/nfs/nfs4file.c  |   2 +
>  include/linux/nfs_fs.h |   6 +-
>  5 files changed, 155 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 4532781..b1fda1c 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -90,6 +90,7 @@ struct nfs_direct_req {
>   int flags;
>  #define NFS_ODIRECT_DO_COMMIT(1) /* an unstable reply
> was received */
>  #define NFS_ODIRECT_RESCHED_WRITES   (2) /* write verification
> failed */
> +#define NFS_ODIRECT_MARK_DIRTY   (4) /* mark read pages
> dirty */
>   struct nfs_writeverfverf;   /* unstable write verifier */
>  };
> 
> @@ -131,15 +132,13 @@ ssize_t nfs_direct_IO(int rw, struct kiocb *iocb,
> struct iov_iter *iter,
> 
>   return -EINVAL;
>  #else
> - const struct iovec *iov = iov_iter_iovec(iter);
> -
>   VM_BUG_ON(iocb->ki_left != PAGE_SIZE);
>   VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE);
> 
>   if (rw == READ || rw == KERNEL_READ)
> - return nfs_file_direct_read(iocb, iov, iter->nr_segs, pos,
> + return nfs_file_direct_read(iocb, iter, pos,
>   rw == READ ? true : false);
> - return nfs_file_direct_write(iocb, iov, iter->nr_segs, pos,
> + return nfs_file_direct_write(iocb, iter, pos,
>   rw == WRITE ? true : false);
>  #endif /* CONFIG_NFS_SWAP */
>  }
> @@ -277,7 +276,8 @@ static void nfs_direct_read_completion(struct
> nfs_pgio_header *hdr)
>   hdr->good_bytes & ~PAGE_MASK,
>   PAGE_SIZE);
>   }
> - if (!PageCompound(page)) {
> + if ((dreq->flags & NFS_ODIRECT_MARK_DIRTY) &&
> + !PageCompound(page)) {
>   if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
>   if (bytes < hdr->good_bytes)
>   set_page_dirty(page);
> @@ -414,10 +414,9 @@ static ssize_t
> nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
>   return result < 0 ? (ssize_t) result : -EFAULT;  }
> 
> -static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
> -   const struct iovec *iov,
> -   unsigned long nr_segs,
> -   loff_t pos, bool uio)
> +static ssize_t nfs_direct_read_schedule(struct nfs_direct_req *dreq,
> + struct iov_iter *iter, loff_t pos,
> + bool uio)
>  {
>   struct nfs_pageio_descriptor desc;
>   ssize_t result = -EINVAL;
> @@ -429,16 +428,47 @@ static ssize_t
> nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
>   get_dreq(dreq);
>   desc.pg_dreq = dreq;
> 
> - for (seg = 0; seg < nr_segs; seg++) {
> - const struct iovec *vec = &iov[seg];
> - result = nfs_direct_read_schedule_segment(&desc, vec,
> pos, uio);
> - if (result < 0)
> - break;
> - requested_bytes += result;
> - if ((size_t)result < vec->iov_len)
> - break;
> - pos += vec->iov_len;
> - }
> + if (iov_iter_has_iovec(iter)) {
> + const struct iovec *iov = iov_iter_iovec(iter);
> + if (uio)
> + dreq->flags = NFS_ODIRECT_MARK_DIRTY;
> + for (seg = 0; seg < iter->nr_segs; seg++) {
> + const struct iovec *vec = &iov[seg];
> + result = nfs_direct_read_schedule_segment(&desc,
>

Re: [ 02/37] lockd: use rpc clients cl_nodename for id encoding

2012-10-22 Thread Myklebust, Trond

On Sun, 2012-10-21 at 09:26 -0700, Greg Kroah-Hartman wrote:
> On Sat, Oct 20, 2012 at 12:15:18AM +0100, Ben Hutchings wrote:
> > On Thu, Oct 18, 2012 at 08:16:25PM -0700, Greg Kroah-Hartman wrote:
> > > 3.0-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > > 
> > > --
> > > 
> > > From: Stanislav Kinsbursky 
> > > 
> > > commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream.
> > > 
> > > Taking hostname from uts namespace if not safe, because this cuold be
> > > performind during umount operation on child reaper death. And in this case
> > > current->nsproxy is NULL already.
> >  
> > In this case (3.0.y) you haven't included the following change
> > (commit cb7323fffa85 'lockd: create and use per-net NSM RPC clients on
> > MON/UNMON requests') that makes lockd actually use cl_nodename.  I
> > think this patch alone won't fix the bug, as nsm_args::nodename can
> > end up pointing to freed memory.
> > 
> > (I also wonder whether clients should really be per-net or per UTS
> > namespace, and whether those should be orthogonal namespaces at all.)
> 
> Hm, Trond, should I also include the other commit above in the next
> 3.0-stable release?
> 
> Or should this one be dropped?

Hi Greg,

Applying this patch shouldn't be harmful, but since it isn't actually
fixing a problem (there being no net-namespace code in Linux-3.0), I'd
suggest just dropping it.

Thanks!
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client bugfixes...

2012-10-22 Thread Myklebust, Trond

Hi Linus,

The following changes since commit ddffeb8c4d0331609ef2581d84de4d763607bd37:

  Linux 3.7-rc1 (2012-10-14 14:41:04 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.7-2

for you to fetch changes up to e9b7e91745fa9df94900c8ab08e633f336686098:

  NFSv4: Fix the return value for nfs_callback_start_svc (2012-10-16 13:14:42 
-0400)


NFS client bugfixes for Linux 3.7

- Do not call pnfs_return_layout() from an rpciod context
- nfs4_ds_disconnect can cause Oopses. Kill it...
- Fix the return value for nfs_callback_start_svc
- Fix a number of compile warnings


Trond Myklebust (6):
  NFSv4.1: Kill nfs4_ds_disconnect()
  NFSv4.1: Do not call pnfs_return_layout() from an rpciod context
  NFSv4.1: Use kcalloc() to allocate zeroed arrays instead of kzalloc()
  NFSv4: fs/nfs/nfs4getroot.c needs to include "internal.h"
  NFSv4.1: Declare osd_pri_2_pnfs_err(), objio_init_read/write to be static
  NFSv4: Fix the return value for nfs_callback_start_svc

 fs/nfs/callback.c|  2 +-
 fs/nfs/nfs4filelayout.c  | 21 -
 fs/nfs/nfs4filelayout.h  |  1 -
 fs/nfs/nfs4filelayoutdev.c   | 22 --
 fs/nfs/nfs4getroot.c |  1 +
 fs/nfs/objlayout/objio_osd.c |  6 +++---
 fs/nfs/pnfs.h|  1 +
 7 files changed, 22 insertions(+), 32 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

RE: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs

2012-12-07 Thread Myklebust, Trond

> -Original Message-
> From: linux-nfs-ow...@vger.kernel.org [mailto:linux-nfs-
> ow...@vger.kernel.org] On Behalf Of Pavel Shilovsky
> Sent: Friday, December 07, 2012 9:43 PM
> To: Christoph Hellwig
> Cc: linux-c...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> fsde...@vger.kernel.org; wine-de...@winehq.org; linux-
> n...@vger.kernel.org
> Subject: Re: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs
> 
> Christoph Hellwig писал 07.12.2012 20:16:
> > On Thu, Dec 06, 2012 at 10:26:28PM +0400, Pavel Shilovsky wrote:
> >> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags -
> >> this change can benefit cifs and nfs modules. While this change is ok
> >> for network filesystems, itsn't not targeted for local filesystems
> >> due security problems (e.g. when a user process can deny root to
> >> delete a file).
> >>
> >> Share flags are used by Windows applications and WINE have to deal
> >> with them too. While WINE can process open share flags itself on
> >> local filesystems, it can't do it if a file stored on a network share
> >> and is used by several clients. This patchset makes it possible for
> >> CIFS/SMB2.0/SMB3.0.
> >
> > I don't think introducing user visible flags that are only supported
> > on a single network filesystem is a good idea.
> 
> It can bring benefits for both CIFS and NFS filesystems - so, at least two 
> ones.
> 
> >
> > I'm not even sure adding these flags does make a lot of sense, but
> > assuming we'd actually want this (and I'd like some more detailed
> > explanation) I think we'd at least need to make sure that:
> >
> >  a) opening files with the new modes gives a proper error message if
> > not
> > supported
> 
> It makes us add such checks for all other filesystems, if I understand right, 
> -
> not a problem, I think.
> 
> >  b) there needs to be local support for them as well
> >  c) we need to think really hard when they should be supported, and
> > need
> > a good rational for it.  I can't see how we could do it
> > unconditionally for all users as that would introduce easy denial
> > of services attacks the way I understand the semantics (correct me
> > if I'm wrong).  So a mount option like you currently do probably
> > is
> > the least bad even if don't fell overly happy about that version.
> >
> > What is the reason your special wine use case can't simply use a
> > userspace cifs client?  Given that wine uses windows filesystem
> > semantics and cifs does as well tunnelling it through a Posix-like API
> > inbetween is never going to be perfect.
> 
> Ideally we should not make any difference between underlying filesystems
> in Wine: an application requests an open of the file and we issue this open
> with flags it passed. Since Wineserver can process share flags locally itself 
> (for
> one linux user), we only need to add this support for CIFS (that is actively
> used by Wine applications because of it's Windows nature). Bringing these
> flags for local filesystems can benefit Wine too: it will help in cases when
> Wine applications of different users on the same machine use the same file
> and can make all those things easier, of course.
> 
> The problem is the possibility of denial-of-service attacks here. We can try 
> to
> prevent them by:
> 1) specifying an extra security bit on the file that indicates that share 
> flags are
> accepted (like we have for mandatory locks now) and setting it for
> neccessary files only, or
> 2) adding a special mount option (but it it probably makes sense if we
> decided to add this support for CIFS and NFS only).

Why not just put it under the control of LSM? It seems to me that this doesn't 
so much want to be a per-mount switch but rather deserves to be a per-process 
MAC (i.e. is this running in a Wine sandbox or not)...

Cheers
   Trond

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-11-14 Thread Myklebust, Trond

On Wed, 2012-11-14 at 16:01 -0500, J. Bruce Fields wrote:
> On Mon, Nov 12, 2012 at 12:37:54PM +0400, Stanislav Kinsbursky wrote:
> > 07.11.2012 22:33, J. Bruce Fields пишет:
> > >On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote:
> > >>On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote:
> > >>>On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote:
> > So you're worried that a bug in the nfs code could modify the root and
> > then not restore it?
> > >>>
> > >>>At least the link you pointed to earlier never sets it back.
> > >>
> > >>This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687
> > >>
> > >>  +   get_fs_root(current->fs, &root);
> > >>  +   set_fs_root(current->fs, &transport->root);
> > >>  +
> > >>  status = xs_local_finish_connecting(xprt, sock);
> > >>  +
> > >>  +   set_fs_root(current->fs, &root);
> > >>  +   path_put(&root);
> > >>
> > >>>Instead
> > >>>of messing with it I'd rather have the sunrpc code use vfs_path_lookup
> > >>>and not care about current->fs->root at all.
> > >>
> > >>The annoyance is that the lookup happens somewhere lower down in the
> > >>networking code (net/unix/af_unix.c:unix_find_other, I think).  So we'd
> > >>need some new (internal) API.  We'd likely be the only user of that new
> > >>API.
> > >
> > >So, if the only drawback is really just the risk of introducing a bug
> > >that leaves the fs_root changed--the above seems simple enough for that
> > >not to be a great risk, right?
> > >
> > 
> > If we unshare rpciod fs struct (which is exported already), then we
> 
> I'm not sure what you mean by that.  Do workqueues actually have their
> own dedicated set of associated tasks?  I thought all workqueues shared
> a common pool of tasks these days.
> 
> > won't affect other kthreads by root swapping.
> > But would be great to hear Trond's opinion about this approach.
> > 
> > Trond, could you tell us your feeling about all this?
> 
> I think it's often easier to get people to comment on an actual patch,
> and this one would be quite short, so try that

unshare() would break expectations for other users of workqueue threads
unless you "reshare()" afterwards. Either way that's going to be
seriously ugly.

OK, let's look at this again. Do we ever use AF_LOCAL connections for
anything other than synchronous rpc calls to the local rpcbind daemon in
order to register/unregister new services? If not, then let's just move
the AF_LOCAL connection back into the process context and out of rpciod.

That implies that the process needs to be privileged, but it needs
privileges in order to start RPC daemons anyway.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-11-14 Thread Myklebust, Trond

On Wed, 2012-11-14 at 16:42 -0500, J. Bruce Fields wrote:
> On Wed, Nov 14, 2012 at 09:36:33PM +0000, Myklebust, Trond wrote:
> > On Wed, 2012-11-14 at 16:01 -0500, J. Bruce Fields wrote:
> > > On Mon, Nov 12, 2012 at 12:37:54PM +0400, Stanislav Kinsbursky wrote:
> > > > 07.11.2012 22:33, J. Bruce Fields пишет:
> > > > >On Tue, Nov 06, 2012 at 08:36:05AM -0500, J. Bruce Fields wrote:
> > > > >>On Tue, Nov 06, 2012 at 08:10:18AM -0500, Christoph Hellwig wrote:
> > > > >>>On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote:
> > > > >>>>So you're worried that a bug in the nfs code could modify the root 
> > > > >>>>and
> > > > >>>>then not restore it?
> > > > >>>
> > > > >>>At least the link you pointed to earlier never sets it back.
> > > > >>
> > > > >>This? http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687
> > > > >>
> > > > >>  +   get_fs_root(current->fs, &root);
> > > > >>  +   set_fs_root(current->fs, &transport->root);
> > > > >>  +
> > > > >>  status = xs_local_finish_connecting(xprt, sock);
> > > > >>  +
> > > > >>  +   set_fs_root(current->fs, &root);
> > > > >>  +   path_put(&root);
> > > > >>
> > > > >>>Instead
> > > > >>>of messing with it I'd rather have the sunrpc code use 
> > > > >>>vfs_path_lookup
> > > > >>>and not care about current->fs->root at all.
> > > > >>
> > > > >>The annoyance is that the lookup happens somewhere lower down in the
> > > > >>networking code (net/unix/af_unix.c:unix_find_other, I think).  So 
> > > > >>we'd
> > > > >>need some new (internal) API.  We'd likely be the only user of that 
> > > > >>new
> > > > >>API.
> > > > >
> > > > >So, if the only drawback is really just the risk of introducing a bug
> > > > >that leaves the fs_root changed--the above seems simple enough for that
> > > > >not to be a great risk, right?
> > > > >
> > > > 
> > > > If we unshare rpciod fs struct (which is exported already), then we
> > > 
> > > I'm not sure what you mean by that.  Do workqueues actually have their
> > > own dedicated set of associated tasks?  I thought all workqueues shared
> > > a common pool of tasks these days.
> > > 
> > > > won't affect other kthreads by root swapping.
> > > > But would be great to hear Trond's opinion about this approach.
> > > > 
> > > > Trond, could you tell us your feeling about all this?
> > > 
> > > I think it's often easier to get people to comment on an actual patch,
> > > and this one would be quite short, so try that
> > 
> > unshare() would break expectations for other users of workqueue threads
> > unless you "reshare()" afterwards. Either way that's going to be
> > seriously ugly.
> > 
> > OK, let's look at this again. Do we ever use AF_LOCAL connections for
> > anything other than synchronous rpc calls to the local rpcbind daemon in
> > order to register/unregister new services?
> 
> Simo's patches use them for upcalls to svcgssd.  Those will always be
> done from server threads.

Any reason why you can't set that up when you start nfsd?

> > If not, then let's just move
> > the AF_LOCAL connection back into the process context and out of rpciod.
> 
> Remind me how this helps?

rpciod shares the 'init' process net namespace and chroot properties.
If, however you call bind() from the (containerised) process that was
used to start nfsd, then you will be using filesystem root (and net
namespace) of that container.

> --b.
> 
> > 
> > That implies that the process needs to be privileged, but it needs
> > privileges in order to start RPC daemons anyway.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-11-15 Thread Myklebust, Trond

On Wed, 2012-11-14 at 22:14 -0800, Eric W. Biederman wrote:
> "J. Bruce Fields"  writes:
> 
> > On Wed, Nov 14, 2012 at 09:51:33PM +, Myklebust, Trond wrote:
> >> On Wed, 2012-11-14 at 16:42 -0500, J. Bruce Fields wrote:
> >> > Simo's patches use them for upcalls to svcgssd.  Those will always be
> >> > done from server threads.
> >> 
> >> Any reason why you can't set that up when you start nfsd?
> >
> > Oh, right, I was thinking of the upcalls themselves--right, the connect
> > we should be able to do on server start, I agree.
> >
> >> 
> >> > > If not, then let's just move
> >> > > the AF_LOCAL connection back into the process context and out of 
> >> > > rpciod.
> >> > 
> >> > Remind me how this helps?
> >> 
> >> rpciod shares the 'init' process net namespace and chroot properties.
> >> If, however you call bind() from the (containerised) process that was
> >> used to start nfsd, then you will be using filesystem root (and net
> >> namespace) of that container.
> >
> > Got it.
> 
> If you can move the connect and bind into the server start that does
> sound like a very good and maintainable solution.  I suspect it might
> even be a smidge better for error handling.
> 
> Is there ever a reason to reconnect one of these sockets?

Not for the rpcbind case, however you can easily get into a situation
where the user restarts the gss daemon. The good news is that the gss
upcall code that uses AF_LOCAL hasn't been merged upstream yet, so that
particular interface is not yet locked in stone.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: nfsd oops on Linus' current tree.

2013-01-03 Thread Myklebust, Trond

On Thu, 2013-01-03 at 17:08 -0500, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jan 03, 2013 at 03:11:20PM -0500, J. Bruce Fields wrote:
> > Both rpciod and nfsiod already set WQ_MEM_RECLAIM.
> > 
> > But, right, looking at kernel/workqueue.c, it seems that the dedicated
> > "rescuer" threads are invoked only in the case when work is stalled
> > because a new worker thread isn't allocated quickly enough.
> 
> Because that's the *only* case where progress can't be guaranteed
> otherwise.
> 
> > So, what to do that's simplest enough that it would work for
> > post-rc2/stable?  I was happy having just a simple dedicated
> > thread--these are only started when nfsd is, so there's no real thread
> > proliferation problem.
> 
> The analysis is likely completely wrong, so please don't go off doing
> something unnecessary.  Please take look at what's causing the
> deadlocks again.

The analysis is a no-brainer:
We see a deadlock due to one work item waiting for completion of another
work item that is queued on the same CPU. There is no other dependency
between the two work items.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: nfsd oops on Linus' current tree.

2013-01-03 Thread Myklebust, Trond

On Thu, 2013-01-03 at 17:26 -0500, Tejun Heo wrote:
> Hello, Trond.
> 
> On Thu, Jan 03, 2013 at 10:12:32PM +, Myklebust, Trond wrote:
> > > The analysis is likely completely wrong, so please don't go off doing
> > > something unnecessary.  Please take look at what's causing the
> > > deadlocks again.
> > 
> > The analysis is a no-brainer:
> > We see a deadlock due to one work item waiting for completion of another
> > work item that is queued on the same CPU. There is no other dependency
> > between the two work items.
> 
> What do you mean "waiting for completion of"?  Is one flushing the
> other?  Or is one waiting for the other to take certain action?  How
> are the two work items related?  Are they queued on the same
> workqueue?  Can you create a minimal repro case of the observed
> deadlock?

The two work items are running on different work queues. One is running
on the nfsiod work queue, and is waiting for the other to complete on
the rpciod work queue. Basically, the nfsiod work item is trying to shut
down an RPC session, and it is waiting for each outstanding RPC call to
finish running a clean-up routine.

We can't call flush_work(), since we don't have a way to pin the
work_struct for any long period of time, so we queue all the RPC calls
up, then sleep on a global wait queue for 1 second or until the last RPC
call wakes us up (see rpc_shutdown_client()).

In the deadlock scenario, it looks as if one (or more) of the RPC calls
are getting queued on the same CPU (but on the rpciod workqueue) as the
shutdown process (running on nfsiod).


> Ooh, BTW, there was a bug where workqueue code created a false
> dependency between two work items.  Workqueue currently considers two
> work items to be the same if they're on the same address and won't
> execute them concurrently - ie. it makes a work item which is queued
> again while being executed wait for the previous execution to
> complete.
> 
> If a work function frees the work item, and then waits for an event
> which should be performed by another work item and *that* work item
> recycles the freed work item, it can create a false dependency loop.
> There really is no reliable way to detect this short of verifying
> every memory free.  A patch is queued to make such occurrences less
> likely (work functions should also match for two work items considered
> the same), but if you're seeing this, the best thing to do is freeing
> the work item at the end of the work function.

That's interesting... I wonder if we may have been hitting that issue.

>From what I can see, we do actually free the write RPC task (and hence
the work_struct) before we call the asynchronous unlink completion...

Dros, can you see if reverting commit
324d003b0cd82151adbaecefef57b73f7959a469 + commit 
168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9 and then applying the attached
patch also fixes the hang on a pristine 3.7.x kernel?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index b6bdb18..400f7ec 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -91,12 +91,13 @@ void nfs_readdata_release(struct nfs_read_data *rdata)
 	put_nfs_open_context(rdata->args.context);
 	if (rdata->pages.pagevec != rdata->pages.page_array)
 		kfree(rdata->pages.pagevec);
-	if (rdata != &read_header->rpc_data)
-		kfree(rdata);
-	else
+	if (rdata == &read_header->rpc_data) {
 		rdata->header = NULL;
+		rdata = NULL;
+	}
 	if (atomic_dec_and_test(&hdr->refcnt))
 		hdr->completion_ops->completion(hdr);
+	kfree(rdata);
 }
 EXPORT_SYMBOL_GPL(nfs_readdata_release);
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b673be3..45d9250 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -126,12 +126,13 @@ void nfs_writedata_release(struct nfs_write_data *wdata)
 	put_nfs_open_context(wdata->args.context);
 	if (wdata->pages.pagevec != wdata->pages.page_array)
 		kfree(wdata->pages.pagevec);
-	if (wdata != &write_header->rpc_data)
-		kfree(wdata);
-	else
+	if (wdata == &write_header->rpc_data) {
 		wdata->header = NULL;
+		wdata = NULL;
+	}
 	if (atomic_dec_and_test(&hdr->refcnt))
 		hdr->completion_ops->completion(hdr);
+	kfree(wdata);
 }
 EXPORT_SYMBOL_GPL(nfs_writedata_release);

Re: nfsd oops on Linus' current tree.

2013-01-03 Thread Myklebust, Trond

On Thu, 2013-01-03 at 18:11 -0500, Trond Myklebust wrote:
> On Thu, 2013-01-03 at 17:26 -0500, Tejun Heo wrote:
> > Ooh, BTW, there was a bug where workqueue code created a false
> > dependency between two work items.  Workqueue currently considers two
> > work items to be the same if they're on the same address and won't
> > execute them concurrently - ie. it makes a work item which is queued
> > again while being executed wait for the previous execution to
> > complete.
> > 
> > If a work function frees the work item, and then waits for an event
> > which should be performed by another work item and *that* work item
> > recycles the freed work item, it can create a false dependency loop.
> > There really is no reliable way to detect this short of verifying
> > every memory free.  A patch is queued to make such occurrences less
> > likely (work functions should also match for two work items considered
> > the same), but if you're seeing this, the best thing to do is freeing
> > the work item at the end of the work function.
> 
> That's interesting... I wonder if we may have been hitting that issue.
> 
> From what I can see, we do actually free the write RPC task (and hence
> the work_struct) before we call the asynchronous unlink completion...
> 
> Dros, can you see if reverting commit
> 324d003b0cd82151adbaecefef57b73f7959a469 + commit 
> 168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9 and then applying the attached
> patch also fixes the hang on a pristine 3.7.x kernel?

Actually, we probably also need to look at rpc_free_task, so the
following patch, instead...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index b6bdb18..400f7ec 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -91,12 +91,13 @@ void nfs_readdata_release(struct nfs_read_data *rdata)
 	put_nfs_open_context(rdata->args.context);
 	if (rdata->pages.pagevec != rdata->pages.page_array)
 		kfree(rdata->pages.pagevec);
-	if (rdata != &read_header->rpc_data)
-		kfree(rdata);
-	else
+	if (rdata == &read_header->rpc_data) {
 		rdata->header = NULL;
+		rdata = NULL;
+	}
 	if (atomic_dec_and_test(&hdr->refcnt))
 		hdr->completion_ops->completion(hdr);
+	kfree(rdata);
 }
 EXPORT_SYMBOL_GPL(nfs_readdata_release);
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b673be3..45d9250 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -126,12 +126,13 @@ void nfs_writedata_release(struct nfs_write_data *wdata)
 	put_nfs_open_context(wdata->args.context);
 	if (wdata->pages.pagevec != wdata->pages.page_array)
 		kfree(wdata->pages.pagevec);
-	if (wdata != &write_header->rpc_data)
-		kfree(wdata);
-	else
+	if (wdata == &write_header->rpc_data) {
 		wdata->header = NULL;
+		wdata = NULL;
+	}
 	if (atomic_dec_and_test(&hdr->refcnt))
 		hdr->completion_ops->completion(hdr);
+	kfree(wdata);
 }
 EXPORT_SYMBOL_GPL(nfs_writedata_release);
 
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d17a704..500407a 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -936,14 +936,13 @@ struct rpc_task *rpc_new_task(const struct rpc_task_setup *setup_data)
 
 static void rpc_free_task(struct rpc_task *task)
 {
-	const struct rpc_call_ops *tk_ops = task->tk_ops;
-	void *calldata = task->tk_calldata;
+	unsigned short tk_flags = task->tk_flags;
 
-	if (task->tk_flags & RPC_TASK_DYNAMIC) {
+	rpc_release_calldata(task->tk_ops, task->tk_calldata);
+	if (tk_flags & RPC_TASK_DYNAMIC) {
 		dprintk("RPC: %5u freeing task\n", task->tk_pid);
 		mempool_free(task, rpc_task_mempool);
 	}
-	rpc_release_calldata(tk_ops, calldata);
 }
 
 static void rpc_async_release(struct work_struct *work)

Re: [PATCH] pnfs: Increase the refcount when LAYOUTGET fails the first time

2013-01-04 Thread Myklebust, Trond

On Fri, 2013-01-04 at 20:19 +0800, ycn...@gmail.com wrote:
> From: Yanchuan Nian 
> 
> The layout will be set unusable if LAYOUTGET fails. Is it reasonable to
> increase the refcount iff LAYOUTGET fails the first time?
> 
> Signed-off-by: Yanchuan Nian 
> ---
>  fs/nfs/pnfs.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index e7165d9..d00260b 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -254,7 +254,7 @@ static void
>  pnfs_layout_set_fail_bit(struct pnfs_layout_hdr *lo, int fail_bit)
>  {
>   lo->plh_retry_timestamp = jiffies;
> - if (test_and_set_bit(fail_bit, &lo->plh_flags))
> + if (!test_and_set_bit(fail_bit, &lo->plh_flags))
>   atomic_inc(&lo->plh_refcount);
>  }
>  
Thank for spotting this! Applying to the bugfixes branch...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nfs: fix null checking in nfs_get_option_str()

2013-01-04 Thread Myklebust, Trond

On Fri, 2013-01-04 at 03:22 -0500, Xi Wang wrote:
> The following null pointer check is broken.
> 
>   *option = match_strdup(args);
>   return !option;
> 
> The pointer `option' must be non-null, and thus `!option' is always false.
> Use `!*option' instead.
> 
> The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some
> cut-and-paste code.").
> 
> Signed-off-by: Xi Wang 
> Cc: sta...@vger.kernel.org
> ---
>  fs/nfs/super.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index c25cadf8..2e7e8c8 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -1152,7 +1152,7 @@ static int nfs_get_option_str(substring_t args[], char 
> **option)
>  {
>   kfree(*option);
>   *option = match_strdup(args);
> - return !option;
> + return !*option;
>  }
>  
>  static int nfs_get_option_ul(substring_t args[], unsigned long *option)

Thank you! Applied to the bugfixes branch.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nfs: avoid dereferencing null pointer in initiate_bulk_draining

2013-01-05 Thread Myklebust, Trond

On Sat, 2013-01-05 at 14:19 -0500, Nickolai Zeldovich wrote:
> Fix an inverted null pointer check in initiate_bulk_draining().
> 
> Signed-off-by: Nickolai Zeldovich 
> ---
>  fs/nfs/callback_proc.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index c89b26b..264d1aa 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -206,7 +206,7 @@ static u32 initiate_bulk_draining(struct nfs_client *clp,
>  
>   list_for_each_entry(lo, &server->layouts, plh_layouts) {
>   ino = igrab(lo->plh_inode);
> - if (ino)
> + if (!ino)
>   continue;
>   spin_lock(&ino->i_lock);
>   /* Is this layout in the process of being freed? */

Thanks for spotting. Applied to the 'bugfixes' branch.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client changesets for Linux 3.8

2012-12-17 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 0e4a43ed08e2f44aa7b96aa95d0a540d675483e1:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes 
(2012-11-07 13:38:56 +0100)

are available in the git repository at:


  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.8-1

for you to fetch changes up to cd6c5968582a273561464fe6b1e8cc8214be02df:

  SUNRPC: continue run over clients list on PipeFS event instead of break 
(2012-12-17 12:19:16 -0500)

Cheers
  Trond


NFS client updates for Linux 3.8

Features include:

- Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code
  Remove altogether where possible, and replace with WARN_ON_ONCE and
  appropriate error returns where not.
- NFSv4.1 client adds session dynamic slot table management. There is
  matching server side code that has been submitted to Bruce for
  consideration. Together, this code allows the server to dynamically
  manage the amount of memory it allocates to the duplicate request
  cache for each client. It will constantly resize those caches to
  reserve more memory for clients that are hot while shrinking caches
  for those that are quiescent.

In addition, there are assorted bugfixes for the generic NFS write code,
fixes to deal with the drop_nlink() warnings, and yet another fix for
NFSv4 getacl.


Andy Adamson (2):
  SUNRPC set gss gc_expiry to full lifetime
  SUNRPC handle EKEYEXPIRED in call_refreshresult

Bryan Schumaker (2):
  NFS: Add sequence_priviliged_ops for nfs4_proc_sequence()
  NFS: Remove _nfs_call_sync_session

Jeff Layton (3):
  nfs: don't extend writes to cover entire page if pagecache is invalid
  nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
  nfs: fix page dirtying in NFS DIO read codepath

Jim Rees (1):
  NFS: Reduce stack use in encode_exchange_id()

NeilBrown (1):
  NFS: avoid NULL dereference in nfs_destroy_server

Stanislav Kinsbursky (1):
  SUNRPC: continue run over clients list on PipeFS event instead of break

Sven Wegener (1):
  NFSv4: Check for buffer length in __nfs4_get_acl_uncached

Trond Myklebust (68):
  NFS: Get rid of unnecessary asserts
  NFS: Remove asserts from the NFS XDR code
  NFSv4: Remove the BUG_ON() from nfs4_get_lease_time_prepare()...
  NFSv4.1: Remove unused function last_byte_offset
  NFSv4.1: Remove assertion BUG_ON()s from the files and generic layout code
  NFS: Remove BUG_ON() calls from the generic writeback code
  NFSv4: Get rid of unnecessary BUG_ON()s
  NFS: Remove BUG_ON()s in the fs/nfs/inode.c
  NFS: Remove the BUG_ON() in the mount code
  lockd: Remove unnecessary BUG_ON()s in the xdr client code
  lockd: Remove trivial BUG_ON()s from the NSM code
  lockd: Remove BUG_ON()s in fs/lockd/host.c
  lockd: Remove BUG_ON()s from fs/lockd/clntproc.c
  SUNRPC: Clean up rpc_bind_new_program
  SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info
  NFSv4.1: Handle session reset and bind_conn_to_session before lease check
  NFSv4.1: Don't confuse CREATE_SESSION arguments and results
  NFSv4.1: Adjust CREATE_SESSION arguments when mounting a new filesystem
  NFSv4.1: We must bump the clientid sequence number after CREATE_SESSION
  NFSv4.1: nfs4_alloc_slots doesn't need zeroing
  NFSv4.1: clean up nfs4_recall_slot to use nfs4_alloc_slots
  NFSv4.1: Shrink struct nfs4_sequence_res by moving sr_renewal_time
  NFSv4: Fix a compile time warning when #undef CONFIG_NFS_V4_1
  NFSv4.1: Shrink struct nfs4_sequence_res by moving the session pointer
  NFSv4.1: Label each entry in the session slot tables with its slot number
  NFSv4.1: Simplify struct nfs4_sequence_args too
  NFSv4.1: Simplify slot allocation
  NFSv4.1: Clean up nfs4_free_slot
  NFSv4.1: Ensure that the client tracks the server target_highest_slotid
  NFSv4.1: Reset the sequence number for slots that have been deallocated
  NFSv4.1: Fix nfs4_callback_recallslot to work with dynamic slot allocation
  NFSv4.1: Don't confuse target_highest_slotid and max_slots in 
cb_recall_slot
  NFSv4.1: Allow the server to recall all but one slot
  NFSv4.1: Support dynamic resizing of the session slot table
  NFSv4.1: Allow SEQUENCE to resize the slot table on the fly
  NFSv4.1: Remove the state manager code to resize the slot table
  NFSv4.1: CB_RECALL_SLOT must schedule a sequence op after updating targets
  NFSv4.1: If slot allocation fails due to OOM, retry more quickly
  NFSv4.1: Clean up session draining
  NFSv4: Move nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease
  NFSv4.1: Cleanup move session slot management to fs/nfs/nfs4session.c
  NFS: Remove unused function slot_idx
  NFSv4.1: Move slot table and sess

Re: nfsd oops on Linus' current tree.

2012-12-21 Thread Myklebust, Trond

On Fri, 2012-12-21 at 13:08 -0500, J. Bruce Fields wrote:
> On Fri, Dec 21, 2012 at 10:33:48AM -0500, Dave Jones wrote:
> > Did a mount from a client (also running Linus current), and the
> > server spat this out..
> > 
> > [ 6936.306135] [ cut here ]
> > [ 6936.306154] WARNING: at net/sunrpc/clnt.c:617 
> > rpc_shutdown_client+0x12a/0x1b0 [sunrpc]()
> 
> This is a warning added by 168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9
> "SUNRPC: add WARN_ON_ONCE for potential deadlock", pointing out that
> nfsd is calling shutdown_client from a workqueue, which is a problem
> because shutdown_client has to wait on rpc tasks that run on a
> workqueue.
> 
> I don't believe there's any circular dependency among the workqueues
> (we're calling shutdown_client from callback_wq, not rpciod_workqueue),

We were getting deadlocks with rpciod when calling rpc_shutdown_client
from the nfsiod workqueue.

The problem here is that the workqueues all run using the same pool of
threads, and so you can get "interesting" deadlocks when one of these
threads has to wait for another one.

> but 168e4b39d1afb.. says that we could get a deadlock if both are
> running on the same kworker thread.
> 
> I'm not sure what to do about that.
> 

The question is if you really do need the call to rpc_killall_tasks and
the synchronous wait for completion of old tasks? If you don't care,
then we could just have you call rpc_release_client() in order to
release your reference on the rpc_client.

> > [ 6936.306156] Hardware name: 
> > [ 6936.306157] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 
> > nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs 
> > coretemp iTCO_wdt iTCO_vendor_support snd_emu10k1 microcode snd_util_mem 
> > snd_ac97_codec ac97_bus snd_hwdep snd_seq snd_pcm snd_page_alloc snd_timer 
> > e1000e snd_rawmidi snd_seq_device snd emu10k1_gp pcspkr i2c_i801 soundcore 
> > gameport lpc_ich mfd_core i82975x_edac edac_core vhost_net tun macvtap 
> > macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc 
> > btrfs libcrc32c zlib_deflate usb_storage firewire_ohci firewire_core 
> > sata_sil crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core 
> > floppy
> > [ 6936.306214] Pid: 52, comm: kworker/u:2 Not tainted 3.7.0+ #34
> > [ 6936.306216] Call Trace:
> > [ 6936.306224]  [] warn_slowpath_common+0x7f/0xc0
> > [ 6936.306227]  [] warn_slowpath_null+0x1a/0x20
> > [ 6936.306235]  [] rpc_shutdown_client+0x12a/0x1b0 
> > [sunrpc]
> > [ 6936.306240]  [] ? delay_tsc+0x98/0xf0
> > [ 6936.306252]  [] 
> > nfsd4_process_cb_update.isra.16+0x4b/0x230 [nfsd]
> > [ 6936.306256]  [] ? __rcu_read_unlock+0x5c/0xa0
> > [ 6936.306260]  [] ? debug_object_deactivate+0x46/0x130
> > [ 6936.306269]  [] nfsd4_do_callback_rpc+0x8d/0xa0 [nfsd]
> > [ 6936.306272]  [] process_one_work+0x207/0x760
> > [ 6936.306274]  [] ? process_one_work+0x197/0x760
> > [ 6936.306277]  [] ? worker_thread+0x21e/0x440
> > [ 6936.306285]  [] ? 
> > nfsd4_process_cb_update.isra.16+0x230/0x230 [nfsd]
> > [ 6936.306289]  [] worker_thread+0x15e/0x440
> > [ 6936.306292]  [] ? rescuer_thread+0x250/0x250
> > [ 6936.306295]  [] kthread+0xed/0x100
> > [ 6936.306299]  [] ? put_lock_stats.isra.25+0xe/0x40
> > [ 6936.306302]  [] ? kthread_create_on_node+0x160/0x160
> > [ 6936.306307]  [] ret_from_fork+0x7c/0xb0
> > [ 6936.306310]  [] ? kthread_create_on_node+0x160/0x160
> > [ 6936.306312] ---[ end trace 5bab69e086ae3c6f ]---
> > [ 6936.363213] [ cut here ]
> > [ 6936.363226] WARNING: at fs/nfsd/vfs.c:937 
> > nfsd_vfs_read.isra.13+0x197/0x1b0 [nfsd]()
> 
> This warning is unrelated, and is probably just carelessness on my part:
> I couldn't see why this condition would happen, and I stuck the warning
> in there without looking much harder.  Probably we should just revert
> 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e "nfsd: warn on odd reply state
> in nfsd_vfs_read" while I go stare at the code.
> 
> --b.
> 
> > [ 6936.363229] Hardware name: 
> > [ 6936.363230] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 
> > nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs 
> > coretemp iTCO_wdt iTCO_vendor_support snd_emu10k1 microcode snd_util_mem 
> > snd_ac97_codec ac97_bus snd_hwdep snd_seq snd_pcm snd_page_alloc snd_timer 
> > e1000e snd_rawmidi snd_seq_device snd emu10k1_gp pcspkr i2c_i801 soundcore 
> > gameport lpc_ich mfd_core i82975x_edac edac_core vhost_net tun macvtap 
> > macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc 
> > btrfs libcrc32c zlib_deflate usb_storage firewire_ohci firewire_core 
> > sata_sil crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core 
> > floppy
> > [ 6936.363284] Pid: 699, comm: nfsd Tainted: GW3.7.0+ #34
> > [ 6936.363286] Call Trace:
> > [ 6936.363293]  [] warn_slowpath_common+0x7f/0xc0
> > [ 6936.363296]  [] warn_slowpath_null+0x1a/0x20
> > [ 6936.363302]  [] nfsd_vfs_read.isra.13+0x197/0x1b0 
>

RE: nfsd oops on Linus' current tree.

2012-12-21 Thread Myklebust, Trond

Apologies for top-posting. The SSD on my laptop died, and so I'm stuck using 
webmail for this account...

Our experience with nfsiod is that the WQ_MEM_RECLAIM option still deadlocks 
despite the "rescuer thread". The CPU that is running the workqueue will 
deadlock with any rpciod task that is assigned to the same CPU. Interestingly 
enough, the WQ_UNBOUND option also appears able to deadlock in the same 
situation.

Sorry, I have no explanation why...

Cheers,
  Trond

From: J. Bruce Fields [bfie...@fieldses.org]
Sent: Friday, December 21, 2012 6:08 PM
To: Myklebust, Trond
Cc: Dave Jones; Linux Kernel; linux-...@vger.kernel.org; Adamson, Dros
Subject: Re: nfsd oops on Linus' current tree.

On Fri, Dec 21, 2012 at 06:40:54PM +0000, Myklebust, Trond wrote:
> On Fri, 2012-12-21 at 13:08 -0500, J. Bruce Fields wrote:
> > On Fri, Dec 21, 2012 at 10:33:48AM -0500, Dave Jones wrote:
> > > Did a mount from a client (also running Linus current), and the
> > > server spat this out..
> > >
> > > [ 6936.306135] [ cut here ]
> > > [ 6936.306154] WARNING: at net/sunrpc/clnt.c:617 
> > > rpc_shutdown_client+0x12a/0x1b0 [sunrpc]()
> >
> > This is a warning added by 168e4b39d1afb79a7e3ea6c3bb246b4c82c6bdb9
> > "SUNRPC: add WARN_ON_ONCE for potential deadlock", pointing out that
> > nfsd is calling shutdown_client from a workqueue, which is a problem
> > because shutdown_client has to wait on rpc tasks that run on a
> > workqueue.
> >
> > I don't believe there's any circular dependency among the workqueues
> > (we're calling shutdown_client from callback_wq, not rpciod_workqueue),
>
> We were getting deadlocks with rpciod when calling rpc_shutdown_client
> from the nfsiod workqueue.
>
> The problem here is that the workqueues all run using the same pool of
> threads, and so you can get "interesting" deadlocks when one of these
> threads has to wait for another one.

OK, coming back after reviewing Documentation/workqueue.txt: the
workqueues potentially involved here are rpciod and the server's
laundromat and callback workqueues.

The former is created with

alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 1);

and the latter two are both created with
create_singlethread_workqueue(), which also sets WQ_MEM_RECLAIM.

As I understand it, that ensures that each of the three has at least one
"rescuer" thread dedicated to it--so there shouldn't be any deadlock as
long as there's no circular dependency among the three.

So think this is a false positive--am I missing something?

> > but 168e4b39d1afb.. says that we could get a deadlock if both are
> > running on the same kworker thread.
> >
> > I'm not sure what to do about that.
> >
>
> The question is if you really do need the call to rpc_killall_tasks and
> the synchronous wait for completion of old tasks? If you don't care,
> then we could just have you call rpc_release_client() in order to
> release your reference on the rpc_client.

No, the waiting's intentional: for example, the laundromat wq is what's
responsible for reaping expired clients, and it wants to wait for
callback rpc's to exit.  (If I remember right they may reference data
structures we're about to clean up).

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: nfsd oops on Linus' current tree.

2012-12-21 Thread Myklebust, Trond

Please reread what I said. There was no obvious circular dependency, because 
nfsiod and rpciod are separate workqueues, both created with WQ_MEM_RECLAIM. 
Dros' experience shows, however that a call to rpc_shutdown_client in an nfsiod 
work item will deadlock with rpciod if the RPC task's work item has been 
assigned to the same CPU as the one running the rpc_shutdown_client work item.

I can't tell right now if that is intentional (in which case the WARN_ON in the 
rpc code is correct), or if it is a bug in the workqueue code. For now, we're 
assuming the former.

From: J. Bruce Fields [bfie...@fieldses.org]
Sent: Friday, December 21, 2012 6:26 PM
To: Myklebust, Trond
Cc: Dave Jones; Linux Kernel; linux-...@vger.kernel.org; Adamson, Dros
Subject: Re: nfsd oops on Linus' current tree.

On Fri, Dec 21, 2012 at 11:15:40PM +, Myklebust, Trond wrote:
> Apologies for top-posting. The SSD on my laptop died, and so I'm stuck using 
> webmail for this account...

Fun!  If that happens to me on this trip, I've got a week trying to hack
the kernel from my cell phone

> Our experience with nfsiod is that the WQ_MEM_RECLAIM option still deadlocks 
> despite the "rescuer thread". The CPU that is running the workqueue will 
> deadlock with any rpciod task that is assigned to the same CPU. Interestingly 
> enough, the WQ_UNBOUND option also appears able to deadlock in the same 
> situation.
>
> Sorry, I have no explanation why...

As I said:

> there shouldn't be any deadlock as long as there's no circular
> dependency among the three.

There was a circular dependency (of rpciod on itself), so having a
dedicated rpciod rescuer thread wouldn't help--once the rescuer thread
is waiting for work queued to do the same queue you're asking for
trouble.

The last argument in

alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 1);

ensures that it will never allow more than 1 piece of work to run per
CPU, so the deadlock should be pretty easy to hit.

And with UNBOUND that's only one piece of work globally, so yeah all you
need is an rpc at shutdown time and it should deadlock every time.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/4] SUNRPC: fix races on PipeFS MOUNT notifications

2013-06-17 Thread Myklebust, Trond

On Tue, 2013-06-11 at 18:39 +0400, Stanislav Kinsbursky wrote:
> Below are races, when RPC client can be created without PiepFS dentries
> 
> CPU#0 CPU#1
> - -
> rpc_new_clientrpc_fill_super
> rpc_setup_pipedir
> mutex_lock(&sn->pipefs_sb_lock)
> rpc_get_sb_net == NULL
> (no per-net PipeFS superblock)
>   sn->pipefs_sb = sb;
>   notifier_call_chain(MOUNT)
>   (client is not in the list)
> rpc_register_client
> (client without pipes dentries)
> 
> To fix this patch:
> 1) makes PipeFS mount notification call with pipefs_sb_lock being held.
> 2) releases pipefs_sb_lock on new SUNRPC client creation only after
> registration.
> 
> Signed-off-by: Stanislav Kinsbursky 
> Cc: sta...@vger.kernel.org

Hi Stanislav,

This isn't going to apply to the stable kernels without the cleanup
patch. Could you please reorganise this patch series so that the cleanup
comes last.

Thanks,
  Trond


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.10-rc3 NFSv3 mount issues

2013-05-30 Thread Myklebust, Trond

On Thu, 2013-05-30 at 16:26 -0400, Chuck Lever wrote:
> On May 30, 2013, at 4:19 PM, Jim Schutt  wrote:
> 
> > Hi,
> > 
> > I've been trying to test 3.10-rc3 on some diskless clients, and found
> > that I can no longer mount my root file system via NFSv3.
> > 
> > I poked around looking at NFS changes for 3.10, and found these two
> > commits:
> > 
> > d497ab9751 "NFSv3: match sec= flavor against server list"
> > 4580a92d44 "NFS: Use server-recommended security flavor by default (NFSv3)"
> > 
> > If I revert both of these commits from 3.10-rc3, then my diskless
> > client can mount its root file system.
> > 
> > The busybox mount command fails like this, when using 3.10-rc3:
> > 
> > / # mount  -t nfs -o ro,nolock,vers=3,proto=tcp 
> > 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt
> > mount: mounting 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x on /mnt 
> > failed: Invalid argument
> > 
> > The commit messages for both these commits seem to say that mounting
> > with the "sys=sec" option should work, but unfortunately, my busybox doesn't
> > seem to understand the "sec=" mount option:
> > 
> > / # mount  -t nfs -o ro,nolock,vers=3,proto=tcp,sec=sys 
> > 172.17.0.122:/gmi/images/jaschut/ceph.toss-2x /mnt
> > mount: invalid number 'sys'
> > 
> > My NFS server is based on RHEL6, and is not using any "sec=" option
> > in its export for this file system.  I did try exporting with "sec=sys",
> > but it didn't seem to make any difference either.
> > 
> > So far, this seems like a regression to me 
> > Any ideas what I might be doing wrong?  How can I
> > help make this work again?
> 
> 3.10-rc3 appears to be missing the fix for this.  See:
> 
>   http://marc.info/?l=linux-nfs&m=136855668104598&w=2
> 
> Trond, can we get this applied?
> 

For some reason it got lost in the mail heap. I've applied it now to the
'bugfixes' branch. Will push upstream in the next few days...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull 2 NFS client bugfixes

2013-05-31 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 83c168bf8017212a9d502536f9dcd0b54d24e330:

  NFS: Fix SETCLIENTID fallback if GSS is not available (2013-05-23 18:50:40 
-0400)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.10-4

for you to fetch changes up to eb54d43707c69340581940e1fcaecb4d7d17b814:

  NFS: Fix security flavor negotiation with legacy binary mounts (2013-05-30 
16:31:34 -0400)


NFS client fixes:

- Fix a regression that broke NFS mounting using klibc and busybox
- Stable fix to check access modes correctly on NFSv4 delegated open()


Chuck Lever (1):
  NFS: Fix security flavor negotiation with legacy binary mounts

Trond Myklebust (1):
  NFSv4: Fix a thinko in nfs4_try_open_cached

 fs/nfs/nfs4proc.c | 2 +-
 fs/nfs/super.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v3 2/4] SUNRPC: fix races on PipeFS UMOUNT notifications

2013-06-25 Thread Myklebust, Trond

On Mon, 2013-06-24 at 11:52 +0400, Stanislav Kinsbursky wrote:
> CPU#0   CPU#1
> -   -
> rpc_kill_sb
> sn->pipefs_sb = NULLrpc_release_client
> (UMOUNT_EVENT)  rpc_free_auth
> rpc_pipefs_event
> rpc_get_client_for_event
> !atomic_inc_not_zero(cl_count)
> 
> atomic_inc(cl_count)
> rpc_free_client
> rpc_clnt_remove_pipedir
> 
> 
> To fix this, this patch does the following:
> 
> 1) Calls RPC_PIPEFS_UMOUNT notification with sn->pipefs_sb_lock being held.
> 2) Removes SUNRPC client from the list AFTER pipes destroying.
> 3) Doesn't hold RPC client on notification: if client in the list, then it
> can't be destroyed while sn->pipefs_sb_lock in hold by notification caller.
> 
> Signed-off-by: Stanislav Kinsbursky 
> Cc: sta...@vger.kernel.org
> ---
>  net/sunrpc/clnt.c |5 +
>  net/sunrpc/rpc_pipe.c |2 +-
>  2 files changed, 2 insertions(+), 5 deletions(-)



> diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
> index c512448..efca2f7 100644
> --- a/net/sunrpc/rpc_pipe.c
> +++ b/net/sunrpc/rpc_pipe.c
> @@ -1165,7 +1165,6 @@ static void rpc_kill_sb(struct super_block *sb)
>   goto out;
>   }
>   sn->pipefs_sb = NULL;
> - mutex_unlock(&sn->pipefs_sb_lock);
>   dprintk("RPC:   sending pipefs UMOUNT notification for net %p%s\n",
>   net, NET_NAME(net));
>   blocking_notifier_call_chain(&rpc_pipefs_notifier_list,
> @@ -1173,6 +1172,7 @@ static void rpc_kill_sb(struct super_block *sb)
>  sb);
>   put_net(net);
>  out:
> + mutex_unlock(&sn->pipefs_sb_lock);

Is this safe to do after the put_net()?

>   kill_litter_super(sb);
>  }
>  
> 


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] Please pull NFS client bugfixes

2013-03-26 Thread Myklebust, Trond

Hi Linus,

The following changes since commit 6dbe51c251a327e012439c4772097a13df43c5b8:

  Linux 3.9-rc1 (2013-03-03 15:11:05 -0800)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.9-3

for you to fetch changes up to 1166fde6a923c30f4351515b6a9a1efc513e7d00:

  SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked 
(2013-03-25 11:23:40 -0400)


NFS client bugfixes for Linux 3.9

- Fix an NFSv4 idmapper regression
- Fix an Oops in the pNFS blocks client
- Fix up various issues with pNFS layoutcommit
- Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked


Trond Myklebust (5):
  NFSv4: Fix the string length returned by the idmapper
  NFSv4.1: Fix a race in pNFS layoutcommit
  NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
  NFSv4.1: Add a helper pnfs_commit_and_return_layout
  SUNRPC: Add barriers to ensure read ordering in 
rpc_wake_up_task_queue_locked

fanchaoting (1):
  pnfs-block: removing DM device maybe cause oops when call dev_remove

 fs/nfs/blocklayout/blocklayoutdm.c |  4 +-
 fs/nfs/idmap.c | 13 +++---
 fs/nfs/nfs4filelayout.c|  1 -
 fs/nfs/nfs4proc.c  | 16 +---
 fs/nfs/pnfs.c  | 81 +-
 fs/nfs/pnfs.h  |  6 +++
 net/sunrpc/sched.c |  9 -
 7 files changed, 96 insertions(+), 34 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sunrpc/clnt.c: BUG kmalloc-256 (Not tainted): Poison overwritten

2013-07-14 Thread Myklebust, Trond

On Sun, 2013-07-14 at 10:02 +0200, Toralf Förster wrote:
> This bisected commit produces at a 32 bit user mode linux guest the attached 
> BUG :
> 
> commit 245268c951262b861bc1be4e9dc812352499
> Author: Trond Myklebust 
> Date:   Wed Jul 10 15:33:01 2013 -0400
> 
> SUNRPC: Fix a deadlock in rpc_client_register()
> 
> Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on
> PipeFS MOUNT notifications) introduces a regression when we call
> rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.
> 
> By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
> end up deadlocking in gss_pipes_dentries_create_net().
> Fix is to register the client and release the mutex before calling
> rpcauth_create().
> 
> Reported-by: Weston Andros Adamson 
> Tested-by: Weston Andros Adamson 
> Cc: Stanislav Kinsbursky 
> Cc:  # : 3848160: SUNRPC: fix races on PipeFS 
> MOUNT
> Cc:  # : e73f4cc: SUNRPC: split client creation
> Signed-off-by: Trond Myklebust 
> 
> 
> 
> 
> 
> 
> 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Version 1.2.6 starting
> 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1042]: Backgrounding to 
> notify hosts...
> 2013-07-13T22:09:07.000+02:00 trinity sm-notify[1043]: Running as root.  
> chown /var/lib/nfs to choose different user
> 2013-07-13T22:09:10.000+02:00 trinity mount[1047]: mount to NFS server 
> 'n22stab4' failed: No route to host, retrying
> 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: sending IPv6 Router 
> Solicitation
> 2013-07-13T22:09:11.000+02:00 trinity dhcpcd[971]: eth0: no IPv6 Routers 
> available
> 2013-07-13T22:09:13.000+02:00 trinity mount[1048]: mount to NFS server 
> 'n22stab4' failed: No route to host, retrying
> 2013-07-13T22:09:13.647+02:00 trinity kernel: 
> =
> 2013-07-13T22:09:13.647+02:00 trinity kernel: BUG kmalloc-256 (Not tainted): 
> Poison overwritten
> 2013-07-13T22:09:13.647+02:00 trinity kernel: 
> -
> 2013-07-13T22:09:13.647+02:00 trinity kernel:
> 2013-07-13T22:09:13.647+02:00 trinity kernel: Disabling lock debugging due to 
> kernel taint
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: 0x49b1ed18-0x49b1ed1b. 
> First byte 0x74 instead of 0x6b
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Allocated in 
> rpc_new_client+0x81/0x3a0 age=300 cpu=0 pid=1049
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Freed in 
> rpc_new_client+0x35a/0x3a0 age=300 cpu=0 pid=1049
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Slab 0x0b87bac0 
> objects=13 used=13 fp=0x  (null) flags=0x0080
> 2013-07-13T22:09:13.647+02:00 trinity kernel: INFO: Object 0x49b1ed10 
> @offset=3344 fp=0x49b1ee40
> 2013-07-13T22:09:13.653+02:00 trinity kernel:
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Bytes b4 49b1ed00: f6 03 00 00 
> bd 94 ff ff 5a 5a 5a 5a 5a 5a 5a 5a  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed10: 6b 6b 6b 6b 6b 
> 6b 6b 6b 74 3a 85 49 6b 6b 6b 6b  t:.I
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed20: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed30: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed40: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed50: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed60: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed70: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.653+02:00 trinity kernel: Object 49b1ed80: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ed90: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1eda0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edb0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edc0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edd0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1ede0: 6b 6b 6b 6b 6b 
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 2013-07-13T22:09:13.660+02:00 trinity kernel: Object 49b1edf0: 6b 6b 6b 6b 6b 
> 6b

Re: [Ksummit-2013-discuss] [ATTEND] How to act on LKML

2013-07-16 Thread Myklebust, Trond

On Tue, 2013-07-16 at 19:31 -0400, Ric Wheeler wrote:
> On 07/16/2013 07:12 PM, Sarah Sharp wrote:
> > On Tue, Jul 16, 2013 at 06:54:59PM -0400, Steven Rostedt wrote:
> >> On Tue, 2013-07-16 at 15:43 -0700, Sarah Sharp wrote:
> >>
> >>> Yes, that's true.  Some kernel developers are better at moderating their
> >>> comments and tone towards individuals who are "sensitive".  Others
> >>> simply don't give a shit.  So we need to figure out how to meet
> >>> somewhere in the middle, in order to establish a baseline of civility.
> >> I have to ask this because I'm thick, and don't really understand,
> >> but ...
> >>
> >> What problem exactly are we trying to solve here?
> > Personal attacks are not cool Steve.  Some people simply don't care if a
> > verbal tirade is directed at them.  Others do not want anyone to attack
> > them personally, but they're fine with people attacking their code.
> >
> > Bystanders that don't understand the kernel community structure are
> > discouraged from contributing because they don't want to be verbally
> > abused, and they really don't want to see either personal attacks or
> > intense belittling, demeaning comments about code.
> >
> > In order to make our community better, we need to figure out where the
> > baseline of "good" behavior is.  We need to define what behavior we want
> > from both maintainers and patch submitters.  E.g. "No regressions" and
> > "don't break userspace" and "no personal attacks".  That needs to be
> > written down somewhere, and it isn't.  If it's documented somewhere,
> > point me to the file in Documentation.  Hint: it's not there.
> >
> > That is the problem.
> >
> > Sarah Sharp
> 
> The problem you are pointing out - and it is a problem - makes us less 
> effective 
> as a community.

Not really. Most of the people who already work as part of this
community are completely used to it. We've created the environment, and
have no problems with it.

Where it could possibly be a problem is when it comes to recruiting
_new_ members to our community. Particularly so given that some
journalists take a special pleasure in reporting particularly juicy
comments and antics. That would tend to scare off a lot of gun-shy
newbies.
On the other hand, it might tend to bias our recruitment toward people
of a more "special" disposition. Perhaps we finally need the services of
a social scientist to help us find out...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [PATCH] fs/nfs/inode.c: adjust code alignment

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 16:47 +0200, Julia Lawall wrote:
> From: Julia Lawall 
> 
> Signed-off-by: Julia Lawall 
> 
> ---
> 
> This patch adjusts the code so that the alignment matches the current
> semantics.  I have no idea if it is the intended semantics, though.  Should
> the call to nfs_setsecurity also be under the else?
> 

>  fs/nfs/inode.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index af6e806..d8ad685 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -463,7 +463,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh
> *fh, struct nfs_fattr *fattr, st
> unlock_new_inode(inode);
> } else
> nfs_refresh_inode(inode, fattr);
> -   nfs_setsecurity(inode, fattr, label);
> +   nfs_setsecurity(inode, fattr, label);
> dprintk("NFS: nfs_fhget(%s/%Ld fh_crc=0x%08x ct=%d)\n",
> inode->i_sb->s_id,
> (long long)NFS_FILEID(inode),

Hi Julia,

Thanks for pointing this out! Given that the 'then' clause of the if
statement already calls nfs_setsecurity before unlocking the inode, I
suspect that the above _should_ really be part of the 'else' clause. 

That said, I can't see that calling nfs_setsecurity twice on the inode
can cause any unintended side-effects, so I suggest that we rather queue
the patch up for inclusion in 3.12.
Steve and Dave, any comments?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 16:50 +0100, Nix wrote:
> On 5 Aug 2013, Jeff Layton said:
> 
> > On Mon, 5 Aug 2013 11:04:27 -0400
> > Jeff Layton  wrote:
> >
> >> On Mon, 05 Aug 2013 15:48:01 +0100
> >> Nix  wrote:
> >> 
> >> > On 5 Aug 2013, Jeff Layton stated:
> >> > 
> >> > > On Sun, 04 Aug 2013 16:40:58 +0100
> >> > > Nix  wrote:
> >> > >
> >> > >> I just got this panic on 3.10.4, in the middle of a large parallel
> >> > >> compilation (of Chromium, as it happens) over NFSv3:
> >> > >> 
> >> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference 
> >> > >> at 0008
> >> > >> [16364.527571] IP: [] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527611] PGD 0 
> >> > >> [16364.527626] Oops:  [#1] PREEMPT SMP 
> >> > >> [16364.527656] Modules linked in: [last unloaded: microcode] 
> >> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 
> >> > >> 3.10.4-05315-gf4ce424-dirty #1
> >> > >> [16364.527730] Hardware name: System manufacturer System Product 
> >> > >> Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> >> > >> [16364.527775] task: 88041a97ad60 ti: 8803501d4000 task.ti: 
> >> > >> 8803501d4000
> >> > >> [16364.527813] RIP: 0010:[] [] 
> >> > >> nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527860] RSP: 0018:8803501d5c58  EFLAGS: 00010282
> >> > >> [16364.527889] RAX: 88041a97ad60 RBX: 8803e49c8800 RCX: 
> >> > >> 
> >> > >> [16364.527926] RDX:  RSI: 004a RDI: 
> >> > >> 8803e49c8b54
> >> > >> [16364.527962] RBP: 8803501d5c68 R08: 00015720 R09: 
> >> > >> 
> >> > >> [16364.527998] R10: 7000 R11: 8803501d5d58 R12: 
> >> > >> 8803501d5d58
> >> > >> [16364.528034] R13: 88041bd2bc00 R14:  R15: 
> >> > >> 8803fc9e2900
> >> > >> [16364.528070] FS:  () GS:88042fa0() 
> >> > >> knlGS:
> >> > >> [16364.528111] CS:  0010 DS:  ES:  CR0: 80050033
> >> > >> [16364.528142] CR2: 0008 CR3: 01c0b000 CR4: 
> >> > >> 001407f0
> >> > >> [16364.528177] DR0:  DR1:  DR2: 
> >> > >> 
> >> > >> [16364.528214] DR3:  DR6: 0ff0 DR7: 
> >> > >> 0400
> >> > >> [16364.528303] Stack:
> >> > >> [16364.528316]  8803501d5d58 8803e49c8800 8803501d5cd8 
> >> > >> 81245418 
> >> > >> [16364.528369]   8803516f0bc0 8803d7b7b6c0 
> >> > >> 81215c81 
> >> > >> [16364.528418]  88030007 88041bd2bdc8 8801aabe9650 
> >> > >> 8803fc9e2900 
> >> > >> [16364.528467] Call Trace:
> >> > >> [16364.528485]  [] nlmclnt_proc+0x148/0x5fb
> >> > >> [16364.528516]  [] ? nfs_put_lock_context+0x69/0x6e
> >> > >> [16364.528550]  [] nfs3_proc_lock+0x21/0x23
> >> > >> [16364.528581]  [] do_unlk+0x96/0xb2
> >> > >> [16364.528608]  [] nfs_flock+0x5a/0x71
> >> > >> [16364.528637]  [] locks_remove_flock+0x9e/0x113
> >> > >> [16364.528668]  [] __fput+0xb6/0x1e6
> >> > >> [16364.528695]  [] fput+0xe/0x10
> >> > >> [16364.528724]  [] task_work_run+0x7e/0x98
> >> > >> [16364.528754]  [] do_exit+0x3cc/0x8fa
> >> > >> [16364.528782]  [] ? SyS_wait4+0xa5/0xc2
> >> > >> [16364.528811]  [] do_group_exit+0x6f/0xa2
> >> > >> [16364.528843]  [] SyS_exit_group+0x17/0x17
> >> > >> [16364.528876]  [] system_call_fastpath+0x16/0x1b
> >> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 
> >> > >> 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 
> >> > >> 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 
> >> > >> 38 48 8b 
> >> > >> [16364.529176] RIP [] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.529264]  RSP 
> >> > >> [16364.529283] CR2: 0008
> >> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
> [...]
> > The listing and disassembly from nlmclnt_proc is not terribly
> > interesting unfortunately. You really want to do the listing and
> > disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).
> 
> Oh, sorry! Wrong end of the oops :)
> 
> 0x81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131).
> 126 struct nlm_args *argp = &req->a_args;
> 127 struct nlm_lock *lock = &argp->lock;
> 128
> 129 nlmclnt_next_cookie(&argp->cookie);
> 130 memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), 
> sizeof(struct nfs_fh));
> 131 lock->caller  = utsname()->nodename;
> 132 lock->oh.data = req->a_owner;
> 133 lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), 
> "%u@%s",
> 134 (unsigned 
> int)fl->fl_u.nfs_fl.owner->pid,
> 135 utsname()->nodename);
> 
>0x81245102 <+0>: callq  0x81613b00 <__fentry__>
>0x81245107 <+5>: push   %rbp
>0x81245108 <+6>:

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 16:15:01 +
> "Myklebust, Trond"  wrote:
> 
> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust 
> > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> >  nlmclnt_setlockargs
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> > 
> > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > which case we're in entirely the wrong namespace.
> > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > exit_task_namespaces() outside of exit_notify()) now means that
> > exit_task_work() is called after exit_task_namespaces(), which
> > triggers an Oops when we're freeing up the locks.
> > 
> > Signed-off-by: Trond Myklebust 
> > Cc: Toralf Förster 
> > Cc: Oleg Nesterov 
> > Cc: Nix 
> > Cc: Jeff Layton 
> > ---
> >  fs/lockd/clntproc.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > index 9760ecb..acd3947 100644
> > --- a/fs/lockd/clntproc.c
> > +++ b/fs/lockd/clntproc.c
> > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, 
> > struct file_lock *fl)
> >  {
> > struct nlm_args *argp = &req->a_args;
> > struct nlm_lock *lock = &argp->lock;
> > +   char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> >  
> > nlmclnt_next_cookie(&argp->cookie);
> > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct 
> > nfs_fh));
> > -   lock->caller  = utsname()->nodename;
> > +   lock->caller  = nodename;
> > lock->oh.data = req->a_owner;
> > lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > -   utsname()->nodename);
> > +   nodename);
> > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > lock->fl.fl_start = fl->fl_start;
> > lock->fl.fl_end = fl->fl_end;
> 
> Looks good to me...
> 
> Reviewed-by: Jeff Layton 
> 
> Trond, any thoughts on the other oops that Nix posted? The issue there
> seems to be that we're trying to do the pathwalk to the rpcbind unix
> socket from exit_task_work(), but that's happening after we've already
> called exit_fs().
> 
> The trivial answer seems to be to simply call exit_task_work() before
> exit_fs() there, but it seems like we ought to be doing the upcall to
> rpcbind in a mount namespace from which we know we can reach the
> socket...

Isn't it enough to just do the same thing as we did for gss proxy? i.e.
set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.

See attachment.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From ab56d77893815b1b9f0aaa7a89cee7c832a31cff Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH] SUNRPC: Don't auto-disconnect from the local rpcbind socket

There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix 
Signed-off-by: Trond Myklebust 
Cc: Jeff Layton 
Cc: sta...@vger.kernel.org # 3.9.x
---
 net/sunrpc/rpcb_clnt.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 3df764d..4b00555 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -238,6 +238,15 @@ static int rpcb_create_local_unix(struct net *net)
 		.program	= &rpcb_program,
 		.version	= RPCBVERS_2,
 		.authflavor	= RPC_AUTH_NULL,
+		/*
+		 * We turn off the idle timeout to prevent the kernel
+		 * from automatically disconnecting the socket.
+		 * Otherwise, we'd have to cache the mount namespace
+		 * of the caller and somehow pass that to the socket
+		 * reconnect code.
+		 */
+		.flags		= RPC_CLNT_CREATE_NOPING |
+  RPC_CLNT_CREATE_NO_IDLE_TIMEOUT,
 	};
 	struct rpc_clnt *clnt, *clnt4;
 	int result = 0;
-- 
1.8.3.1

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 19:33 +0100, Nix wrote:
> On 5 Aug 2013, Trond Myklebust told this:
> > Does the attached patch fix the problem?
> 
> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust 
> > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> >  nlmclnt_setlockargs
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> 
> It makes it worse. Much, much worse. From a crash every so often when
> I'm doing compilations over NFS, I get an immediate panic on startx,
> long long before I even try to replicate the earlier panic:
> 
> [   83.432358] task: 88041aaa5ac0 ti: 8804199e2000 task.ti: 
> 8804199e2000
> [   83.432428] RIP: 0010:[] [] 
> encode_nlm4_lock+0x26/0xbe
> [   83.432512] RSP: 0018:8804199e3a78  EFLAGS: 00010286
> [   83.432564] RAX:  RBX: 88041a577038 RCX: 
> 
> [   83.432630] RDX: 8804193b3098 RSI: 88041a577038 RDI: 
> 008c
> [   83.432697] RBP: 8804199e3aa8 R08: 8804193b3098 R09: 
> 0001
> [   83.432763] R10: 88042fa12980 R11: 88042fa12980 R12: 
> 8804199e3ae8
> [   83.432830] R13: 008c R14: 8804199e3fd8 R15: 
> 815de80e
> [   83.432898] FS:  7f594b40c740() GS:88042fa0() 
> knlGS:
> [   83.432974] CS:  0010 DS:  ES:  CR0: 80050033
> [   83.433028] CR2: 008c CR3: 00041ab3d000 CR4: 
> 001407f0
> [   83.433095] DR0:  DR1:  DR2: 
> 
> [   83.433176] DR3:  DR6: 0ff0 DR7: 
> 0400
> [   83.433255] Stack:
> [   83.433276]  88041a44fb70 88040004 8804199e3ae8 
> 88041a577010 
> [   83.433360]  8804188e0e00 8804199e3fd8 8804199e3ac8 
> 8124b0d7 
> [   83.433443]  8804188e0e00 8124b086 8804199e3b38 
> 815e6032 
> [   83.433616] Call Trace:
> [   83.433646]  [] nlm4_xdr_enc_lockargs+0x51/0x76
> [   83.433707]  [] ? nlm4_xdr_enc_cancargs+0x56/0x56
> [   83.433769]  [] rpcauth_wrap_req+0x57/0x62
> [   83.433826]  [] call_transmit+0x17c/0x1f9
> [   83.433880]  [] __rpc_execute+0xe8/0x2ca
> [   83.433935]  [] rpc_execute+0x76/0x9d
> [   83.433986]  [] rpc_run_task+0x78/0x80
> [   83.434039]  [] rpc_call_sync+0x88/0x9e
> [   83.434092]  [] nlmclnt_call+0xb5/0x240
> [   83.434146]  [] nlmclnt_proc+0x226/0x5fb
> [   83.434226]  [] nfs3_proc_lock+0x21/0x23
> [   83.434280]  [] do_setlk+0x65/0xee
> [   83.434329]  [] nfs_lock+0x14e/0x162
> [   83.434382]  [] vfs_lock_file+0x29/0x35
> [   83.434435]  [] fcntl_setlk+0x139/0x2c5
> [   83.434490]  [] SyS_fcntl+0x2b6/0x47d
> [   83.434543]  [] system_call_fastpath+0x16/0x1b
> [   83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 
> e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef  
> ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00 
> [   83.435077] RIP [] encode_nlm4_lock+0x26/0xbe
> [   83.435140]  RSP 
> [   83.435197] CR2: 008c
> 
> That's here:
> 
> (gdb) list *(encode_nlm4_lock+0x26)
> 0x8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329).
> 324  *  string caller_name;
> 325  */
> 326 static void encode_caller_name(struct xdr_stream *xdr, const char 
> *name)
> 327 {
> 328 /* NB: client-side does not set lock->len */
> 329 u32 length = strlen(name);
> 330 __be32 *p;
> 331
> 332 p = xdr_reserve_space(xdr, 4 + length);
> 333 xdr_encode_opaque(p, name, length);
> 
>0x8124af69 <+38>:repnz scas %es:(%rdi),%al
> 
> Pretty clearly, "name" can be NULL after this patch...
> 
Yes. This scheme will only work if we make sure that host->h_rpcclnt is
initialised at mount time. Here is a v2 patch that should do the right
thing.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 9a1b6bf818e74bb7aabaecb59492b739f2f4d742 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 12:06:12 -0400
Subject: [PATCH v2] LOCKD: Don't call utsname()->nodename from
 nlmclnt_setlockargs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
which case we're in entirely the wrong namespace.

Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
exit_task_namespaces() outside of exit_notify()) now means that
exit_task_work() is called after exit_task_namespaces(), which
triggers an Oops when we're freeing up the locks.

Fix this by ensuring that we initialise the nlm_host's rpc_client at mount
time, so that the cl_nodename field is initialised to the value of
utsname()->nodename that the net namespace uses. Then replace the
lockd callers of utsn

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond

On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 18:18:03 +
> "Myklebust, Trond"  wrote:
> 
> > On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > > On Mon, 5 Aug 2013 16:15:01 +
> > > "Myklebust, Trond"  wrote:
> > > 
> > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > > From: Trond Myklebust 
> > > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > > >  nlmclnt_setlockargs
> > > > MIME-Version: 1.0
> > > > Content-Type: text/plain; charset=UTF-8
> > > > Content-Transfer-Encoding: 8bit
> > > > 
> > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > > which case we're in entirely the wrong namespace.
> > > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > > exit_task_namespaces() outside of exit_notify()) now means that
> > > > exit_task_work() is called after exit_task_namespaces(), which
> > > > triggers an Oops when we're freeing up the locks.
> > > > 
> > > > Signed-off-by: Trond Myklebust 
> > > > Cc: Toralf Förster 
> > > > Cc: Oleg Nesterov 
> > > > Cc: Nix 
> > > > Cc: Jeff Layton 
> > > > ---
> > > >  fs/lockd/clntproc.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > > index 9760ecb..acd3947 100644
> > > > --- a/fs/lockd/clntproc.c
> > > > +++ b/fs/lockd/clntproc.c
> > > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst 
> > > > *req, struct file_lock *fl)
> > > >  {
> > > > struct nlm_args *argp = &req->a_args;
> > > > struct nlm_lock *lock = &argp->lock;
> > > > +   char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > > >  
> > > > nlmclnt_next_cookie(&argp->cookie);
> > > > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), 
> > > > sizeof(struct nfs_fh));
> > > > -   lock->caller  = utsname()->nodename;
> > > > +   lock->caller  = nodename;
> > > > lock->oh.data = req->a_owner;
> > > > lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), 
> > > > "%u@%s",
> > > > (unsigned 
> > > > int)fl->fl_u.nfs_fl.owner->pid,
> > > > -   utsname()->nodename);
> > > > +   nodename);
> > > > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > > > lock->fl.fl_start = fl->fl_start;
> > > > lock->fl.fl_end = fl->fl_end;
> > > 
> > > Looks good to me...
> > > 
> > > Reviewed-by: Jeff Layton 
> > > 
> > > Trond, any thoughts on the other oops that Nix posted? The issue there
> > > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > > socket from exit_task_work(), but that's happening after we've already
> > > called exit_fs().
> > > 
> > > The trivial answer seems to be to simply call exit_task_work() before
> > > exit_fs() there, but it seems like we ought to be doing the upcall to
> > > rpcbind in a mount namespace from which we know we can reach the
> > > socket...
> > 
> > Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> > set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
> > 
> > See attachment.
> 
> Yeah, that looks like a reasonable thing to do...
> 
> OTOH, Is there any other way for a unix socket to end up disconnected
> other than if we were to close it? Maybe if rpcbind stopped, the socket
> unlinked and recreated and then started again?
> 
> If so then you still could potentially end up in this situation even if
> you didn't autoclose it.

True. How about something like the following instead. Note the change to
the original patch...
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 00326ed6442c66021cd4b5e19e80f3e2027d5d42 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH v2 1/2] SUNRPC:

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond

On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> On 6 Aug 2013, Trond Myklebust verbalised:
> > True. How about something like the following instead. Note the change to
> > the original patch...
> 
> Well, with those applied I could reboot without a panic for the first
> time since 3.8.x: looking good. I'll give it a reboot or two with a
> system that's not hot from booting though.
> 

Could you please also try applying only the 1/2 patch, to see if that
suffices to quell the shutdown panic?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-07 Thread Myklebust, Trond

On Wed, 2013-08-07 at 22:01 +0100, Nix wrote:
> On 7 Aug 2013, Trond Myklebust said:
> 
> > On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> >> On 6 Aug 2013, Trond Myklebust verbalised:
> >> > True. How about something like the following instead. Note the change to
> >> > the original patch...
> >> 
> >> Well, with those applied I could reboot without a panic for the first
> >> time since 3.8.x: looking good. I'll give it a reboot or two with a
> >> system that's not hot from booting though.
> >
> > Could you please also try applying only the 1/2 patch, to see if that
> > suffices to quell the shutdown panic?
> 
> It doesn't suffice. I see this severely truncated oops:
> 
> [  115.799092] BUG: unable to handle kernel NULL pointer dereference at 
> 0008
> [  115.800284] IP: [] path_init+0x11c/0x36f
> [  115.801463] PGD 0 
> [  115.802625] Oops:  [#1] PREEMPT SMP 
> [  115.803805] Modules linked in: [last unloaded: microcode] 
> [  115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted 
> 3.10.5-05317-g3c9f6fa-dirty #2
> [  115.806207] Hardware name: System manufacturer System Product 
> Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> [  115.807453] task: 8804189a ti: 8803f74d6000 task.ti: 
> 8803f74d6000
> 
OK. Then I'll mark them both for stable inclusion in 3.9+.

Thanks for testing!
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: [GIT PULL] x86 fixes for 3.11-rc2

2013-07-18 Thread Myklebust, Trond

On Thu, 2013-07-18 at 17:46 -0700, Linus Torvalds wrote:

> Finnish is hard. But good for swearing.

Only because the ratio of vowels to consonants causes an immediate
outbreak of swearing among those who try...

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

[GIT PULL] Please pull NFS client fixes

2013-07-19 Thread Myklebust, Trond

Hi Linus,

The following changes since commit ad81f0545ef01ea651886dddac4bef6cec930092:

  Linux 3.11-rc1 (2013-07-14 15:18:27 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-3.11-3

for you to fetch changes up to b4a2cf76ab7c08628c62b2062dacefa496b59dfd:

  NFSv4: Fix a regression against the FreeBSD server (2013-07-17 16:54:46 -0400)


NFS client bugfixes for 3.11

- Fix a regression against NFSv4 FreeBSD servers when creating a new file
- Fix another regression in rpc_client_register()


Trond Myklebust (2):
  SUNRPC: Fix another issue with rpc_client_register()
  NFSv4: Fix a regression against the FreeBSD server

 fs/nfs/nfs4xdr.c  | 21 ++---
 net/sunrpc/clnt.c |  1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com

Re: Linux 3.11-rc2

2013-07-22 Thread Myklebust, Trond

On Tue, 2013-07-23 at 03:04 +0200, rydb...@euromail.se wrote:
> Hi Trond, Linus,
> 
> On Sun, Jul 21, 2013 at 12:53:10PM -0700, Linus Torvalds wrote:
> > So it's been another week, and -rc2 is out there.
> 
> This one happens to break nfs in a rather blunt-instrument fashion -
> creating files on a nfs4 partition [1] no longer works. Bisection
> yields this commit as the culprit:
> 
> commit b4a2cf76ab7c08628c62b2062dacefa496b59dfd
> Author: Trond Myklebust 
> Date:   Wed Jul 17 16:43:16 2013 -0400
> 
> NFSv4: Fix a regression against the FreeBSD server
> 
> Technically, the Linux client is allowed by the NFSv4 spec to send
> 3 word bitmaps as part of an OPEN request. However, this causes the
> current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors.
> 
> Fix the regression by making the Linux client use a 2 word bitmap unless
> doing NFSv4.2 with labeled NFS.
> 
> Signed-off-by: Trond Myklebust 
> 
> Reverting the patch returns things to normal.

- Can you please provide me with a binary tcpdump or wireshark dump that
demonstrates the problem?

- What server is this?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

1 2 >

1 - 100 of 175 matches

Mail list logo