Re: [PATCH 06/19] list: add a new LRU list type

2012-11-28 Thread Christoph Hellwig
On Wed, Nov 28, 2012 at 10:14:33AM +1100, Dave Chinner wrote:
> From: Dave Chinner 
> 
> Several subsystems use the same construct for LRU lists - a list
> head, a spin lock and and item count. They also use exactly the same
> code for adding and removing items from the LRU. Create a generic
> type for these LRU lists.
> 
> This is the beginning of generic, node aware LRUs for shrinkers to
> work with.

Can you please add kerneldoc comments for the functions, and add
symbolic constants for the possible return values from the isolate
callback?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/19] xfs: convert dquot cache lru to list_lru

2012-11-28 Thread Christoph Hellwig
> + if (!xfs_dqflock_nowait(dqp))
> + xfs_dqunlock(dqp);
> + goto out_miss_busy;

This seems to miss braces.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] writeback: replace custom worker pool implementation with unbound workqueue

2013-03-21 Thread Christoph Hellwig
On Thu, Mar 21, 2013 at 12:32:52PM +0100, Jan Kara wrote:
> On Wed 20-03-13 22:07:18, Tejun Heo wrote:
> > Hello, Dave.
> > 
> > On Thu, Mar 21, 2013 at 12:57:21PM +1100, Dave Chinner wrote:
> > > When you have a system that has 50+ active filesystems (pretty
> > > common in the distributed storage environments were every disk has
> > > it's own filesystem), knowing which filesystem(s) are getting stuck
> > > in writeback from the sysrq-w or hangcheck output is pretty damn
> > > important
> > 
> > Hmm... I guess, then, what we can do is adding a worker description
> > which is printed out with task dump.  ie. A work function can call
> > work_set_desc(fmt, arg).  It won't be expensive and can be
> > implemented relatively easily.  Does that sound good enough?
>   Yeah, that sounds good! Thanks!

How about automatically doing this based on the workqueue name?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.5.2: moving files from xfs/disk -> nfs: radix_tree_lookup_slot+0xe/0x10

2012-09-01 Thread Christoph Hellwig
I'd suspect it's something with the actual radix tree code, Ccing
linux-mm in case they know more.

On Mon, Aug 27, 2012 at 11:00:10AM -0400, Justin Piszcz wrote:
> Hi,
> 
> Moving ~276GB of files (mainly large backups) and everything has
> seemed to lockup on the client moving data to the server, it is still
> in this state..
> 
> [75716.705697] INFO: task sync:8790 blocked for more than 120 seconds.
> [75716.705701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [75716.705703] syncD 88040ec54830 0  8790   2141 
> 0x
> [75716.705708]  88001fff1d08 0086 81862fc0
> 88001fff1fd8
> [75716.705713]  88001fff1fd8 4000 88041d958670
> 88040ec54830
> [75716.705716]  88001fff1c38 812dcaee 88001fff1c58
> 88001fff1d78
> [75716.705720] Call Trace:
> [75716.705729]  [] ? radix_tree_lookup_slot+0xe/0x10
> [75716.705733]  [] ? find_get_pages_tag+0xc6/0x150
> [75716.705738]  [] ? __enqueue_entity+0x70/0x80
> [75716.705742]  [] ? __sync_filesystem+0x90/0x90
> [75716.705747]  [] schedule+0x24/0x70
> [75716.705751]  [] schedule_timeout+0x1a9/0x210
> [75716.705755]  [] ? calc_period_shift+0x60/0x60
> [75716.705760]  [] ? check_preempt_curr+0x75/0xa0
> [75716.705764]  [] wait_for_common+0xc0/0x150
> [75716.705767]  [] ? try_to_wake_up+0x280/0x280
> [75716.705770]  [] ? __sync_filesystem+0x90/0x90
> [75716.705773]  [] wait_for_completion+0x18/0x20
> [75716.705777]  [] writeback_inodes_sb_nr+0x77/0xa0
> [75716.705782]  [] ?
> shrink_dcache_for_umount_subtree+0x111/0x1d0
> [75716.705785]  [] writeback_inodes_sb+0x29/0x40
> [75716.705788]  [] __sync_filesystem+0x47/0x90
> [75716.705791]  [] sync_one_sb+0x1b/0x20
> [75716.705795]  [] iterate_supers+0xe1/0xf0
> [75716.705798]  [] sys_sync+0x2b/0x60
> [75716.705802]  [] system_call_fastpath+0x1a/0x1f
> [75836.701197] INFO: task sync:8790 blocked for more than 120 seconds.
> 
> Thoughts?
> 
> Justin.
> 
> ___
> xfs mailing list
> x...@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] VFS: File System Mount Wide O_DIRECT Support

2012-09-04 Thread Christoph Hellwig
On Tue, Sep 04, 2012 at 06:17:47PM +0800, Li Wang wrote:
> For file system created on file-backed loop device, there will be two-levels 
> of 
> page cache present, which typically doubles the memory consumption. 

And the right fix is to not use buffer I/O on the backing file instead
of hacks like this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS status update for August 2012

2012-09-13 Thread Christoph Hellwig
After the release of Linux 3.5 in July, August saw the big XFS merge for
the Linux 3.6 merge window.  In the meantime the XFS tree has seen low
activity, with refined support for SEEK_HOLE/SEEK_DATA that takes unwritten
extents into account as the main feature.  A couple of fixes also went in
and mostly got back ported to mainline for Linux 3.5 as well.

The mailing list had a lot of traffic about interesting updates that didn't
make it into the tree yet, with the most interesting being: remount support
for the inode64 option, a patch series allowing to trim back large
preallocations when the filesystem is out of space, and a large rewrite
of the XFS-internal sync code.

On the user land side xfs_db saw a fix for a very long standing bug go into
the tree and xfstests saw a few fixes.  The mailing list list saw various other
interesting updates including a German translation for the xfsdump messages,
32-bit project ID support in xfsdump, a lot of xfstests updates including a
patch to use the upstream util-linux version of fstrim in xfstests, support
for tmpfs in xfstests, various ext3 and btrfs updates, but none of them
made it into the git tree.  There also was a lot of discussion of older
patch to reorganize the xfstests source tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NFS]: Lock daemon start/stop rework.

2008-01-30 Thread Christoph Hellwig
On Wed, Jan 30, 2008 at 02:41:34PM +0300, Denis V. Lunev wrote:
> The pid of the locking daemon can be substituted with a task struct
> without a problem. Namely, the value if filled in the context of the lockd
> thread and used in lockd_up/lockd_down.
> 
> It is possible to save task struct instead and use it to kill the process.
> The safety of this operation is guaranteed by the RCU, i.e. task can't
> disappear without passing a quiscent state.

We have a patch series pending on the nfs list that does this plus a lot
more in the area.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] add perform_write to a_ops

2008-02-04 Thread Christoph Hellwig
On Mon, Feb 04, 2008 at 06:04:10PM +0100, Miklos Szeredi wrote:
> a_ops->perform_write() was left out from Nick Piggin's new a_ops
> patchset, as it was non-essential, and postponed for later inclusion.
> 
> This short series reintroduces it, but only adds the fuse
> implementation and not simple_perform_write(), which I'm not sure
> would be a significant improvement.
> 
> This allows larger than 4k buffered writes for fuse, which is one of
> the most requested features.
> 
> This goes on top of the "fuse: writable mmap" patches.

Please don't do this, but rather implement your own .aio_write.  There's
very little in generic_file_aio_write that wouldn't be handle by
->perform_write and we should rather factor those up or move to higher
layers than adding this ill-defined abstraction.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] add perform_write to a_ops

2008-02-04 Thread Christoph Hellwig
On Mon, Feb 04, 2008 at 09:52:06PM +0100, Miklos Szeredi wrote:
> Moving up to higher layers might not be possible, due to lock/unlock
> of i_mutex being inside generic_file_aio_write().

Well some bits can be moved up.  Here's my grand plan which I plan
to implement once I get some time for it (or let someone else do
if they beat me):

 - generic_segment_checks goes to fs/read_write.c before caling into
   the filesystem
 - dito for vfs_check_frozen
 - generic_write_checks is a suitable helper already
 - dito for remove_suid
 - dito for file_update_time
 - after that there's not a whole lot left in generic_file_aio_write,
   except for direct I/O handling which will probably be very fs-specific
   if you have your own buffered I/O code

generic_file_buffered_write is an almost trivial wrapper around what's
->perform_write in Nick's earlier patches and a helper for the syncing
activity.



> 
> But with fuse being the only user, it's not a huge issue duplicating
> some code.
> 
> Nick, were there any other candidates, that would want to use such an
> interface in the future?
> 
> Thanks,
> Miklos
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 05/49] audit: add a new "type" field to audit_names struct

2012-10-02 Thread Christoph Hellwig
> +#define  AUDIT_TYPE_UNKNOWN  0   /* we don't know yet */
> +#define AUDIT_TYPE_NORMAL1   /* a "normal" audit record */

I don't care about tab vs space after the #define, but at least be
consistent.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 02/49] audit: pass in dentry to audit_copy_inode wherever possible

2012-10-02 Thread Christoph Hellwig
On Mon, Oct 01, 2012 at 08:16:11PM -0400, Jeff Layton wrote:
> In some cases, we were passing in NULL even when we have a dentry.
> 
> Reported-by: Eric Paris 
> Signed-off-by: Jeff Layton 
> ---
>  kernel/auditsc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 4b96415..5c45b9b 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2226,7 +2226,7 @@ void __audit_inode_child(const struct dentry *dentry,
>   if (!strcmp(dname, n->name) ||
>!audit_compare_dname_path(dname, n->name, &dirlen)) {
>   if (inode)
> - audit_copy_inode(n, NULL, inode);
> + audit_copy_inode(n, dentry, inode);

Btw, the calling conventions here also seems fairly ugly.

Instead of the optional dentry parameter I'd have a audit_copy_inode
that takes just the name and the inode, and an optional direct  call
to audit_copy_fcaps for those callers that have a dentry.  That would
also allow removing the branch for the dentry == NULL case in
audit_copy_fcaps.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 03/49] audit: no need to walk list in audit_inode if name is NULL

2012-10-02 Thread Christoph Hellwig
On Mon, Oct 01, 2012 at 08:16:12PM -0400, Jeff Layton wrote:
> If name is NULL then the condition in the loop will never be true. Also,
> with this change, we can eliminate the check for n->name == NULL since
> the equivalence check will never be true if it is.

Given that name == NULL is a static condition it seems like these
should be two different calls, E.g. audit_dentry and audit_path.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] target/file: Re-enable optional fd_buffered_io=1 operation

2012-10-03 Thread Christoph Hellwig
On Tue, Oct 02, 2012 at 01:16:44PM -0700, Nicholas A. Bellinger wrote:
>  * Optionally allow fd_buffered_io=1 to be enabled for people
>  * who want use the fs buffer cache as an WriteCache mechanism.
>  *
>  * This means that in event of a hard failure, there is a risk
>  * of silent data-loss if the SCSI client has *not* performed a
>  * forced unit access (FUA) write, or issued SYNCHRONIZE_CACHE
>  * to write-out the entire device cache.
>  */

Oh, I get Vlads flame.  This doesn't simply disable O_DSYNC now but also
sets WCE=1.  In this case I don't really get the point of the patch, why
can't we simply set it from configfs?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/21] ocfs2: drop vmtruncate

2012-10-23 Thread Christoph Hellwig
On Tue, Oct 23, 2012 at 01:54:48AM -0700, Joel Becker wrote:
> On Sat, Oct 20, 2012 at 02:19:00PM +0200, Marco Stornelli wrote:
> > Removed vmtruncate
> > 
> > Signed-off-by: Marco Stornelli 
> 
> Acked-by: Joel Becker 
> 
> Do you want me to pull this, or are you going to send it with your set?

Is this really correct?  It now updates i_size before starting a
transaction.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 00/22] loop: Issue O_DIRECT aio using bio_vec

2012-10-23 Thread Christoph Hellwig
On Mon, Oct 22, 2012 at 10:15:00AM -0500, Dave Kleikamp wrote:
> This is the current version of the patchset I presented at the LSF-MM
> Summit in San Francisco in April. I apologize for letting it go so
> long before re-submitting.
> 
> This patchset was begun by Zach Brown and was originally submitted for
> review in October, 2009. Feedback was positive, and I have picked up
> where he left off, porting his patches to the latest mainline kernel
> and adding support more file systems.
> 
> This patch series adds a kernel interface to fs/aio.c so that kernel code can
> issue concurrent asynchronous IO to file systems.  It adds an aio command and
> file system methods which specify io memory with pages instead of userspace
> addresses.
> 
> This series was written to reduce the current overhead loop imposes by
> performing synchronus buffered file system IO from a kernel thread.  These
> patches turn loop into a light weight layer that translates bios into iocbs.
> 
> The downside of this is that in its current implementation, performance takes
> a big hit for non-synchonous I/O, since the underlying page cache is bypassed.
> The tradeoff is that all writes to the loop device make it to the underlying
> media, making loop-mounted file systems recoverable.

It also seems to still not fully kill thr old aio_read/write codepath.
At least XFS isn't touched yet.  It also doesn't seem to kill the nasty
hack for in-kernel direct I/O introduced with the swap over nfs code
(grep for REQ_KERNEL / KERNEL_READ / KERNEL_WRITE)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kconfig/menuconfig: use TAILQ instead of CIRCLEQ

2012-10-23 Thread Christoph Hellwig
On Thu, Oct 18, 2012 at 07:33:45PM +0200, Yann E. MORIN wrote:
> Some systems (eg. Cygwin, FreeBSD) are missing the CIRCLEQ macros.
> They were removed in Y2000 from FreeBSD:
> http://svnweb.freebsd.org/base?view=revision&revision=70469
> 
> The reason was that TAILQ are perfectly capable of doing the exact
> same things:
> 
> http://www.mavetju.org/mail/view_thread.php?list=freebsd-arch&id=915145&thread=yes
> 
> (Thank Yaakov for the pointers!)
> 
> So, switch to using TAILQ instead, which are more portable.

Why not use the kernels list.h?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/21] procfs: drop vmtruncate

2012-10-24 Thread Christoph Hellwig
On Sat, Oct 20, 2012 at 02:17:33PM +0200, Marco Stornelli wrote:
> Removed vmtruncate
> 
> Signed-off-by: Marco Stornelli 

As Al pointed out we probably shouldn't even allow truncate on procfs.
Can look into refusing it instead, please?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: acpi dsts loading and populate_rootfs

2008-02-11 Thread Christoph Hellwig
[skipping the populate_rootfs discussion as it seems you have a better
 handle on that than me]

On Sun, Feb 10, 2008 at 12:58:09PM +0100, Eric Piel wrote:
> >And while we're at it the file reading thing in there is utter crap
> >aswell.  You really should be using the firmware loader which works
> >perfectly fine if you initramfs is set up for it.  So please folks,
> >back to the drawing board, do it properly and send it out to lkml
> >for review please.
> Christoph, if you have seen this part of the code, you have probably 
> also read the big fat warning explaining why this cannot be done by 
> firmware loader (ie: userspace cannot be run at this early time, 
> corresponding to acpi_early_init()). However, you probably know the 
> kernel ten times better than me. Could you explain what I misunderstood 
> when writing this warning, and give me some hints about how to use the 
> firmware loader in this case?

Sorry, I misparsed the comment.  I took it for the usual I'm too lazy
to put something that could load firmware into initramfs excuse.

But thinking about it is there a reason acpi initialization needs to
happen so early that we can't even have userspace in initramfs running?

But if we can't use real userspace this could should at least be written
like the pseudo-userspace in init/do_mounts*.c, using the sys_ syscall
implementations.

As an additional comment the stat + open approach is racy and not a good
idea.  Please just open the file using sys_open, it will tell you
if the file doesn't exist and then use fstat on it to find the
length.  It would also be useful if this kind of code is not hidden
inside acpi but rather done somewhere close to the early init code
because that's where people would expect this kind of nastiness._
syscall
implementations.

As an additional comment the stat + open approach is racy and not a good
idea.  Please just open the file using sys_open, it will tell you
if the file doesn't exist and then use fstat on it to find the
length.  It would also be useful if this kind of code is not hidden
inside acpi but rather done somewhere close to the early init code
because that's where people would expect this kind of nastiness.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 00/30] Read-only bind mounts (-mm resend)

2008-02-11 Thread Christoph Hellwig
On Fri, Feb 08, 2008 at 02:26:41PM -0800, Dave Hansen wrote:
> This is against current Linus git
> (a4ffc0a0b240a29cbe489f6db9dae112a49ef1c1).
> 
> This rolls up all the -mm bugfixes that were accumulated, and
> addresses some new review comments from Al.  Also contains some
> reworking from hch and a patch from Jeff Dike.

Dave, you you please send this on to Linus ASAP?  It's getting really
late in the .25 circle.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX 1/2] gdth: scan for scsi devices

2008-02-12 Thread Christoph Hellwig
On Tue, Feb 12, 2008 at 07:35:22PM +0200, Boaz Harrosh wrote:
> 
> The patch: "gdth: switch to modern scsi host registration"
> 
> missed one simple fact when moving a way from scsi_module.c.
> That is to call scsi_scan_host() on the probed host.
> With this the gdth driver from 2.6.24 is again able to
> see drives and boot.

Doh, someone please hand me a brown paper bag.  My first series
of patches had this but it got dropped when I rebased it over various
janitor cleanups.

The patch looks obviously correct, thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] Create and populate toplevel tests/ for kernel tests

2008-02-12 Thread Christoph Hellwig
On Mon, Feb 11, 2008 at 04:14:52PM +0530, Ananth N Mavinakayanahalli wrote:
> The following series of patches create and populate the toplevel tests/
> directory. This will henceforth be the place where all in-kernel tests
> live.
> 
> All patches against 2.6.25-rc1 and are just code movement without any
> change in functionality.

ACK to patches 1-7, and I agree with Ingo that the x86-specific test
should stay under arch/x86.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BTRFS partition usage...

2008-02-12 Thread Christoph Hellwig
On Tue, Feb 12, 2008 at 03:35:57PM -0800, David Miller wrote:
> What XFS does is really unfortunate, let's learn from it's
> mistake.

I'd rather say what Sun did with their disklabels was rather unfortunate :)
But yeah, new filesystem should cater for it's braindamage because it
doesn't have any kind of runtime cost at all.   XFS was designed for
IRIX back then which had disklabels just like the SUN ones, just without
the braindamage of having the disklabel inside the partition..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-13 Thread Christoph Hellwig
On Tue, Feb 12, 2008 at 12:50:51PM -0800, Greg KH wrote:
> I can run the numbers, but almost every one of those changes has at
> least 2 signed-off-by: on them, so they should all be being reviewed
> properly.

Good joke..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add MS_BIND_FLAGS mount flag

2008-02-13 Thread Christoph Hellwig
On Tue, Feb 12, 2008 at 09:45:15PM -0800, Paul Menage wrote:
> From: Paul Menage <[EMAIL PROTECTED]>
>
> Add a new mount() flag, MS_BIND_FLAGS.
>
> MS_BIND_FLAGS indicates that a bind mount should take its per-mount flags
> from the arguments passed to mount() rather than from the source
> mountpoint.
>
> This flag allows you to create a bind mount with the desired per-mount
> flags in a single operation, rather than having to do a bind mount
> followed by a remount, which is fiddly and can block for non-trivial
> periods of time (on sb->s_umount?).
>
> For recursive bind mounts, only the root of the tree being bound
> inherits the per-mount flags from the mount() arguments; sub-mounts
> inherit their per-mount flags from the source tree as usual.

I think this concept is reasonable, but I don't think MS_BIND_FLAGS
is a descriptive name for this flag.  MS_EXPLICIT_FLAGS might be better
but still isn't optimal.

> +static struct vfsmount *
> +__copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> + int flag, int mnt_flags)

> +struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry,
> +int flag)
> +{
> + return __copy_tree(mnt, dentry, flag, mnt->mnt_flags);
> +}

Please just change copy_tree to have an additional parameter.  There's
just four callers of it in the tree, and three of them want the new
parameter.

> + /* Use the source mount flags unless the user passed MS_BIND_FLAGS */

I think this comment could be made a little more explicit.

/*
 * If MS_EXPLICIT_FLAGS is passed in we will take the paramters
 * the user has specified.  The default behaviour for bind
 * mounts is however to take the flags from existing mount
 * instances for the same superblock.
 */
 
> #define MS_RELATIME   (1<<21) /* Update atime relative to mtime/ctime. */
> #define MS_KERNMOUNT  (1<<22) /* this is a kern_mount call */
> #define MS_I_VERSION  (1<<23) /* Update inode I_version field */
> +#define MS_BIND_FLAGS(1<<24) /* For MS_BIND, take mnt_flags from
> +  * args, not from source mountpoint */
> #define MS_ACTIVE (1<<30)
> #define MS_NOUSER (1<<31)

this looks like whitespace damage to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

2008-02-13 Thread Christoph Hellwig
On Wed, Feb 13, 2008 at 11:03:45AM -0600, James Bottomley wrote:
> > I don't understand please explain. 
> > What does a driver need to do if it needs a consistent shutdown retine?
> > module or built in? unload or shutdown?
> 
> It needs to register a reboot notifier, which gdth does.

Well, for crappy legacy driver that's the way, but it's not really
recommended.  As soon as a driver uses the proper driver models,
e.g. gdth for pci using Jeff's pci hotplug patches it can just
implement the ->shutdown method that is called before shutdown/kexec
and can do the right thing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysv: [bl]e*_add_cpu conversion

2008-02-13 Thread Christoph Hellwig
On Wed, Feb 13, 2008 at 12:06:21AM +0100, [EMAIL PROTECTED] wrote:
> From: Marcin Slusarz <[EMAIL PROTECTED]>
> 
> replace all:
> big/little_endian_variable = 
> cpu_to_[bl]eX([bl]eX_to_cpu(big/little_endian_variable) +
>   expression_in_cpu_byteorder);
> with:
>   [bl]eX_add_cpu(&big/little_endian_variable, 
> expression_in_cpu_byteorder);
> generated with semantic patch

The patch looks correct to me, so ACK.

> diff --git a/fs/sysv/sysv.h b/fs/sysv/sysv.h
> index 42d51d1..38ebe3f 100644
> --- a/fs/sysv/sysv.h
> +++ b/fs/sysv/sysv.h
> @@ -217,9 +217,9 @@ static inline __fs32 fs32_add(struct sysv_sb_info *sbi, 
> __fs32 *n, int d)
>   if (sbi->s_bytesex == BYTESEX_PDP)
>   *(__u32*)n = PDP_swab(PDP_swab(*(__u32*)n)+d);
>   else if (sbi->s_bytesex == BYTESEX_LE)
> - *(__le32*)n = cpu_to_le32(le32_to_cpu(*(__le32*)n)+d);
> + le32_add_cpu((__le32 *)n, d);
>   else
> - *(__be32*)n = cpu_to_be32(be32_to_cpu(*(__be32*)n)+d);
> + be32_add_cpu((__be32 *)n, d);
>   return *n;

but now that you've named the be and le primitives *_add_cpu it would
be nice if you could submit a second patch to sysv to rename fs*_add
to fs*_add_cpu aswell.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] spufs support multiple probes markers

2008-02-13 Thread Christoph Hellwig
On Tue, Feb 12, 2008 at 06:56:50PM -0500, Mathieu Desnoyers wrote:
> Update spufs to the new linux kernel markers API, which supports connecting
> more than one probe to a single marker.

Compiles and works for me.  But saying I like the odd API would be lying.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Feature Removals for 2.6.25

2008-02-14 Thread Christoph Hellwig
On Thu, Feb 14, 2008 at 10:24:43AM -0800, Arjan van de Ven wrote:
> Bill Davidsen wrote:
> >Note that because the hardware is old, it's highly likely that most of 
> >it will be retired before that sk98lin driver needs a change. I can't 
> >see anyone using sk98lin on a new system, so it would be less 
> >contentious to let the hardware (or users) die of natural causes if you 
> >can.
> >
> 
> the problem is that the new one DOES NOT GET FIXED.
> THAT is a huge problem; it means we have a buggy driver...

It's also utterly missing the point.  The skge driver supports all the
hardware that sk98lin supports.  If it doesn't work bug Stephen who's
a very responsive maintainer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/10] mount ownership and unprivileged mount syscall (v8)

2008-02-15 Thread Christoph Hellwig
On Thu, Feb 14, 2008 at 10:21:03PM -0800, Andrew Morton wrote:
> Linus has just merged all the VFS renaming patches, so the decks
> are clear for looking at this work.
> 
> However David and Christoph are beavering away on the r-o-bind-mounts
> patches and I expect that there will be overlaps with unprivileged mounts.
> 
> Could we coordinate things a bit please?  Decide who goes first, review
> and maybe even test each others work, etc?

Al is setting up a git tree for VFS work.  per-mount r/o will go in
as one of the first things, aswell as his rework of the path lookup
logic to fix the intents mess.

> 
> Thanks.
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/10] mount ownership and unprivileged mount syscall (v8)

2008-02-15 Thread Christoph Hellwig
On Fri, Feb 15, 2008 at 01:09:51AM -0800, Andrew Morton wrote:
> > > However David and Christoph are beavering away on the r-o-bind-mounts
> > > patches and I expect that there will be overlaps with unprivileged mounts.
> > > 
> > > Could we coordinate things a bit please?  Decide who goes first, review
> > > and maybe even test each others work, etc?
> > 
> > Al is setting up a git tree for VFS work.  per-mount r/o will go in
> > as one of the first things, aswell as his rework of the path lookup
> > logic to fix the intents mess.
> > 
> 
> That didn't answer my question..

Well, Al as the defacto VFS maintainer will decide on the ordering.
Reviewing this stuff properly is still on my todo list, but currently
I'm busy with more important things.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/30] r/o bind mounts: stub functions

2008-02-15 Thread Christoph Hellwig
On Fri, Feb 15, 2008 at 05:11:19PM -0800, Andrew Morton wrote:
> > It would be nice if an initial patch which introduces the new
> > functionality you need for r/o bind mounts could get introduced into
> > mainline *first*, and then people could add patches that call
> > mnt_want_write(), et. al into their trees gradually.
> 
> Yes, I expect that merging a handful of do-nothing mnt_foo_write()
> functions into mainline right now would ease life.
> 
> > otherwise akpm gets grumpy
> 
> itym "less than usually cheery"

Haha,

once we put pieces in the first three patches would be useful aswell,
to easily catch additions in the next cycle that might be adding
NULL-vfsmount calls to dentry_open.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysv: [bl]e*_add_cpu conversion

2008-02-16 Thread Christoph Hellwig
On Thu, Feb 14, 2008 at 02:04:37AM -0500, Christoph Hellwig wrote:
> > diff --git a/fs/sysv/sysv.h b/fs/sysv/sysv.h
> > index 42d51d1..38ebe3f 100644
> > --- a/fs/sysv/sysv.h
> > +++ b/fs/sysv/sysv.h
> > @@ -217,9 +217,9 @@ static inline __fs32 fs32_add(struct sysv_sb_info *sbi, 
> > __fs32 *n, int d)
> > if (sbi->s_bytesex == BYTESEX_PDP)
> > *(__u32*)n = PDP_swab(PDP_swab(*(__u32*)n)+d);
> > else if (sbi->s_bytesex == BYTESEX_LE)
> > -   *(__le32*)n = cpu_to_le32(le32_to_cpu(*(__le32*)n)+d);
> > +   le32_add_cpu((__le32 *)n, d);
> > else
> > -   *(__be32*)n = cpu_to_be32(be32_to_cpu(*(__be32*)n)+d);
> > +   be32_add_cpu((__be32 *)n, d);
> > return *n;
> 
> but now that you've named the be and le primitives *_add_cpu it would
> be nice if you could submit a second patch to sysv to rename fs*_add
> to fs*_add_cpu aswell.

Btw, the same applies to ufs aswell as it's using the same construct.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] ext3 freeze feature

2008-02-16 Thread Christoph Hellwig
On Fri, Feb 15, 2008 at 08:51:15PM +0900, Takashi Sato wrote:
> So XFS_IOC_FREEZE and XFS_IOC_THAW cannot be lifted to generic code simply.
> I think we should create new generic numbers for freeze and thaw

Actually we've lifted specific ioctls to the generic layer before all
the time in drivers.  That's the only way to make functionality that was
specific to a single driver (or in this case filesystem) generic.  If
the numbering issues confuses you make sure to add a big comment
describing it

> And xfs_freeze calls XFS_IOC_FREEZE with a magic number 1, but what is 1?

As Eric said it's ignored.

> Instead, I'd like the sec to timeout on freeze API in order to thaw
> the filesystem automatically.  It can prevent a filesystem from staying
> frozen forever.
> (Because a freezer may cause a deadlock by accessing the frozen filesystem.)

Timeout based locking is generally a horrible idea, there's a reason
we don't have any primitives for that in the kernel :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xfs: silence GCC warning

2012-11-02 Thread Christoph Hellwig
Looks good, Dave has actually sent it a tidbit earlier as part
of his series with fixes for 3.7-rc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-30 Thread Christoph Hellwig
On Mon, Jul 30, 2012 at 09:31:06AM +0200, Paolo Bonzini wrote:
> You only need to add REQ_FLUSH support.  The virtio-blk protocol does
> not support REQ_FUA, because there's no easy way to do it in userspace.

A bio-based driver needs to handle both REQ_FLUSH and REQ_FUA as it does
not get the sequencing of REQ_FUA into REQ_FLUSH that request based drivers
can request.  To what the REQ_FUA request gets translated is a different story.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-30 Thread Christoph Hellwig
On Mon, Jul 30, 2012 at 12:43:12PM +0800, Asias He wrote:
> I think we can add REQ_FLUSH & REQ_FUA support to bio path and that 
> deserves another patch.

Adding it is a requirement for merging the code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-07-30 Thread Christoph Hellwig
On Mon, Jul 30, 2012 at 11:25:51AM +0930, Rusty Russell wrote:
> I consider this approach a half-way step.  Quick attempts on my laptop
> and I couldn't find a case where the bio path was a loss, but in theory
> if the host wasn't doing any reordering and it was a slow device, you'd
> want the guest to do so.
> 
> I'm not sure if current qemu can be configured to do such a thing?


The host kernel will do the I/O scheduling for you unless you explicitly
disable it.  And we should be able to assume an administrator will only
disable it when they have a reason for it - if not they'll get worse
performance for non-virtualized workloads as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hang in XFS reclaim on 3.7.0-rc3

2012-10-30 Thread Christoph Hellwig
On Tue, Oct 30, 2012 at 09:37:06PM +0100, Torsten Kaiser wrote:
> On Mon, Oct 29, 2012 at 11:26 PM, Dave Chinner  wrote:

For some reason I only managed to get the two mails from Torsten into my
xfs list folder, but the one from Dave is missing.  I did see Dave's
mail in my linux-mm folder, though.

> > From: Dave Chinner 
> >
> > Signed-off-by: Dave Chinner 

Looks good,

Reviewed-by: Christoph Hellwig 

And I agree that vmap needs to be fixed to pass through the gfp flags
ASAP.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS status update for September 2012

2012-10-30 Thread Christoph Hellwig
September saw the release of Linux 3.6 which included a slightly smaller
than usual XFS update:

 51 files changed, 2607 insertions(+), 2455 deletions(-)

The bulk of the changes in Linux 3.6 is made up by native support for
discontinuous buffers, with various minor bug fixes and cleanups making up
the rest.  In the meantime the mailing list was busy with multiple revisions
of the sync/unmount code rewrite, and the patch series to make the inode64
mount option remountable and the default.

On the user space side xfsprogs saw new support for the preadv, pwritev and
sync_file_range system calls in xfs_io as well as minor fixes to libxcmd and
mkfs.xfs, xfsdump grew support for dumping and restoring 32-bit project ids
and and saw an update to the German translation.  Xfstests saw a large number
of fixes, and a single new test case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/14] skd: remove SCSI subsystem specific includes

2013-10-01 Thread Christoph Hellwig
On Mon, Sep 30, 2013 at 03:25:47PM +0200, Bartlomiej Zolnierkiewicz wrote:
> This is not a SCSI host driver so remove SCSI subsystem specific
> includes.

The sad thing is that it is a driver for a device speaking SCSI, but for
some reason that was never discussed it is written to the block layer.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/26] mm: Provide get_user_pages_unlocked()

2013-10-02 Thread Christoph Hellwig
On Wed, Oct 02, 2013 at 04:27:57PM +0200, Jan Kara wrote:
> Provide a wrapper for get_user_pages() which takes care of acquiring and
> releasing mmap_sem. Using this function reduces amount of places in
> which we deal with mmap_sem.
> 
> Signed-off-by: Jan Kara 

Seem like this should be the default one and the one without the locked
should be __unlocked.  Maybe a grand rename is in order?


get_user_pages_fast -> get_user_pages
get_user_pages_unlocked -> __get_user_pages
get_user_pages_unlocked -> __get_user_pages_locked

steering people to the most sensible ones by default?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Crypto: change all occurances of "kerneli" to kernel Bug #60848

2013-10-02 Thread Christoph Hellwig
On Wed, Oct 02, 2013 at 11:56:23AM -0400, Kevin Mulvey wrote:
> replace all kerneli with kernel

kerneli is correct in this case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] percpu_counter: fix irq restore in __percpu_counter_add

2013-10-03 Thread Christoph Hellwig
The current version of "percpu_counter: make APIs irq safe" in the blk-mq
tree doesn't properly restore irqs, thus causing the boot of the blk-multique
tree to fail with various irqs_disabled() BUGs and related issues.
 
Signed-off-by: Christoph Hellwig 

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 1ae43a7..7473ee3 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -82,7 +82,7 @@ void __percpu_counter_add(struct percpu_counter *fbc, s64 
amount, s32 batch)
unsigned long flags;
raw_spin_lock_irqsave(&fbc->lock, flags);
fbc->count += count;
-   raw_spin_unlock(&fbc->lock);
+   raw_spin_unlock_irqrestore(&fbc->lock, flags);
__this_cpu_write(*fbc->counters, 0);
} else {
__this_cpu_write(*fbc->counters, count);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-10-03 Thread Christoph Hellwig
On Sat, Jul 20, 2013 at 09:48:28AM -0500, Mike Christie wrote:
> What about the attached only compile tested patch. The patch has the mq
> block code work like the non mq code for bio cleanups.
> 
> 

> blk-mq: blk-mq should free bios in pass through case
> 
> For non mq calls, the block layer will free the bios when
> blk_finish_request is called.
e 
> For mq calls, the blk mq code wants the caller to do this.
> 
> This patch has the blk mq code work like the non mq code
> and has the block layer free the bios.
> 
> Signed-off-by: Mike Christie 

This patch breaks booting for me in the current blk multiqueue tree,
with an apparent double free of a bio when using virtio-blk in writeback
mode (cache=writeback or cache=none in qemu):

[   15.253608] [ cut here ]
[   15.256422] kernel BUG at /work/hch/linux/fs/bio.c:498!
[   15.256879] invalid opcode:  [#1] SMP 
[   15.256879] Modules linked in:
[   15.256879] CPU: 3 PID: 353 Comm: kblockd Not tainted 3.11.0+ #25
[   15.256879] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   15.256879] task: 88007d75e0c0 ti: 88007d676000 task.ti: 
88007d676000
[   15.256879] RIP: 0010:[]  [] 
bio_put+0x8a/0x90
[   15.256879] RSP: 0018:88007fd83b50  EFLAGS: 00010046
[   15.256879] RAX:  RBX: 88007d713080 RCX: 0035
[   15.256879] RDX: 0002 RSI: 88007ad50338 RDI: 88007d713080
[   15.256879] RBP: 88007fd83b60 R08: 7010 R09: 007ad5033808
[   15.256879] R10: ff672b1b7d38ce02 R11: 028b R12: 
[   15.256879] R13:  R14: 88007b4c36c0 R15: 88007b40d608
[   15.256879] FS:  () GS:88007fd8() 
knlGS:
[   15.256879] CS:  0010 DS:  ES:  CR0: 8005003b
[   15.256879] CR2: 0138 CR3: 02124000 CR4: 06e0
[   15.256879] Stack:
[   15.256879]  88007d713080  88007fd83b80 
811ae8a3
[   15.256879]  88007fd83bf0 1000 88007fd83b90 
811b3268
[   15.256879]  88007fd83bc0 816ac847 88007b4c36c0 
88007fd99d00
[   15.256879] Call Trace:
[   15.256879]   
[   15.256879]  [] end_bio_bh_io_sync+0x33/0x50
[   15.256879]  [] bio_endio+0x18/0x30
[   15.256879]  [] blk_mq_complete_request+0x47/0xd0
[   15.256879]  [] __blk_mq_end_io+0x19/0x20
[   15.256879]  [] blk_mq_end_io+0x68/0xd0
[   15.256879]  [] blk_flush_complete_seq+0xe2/0x370
[   15.256879]  [] flush_end_io+0x11b/0x200
[   15.256879]  [] blk_mq_complete_request+0x75/0xd0
[   15.256879]  [] __blk_mq_end_io+0x19/0x20
[   15.256879]  [] blk_mq_end_io+0x68/0xd0
[   15.256879]  [] virtblk_done+0xef/0x260
[   15.256879]  [] vring_interrupt+0x30/0x60
[   15.256879]  [] handle_irq_event_percpu+0x54/0x1f0
[   15.256879]  [] handle_irq_event+0x43/0x70
[   15.256879]  [] handle_edge_irq+0x6f/0x120
[   15.256879]  [] handle_irq+0x58/0x140
[   15.256879]  [] ? irq_enter+0x4f/0x90
[   15.256879]  [] do_IRQ+0x55/0xd0
[   15.256879]  [] common_interrupt+0x72/0x72
[   15.256879]  [] ? sched_clock_local+0x25/0xa0
[   15.256879]  [] ? __do_softirq+0xb0/0x250
[   15.256879]  [] ? __do_softirq+0xa9/0x250
[   15.256879]  [] irq_exit+0xae/0xd0
[   15.256879]  [] smp_apic_timer_interrupt+0x45/0x60
[   15.256879]  [] apic_timer_interrupt+0x72/0x80
[   15.256879]   
[   15.256879]  [] ? retint_restore_args+0x13/0x13
[   15.256879]  [] ? _raw_spin_unlock_irq+0x32/0x40
[   15.256879]  [] ? _raw_spin_unlock_irq+0x2b/0x40
[   15.256879]  [] rescuer_thread+0xe4/0x2f0
[   15.256879]  [] ? process_scheduled_works+0x40/0x40
[   15.256879]  [] kthread+0xd6/0xe0
[   15.256879]  [] ? _raw_spin_unlock_irq+0x2b/0x40
[   15.256879]  [] ? __init_kthread_worker+0x70/0x70
[   15.256879]  [] ret_from_fork+0x7c/0xb0
[   15.256879]  [] ? __init_kthread_worker+0x70/0x70
[   15.256879] Code: ff 41 8b 44 24 08 48 89 df 49 8b 74 24 10 48 29 c7 e8 cb 
88 f8 ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 90 48 89 df e8 b8 5c fc ff eb 9b <0f> 0b 
0f 1f 40 00 55 48 89 e5 41 57 45 31 ff 41 56 41 55 41 54 
[   15.256879] RIP  [] bio_put+0x8a/0x90
[   15.256879]  RSP 
[   15.256879] ---[ end trace 1f201608bfddfca7 ]---
[   15.256879] Kernel panic - not syncing: Fatal exception in interrupt
[   15.256879] Shutting down cpus with NMI

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/17] RCU'd vfsmounts

2013-10-04 Thread Christoph Hellwig
On Fri, Oct 04, 2013 at 03:53:51AM +0100, Al Viro wrote:
> Speaking of those two - I really want to see file_table.c one killed.
> Christoph, do you have anything along the lines of getting rid of the
> mark_files_ro() nonsense?  After all, a combination of r/w vfsmount
> and a superblock with MS_RDONLY in flags should do about the right thing
> these days...  I can probably knock something together tomorrow, but
> you've brought that thing up quite a few times, so if you happen to have
> a patch more or less ready...

I used to have a patch for it that was also sent to the list long ago,
but it got rid of the sysrq emergency remount r/o, which people didn't like.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] [PATCH 1/5] percpu_counter: fix irq restore in __percpu_counter_add

2013-10-04 Thread Christoph Hellwig
The current version of "percpu_counter: make APIs irq safe" in the blk-mq
tree doesn't properly restore irqs, thus causing the boot of the blk-multique
tree to fail with various irqs_disabled() BUGs and related issues.

Signed-off-by: Christoph Hellwig 
---
 lib/percpu_counter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 1ae43a7..7473ee3 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -82,7 +82,7 @@ void __percpu_counter_add(struct percpu_counter *fbc, s64 
amount, s32 batch)
unsigned long flags;
raw_spin_lock_irqsave(&fbc->lock, flags);
fbc->count += count;
-   raw_spin_unlock(&fbc->lock);
+   raw_spin_unlock_irqrestore(&fbc->lock, flags);
__this_cpu_write(*fbc->counters, 0);
} else {
__this_cpu_write(*fbc->counters, count);
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] [PATCH 3/5] block: remove request ref_count

2013-10-04 Thread Christoph Hellwig
This reference count has been around since before git history, but the only
place where it's used is in blk_execute_rq, and ther it is entirely useless
as it is incremented before submitting the request and decremented in the
end_io handler before waking up the submitter thread.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c   |3 ---
 block/blk-exec.c   |7 ---
 include/linux/blkdev.h |2 --
 3 files changed, 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f80df88..5620e58 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -110,7 +110,6 @@ void blk_rq_init(struct request_queue *q, struct request 
*rq)
rq->cmd = rq->__cmd;
rq->cmd_len = BLK_MAX_CDB;
rq->tag = -1;
-   rq->ref_count = 1;
rq->start_time = jiffies;
set_start_time_ns(rq);
rq->part = NULL;
@@ -1249,8 +1248,6 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
 {
if (unlikely(!q))
return;
-   if (unlikely(--req->ref_count))
-   return;
 
blk_pm_put_request(req);
 
diff --git a/block/blk-exec.c b/block/blk-exec.c
index e706213..7972da7 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -24,7 +24,6 @@ static void blk_end_sync_rq(struct request *rq, int error)
struct completion *waiting = rq->end_io_data;
 
rq->end_io_data = NULL;
-   __blk_put_request(rq->q, rq);
 
/*
 * complete last, if this is a stack request the process (and thus
@@ -103,12 +102,6 @@ int blk_execute_rq(struct request_queue *q, struct gendisk 
*bd_disk,
int err = 0;
unsigned long hang_check;
 
-   /*
-* we need an extra reference to the request, so we can look at
-* it after io completion
-*/
-   rq->ref_count++;
-
if (!rq->sense) {
memset(sense, 0, sizeof(sense));
rq->sense = sense;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 06df51d..8771c0b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -168,8 +168,6 @@ struct request {
 
unsigned short ioprio;
 
-   int ref_count;
-
void *special;  /* opaque pointer available for LLD use */
char *buffer;   /* kaddr of the current segment if available */
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] [PATCH 4/5] block: make blk_get/put_request work for blk-mq drivers

2013-10-04 Thread Christoph Hellwig
Consumers of the block layer shouldn't have to care if a driver uses
blk-mq or not, so make blk_get/put_request call the mq equivalents
underneath.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c  |   31 ---
 block/blk-mq.c|1 -
 drivers/block/mtip32xx/mtip32xx.c |2 +-
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5620e58..6d7fd79 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1080,7 +1080,8 @@ retry:
goto retry;
 }
 
-struct request *blk_get_request(struct request_queue *q, int rw, gfp_t 
gfp_mask)
+static struct request *blk_old_get_request(struct request_queue *q, int rw,
+   gfp_t gfp_mask)
 {
struct request *rq;
 
@@ -1097,6 +1098,14 @@ struct request *blk_get_request(struct request_queue *q, 
int rw, gfp_t gfp_mask)
 
return rq;
 }
+
+struct request *blk_get_request(struct request_queue *q, int rw, gfp_t 
gfp_mask)
+{
+   if (q->mq_ops)
+   return blk_mq_alloc_request(q, rw, gfp_mask);
+   else
+   return blk_old_get_request(q, rw, gfp_mask);
+}
 EXPORT_SYMBOL(blk_get_request);
 
 /**
@@ -1133,12 +1142,7 @@ EXPORT_SYMBOL(blk_get_request);
 struct request *blk_make_request(struct request_queue *q, struct bio *bio,
 gfp_t gfp_mask)
 {
-   struct request *rq;
-
-   if (q->mq_ops)
-   rq = blk_mq_alloc_request(q, bio_data_dir(bio), gfp_mask);
-   else
-   rq = blk_get_request(q, bio_data_dir(bio), gfp_mask);
+   struct request *rq = blk_get_request(q, bio_data_dir(bio), gfp_mask);
 
if (unlikely(!rq))
return ERR_PTR(-ENOMEM);
@@ -1276,12 +1280,17 @@ EXPORT_SYMBOL_GPL(__blk_put_request);
 
 void blk_put_request(struct request *req)
 {
-   unsigned long flags;
struct request_queue *q = req->q;
 
-   spin_lock_irqsave(q->queue_lock, flags);
-   __blk_put_request(q, req);
-   spin_unlock_irqrestore(q->queue_lock, flags);
+   if (q->mq_ops)
+   blk_mq_free_request(req);
+   else {
+   unsigned long flags;
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   __blk_put_request(q, req);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+   }
 }
 EXPORT_SYMBOL(blk_put_request);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 423089d..709747f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -268,7 +268,6 @@ void blk_mq_free_request(struct request *rq)
hctx = q->mq_ops->map_queue(q, ctx->cpu);
__blk_mq_free_request(hctx, ctx, rq);
 }
-EXPORT_SYMBOL(blk_mq_free_request);
 
 void blk_mq_finish_request(struct request *rq, int error)
 {
diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 4427178..3c4c668 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -158,7 +158,7 @@ static struct mtip_cmd *mtip_get_int_command(struct 
driver_data *dd)
 
 static void mtip_put_int_command(struct driver_data *dd, struct mtip_cmd *cmd)
 {
-   blk_mq_free_request(blk_mq_rq_from_pdu(cmd));
+   blk_put_request(blk_mq_rq_from_pdu(cmd));
 }
 
 /*
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] [PATCH 5/5] block: use blk-exec.c infrastructure for blk-mq

2013-10-04 Thread Christoph Hellwig
There is no need to reinvent blk_execute_rq for blk-mq if we can easily
reuse the sync and async versions already present.

Signed-off-by: Christoph Hellwig 
---
 block/blk-exec.c   |7 +++
 block/blk-mq.c |   36 
 drivers/block/virtio_blk.c |5 -
 include/linux/blk-mq.h |1 -
 4 files changed, 11 insertions(+), 38 deletions(-)

diff --git a/block/blk-exec.c b/block/blk-exec.c
index 7972da7..47aef02 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "blk.h"
@@ -58,6 +59,12 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
gendisk *bd_disk,
 
rq->rq_disk = bd_disk;
rq->end_io = done;
+
+   if (q->mq_ops) {
+   blk_mq_insert_request(q, rq, true);
+   return;
+   }
+
/*
 * need to check this before __blk_run_queue(), because rq can
 * be freed before that returns.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 709747f..dece2e2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -958,42 +958,6 @@ run_queue:
blk_mq_run_hw_queue(hctx, !is_sync || is_flush_fua);
 }
 
-static void blk_mq_execute_end_io(struct request *rq, int error)
-{
-   struct completion *wait = rq->end_io_data;
-
-   complete(wait);
-}
-
-/*
- * Insert request, pass to device, and wait for it to finish.
- */
-int blk_mq_execute_rq(struct request_queue *q, struct request *rq)
-{
-   DECLARE_COMPLETION_ONSTACK(wait);
-   unsigned long hang_check;
-   int err = 0;
-
-   rq->end_io_data = &wait;
-   rq->end_io = blk_mq_execute_end_io;
-   blk_mq_insert_request(q, rq, true);
-
-   /* Prevent hang_check timer from firing at us during very long I/O */
-   hang_check = sysctl_hung_task_timeout_secs;
-   if (hang_check)
-   while (!wait_for_completion_io_timeout(&wait, hang_check * 
(HZ/2)));
-   else
-   wait_for_completion_io(&wait);
-
-   if (rq->errors)
-   err = -EIO;
-
-   blk_mq_finish_request(rq, rq->errors);
-
-   return err;
-}
-EXPORT_SYMBOL(blk_mq_execute_rq);
-
 /*
  * Default mapping to a software queue, since we use one per CPU.
  */
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index e0b1115..0352df1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -410,6 +410,7 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
struct virtio_blk *vblk = disk->private_data;
struct request *req;
struct bio *bio;
+   int err;
 
bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
   GFP_KERNEL);
@@ -423,7 +424,9 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
}
 
req->cmd_type = REQ_TYPE_SPECIAL;
-   return blk_mq_execute_rq(vblk->disk->queue, req);
+   err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+   blk_put_request(req);
+   return err;
 }
 
 static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2fea261..4fddab2 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -121,7 +121,6 @@ void blk_mq_init_commands(struct request_queue *, void 
(*init)(void *data, struc
 void blk_mq_flush_plug(struct request_queue *, bool);
 void blk_mq_insert_request(struct request_queue *, struct request *, bool);
 void blk_mq_insert_requests(struct request_queue *, struct list_head *, bool, 
bool);
-int blk_mq_execute_rq(struct request_queue *, struct request *);
 void blk_mq_run_queues(struct request_queue *q, bool async);
 void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] blk-mq fixes and improvements

2013-10-04 Thread Christoph Hellwig
The first two patches make blk-mq work fine again for me in a KVM VM,
the others make sure that consumers don't have to care if the underlying queue
uses blk-mq or not and remove some code at the same time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] [PATCH 2/5] revert: "blk-mq: blk-mq should free bios in pass through case"

2013-10-04 Thread Christoph Hellwig
This patch causes boot failures when using REQ_FLUSH requests.  Also the
following statement in the commit log:

For non mq calls, the block layer will free the bios when
blk_finish_request is called.

For mq calls, the blk mq code wants the caller to do this.

Seems incorrect as far as I can follow the code as blk_finish_request only
calls __blk_put_request which then completes the bios if req->end_io is
not set.  This matches the blk-mq behaviour before this patch, so reverting
it makes the code more similar to the legacy case in addition to fixing the
boot failure.

Signed-off-by: Christoph Hellwig 
---
 block/blk-flush.c |2 +-
 block/blk-mq.c|   14 ++
 block/blk-mq.h|1 +
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3e4cc9c..c56c37d 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,7 +231,7 @@ static void flush_end_io(struct request *flush_rq, int 
error)
unsigned long flags = 0;
 
if (q->mq_ops) {
-   blk_mq_free_request(flush_rq);
+   blk_mq_finish_request(flush_rq, error);
spin_lock_irqsave(&q->mq_flush_lock, flags);
}
running = &q->flush_queue[q->flush_running_idx];
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 93563e0..423089d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -270,7 +270,7 @@ void blk_mq_free_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_free_request);
 
-static void blk_mq_finish_request(struct request *rq, int error)
+void blk_mq_finish_request(struct request *rq, int error)
 {
struct bio *bio = rq->bio;
unsigned int bytes = 0;
@@ -286,17 +286,22 @@ static void blk_mq_finish_request(struct request *rq, int 
error)
 
blk_account_io_completion(rq, bytes);
blk_account_io_done(rq);
+   blk_mq_free_request(rq);
 }
 
 void blk_mq_complete_request(struct request *rq, int error)
 {
trace_block_rq_complete(rq->q, rq);
-   blk_mq_finish_request(rq, error);
 
+   /*
+* If ->end_io is set, it's responsible for doing the rest of the
+* completion.
+*/
if (rq->end_io)
rq->end_io(rq, error);
else
-   blk_mq_free_request(rq);
+   blk_mq_finish_request(rq, error);
+
 }
 
 void __blk_mq_end_io(struct request *rq, int error)
@@ -984,7 +989,8 @@ int blk_mq_execute_rq(struct request_queue *q, struct 
request *rq)
if (rq->errors)
err = -EIO;
 
-   blk_mq_free_request(rq);
+   blk_mq_finish_request(rq, rq->errors);
+
return err;
 }
 EXPORT_SYMBOL(blk_mq_execute_rq);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 52bf1f9..42d0110 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,6 +27,7 @@ void blk_mq_complete_request(struct request *rq, int error);
 void blk_mq_run_request(struct request *rq, bool run_queue, bool async);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_init_flush(struct request_queue *q);
+void blk_mq_finish_request(struct request *rq, int error);
 
 /*
  * CPU hotplug helpers
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/17] RCU'd vfsmounts

2013-10-04 Thread Christoph Hellwig
That patch looks fine to me. Having the s_readonly_remount infrastructure
around certainly makes things easier.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] [PATCH 2/5] revert: "blk-mq: blk-mq should free bios in pass through case"

2013-10-05 Thread Christoph Hellwig
On Fri, Oct 04, 2013 at 12:39:33PM -0500, Mike Christie wrote:
> Sorry, messed up function name. I meant blk_end_request*.
> 
> For blk_execute_rq_nowait/blk_execute_rq and normal request use, the
> lower levels free the bios as they are completed by one of the
> blk_finish_request* calls. The caller of of
> blk_execute_rq_nowait/blk_execute_rq does not have to worry about
> freeing bios. It just frees the request when it is done with it.

Are you talking about bios or requests?  All these functions deal with
requests, so the talk of bios really confuses me.

That beeing said the old ones all require the caller to free the
request, and complicate that with the useless refcounting that my patch
3 removes.  Take a look at the other patches how all the calling
conventions can be nicely unified.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lustre: why does cfs_get_random_bytes() exist?

2013-10-05 Thread Christoph Hellwig
On Sat, Oct 05, 2013 at 10:21:21AM -0400, Theodore Ts'o wrote:
> add_device_randomness() is called from __dev_open() and
> dev_set_mac_address() in net/core/dev.c.  This is above the ethernet
> and infiniband level.  So as long as it looks like a Linux network
> device, and they are setting the hardware media access address in the
> standard place (dev->dev_addr), it should work fine.
> 
> If they don't then they should fix their drivers to call
> add_device_randomness(); the answer shouldn't be to make every single
> users of the Linux random number generation infrastructure work around
> the problem at the subsystem or file system level!

Besides these points a filesystem isn't the place to work around entropy
pool issues on some obscure platform.  Even if they can't fix it this
way for some reason the workaround still should be in the interaction
of the arch code and the Linux prng infrastructure and not in Lustre.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] [PATCH 2/5] revert: "blk-mq: blk-mq should free bios in pass through case"

2013-10-06 Thread Christoph Hellwig
On Sat, Oct 05, 2013 at 03:20:05PM -0500, Mike Christie wrote:
> The functions take in requests as the argument, but they end up
> operating on bios too. The scsi layer does
> scsi_io_completion->scsi_end_request-> blk_end_request ->
> blk_end_bidi_request -> blk_update_bidi_request -> blk_update_request.
> That function will then complete bios on the request passed in. It does
> not matter if the request is a REQ_TYPE_FS or REQ_TYPE_BLOCK_PC.
> 
> With my patch I was trying to make the block layer do the same for mq
> REQ_TYPE_BLOCK_PC requests inserted into the queue with
> blk_execute_rq_nowait (Nick's patch had support for something like that)
> by having the block mq layer call blk_mq_finish_request instead of
> making the code that calls blk_execute_rq_nowait do it.

Ok, I get the point now.  Didn't see that issue because I've only been
testing the non-bio REQ_TYPE_BLOCK_PC case as exposed by the virtio-blk
serial attribute so far.

> > That beeing said the old ones all require the caller to free the
> > request, and complicate that with the useless refcounting that my patch
> > 3 removes.  Take a look at the other patches how all the calling
> > conventions can be nicely unified.
> 
> I agree. I like them. I am saying though it could be better because even
> with your patches the rq->end_io functions need to have the mq_ops check
> like the flush_end_io does. If my patch worked as intended we would have
> your improvements and we would not need the rq->end_io functions to have
> that check and call blk_mq_finish_request because the blk mq layer was
> doing it for them.
> 
> That is all I am saying. I would like to be able to remove that check
> and blk_mq_finish_request call from the rq->end_io callouts.

I've found the bug that caused the regression with your patch, it's that
blk-mq incorrectly completes bios midway through a flush sequence, while
the old path didn't.

I'll send out a series soon that fixes this and re-reverts your patch,
although I split that re-revert into two patches in addition to my new
fix, given that I think your mixing up of the blk_mq_finish_request /
blk_mq_free_spit plus the confusing description made it really hard to
grasp, but I'll leave it to Jens how he wants to apply it to his tree
and make it look after the next rebase.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] blk-mq: kill blk_mq_finish_request

2013-10-06 Thread Christoph Hellwig
It can be merged into the only caller, and doing so allows accounting for
I/O completion in the correct place as well.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |   14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c137541..99a600e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -284,11 +284,13 @@ static void blk_mq_bio_endio(struct request *rq, struct 
bio *bio, int error)
bio_endio(bio, error);
 }
 
-static void blk_mq_finish_request(struct request *rq, int error)
+void blk_mq_complete_request(struct request *rq, int error)
 {
struct bio *bio = rq->bio;
unsigned int bytes = 0;
 
+   trace_block_rq_complete(rq->q, rq);
+
while (bio) {
struct bio *next = bio->bi_next;
 
@@ -299,19 +301,13 @@ static void blk_mq_finish_request(struct request *rq, int 
error)
}
 
blk_account_io_completion(rq, bytes);
-   blk_account_io_done(rq);
-}
-
-void blk_mq_complete_request(struct request *rq, int error)
-{
-   trace_block_rq_complete(rq->q, rq);
-
-   blk_mq_finish_request(rq, error);
 
if (rq->end_io)
rq->end_io(rq, error);
else
blk_mq_free_request(rq);
+
+   blk_account_io_done(rq);
 }
 
 void __blk_mq_end_io(struct request *rq, int error)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] blk-mq: more careful bio completion

2013-10-06 Thread Christoph Hellwig
Make sure we set bio errors correctly and don't complete bios midway
through a flush sequence.  Largely copied from the old I/O path.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index dece2e2..d2e568e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -269,6 +269,21 @@ void blk_mq_free_request(struct request *rq)
__blk_mq_free_request(hctx, ctx, rq);
 }
 
+static void blk_mq_bio_endio(struct request *rq, struct bio *bio, int error)
+{
+   if (error)
+   clear_bit(BIO_UPTODATE, &bio->bi_flags);
+   else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
+   error = -EIO;
+
+   if (unlikely(rq->cmd_flags & REQ_QUIET))
+   set_bit(BIO_QUIET, &bio->bi_flags);
+
+   /* don't actually finish bio if it's part of flush sequence */
+   if (!(rq->cmd_flags & REQ_FLUSH_SEQ))
+   bio_endio(bio, error);
+}
+
 void blk_mq_finish_request(struct request *rq, int error)
 {
struct bio *bio = rq->bio;
@@ -279,7 +294,7 @@ void blk_mq_finish_request(struct request *rq, int error)
 
bio->bi_next = NULL;
bytes += bio->bi_size;
-   bio_endio(bio, error);
+   blk_mq_bio_endio(rq, bio, error);
bio = next;
}
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] blk-mq updates

2013-10-06 Thread Christoph Hellwig
Patch 1 makes sure bios are completed more carefully, fixing the regression
with the earlier patch from Mike as well as a few issues found during code
inspection.  Patches 2 and 3 are a split up and better documented version
of "blk-mq: blk-mq should free bios in pass through case", and the last two
are minor cleanups.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] blk-mq: always complete bios in blk_mq_complete_request

2013-10-06 Thread Christoph Hellwig
From: Mike Christie 

Complete bios for all requests going through blk_mq_complete_request.

This mirrors the old I/O path and avoids blk-mq special casing in every
->end_io handler.

[hch: split up and reworded the changelog]

Signed-off-by: Mike Christie 
Signed-off-by: Christoph Hellwig 
---
 block/blk-flush.c |1 -
 block/blk-mq.c|   13 -
 block/blk-mq.h|1 -
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index d835243..3e4cc9c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,7 +231,6 @@ static void flush_end_io(struct request *flush_rq, int 
error)
unsigned long flags = 0;
 
if (q->mq_ops) {
-   blk_mq_finish_request(flush_rq, error);
blk_mq_free_request(flush_rq);
spin_lock_irqsave(&q->mq_flush_lock, flags);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c32719..c137541 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -284,7 +284,7 @@ static void blk_mq_bio_endio(struct request *rq, struct bio 
*bio, int error)
bio_endio(bio, error);
 }
 
-void blk_mq_finish_request(struct request *rq, int error)
+static void blk_mq_finish_request(struct request *rq, int error)
 {
struct bio *bio = rq->bio;
unsigned int bytes = 0;
@@ -306,17 +306,12 @@ void blk_mq_complete_request(struct request *rq, int 
error)
 {
trace_block_rq_complete(rq->q, rq);
 
-   /*
-* If ->end_io is set, it's responsible for doing the rest of the
-* completion.
-*/
+   blk_mq_finish_request(rq, error);
+
if (rq->end_io)
rq->end_io(rq, error);
-   else {
-   blk_mq_finish_request(rq, error);
+   else
blk_mq_free_request(rq);
-   }
-
 }
 
 void __blk_mq_end_io(struct request *rq, int error)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 42d0110..52bf1f9 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ void blk_mq_complete_request(struct request *rq, int error);
 void blk_mq_run_request(struct request *rq, bool run_queue, bool async);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_init_flush(struct request_queue *q);
-void blk_mq_finish_request(struct request *rq, int error);
 
 /*
  * CPU hotplug helpers
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] blk-mq: dont call blk_mq_free_request from blk_mq_finish_request

2013-10-06 Thread Christoph Hellwig
From: Mike Christie 

These are fundamentally different operations, so keep them separate.

[hch: split up and reworded the changelog]

Signed-off-by: Mike Christie 
Signed-off-by: Christoph Hellwig 
---
 block/blk-flush.c |1 +
 block/blk-mq.c|5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index c56c37d..d835243 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -232,6 +232,7 @@ static void flush_end_io(struct request *flush_rq, int 
error)
 
if (q->mq_ops) {
blk_mq_finish_request(flush_rq, error);
+   blk_mq_free_request(flush_rq);
spin_lock_irqsave(&q->mq_flush_lock, flags);
}
running = &q->flush_queue[q->flush_running_idx];
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d2e568e..9c32719 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -300,7 +300,6 @@ void blk_mq_finish_request(struct request *rq, int error)
 
blk_account_io_completion(rq, bytes);
blk_account_io_done(rq);
-   blk_mq_free_request(rq);
 }
 
 void blk_mq_complete_request(struct request *rq, int error)
@@ -313,8 +312,10 @@ void blk_mq_complete_request(struct request *rq, int error)
 */
if (rq->end_io)
rq->end_io(rq, error);
-   else
+   else {
blk_mq_finish_request(rq, error);
+   blk_mq_free_request(rq);
+   }
 
 }
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] blk-mq: cleanup blk_mq_bio_to_request

2013-10-06 Thread Christoph Hellwig
Remove dead code and a dead argument.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |   15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 99a600e..2b85029 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -857,15 +857,8 @@ void blk_mq_insert_requests(struct request_queue *q, 
struct list_head *list,
from_schedule);
 }
 
-static void blk_mq_bio_to_request(struct request_queue *q,
- struct request *rq, struct bio *bio)
+static void blk_mq_bio_to_request(struct request *rq, struct bio *bio)
 {
-   unsigned int rw_flags;
-
-   rw_flags = bio_data_dir(bio);
-   if (rw_is_sync(bio->bi_rw))
-   rw_flags |= REQ_SYNC;
-
init_request_from_bio(rq, bio);
blk_account_io_start(rq, 1);
 }
@@ -915,7 +908,7 @@ static void blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
hctx->queued++;
 
if (unlikely(is_flush_fua)) {
-   blk_mq_bio_to_request(q, rq, bio);
+   blk_mq_bio_to_request(rq, bio);
blk_mq_put_ctx(ctx);
blk_insert_flush(rq);
goto run_queue;
@@ -930,7 +923,7 @@ static void blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
struct blk_plug *plug = current->plug;
 
if (plug) {
-   blk_mq_bio_to_request(q, rq, bio);
+   blk_mq_bio_to_request(rq, bio);
if (list_empty(&plug->list))
trace_block_plug(q);
else if (request_count >= BLK_MAX_REQUEST_COUNT) {
@@ -949,7 +942,7 @@ static void blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
blk_mq_attempt_merge(q, ctx, bio))
__blk_mq_free_request(hctx, ctx, rq);
else {
-   blk_mq_bio_to_request(q, rq, bio);
+   blk_mq_bio_to_request(rq, bio);
__blk_mq_insert_request(hctx, rq);
}
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC] block: use a separate plug list for blk-mq requests

2013-10-06 Thread Christoph Hellwig
blk_flush_plug_list became a bit of a mess with the introduction of blk-mq,
so I started looking into separating the blk-mq handling from it.  Turns
out that by doing this we can streamline the blk-mq submission path a lot.

If we branch out to a blk-mq specific code path early we can do the list sort
based on the hw ctx instead of the queue and thus avoid the later improvised
loop to sort again.  In addition we can also remove the hw irq disabling in
the submission path entirely and collapse a couple of functions in blk-mq.c,
all at the cost of an additional list_head in struct blk_plug which can go
away again as soon as we remove old-school request_fn based drivers.

Signed-off-by: Christoph Hellwig 

---
 block/blk-core.c   |   29 -
 block/blk-mq.c |   81 -
 include/linux/blk-mq.h |4 +-
 include/linux/blkdev.h |6 +++
 4 files changed, 62 insertions(+), 58 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6d7fd79..7bedff6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2887,6 +2887,7 @@ void blk_start_plug(struct blk_plug *plug)
 
plug->magic = PLUG_MAGIC;
INIT_LIST_HEAD(&plug->list);
+   INIT_LIST_HEAD(&plug->mq_list);
INIT_LIST_HEAD(&plug->cb_list);
 
/*
@@ -2973,28 +2974,21 @@ struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn 
unplug, void *data,
 }
 EXPORT_SYMBOL(blk_check_plugged);
 
-static void do_queue_unplug(struct request_queue *q, bool from_schedule,
-   unsigned int depth, struct list_head *list)
-{
-   if (q->mq_ops) {
-   trace_block_unplug(q, depth, !from_schedule);
-   blk_mq_insert_requests(q, list, 1, from_schedule);
-   } else
-   queue_unplugged(q, depth, from_schedule);
-}
-
 void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 {
struct request_queue *q;
unsigned long flags;
struct request *rq;
LIST_HEAD(list);
-   LIST_HEAD(q_list);
unsigned int depth;
 
BUG_ON(plug->magic != PLUG_MAGIC);
 
flush_plug_callbacks(plug, from_schedule);
+
+   if (!list_empty(&plug->mq_list))
+   blk_mq_flush_plug_list(plug, from_schedule);
+
if (list_empty(&plug->list))
return;
 
@@ -3019,17 +3013,10 @@ void blk_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
 * This drops the queue lock
 */
if (q)
-   do_queue_unplug(q, from_schedule, depth, 
&q_list);
+   queue_unplugged(q, depth, from_schedule);
q = rq->q;
depth = 0;
-   if (!q->mq_ops)
-   spin_lock(q->queue_lock);
-   }
-
-   if (q->mq_ops) {
-   depth++;
-   list_add_tail(&rq->queuelist, &q_list);
-   continue;
+   spin_lock(q->queue_lock);
}
 
/*
@@ -3055,7 +3042,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
 * This drops the queue lock
 */
if (q)
-   do_queue_unplug(q, from_schedule, depth, &q_list);
+   queue_unplugged(q, depth, from_schedule);
 
local_irq_restore(flags);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2b85029..6a86881 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -774,15 +775,18 @@ void blk_mq_run_request(struct request *rq, bool 
run_queue, bool async)
blk_mq_run_hw_queue(hctx, async);
 }
 
-static void __blk_mq_insert_requests(struct request_queue *q,
+static void blk_mq_insert_requests(struct request_queue *q,
 struct blk_mq_ctx *ctx,
 struct list_head *list,
-bool run_queue, bool from_schedule)
+int depth,
+bool from_schedule)
 
 {
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *current_ctx;
 
+   trace_block_unplug(q, depth, !from_schedule);
+
current_ctx = blk_mq_get_ctx(q);
 
if (!cpu_online(ctx->cpu))
@@ -806,55 +810,64 @@ static void __blk_mq_insert_requests(struct request_queue 
*q,
 
blk_mq_put_ctx(current_ctx);
 
-   if (run_queue)
-   blk_mq_run_hw_queue(hctx, from_schedule);
+   blk_mq_run_hw_queue(hctx, from_schedule);
 }
 
-void blk_mq_insert_requests(struct request_queue *q, struct list_head *list,
-   bool run_queue, bool from_schedule)
+static int plug_ctx_cmp(voi

[PATCH] blk-mq: fix blk_mq_start_stopped_hw_queues from irq context

2013-10-06 Thread Christoph Hellwig
The only caller of blk_mq_start_stopped_hw_queues is in irq context,
leading to lockdep splat when it actually gets called.  Fix this by
deferring the hw queue run to workqueue context.

Signed-off-by: Christoph Hellwig 

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2b85029..923e9e1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -686,7 +686,8 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q)
if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
continue;
 
-   blk_mq_start_hw_queue(hctx);
+   clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
+   blk_mq_run_hw_queue(hctx, true);
}
 }
 EXPORT_SYMBOL(blk_mq_start_stopped_hw_queues);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/6] uas: make work list per-device

2013-09-17 Thread Christoph Hellwig
On Fri, Sep 13, 2013 at 01:27:12PM +0200, Gerd Hoffmann wrote:
> Simplifies locking, we'll protect the list with the device spin lock.
> Also plugs races which can happen when two devices operate on the
> global list.
> 
> While being at it rename the list head from "list" to "work", preparing
> for the addition of a second list.

Why do you even the list?  What would a ordered per-device workqueue not
provide?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sparc64: Export flush_ptrace_access() (needed by lustre)

2013-09-06 Thread Christoph Hellwig
On Thu, Sep 05, 2013 at 11:08:29AM +0200, Geert Uytterhoeven wrote:
> ERROR: "flush_ptrace_access" [drivers/staging/lustre/lustre/libcfs/libcfs.ko]
> undefined!

This seems to be more copy_to_user_page fallout, so instead of all these
arch patches we should figure out why lustre absolutely wants to use
this API that so far has been for core MM code only.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sparc64: Export flush_ptrace_access() (needed by lustre)

2013-09-06 Thread Christoph Hellwig
On Fri, Sep 06, 2013 at 05:56:13AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 05, 2013 at 11:08:29AM +0200, Geert Uytterhoeven wrote:
> > ERROR: "flush_ptrace_access" 
> > [drivers/staging/lustre/lustre/libcfs/libcfs.ko]
> > undefined!
> 
> This seems to be more copy_to_user_page fallout, so instead of all these
> arch patches we should figure out why lustre absolutely wants to use
> this API that so far has been for core MM code only.

Looking at the code is seems to be for a reimplementation of
__access_remote_vm.  So at very least it should use that, but I still
requiestion why it's doing that at all.

Geert, please don't just export stuff for staging/ use only.   Code
wouldn't be in staging if it weren't a steaming pile of junk, so
adjusting anyting in the real tree for it needs proper discussion first.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: Disable lustre file system for MIPS, SH, and XTENSA

2013-09-09 Thread Christoph Hellwig
On Sun, Sep 08, 2013 at 06:59:45PM -0700, Greg Kroah-Hartman wrote:
> Can't we just export the functions for those arches?  Surely lutre
> isn't the first/only driver that needs this?

Lustre is.  These are core mm helpers, and lustre uses them to
reimplement another core VM function.  It then uses it to access
userspace environment variable.

In short all this code should be nuked, and no new symbols should be
exported.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi] enclosure: remove all possible sysfs entries before add device

2013-09-09 Thread Christoph Hellwig
> Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U)

Please reproduce without this weird crap loaded.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fs: remove vfs_follow_link

2013-09-09 Thread Christoph Hellwig
For a long time no filesystem has been using vfs_follow_link, and as seen
by recent filesystem submissions any new use is accidental as well.

Remove vfs_follow_link, document the replacement in
Documentation/filesystems/porting and also rename __vfs_follow_link
to match its only caller better.

Signed-off-by: Christoph Hellwig 

diff --git a/Documentation/filesystems/porting 
b/Documentation/filesystems/porting
index 206a1bd..f089058 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -451,3 +451,7 @@ in your dentry operations instead.
 --
 [mandatory]
->readdir() is gone now; switch to ->iterate()
+[mandatory]
+   vfs_follow_link has been removed.  Filesystems must use nd_set_link
+   from ->follow_link for normal symlinks, or nd_jump_link for magic
+   /proc/ style links.
diff --git a/fs/namei.c b/fs/namei.c
index f415c66..9419ce9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -674,7 +674,7 @@ static __always_inline void set_root_rcu(struct nameidata 
*nd)
}
 }
 
-static __always_inline int __vfs_follow_link(struct nameidata *nd, const char 
*link)
+static __always_inline int __follow_link(struct nameidata *nd, const char 
*link)
 {
int ret;
 
@@ -888,7 +888,7 @@ follow_link(struct path *link, struct nameidata *nd, void 
**p)
error = 0;
s = nd_get_link(nd);
if (s) {
-   error = __vfs_follow_link(nd, s);
+   error = __follow_link(nd, s);
if (unlikely(error))
put_link(nd, link, *p);
}
@@ -4244,11 +4244,6 @@ int generic_readlink(struct dentry *dentry, char __user 
*buffer, int buflen)
return res;
 }
 
-int vfs_follow_link(struct nameidata *nd, const char *link)
-{
-   return __vfs_follow_link(nd, link);
-}
-
 /* get the link contents into pagecache */
 static char *page_getlink(struct dentry * dentry, struct page **ppage)
 {
@@ -4360,7 +4355,6 @@ EXPORT_SYMBOL(vfs_path_lookup);
 EXPORT_SYMBOL(inode_permission);
 EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_create);
-EXPORT_SYMBOL(vfs_follow_link);
 EXPORT_SYMBOL(vfs_link);
 EXPORT_SYMBOL(vfs_mkdir);
 EXPORT_SYMBOL(vfs_mknod);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 529d871..49e71b0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2494,7 +2494,6 @@ extern const struct file_operations generic_ro_fops;
 #define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m))
 
 extern int vfs_readlink(struct dentry *, char __user *, int, const char *);
-extern int vfs_follow_link(struct nameidata *, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
 extern void *page_follow_link_light(struct dentry *, struct nameidata *);
 extern void page_put_link(struct dentry *, struct nameidata *, void *);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: Disable lustre file system for MIPS, SH, and XTENSA

2013-09-10 Thread Christoph Hellwig
On Wed, Sep 11, 2013 at 01:14:11AM +0800, Peng Tao wrote:
> The problem is access_process_vm() is not exported since certain
> version of kernel including the latest. According to Christoph in the
> other mail, access_process_vm() is also a core mm function that is not
> supposed to be exported. Then what kind of change shall we make in
> order to keep current functionality?

You should remove the higher level functionality, kernel modules are
not supposed to look at userspace environment variables.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: Disable lustre file system for MIPS, SH, and XTENSA

2013-09-11 Thread Christoph Hellwig
On Wed, Sep 11, 2013 at 10:51:50AM +0800, Peng Tao wrote:
> I'm not fighting against removing the piece of code. But if there is a
> strong reason to keep the functionality, we need to find a way to
> implement it. The convenience of using environment variables is that
> job scheduler can set the environment and other existing applications
> don't have to change. Are there other means to do the same? ioctl and
> upcall both need application change AFAIK.

There is no use case for it, the kernel has no business looking at these
variables.  Given that you think it's not even used I don't even know
why we're having this discussion.

Talking about nasty code, the whole linux-curproc.c is highly
questionable:

 - cfs_curproc_groups_nr:
unused and should be removed

 - cfs_cap_raise/cfs_cap_lower/cfs_cap_raised:
needs to go away, modyules must not change access permissions
on behalf of processes

 - the whole cfs_cap_t handling also needs to go away, passing around
   capabilities is not a concept the kernel supports for a reason

 - current_is_32bit:
Code should just use is_compat_task directly.


I've just taken the time to walk through this one file, but it seems
like most of libcfs is just as bad.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-mq: fix blk_mq_start_stopped_hw_queues from irq context

2013-10-07 Thread Christoph Hellwig
On Sun, Oct 06, 2013 at 12:10:13PM -0600, Jens Axboe wrote:
> Thanks, applied. Might not be a bad idea to just mimic the run queue
> API, and provide a blk_mq_start_hw_queue(hctx, is_async) instead.

I have to see I really don't like this is_async, I'd much preper to
untangle those into separate versions further up the stack similar to
how the old blk_run_queue variants work.

But that'll have to wait a bit anyway, at least until you made a
decision on the plugging patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Reworking dm-writeboost [was: Re: staging: Add dm-writeboost]

2013-10-08 Thread Christoph Hellwig
On Tue, Oct 08, 2013 at 10:43:07AM +1100, Dave Chinner wrote:
> > Maybe, writeboost should disable deferring barriers
> > if barrier_deadline_ms parameter is especially 0.
> > Linux kernel's layered architecture is obviously not always perfect
> > so there are similar cases in other boundaries
> > such as O_DIRECT to bypass the page cache.
> 
> Right - but you can't detect O_DIRECT at the dm layer. IOWs, you're
> relying on the user tweaking the corect knobs for their workload.

You can detect O_DIRECT writes by second guession a special combination
of REQ_ flags only used there, as cfg tries to treat it special:

#define WRITE_SYNC  (WRITE | REQ_SYNC | REQ_NOIDLE)
#define WRITE_ODIRECT   (WRITE | REQ_SYNC)

the lack of REQ_NOIDLE when REQ_SYNC is set gives it away.  Not related
to the FLUSH or FUA flags in any way, though.

Akira, can you explain the workloads where your delay of FLUSH or FUA
requests helps you in any way?  I very much agree with Dave's reasoning,
but if you found workloads where your hack helps we should make sure we
fix them at the place where they are issued.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Export symbols for splice in modules

2013-10-08 Thread Christoph Hellwig
On Tue, Oct 08, 2013 at 02:38:18PM +0200, Christian Ruppert wrote:
> The symbols splice_to_pipe, splice_grow_spd and splice_shrink_spd are not
> currently exported from the kernel. This prevents the implementation of
> drivers using splice in modules not statically linked with the kernel. This
> patch exports those symbols to make it possible to implement splice in
> kernel modules.

Please submit these together with your drivers.  In general it seems
a higher level interface should be exported anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 27/29] get_signal_to_deliver: remove regs and cookie args

2013-10-08 Thread Christoph Hellwig
On Tue, Oct 08, 2013 at 01:36:17PM +0200, Richard Weinberger wrote:
> Both arguments are unused, remove them.

And with the get_signal macro left as the only caller, that one should
become the implementation function.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scsi-mq updated to latest linux-block/new-queue

2013-10-10 Thread Christoph Hellwig
> What is the criteria for patches to include in your tree? I would suggest
> to consider this one https://lkml.org/lkml/2013/8/9/90 if it fits.

For this one you probably want to send a patch to Jens to move blk-mq-tag.h
to include/linux first instead of doing the relative include hack.

Also the  blk_mq_*tag* routines you use aren't exported, so a modular build of
the driver with that patch applied will fail.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] stable_kernel_rules.txt: Exclude networking from stable rules

2013-09-22 Thread Christoph Hellwig
This is also the preferred way to do it for XFS.  Maybe word it in a way
that we can easily add subsystems.

To me it generally seems to be the best way to do it - having random Ccs
and lots of stable trees doesn't seem like a very good way of handling
it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] stable_kernel_rules.txt: Exclude networking from stable rules

2013-09-24 Thread Christoph Hellwig
On Mon, Sep 23, 2013 at 01:34:05PM -0700, Joe Perches wrote:
> Maybe adding a mechanism to MAINTAINERS would be better.
> Maybe a default B: (backport?) of sta...@vger.kernel.org
> with a per-subsystem override?

Sounds fine to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysv: Add forgotten superblock lock init for v7 fs

2013-09-24 Thread Christoph Hellwig
On Wed, Sep 18, 2013 at 12:39:16AM +0200, Lubomir Rintel wrote:
> Superblock lock was replaced with (un)lock_super() removal, but left
> uninitialized for Seventh Edition UNIX filesystem in the following commit 
> (3.7):
> c07cb01 sysv: drop lock/unlock super
> 
> Signed-off-by: Lubomir Rintel 

Looks good.

Signed-off-by: Christoph Hellwig 

Al, can you add this to the next VFS pile?

> ---
>  fs/sysv/super.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/sysv/super.c b/fs/sysv/super.c
> index d0c6a00..eda1095 100644
> --- a/fs/sysv/super.c
> +++ b/fs/sysv/super.c
> @@ -487,6 +487,7 @@ static int v7_fill_super(struct super_block *sb, void 
> *data, int silent)
>   sbi->s_sb = sb;
>   sbi->s_block_base = 0;
>   sbi->s_type = FSTYPE_V7;
> + mutex_init(&sbi->s_lock);
>   sb->s_fs_info = sbi;
>   
>   sb_set_blocksize(sb, 512);
> -- 
> 1.7.1
> 
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/6] uas: make work list per-device

2013-09-24 Thread Christoph Hellwig
On Wed, Sep 18, 2013 at 09:33:04AM +0200, Gerd Hoffmann wrote:
> > > While being at it rename the list head from "list" to "work", preparing
> > > for the addition of a second list.
> > 
> > Why do you even the list?
> 
> The list was already there when I took over maintainance ...
> 
> > What would a ordered per-device workqueue not
> > provide?
> 
> Had no reason to look into replacing the list with something else so
> far.  Why do you suggest using a workqueue instead?
> 
> Note that the list is not used in a typical request workflow.  Only in
> case queuing an usb urb failed the request is linked into the list and
> the worker will try to submit the usb urb again.

The driver is only using the list to queue up things into workqueue
context as far as I can see.  Which means it's far easier to leave that
to the workqueue.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/22] Immutable biovecs, block layer changes

2013-09-24 Thread Christoph Hellwig
Just curious, what's the state of the remaining immutable bio work?

On Thu, Aug 08, 2013 at 02:15:29PM -0700, Kent Overstreet wrote:
> > What is preventing you from sending those out as well?  While it's not
> > absolutely nessecary it would certainly be good if we'd avoid a struct
> > bio size regression.
> 
> There's still some fairly significant changes, and I don't want to make
> too many invasive changes at once.
> 
> Main thing is making generic_make_request() take arbitrary size bios.
> After this series that's just two simple patches, but then the changes
> to make use of that will be changing behaviour in non obvious ways.
> 
> The way the merging changes work is it enables multi page bvecs - so a
> bvec can point to many physically contiguous pages. This moves segment
> merging to bio_add_page(), and gets rid of bi_phys_segments - now
> bi_vcnt == bi_phys_segments, we just split the bio if it's too big.
> 
> Then, with bio_add_page() building up large bios and the block layer
> splitting them as necessary, there shouldn't be any need for segment
> merging across bios anymore (because generally when that would've
> happened before, now we'd just be sending one larger bio down).
> 
> The remaining patches aren't terribly complicated though (less
> complicated than this patch series). Trickiest bit is multipage bvecs,
> and that's mostly just lots of code auditing - the way I convert
> existing code is by adding bio_for_each_page() - analagous to
> bio_for_each_segment, but giving you bvecs that point to single pages.
> So it's an easy conversion, just have to make sure nothing's missed.
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/22] Immutable biovecs, block layer changes

2013-09-24 Thread Christoph Hellwig
On Tue, Sep 24, 2013 at 09:20:14AM -0400, Mike Snitzer wrote:
> Have you been over the patchset?  Looks sane to you?

I looked over it, although I didn't dig into the details of all driver
patches, and I like what I see.  As said in the previous mail I'd love
to see the patches to shrink struct bio again, too.

> Given how disruptive this patchset is to the block layer I'm wondering
> how painful this change will be in combination with Jens' blk-mq
> changes.  I'd prefer to see blk-mq before immutable biovecs; but I have
> my own selfish reasons for that.

I don't see too much conflicts with blk-multiqueue as that is operating
at the request layer.  Blk-multiqueue defintively is a higher priority
for me, but as it already looks fairly good I have no idea what we're
blocked on for it anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/11] lots of fixes for zero allocation length

2012-09-10 Thread Christoph Hellwig
On Fri, Sep 07, 2012 at 05:30:31PM +0200, Paolo Bonzini wrote:
> Here are the fixes for zero-length commands, that ultimately let
> all command go through normal processing, even if they have
> zero length.
> 
> The first patch is for PSCSI, the others are for IBLOCK and friends.
> I tried to order them so that the safest come first.  So for 3.6 you
> could choose to stop at patch 1, 2, 3 or 6, or not apply anything
> at all.
> 
> Testcases included in each patch, when relevant.

Can you please submit them for the scsi testsuite?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] Do a proper locking for mmap and block size change

2012-11-30 Thread Christoph Hellwig
On Fri, Nov 30, 2012 at 01:49:10PM +1100, Dave Chinner wrote:
> > Ugh. That's a big violation of how buffer-heads are supposed to work:
> > the block number is very much defined to be in multiples of b_size
> > (see for example "submit_bh()" that turns it into a sector number).
> > 
> > But you're right. The direct-IO code really *is* violating that, and
> > knows that get_block() ends up being defined in i_blkbits regardless
> > of b_size.
> 
> Same with mpage_readpages(), so it's not just direct IO that has
> this problem

The mpage code may actually fall back to BHs.

I have a version of the direct I/O code that uses the iomap_ops from the
multi-page write code that you originally started.  It uses the new op
as primary interface for direct I/O and provides a helper for
filesystems that still use buffer heads internally.  I'll try to dust it
off and send out a version for the current kernel.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] Do a proper locking for mmap and block size change

2012-11-30 Thread Christoph Hellwig
On Sat, Dec 01, 2012 at 09:40:41AM +1100, Dave Chinner wrote:
> So it was based on this interface?

Based on that.  I had dropped the inode operation as it's not really
a generic operation but a callback for either the buffered I/O code
or direct I/O and should be treated as such.  I've also split the
single multiplexer function into individual ones, but the underlying
data structure and fundamental operations are the same.

> (I went looking for this code on google a couple of days ago so I
> could point at it and say "we should be using an iomap structure,
> not buffer heads", but it looks like I never posted it to fsdevel or
> the xfs lists...)

Your version defintively was up on your kernel.org XFS tree, that's what
I started from.

I'll have a long plane right tonight, let's see if I can get the direct
I/O version updated.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Linux KVM tool for v3.7-rc0

2012-10-21 Thread Christoph Hellwig
On Sun, Oct 21, 2012 at 01:07:31PM +1000, Dave Airlie wrote:
> Why couldn't this script just be a wrapper around qemu

It can be.  Here is my ususual one:

#!/bin/sh

/opt/qemu/bin/qemu-system-x86_64 \
-m 1500 \
-enable-kvm \
-drive if=none,file=/home/hch/qemu-root.img,cache=none,id=root \
-device virtio-blk-pci,drive=root \
-append "root=/dev/vda console=tty0 console=ttyS0,38400n8 
kgdboc=ttyS0,38400n8" \
-kernel arch/x86/boot/bzImage \
-nographic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocate UAPI

2012-12-06 Thread Christoph Hellwig
On Wed, Dec 05, 2012 at 07:45:39AM -0800, Linus Torvalds wrote:
> On Wed, Dec 5, 2012 at 2:48 AM, Martin Steigerwald  
> wrote:
> >
> > Linus, while I am interested in an answer I think that Dave and Christoph
> > as Linux filesystem developers actually deserve one (instead of silently
> > being ignored which is also a decision in this matter).
> >
> > I did not see an answer in linux-2.6 commit log as of today so far.
> 
> Christ guys. This whole thread is retarded.
> 
> The *ONLY* reason people seem to have for reverting that is a "ooh, my
> feelings are hurt by how this was done, and now I'm a pissy bitch and
> I want to get this reverted".
> 
> Stop the f*cking around already.
> 
> If you want something reverted, you show me the *technical* reason for
> it. Not the "ooh, I'm so annoyed by how this was done" reason for it.
> 
> And if your little feelings got hurt, get your mommy to tuck you in,
> don't email me about it. Because I'm not exactly known for my deep
> emotional understanding and supportive personality, am I?

No, the problem is that the thing is not just a) wrong, but b) only made
it in through sneaky ways.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocate UAPI

2012-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2012 at 12:14:02PM +1100, Dave Chinner wrote:
> On Wed, Dec 05, 2012 at 10:25:17AM -0800, Linus Torvalds wrote:
> > Yes, people can argue that "process" is about technical issues too,
> > but let's be honest: our process is fluid. Not everything gets
> > reviewed on the mailing list, and people *do* talk about things
> > face-to-face at conferences.
> 
> Yes, that is true.
> 
> But we don't review code at conferences. We have mailing lists for
> that.  I've lost count of the number of times I've heard "post your
> code" or "let's take it to the list" or "better to discuss this on
> the list where everyone can comment" at conferences such as
> Plumbers. It's a standard practise - talk at conferences, review
> on mailing lists.

Also the only conference outcome I remember is that everyone at LSF
except for Ted basically said "no fucking way".

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs

2012-12-07 Thread Christoph Hellwig
On Thu, Dec 06, 2012 at 10:26:28PM +0400, Pavel Shilovsky wrote:
> Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags - this 
> change can benefit cifs and nfs modules. While this change is ok for network 
> filesystems, itsn't not targeted for local filesystems due security problems 
> (e.g. when a user process can deny root to delete a file).
> 
> Share flags are used by Windows applications and WINE have to deal with them 
> too. While WINE can process open share flags itself on local filesystems, it 
> can't do it if a file stored on a network share and is used by several 
> clients. This patchset makes it possible for CIFS/SMB2.0/SMB3.0.

I don't think introducing user visible flags that are only supported on
a single network filesystem is a good idea.

I'm not even sure adding these flags does make a lot of sense, but
assuming we'd actually want this (and I'd like some more detailed
explanation) I think we'd at least need to make sure that:

 a) opening files with the new modes gives a proper error message if not
supported
 b) there needs to be local support for them as well
 c) we need to think really hard when they should be supported, and need
a good rational for it.  I can't see how we could do it
unconditionally for all users as that would introduce easy denial
of services attacks the way I understand the semantics (correct me
if I'm wrong).  So a mount option like you currently do probably is
the least bad even if don't fell overly happy about that version.

What is the reason your special wine use case can't simply use a
userspace cifs client?  Given that wine uses windows filesystem
semantics and cifs does as well tunnelling it through a Posix-like API
inbetween is never going to be perfect.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH reduce impact of FIFREEZE on userland processes

2012-12-08 Thread Christoph Hellwig
On Fri, Dec 07, 2012 at 11:42:55AM +1100, Dave Chinner wrote:
> The problem wth doing this is that the sync can delay the freeze
> process by quite some time under the exact conditions you describe.
> If you want freeze to take effect immediately (i.e instantly stop
> new modifications), then adding a sync will break this semantic.
> THere are existing users of freeze that require this behaviour...

But that's only because he uses the big hammer sync_filesystem() which
actually waits for I/O completion.  I agree that this is a bad idea,
but if we'd just do a writeback_inodes_sb() call in this place that
starts asynchronous writeout I think everyone would benefit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] target/sbc: Make WRITE_SAME check differentiate between UNMAP=[1,0]

2012-11-15 Thread Christoph Hellwig
> + if (flags[0] & 0x08)
> + cmd->se_cmd_flags |= SCF_WRITE_SAME_DISCARD;

I don't like this flag at all.  We can still simply check the CDB
during ->execute_cmd and avoid this redundant flag.

Except for that bit the changes look fine, but should not be a patch
on their own.  Without an actual implementation this relaxation is
actively harmful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] target: Add max_write_same_len device attribute

2012-11-15 Thread Christoph Hellwig
On Thu, Nov 08, 2012 at 08:07:17PM +, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> This patch adds a new max_write_same_len device attribute for use with
> WRITE_SAME w/ UNMAP=0 backend emulation.
> 
> Also, update block limits VPD emulation code in spc_emulate_evpd_b0() to
> set the default MAXIMUM WRITE SAME LENGTH value of zero.

why do we need an exposed attribute for this?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] target/iblock: Add WRITE_SAME w/ UNMAP=0 emulation support

2012-11-15 Thread Christoph Hellwig
> + /*
> +  * Enable WRITE_SAME emulation for IBLOCK, use scsi_debug.c default
> +  */

Why would we care what scsi_debug.c uses?

> + dev->dev_attrib.max_write_same_len = 0x;
>  
>   if (blk_queue_nonrot(q))
>   dev->dev_attrib.is_nonrot = 1;
> @@ -375,22 +379,80 @@ err:
>   return ret;
>  }

> +static struct bio *iblock_get_bio(struct se_cmd *, sector_t, u32);
> +static void iblock_submit_bios(struct bio_list *, int);
> +static void iblock_complete_cmd(struct se_cmd *);

I'd suggest moving the write_same callback below these to avoid
forward declarations.

> + if (cmd->se_cmd_flags & SCF_WRITE_SAME_DISCARD) {

I'd probably add separate write_same and write_same_unmap members to
the sbc_ops structure.  That'll keep decoding which one is used in the
SBC code, and it'll keep the implementations nicely separated.

> + if (sectors > cmd->se_dev->dev_attrib.max_write_same_len) {

This sort of check should stay in the SBC code.

> + sg = &cmd->t_data_sg[0];

Btw, it seems like we don't bother to ensure the S/G list length
is just one sector for WRITE SAME with either the unmap bit set or not.

Also please add testcases for WRITE SAME including corner cases like
incorrect transfer length to the scsi testsuite to ensure this code
has proper QA coverage.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-11-06 Thread Christoph Hellwig
On Tue, Nov 06, 2012 at 07:06:42AM -0500, J. Bruce Fields wrote:
> On Tue, Nov 06, 2012 at 02:14:50PM +0400, Stanislav Kinsbursky wrote:
> > 09.10.2012 23:35, J. Bruce Fields ??:
> > >Cc'ing Eric since I seem to recall he suggested doing it this way?
> > >
> > >Seems OK to me, but maybe that swap_root should be in common code?  (Or
> > >maybe we could use set_fs_root()?)
> > >
> > 
> > This patch is not good since, as Eric mentioned, all kernel threads
> > share same fs struct.
> > We can swap whole fs struct. Or we can unshare fs struct
> > (unshare_fs_struct() is exported) and swap root in this case.
> > But this approach is to close to set_fs_root() logic, which is not
> > exported and seems there are some valid reasons for it.
> 
> What are those reasons?
> 
> Googling found one previous thread:
> 
>   http://thread.gmane.org/gmane.linux.kernel/1259986/focus=47687
> 
> There Trond requests an ACK from Al or Cristoph for the export, but I
> don't see either an ACK or any objection.

I really don't think messing with current->fs for workqueue worker
threads is a good idea, as the worker threads are shared by different
workqueues and thus this can easily cause havoc for entirely unrelated
subsystems.

To do this properly you'll need to avoid current->fs references in the
sunrpc code.

And just in case it wasn't clear: the hack in this iteration is even
worse than the original.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-11-06 Thread Christoph Hellwig
On Tue, Nov 06, 2012 at 08:07:06AM -0500, J. Bruce Fields wrote:
> So you're worried that a bug in the nfs code could modify the root and
> then not restore it?

At least the link you pointed to earlier never sets it back.  Instead
of messing with it I'd rather have the sunrpc code use vfs_path_lookup
and not care about current->fs->root at all.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xfs: return -EINVAL instead of -EUCLEAN when mounting non-xfs

2013-01-03 Thread Christoph Hellwig
On Sun, Dec 30, 2012 at 02:16:50AM +0300, Sergei Trofimovich wrote:
> It fixes boot panic when trying to boot from btrfs filesystem.
> kernel tries to mount as xfs and gets fatal -EUCLEAN:
> 
> [0.17] VFS: Cannot open root device "ubda" or unknown-block(98,0): 
> error -117
> [0.17] Please append a correct "root=" boot option; here are the 
> available partitions:
> [0.17] 6200 1048576 ubda  driver: uml-blkdev
> [0.17] Kernel panic - not syncing: VFS: Unable to mount root fs on 
> unknown-block(98,0)
> 
> init/do_mounts.c expects only -EINVAL as 'retry another' option.
> Fixes regression introduced by commit 98021821a502db347bd9c7671b6e8ce07ea6

Looks reasonable, but  think xfs_readsb should simply be changed to
turn all EFSCORRUPTED returns into EINVAL if loud is not set.  The
place that changes the errno value would also be a perfect place to
comment why we are doing this in the code so that this knowledge doesn't
get lost.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/3] mm: Update file times when inodes are written after mmaped writes

2012-12-22 Thread Christoph Hellwig
NAK, we went through great trouble to get rid of the nasty layering
violation where the VM called file_update_time directly just a short
while ago, reintroducing that is a massive step back.

Make sure whatever "solution" for your problem you come up with keeps
the file update in the filesystem or generic helpers.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] target/iblock: Add WRITE_SAME w/ UNMAP=0 emulation support

2012-11-15 Thread Christoph Hellwig
On Thu, Nov 15, 2012 at 11:29:46AM -0800, Nicholas A. Bellinger wrote:
> Well at least for the latter that is because UNMAP=0 does not have a
> payload.  ;)

UNMAP=0 does have a payload, we just ignore it.  In fact I was told
that targets should check for a completely zeroed sector sized payload
for it if being pedantic.  I can't really find the justification for
that in the standard - the closest thing to it is the stance about
ignoring the unmap bit if the device is fully provisioned.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: revert commit bbdd6808 to fallocate UAPI

2012-11-20 Thread Christoph Hellwig
On Tue, Nov 20, 2012 at 10:04:27AM +1100, Dave Chinner wrote:
> The method of pushing of such a commit (i.e. written, committed and
> pushed by a tree maintainer as part of a larger subsystem merge)
> could be seen as designed to avoid review and discussion of a
> controversial change that is likely to be NAKed. A long-term
> subsystem maintainer should know better than to push changes in this
> manner.


Agreed, this needs to be reverted.

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Christoph Hellwig
Having RCU for modification mostly workloads never is a good idea, so
I don't think it makes sense to mention it here.

If you care about the overhead it's worth trying to use per-cpu lists,
though.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >