On Sun, 17 May 2015 19:56:26 -0700 Linus Torvalds
wrote:
> On Sun, May 17, 2015 at 4:16 PM, NeilBrown wrote:
> >
> > Just to be crystal clear about what I want:
> > I want the filesystem to be in control
>
> Yeah, no. Not going to happen.
>
> You seem to think that the dcache is "just" a cac
On Sun, May 17, 2015 at 8:42 PM, Al Viro wrote:
>
> "Rest of the path" makes no sense, obviously. "More of the path" (and _not_
> as a string, TYVM - we have those components in ->d_name.name of dentries we
> want revalidated [..])
For revalidate, yes we kind of have them as dentries. I say kind
On Sun, May 17, 2015 at 07:56:26PM -0700, Linus Torvalds wrote:
> > So for Al's example of revalidating multiple components at once, once the
> > VFS
> > gets to a point in the path where d_revalidate says "I need more time",
> > the VFS just passes the rest of the path to the filesystem.
>
>
On Sun, May 17, 2015 at 4:16 PM, NeilBrown wrote:
>
> Just to be crystal clear about what I want:
> I want the filesystem to be in control
Yeah, no. Not going to happen.
You seem to think that the dcache is "just" a cache. It's not. It's a
cache, but that is absolutely not all that it is. It's
On Mon, May 18, 2015 at 09:39:07AM +1000, NeilBrown wrote:
> There is no reason to be so gloomy.
RTFS.
> The VFS would provide a generic_do_last() (or whatever) which handles
> everything correctly for local filesystems which keep the dcache precisely
> consistent and use it for all the valuable
On Sun, 17 May 2015 11:55:35 +0100 Al Viro wrote:
> As for Neil's point re do_last() and friends being much too convoluted - yes,
> they are. And it's not a result of trying to shoehorn everything in one
> model. "Just let NFS have at it" as soon as we reach do_last() won't make
> things any si
On Sun, 17 May 2015 09:43:34 -0700 Linus Torvalds
wrote:
> On Sun, May 17, 2015 at 3:55 AM, Al Viro wrote:
> >
> > And that is complete crap. Multi-component lookups do make sense; once
> > we are at the edge of the area present in dcache, we _know_ there won't
> > be any existing mountpoints i
On Sun, May 17, 2015 at 9:43 AM, Linus Torvalds
wrote:
>
> d_instantiate(dentry, inode);
>
> could decide that *before* it does that "d_instantiate()", it could
> pre-populate the child list of 'dentry' with the lookup information
> for 'b' (and possibly recursively for 'c' too under 'd').
On Sun, May 17, 2015 at 3:55 AM, Al Viro wrote:
>
> And that is complete crap. Multi-component lookups do make sense; once
> we are at the edge of the area present in dcache, we _know_ there won't
> be any existing mountpoints involved; parsing the components and feeding
> them to fs at once, alo
On Sat, May 16, 2015 at 09:04:34PM -0700, Linus Torvalds wrote:
> It's now about things like overlayfs etc, all those things.
Er... Bad example, that - overlayfs is _not_ fs-agnostic.
> When somebody does a lookup of a filename, it is not a "pass this
> filename to the filesystem". It very much
On Sat, 16 May 2015 21:04:34 -0700 Linus Torvalds
wrote:
> On Sat, May 16, 2015 at 8:48 PM, Linus Torvalds
> wrote:
> >
> > Sorry, but that really is how it is. NFS isn't special enough for some
> > badly designed lookup models to matter one whit.
>
> Btw, it's not just about performance, altho
On Sat, May 16, 2015 at 8:48 PM, Linus Torvalds
wrote:
>
> Sorry, but that really is how it is. NFS isn't special enough for some
> badly designed lookup models to matter one whit.
Btw, it's not just about performance, although the whole "we can do
cached lookups without ever having to et the fil
On Sat, May 16, 2015 at 8:12 PM, NeilBrown wrote:
>
> The problem isn't getting intermediates. The problem is that not having
> intermediates confuses the dcache. When the dcache is just providing a
> caching service, and not providing a consistency service, then it shouldn't
> let itself get co
On Sat, 16 May 2015 15:18:11 +0100 Al Viro wrote:
> On Sat, May 16, 2015 at 06:46:26AM +0100, Al Viro wrote:
>
> > Dealing with multi-component lookups isn't impossible and might be a good
> > idea, but only if all intermediates are populated. What information does
> > NFSv4 multi-component loo
On Sat, 16 May 2015 06:46:26 +0100 Al Viro wrote:
> On Sat, May 16, 2015 at 02:45:27PM +1000, NeilBrown wrote:
>
> > Yes, I've looked lately :-)
> > I think that all of RCU-walk, and probably some of REF-walk should happen
> > before the filesystem gets to see anything.
> > But once you hit a no
On Fri, May 15, 2015 at 9:31 PM, Al Viro wrote:
=>
> Point, but... A lot of our problems comes from the fact that ->i_mutex
> doubles as protection against the addition to the list of children, on
> top of protection of directory itself.
Yeah, ok, we'd need to change that too. Maybe just make it
On Sat, May 16, 2015 at 06:46:26AM +0100, Al Viro wrote:
> Dealing with multi-component lookups isn't impossible and might be a good
> idea, but only if all intermediates are populated. What information does
> NFSv4 multi-component lookup give you? 9p one gives an array of FIDs,
> one per compon
On Sat, May 16, 2015 at 02:45:27PM +1000, NeilBrown wrote:
> Yes, I've looked lately :-)
> I think that all of RCU-walk, and probably some of REF-walk should happen
> before the filesystem gets to see anything.
> But once you hit a non-positive dentry or the parent of the target name, I'd
> rather
On Sat, 16 May 2015 02:47:18 +0100 Al Viro wrote:
> On Sat, May 16, 2015 at 11:25:03AM +1000, NeilBrown wrote:
> > But surely those things can be managed with a spinlock.
> >
> > I think a big part of the problem is that the VFS tries to control
> > filesystems rather than provide services to th
On Fri, May 15, 2015 at 08:37:20PM -0700, Linus Torvalds wrote:
> On May 15, 2015 8:17 PM, "Al Viro" wrote:
> >
> > What for? All we need is a flag, waitqueue and being woken
> > up when the flag gets cleared.
>
> You need to have the flag somewhere.
>
> The child dentry doesn't exist y
On Fri, May 15, 2015 at 07:23:11PM -0700, Linus Torvalds wrote:
>For filesystems that say that they are ok with, make lookup_slow()
> (and *only* lookup_slow for now) instead take the rwsem for reading,
> but in addition to that, take a hashed mutex.
>
> By "hashed mutex", I mean having a sma
On Fri, May 15, 2015 at 6:55 PM, Al Viro wrote:
>
> See upthread. It might be doable (provided that we turn ->i_mutex into
> rwsem, to keep the exclusion with directory _modifiers_), but it'll need
> a really non-trivial code review of a bunch of filesystems, especially ones
> that want to play w
On Fri, May 15, 2015 at 06:47:04PM -0700, Linus Torvalds wrote:
> Now, maybe we could solve it with a new sleeping lock in the dentry
> itself. Maybe we could allocate the new dentry early, add it to the
> directory the usual way, but mark it as being "not ready" (so that
> d_lookup() wouldn't use
On Sat, May 16, 2015 at 11:25:03AM +1000, NeilBrown wrote:
> But surely those things can be managed with a spinlock.
>
> I think a big part of the problem is that the VFS tries to control
> filesystems rather than provide services to them.
What with being the thing syscalls talk to for sending th
On Fri, May 15, 2015 at 6:25 PM, NeilBrown wrote:
>>
>>For example, simply that we only ever have one single dentry for a
>> particular name, and that we only ever have one active lookup per
>> dentry. Those things happen independently of - and before - the server
>> even sees the operation.
>
On Fri, May 15, 2015 at 05:45:56PM -0700, Linus Torvalds wrote:
> Al, do you have any ideas? Personally, I've wanted to make I_mutex a
> rwsem for a long time, but right now pretty much everything uses it
> for exclusion. For example, filename lookup is clearly just reading
> the directory, so it
On Fri, 15 May 2015 17:45:56 -0700 Linus Torvalds
wrote:
> On Fri, May 15, 2015 at 4:30 PM, NeilBrown wrote:
> >
> > .. and I've been wondering what to do about i_mutex and NFS. I've had
> > customer reports of slowness in creating files that seems to be due to
> > i_mutex on the directory bein
On Fri, May 15, 2015 at 4:38 PM, Dave Chinner wrote:
>
> Right, because it's cold cache performance that everyone complains
> about.
People really do complain about the hot-cache one too.
Did you read the description of the sample benchmark that Jeremy
described Windows sales people for using?
On Fri, May 15, 2015 at 4:30 PM, NeilBrown wrote:
>
> .. and I've been wondering what to do about i_mutex and NFS. I've had
> customer reports of slowness in creating files that seems to be due to
> i_mutex on the directory being held over the whole 'create' RPC, so only one
> of those can be in
On Sat, May 16, 2015 at 01:10:27AM +0100, Al Viro wrote:
> Er... Remember the clusterfuck around the ->i_size and alignment
> checks on XFS DIO writes? Just this cycle. Correctness of XFS
> locking is nothing to boast about - it *is* convoluted as hell and you
> guys are not superhuman enough t
On Sat, May 16, 2015 at 09:38:08AM +1000, Dave Chinner wrote:
> > Both readdir() and path component lookup are technically read
> > operations, so why the hell do we use a mutex, rather than just
> > get a read-write lock for reading? Yeah, it's that (d) above. I
> > might trust xfs and ext4 to ge
On Sat, May 16, 2015 at 09:30:22AM +1000, NeilBrown wrote:
> .. and I've been wondering what to do about i_mutex and NFS. I've had
> customer reports of slowness in creating files that seems to be due to
> i_mutex on the directory being held over the whole 'create' RPC, so only one
> of those can
On Thu, May 14, 2015 at 04:57:22PM -0700, Jeremy Allison wrote:
> On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
> > On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison wrote:
> > >
> > > Of course we tell people to just set their filesystems
> > > up using mkfs.xfs -n version=ci :-).
On Thu, May 14, 2015 at 08:51:12AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 4:23 AM, Dave Chinner wrote:
> >
> > IIRC, ext4 readdir is not slow because of the use of the buffer
> > cache, it's slow because of the way it hashes dirents across blocks
> > on disk. i.e. it has locality
On Fri, May 15, 2015 at 03:15:48PM -0600, Andreas Dilger wrote:
> On May 14, 2015, at 5:23 AM, Dave Chinner wrote:
> >
> > On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> >> On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote:
> >>>
> >>> Maybe... I'd like to see the profiles, TBH
On Fri, 15 May 2015 15:15:48 -0600 Andreas Dilger wrote:
> On May 14, 2015, at 5:23 AM, Dave Chinner wrote:
> >
> > On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> >> On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote:
> >>>
> >>> Maybe... I'd like to see the profiles, TBH - es
On May 14, 2015, at 5:23 AM, Dave Chinner wrote:
>
> On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
>> On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote:
>>>
>>> Maybe... I'd like to see the profiles, TBH - especially getxattr() and
>>> access() frequency on various loads. Sure,
On Thu, May 14, 2015 at 7:51 PM, Al Viro wrote:
>
> What's the benefit compared to c-i mount? Not hitting filesystem's
> ->d_hash() and ->d_compare()?
So the reason I'd be interested in per-access flags rather than mount flags are:
- only special apps should use this anyway. IOW, samba and per
On Thu, May 14, 2015 at 07:18:16PM -0700, Linus Torvalds wrote:
> The only difference - EVER - would be if you pass in the ICASE flag.
> Nothing I suggested would change semantics without it (the _hash_
> changes, but that doesn't change semantics, it's a purely internal
> random number).
>
> Now,
On Thu, May 14, 2015 at 6:26 PM, Al Viro wrote:
>
> Hold on. Should
> stat("blah", &buf) => ENOENT, OK, let's create it
> mkdir("blah", 0)=> EEXIST, bugger, looks like a race
> stat("blah", &buf) => ENOENT, Whiskey, Tango, Foxtrot
> be possible?
No. What
On Thu, May 14, 2015 at 05:25:39PM -0700, Linus Torvalds wrote:
> We can easily make things per-operation, by adding another flag. We
> already have per-operation flags like LOOKUP_FOLLOW, which decides if
> we follow the last symlink or not. We could add a LOOKUP_ICASE, which
> decides whether we
On Thu, May 14, 2015 at 4:36 PM, Al Viro wrote:
> On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
>
>> So ASCII-only case-insensitivity is sufficient for you guys?
>>
>> Doing case-insensitive lookups at a vfs layer level wouldn't be
>> impossible (add some new lookup flag, so it w
On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison wrote:
> >
> > Of course we tell people to just set their filesystems
> > up using mkfs.xfs -n version=ci :-).
>
> So ASCII-only case-insensitivity is sufficient for you guys?
No it'
On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
> So ASCII-only case-insensitivity is sufficient for you guys?
>
> Doing case-insensitive lookups at a vfs layer level wouldn't be
> impossible (add some new lookup flag, so it would *not* be
> per-filesystem, it would be per-operati
On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison wrote:
>
> Of course we tell people to just set their filesystems
> up using mkfs.xfs -n version=ci :-).
So ASCII-only case-insensitivity is sufficient for you guys?
Doing case-insensitive lookups at a vfs layer level wouldn't be
impossible (add so
On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote:
> >
> > Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> > access() frequency on various loads. Sure, make(1) and cc(1) really care
> > about stat() very much
On Thu, May 14, 2015 at 8:51 AM, Linus Torvalds
wrote:
>
> Basically, in computer science, pretty much all performance work is
> about caching.
Credit where credit is due. Terje "almost all programming can be
viewed as an exercise in caching" Mathisen.
Linus
--
To unsubscribe f
Al Viro writes:
> In particular, automounts will require
> discussing what exactly in the process' state is used for those - both
> with autofs/NFS/AFS/CIFS folks and with Eric (what netns should be used
> when we are crossing an NFSv4 referral point? Should it come from the
> NFS mount we'd foun
On Thu, May 14, 2015 at 4:23 AM, Dave Chinner wrote:
>
> IIRC, ext4 readdir is not slow because of the use of the buffer
> cache, it's slow because of the way it hashes dirents across blocks
> on disk. i.e. it has locality issues, not a caching problem.
No, you're just worrying about IO. Natural
On Thu 14-05-15 21:23:04, Dave Chinner wrote:
> On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> > And readdir() itself, for that matter - we have no good vfs-level
> > readdir caching, so it all ends up serialized on the inode
> > semaphore, and it all goes all the way into the fi
On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote:
> >
> > Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> > access() frequency on various loads. Sure, make(1) and cc(1) really care
> > about stat() very much
On Wed, May 13, 2015 at 8:30 PM, Al Viro wrote:
>
> Maybe... I'd like to see the profiles, TBH - especially getxattr() and
> access() frequency on various loads. Sure, make(1) and cc(1) really care
> about stat() very much, but I wouldn't be surprised if something like
> httpd or samba would be
On Wed, May 13, 2015 at 06:39:53PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 3:25 PM, Al Viro wrote:
> > More on top of the current vfs.git#for-next (== the posted patchset
> > with a couple of fixes): more fs/namei.c reorganization and stack footprint
> > reduction (below 1Kb
On Wed, May 13, 2015 at 3:25 PM, Al Viro wrote:
> More on top of the current vfs.git#for-next (== the posted patchset
> with a couple of fixes): more fs/namei.c reorganization and stack footprint
> reduction (below 1Kb now). One interesting piece of that is that we don't
> touch current->
More on top of the current vfs.git#for-next (== the posted patchset
with a couple of fixes): more fs/namei.c reorganization and stack footprint
reduction (below 1Kb now). One interesting piece of that is that we don't
touch current->fs->lock anymore - unlazy_walk() used to, but now we can
55 matches
Mail list logo