Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard
Hi, > Well, PowerPC "dcbt" does prefetch() correctly, it doesn't ever raise > exceptions, doesn't have any side effects, takes only enough CPU to > decode the address, and is ignored if it would have to do anything > other than load the cacheline from RAM. Prefetch streams are halted > w

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard
> OK, 200 cycles... > > But what is the cost of the conditional branch you added in prefetch(x) ? Much less than the tablewalk. On ppc64 a tablewalk of an address that is not populated in the hashtable will incur 2 cacheline lookups (primary and secondary buckets). This plus the MMU state machin

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard
> Yeah, I'm not at all surprised. Any implementation of "prefetch" that > doesn't just turn into a no-op if the TLB entry doesn't exist (which makes > them weaker for *actual* prefetching) will generally have a hard time with > a NULL pointer. Exactly because it will try to do a totally unneces

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Kyle Moffett
On Mar 08, 2007, at 02:24:04, Eric Dumazet wrote: Kyle Moffett a écrit : Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-by

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Bob Copeland
+ its done only when the path is needed.). Real filesystems probably + dont want to use it, because their dentries are present in global Pedantry: it's and don't? -Bob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PRO

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Valdis . Kletnieks
On Thu, 08 Mar 2007 08:24:04 +0100, Eric Dumazet said: > > But what is the cost of the conditional branch you added in prefetch(x) ? > > if (!x) return; > > (correctly predicted or not, but do powerPC have a BTB ?) > > About the NULL 'potential problem', maybe we could use a dummy nil (but > ma

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Linus Torvalds
On Thu, 8 Mar 2007, Eric Dumazet wrote: > > Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> > CC: Christoph Hellwig <[EMAIL PROTECTED]> > CC: Linus Torvalds <[EMAIL PROTECTED]> Acked-by: Linus Torvalds <[EMAIL PROTECTED]> Except you should fix the subject line when you send it out to Andrew ;)

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Christoph Hellwig
I'm sorry about complaining again, but.. > > + /* > + * Some filesystems want to provide dentry's pathname themselves, > + * instead of pre-building names at dentry creation. > + */ It's not _some_ filesystems. If real filesystem did this we'd be in horrible trouble. It's

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Eric Dumazet
Thanks again for your feedback Christoph :) I think I addressed all your remarks. Thank you [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname f

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Christoph Hellwig
On Thu, Mar 08, 2007 at 10:42:21AM +0100, Eric Dumazet wrote: > @@ -1823,6 +1823,9 @@ char * d_path(struct dentry *dentry, str > struct vfsmount *rootmnt; > struct dentry *root; > > + if (dentry->d_op && dentry->d_op->d_dname) > + return (dentry->d_op->d_dname)(dentry,

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Eric Dumazet
On Thursday 08 March 2007 09:56, Christoph Hellwig wrote: > This patch needs a lot more documentation. It needs some really big > comments on why this should never ever be used for a real filesystem > (real as in user mountable), and probably add an assert for that > invariant somewhere. Please a

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Christoph Hellwig
This patch needs a lot more documentation. It needs some really big comments on why this should never ever be used for a real filesystem (real as in user mountable), and probably add an assert for that invariant somewhere. Please also update Documentation/filesystems/Locking and Documentation/fil

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Michael K. Edwards
On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: No, I just checked, and Intel's own optimization manual makes it clear that you should be careful. They talk about performance penalties due to resource constraints - which makes tons of sense with a core that is good at handling its own resour

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
Kyle Moffett a écrit : Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-byte cacheline and has no serializing effects whatsoev

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Michael K. Edwards wrote: > > People's prejudices against prefetch instructions are sometimes > traceable to the 3DNow! prefetch(w) botch, which some processors > "support" as no-ops and others are too aggressive about (Opteron > prefetches are reputed to be "strong", i. e.,

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Kyle Moffett
On Mar 07, 2007, at 20:25:14, Michael K. Edwards wrote: On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote In general, using software prefetching is just a stupid idea, unless - the prefetch really is very strict (ie for a linked list you do exactly the above kinds of things to make sure th

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Michael K. Edwards
On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote Yeah, I'm not at all surprised. Any implementation of "prefetch" that doesn't just turn into a no-op if the TLB entry doesn't exist (which makes them weaker for *actual* prefetching) will generally have a hard time with a NULL pointer. Exactly b

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Anton Blanchard wrote: > > Funny you mention this. We found some noticeable ppc64 regressions when > moving the dcache to standard list macros and had to do this to fix it > up: > > static inline void prefetch(const void *x) > { > if (unlikely(!x)) >

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Anton Blanchard
Hi, > Well, I hope a prefetch(NULL) is OK because we are doing millions of > them (see include/linux/list.h :) ) Funny you mention this. We found some noticeable ppc64 regressions when moving the dcache to standard list macros and had to do this to fix it up: static inline void prefetch(const v

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > I was thinking about being able to cache the name into the dentry, do you > think it's worth the pain ? (its not SMP safe for example...) Actually, it *can* be SMP-safe, if you do it right. Something like len = dentry->d_name.len; i

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
> Not only might memcpy() do a "prefetch for read" on the source for some > architectures (which in turn may end up being slow for an address that > isn't in the TLB, like NULL), but you depend on a very much internal Well, I hope a prefetch(NULL) is OK because we are doing millions of them (see

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > OK no problem here is the patch without messing f_path.mnt All right. I like it. Definitely worth putting into -mm, or just re-sending to me after 2.6.21 is out (I'll forget all about it otherwise). I have one more worry, namely this:: -

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
On Wednesday 07 March 2007 18:45, Linus Torvalds wrote: > On Wed, 7 Mar 2007, Eric Dumazet wrote: > > sockets already uses file->private_data. > > > > But calls to read()/write() (not send()/recv()) still need to go through > > the dentry, before entering socket land. > > Sure. The dentry and the i

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > sockets already uses file->private_data. > > But calls to read()/write() (not send()/recv()) still need to go through the > dentry, before entering socket land. Sure. The dentry and the inode need to *exist*, but they can be one single static dentr

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
(resending with a more convenient attachment) Please find enclosed the following patch, to prepare this path. [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to be

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
On Wednesday 07 March 2007 18:02, Linus Torvalds wrote: > On Wed, 7 Mar 2007, Eric Dumazet wrote: > > I would definitly *love* saving dentries for pipes (and sockets too), but > > how are you going to get the inode ? > > Don't use an inode at all. Lovely :) > > > pipes()/sockets() can use read()/

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > Crazy ideas : (some readers are going to kill me) First off, as noted earlier, you don't need crazy ideas. But: > 1) Use the low order bit of f_path.dentry to say : this pointer is not a > pointer to a dentry but the inode pointer (with the low ord

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > I would definitly *love* saving dentries for pipes (and sockets too), but how > are you going to get the inode ? Don't use an inode at all. > pipes()/sockets() can use read()/write()/rw_verify_area() and thus need > file->f_path.dentry->d_inode (so e

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Christoph Hellwig
On Wed, Mar 07, 2007 at 08:16:14AM +0100, Eric Dumazet wrote: > Crazy ideas : (some readers are going to kill me) > > 1) Use the low order bit of f_path.dentry to say : this pointer is not a > pointer to a dentry but the inode pointer (with the low order bit set to 1) > > OR > > 2) file->f_path

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet
Eric Dumazet a écrit : I would definitly *love* saving dentries for pipes (and sockets too), but how are you going to get the inode ? pipes()/sockets() can use read()/write()/rw_verify_area() and thus need file->f_path.dentry->d_inode (so each pipe needs a separate dentry) Are you suggesti

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Wed, 7 Mar 2007, Eric Dumazet wrote: > I would definitly *love* saving dentries for pipes (and sockets too), but how > are you going to get the inode ? I was not planning to touch anything but epoll, signalfd and timerfd files. > pipes()/sockets() can use read()/write()/rw_verify_area() and

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet
Linus Torvalds a écrit : I assume that the *only* reason for having multiple dentries is really just the output in /proc//fd/, right? Or is there any other reason to have separate dentries for these pseudo-files? It's a bit sad to waste that much memory (and time) on something like that. I

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Tue, 6 Mar 2007, Linus Torvalds wrote: > > > On Tue, 6 Mar 2007, Davide Libenzi wrote: > > > > I'm OK with everything that avoid code duplication due to those fake > > inodes. The change can be localized inside the existing API, so it doesn't > > really affect me externally. > > Can you t

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Linus Torvalds
On Tue, 6 Mar 2007, Davide Libenzi wrote: > > I'm OK with everything that avoid code duplication due to those fake > inodes. The change can be localized inside the existing API, so it doesn't > really affect me externally. Can you try with the first patch version that doesn't do anything spec

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Tue, 6 Mar 2007, Linus Torvalds wrote: > > [ Al Viro added to Cc: as the arbiter of good taste in the VFS layer. He > has veto powers even over my proposals ;^] > > On Tue, 6 Mar 2007, Davide Libenzi wrote: > > > > I currently have the dentry to carry the name of the file* class it is >

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Linus Torvalds
[ Al Viro added to Cc: as the arbiter of good taste in the VFS layer. He has veto powers even over my proposals ;^] On Tue, 6 Mar 2007, Davide Libenzi wrote: > > I currently have the dentry to carry the name of the file* class it is > linked to. I'd prefer to keep it that way, unless there a

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Tue, 6 Mar 2007, Avi Kivity wrote: > Right. For kvmfs this isn't a problem as there's a 1:1 relationship > between synthetic inodes and dentries. Perhaps d_alloc_anon() could be > extended to avoid the scan if it's a problem. I currently have the dentry to carry the name of the file* class

Re: [patch] epoll use a single inode ...

2007-03-05 Thread Davide Libenzi
On Tue, 6 Mar 2007, Avi Kivity wrote: > > /* File callbacks that implement the eventpoll file behaviour */ > > static const struct file_operations eventpoll_fops = { > > .release= ep_eventpoll_close, > > @@ -763,15 +767,18 @@ > > * using the inode number. > > */ > > err

Re: [patch] epoll use a single inode ...

2007-03-05 Thread Eric Dumazet
Davide Libenzi a écrit : * using the inode number. */ error = -ENOMEM; - sprintf(name, "[%lu]", inode->i_ino); + sprintf(name, "[%p]", ep); this.name = name; this.len = strlen(name); Small remark : you can avoid strlen(), since sprintf gives