On Fri, 2 Apr 2010, Ask Bj?rn Hansen wrote:
I can't believe I'm doing this, but ...
:-) All for entertainment's sake...
The main point here is that we can't use 20 inodes per distribution. It's Just
Nuts. Sure, it's only something like 400k files/inodes now - but at the rate
it's going it'll be a lot more soon enough.
Thats a problem, but not likely the biggest drag on server I/O you're
suffering. Might that be <ahem> rsync?
That reply doesn't even make sense.
Then you've ignored most of this thread. Inode counts themselves aren't
indicative of anything. It's the I/O access patterns that are. And my
assertion has been that the excessive stats by the server are a bigger
impediment to synchronization than the inode count.
You're right, I'm not arguing the need for the cruft. I've only pointed out
the obvious reality that trimming files only postpones the I/O management
issues that at some time are likely going to have to be addressed, anyway.
And that you'll get less bang for the buck (or man hour) by treating the
symptoms, not the disease.
For the record: if that's what you want to do, have at it. Let's just not
be disingenuous about the fact that we're abrogating our responsibilities as
technologists by refusing to address the real problems and weaknesses of the
platform.
You are confusing "we", "I" and "you" again.
Perhaps.
....
Yes, I (and I'm guessing everyone else who have thought about it for more than say 5
seconds) agree that having rsync remember the file tree to save the disk IO for each sync
sounds like an "obvious solution".
But reality is more complicated. If it was such an obviously good solution someone would
have done it by now. (For starters play this question: "What is the kernel
cache?").
It hasn't been done because its outside of the scope of design for rsync.
It's meant to sync arbitrary filesets in which many, if not all, changes are
made out of band. It's decidely non-trivial to implement in that mode
unless you're willing to accept a certain window in which your database may
be out of date.
But, in a situation like PAUSE, where the avenues in which files can be
introduced into the file sets is controlled, it does become trivial. It's
the gatekeeper, it knows who's been in or out.
Andreas' solution is much more sensible -- and as have been pointed out before
we DO USE THAT; but the problem here is not with clients who are interested
enough to do something special and dedicate resources to their CPAN mirroring.
By all means, I'm not opposed to any solution that actually addresses the
problem. I don't agree that would be the fast time to implementation, but
no questions as to whether File::Rsync::Mirror::Recent would help things.
I'd support (and help) that goal.
My objections are more properly directed to those stuck on just deleting
files from the tree.
--Arthur Corliss
Live Free or Die