On Sat, Feb 19, 2005 at 06:53:48PM +0100, Andrea Arcangeli wrote: > On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote: > > The linux-2.5 tree right now (I'm re-doing the conversion, and am up to > > October of last year, so far) is at 141M, if you don't count the pristine > > cache or working directory. That's already compressed, so you don't get > > any extra bonuses. Darcs stores with each changeset both the old and new > > versions of each hunk, which gives you some redundancy, and probably > > accounts for the factor of two greater size than CVS. This gives a bit of > > redundancy, which can be helpful in cases of repository corruption. > > Double size of the compressed backup is about the same as SVM with fsfs > (not tested on l-k tree but in something much smaller). Why not to > simply checksum instead of having data redundancy? Knowing when > something is corrupted is a great feature, but doing raid1 without the > user asking for it, is a worthless overhead.
There are internal issues that would cause trouble here--darcs assumes that if it knows a given patch, it also knows the patch's inverse. > > I hope to someday (when more pressing issues are dealt with) add a per-file > > cache indicating which patches affect which files, which should largely > > address this problem, although it won't help at all with files that are > > touched by most of the changesets, and won't be optimimal in any case. :( > > Wouldn't using the CVS format help an order of magnitude here? With > CVS/SCCS format you can extract all the patchsets that affected a single > file in a extremely efficient manner, memory utilization will be > extremely low too (like in cvsps indeed). You'll only have to lookup the > "global changeset file", and then open the few ,v files that are > affected and extract their patchsets. cvsps does this optimally > already. The only difference is that what cvsps is a "readonly" cache, > while with a real SCM it would be a global file that control all the > changesets in an atomic way. The catch is that then we'd have to implement a smart server to keep users from having to download the entire history with each update. That's not a fundamentally hard issue, but it would definitely degrade darcs' ease of use, besides putting additional load on the server. So if something like this were implemented, I think it would definitely have to be as an optional format. Also, we couldn't actually store the data in CVS/SCCS format, since in darcs a patch to a single file isn't uniquely determined by two states of that file. But storing separately the patches relevant to different files would definitely be an optimization worth considering. -- David Roundy http://www.darcs.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/