On Fri, Oct 16, 2009 at 10:31 AM, Anthony <wikim...@inbox.org> wrote: > On Fri, Oct 16, 2009 at 12:45 AM, jamesmikedup...@googlemail.com >> if you want only the last 3 revisions checked out , it takes about 10 >> seconds and produces 300k of data. > > 10 seconds? That's horrible. Have you tried using svn?
On a reasonably fast network it actually only about 10 seconds to pull the entire edit history from his repo, it would take less if the history has been repacked as I described— but that kind of tight repacking makes it take longer when you only want a portion of the history. Still— much of the neat things that can be done by having the article in git are only possible if you have the complete history, for example: generating a blame map needs the entire history. It would be nice if the git archival format was more efficient for the kinds of changes made in Wikipedia articles: Source code changes tends to have short lines and changes tend to change a significant portion of the lines, while edits on Wikipedia are far more likely to change only part of a very long line (really, a paragraph).... so working with line level deltas is efficient for source code while inefficient for Wikipedia data. On this repository a git fast-export --all | lzma -9 produces a 900kbyte output (505783 bytes if you want to be silly and use PAQ8HP12, which is pretty much the state of the art for English text, instead of LZMA). These methods don't provide fast random access but it's still clear that there is a lot of room for improvement. ;) I'm not sure if anyone is working on improved compression for git for these kinds of documents. Getting the entire history of a frequently edited article like this down to ~1-2mb is roughly where I think it's reasonable for someone doing continued non-trivial work on the article to fetch the entire history and thus gain access to functionality that needs most of the history. _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l