On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <tstarl...@wikimedia.org> wrote: > I was looking through some old files in our SourceForge project. I > opened a file called wiki.tar.gz, and inside were three complete > backups of the text of Wikipedia, from February, March and August 2001! > > This is exciting, because there is lots of article history in here > which was assumed to be lost forever. > > I've long been interested in Wikipedia's history, and I've tried in > the past to locate such backups. I asked various people who might have > had one. I had given up hope. > > The history of particularly old Wikipedia articles, as seen in the > present Wikipedia database, is incomplete, due to Usemod's policy of > deleting old revisions of pages after about a month. The script which > Brion wrote to import the article histories from UseMod to MediaWiki > only fetched those revisions which hadn't been purged yet. > > I didn't want to believe that those revisions had been lost forever, > and I even opened the UseMod source code and stared forlornly at the > unlink() call. What I (and Brion before) missed is that UseMod appends > a record of every change made to two files, called diff_log and rclog. > In these two files is a record of every change made to Wikipedia from > January 15 to August 17, 2001. > > I've put the two log files up on the web, at: > > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z > > The 7-zip archive is only 8.4MB -- much more manageable than today's > backups. > > rclog contains IP addresses. The Usemod software made IP addresses of > logged-in users public, so the people who made these edits had no > expectation that their IP address would be kept private. That, coupled > with the passage of time, makes me think that no harm to user privacy > can come from releasing these files. > > -- Tim Starling >
I have to say this is super cool. It's like digging up a time capsule right before the 10th anniversary. One of my favorite early edits: "This is the new WikiPedia! The idea here is to write a complete encyclopedia from scratch, without peer review process, etc. Some people think that this may be a hopeless endeavor, that the result will necessarily suck. We aren't so sure. So, let's get to work!" -Chad _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l