On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <tstarl...@wikimedia.org> wrote:
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
> I've put the two log files up on the web, at:
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
> -- Tim Starling
I have to say this is super cool. It's like digging up a time capsule
right before the 10th anniversary. One of my favorite early edits:

"This is the new WikiPedia!  The idea here is to write a complete
encyclopedia from scratch, without peer review process, etc.
Some people think that this may be a hopeless endeavor, that
the result will necessarily suck.  We aren't so sure.  So, let's get
to work!"


foundation-l mailing list
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to