This is definitely a tremendous asset leading up to our big bday in January. I hope we can extract and post some of the real gems.
Thanks for the resourcefulness and the sharing, Tim. On Dec 14, 2010, at 10:04 AM, phoebe ayers wrote: > On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <tstarl...@wikimedia.org> wrote: >> I was looking through some old files in our SourceForge project. I >> opened a file called wiki.tar.gz, and inside were three complete >> backups of the text of Wikipedia, from February, March and August 2001! >> >> This is exciting, because there is lots of article history in here >> which was assumed to be lost forever. >> >> I've long been interested in Wikipedia's history, and I've tried in >> the past to locate such backups. I asked various people who might have >> had one. I had given up hope. >> >> The history of particularly old Wikipedia articles, as seen in the >> present Wikipedia database, is incomplete, due to Usemod's policy of >> deleting old revisions of pages after about a month. The script which >> Brion wrote to import the article histories from UseMod to MediaWiki >> only fetched those revisions which hadn't been purged yet. >> >> I didn't want to believe that those revisions had been lost forever, >> and I even opened the UseMod source code and stared forlornly at the >> unlink() call. What I (and Brion before) missed is that UseMod appends >> a record of every change made to two files, called diff_log and rclog. >> In these two files is a record of every change made to Wikipedia from >> January 15 to August 17, 2001. >> >> I've put the two log files up on the web, at: >> >> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z >> >> The 7-zip archive is only 8.4MB -- much more manageable than today's >> backups. >> >> rclog contains IP addresses. The Usemod software made IP addresses of >> logged-in users public, so the people who made these edits had no >> expectation that their IP address would be kept private. That, coupled >> with the passage of time, makes me think that no harm to user privacy >> can come from releasing these files. >> >> -- Tim Starling > > AWESOME. This is so cool. I've copied the research list too, since > there's many Wikipedia historians that will be eager to see the older > versions. > > I hope we can get them up in a browsable way, like nostalgia.wikipedia.org! > > -- phoebe > > _______________________________________________ > foundation-l mailing list > foundation-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l -- Jay Walsh Head of Communications WikimediaFoundation.org blog.wikimedia.org +1 (415) 839 6885 x 609, @jansonw _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l