subject:"\[Foundation\-l\] Wikistats is back"

Re: [Foundation-l] Wikistats is back

2009-01-05 Thread Brion Vibber

On 12/24/08 3:31 PM, Brian wrote: > I am still quite shocked at the amount of time the english wikipedia takes > to dump, especially since we seem to have close links to folks who work at > mysql. To me it seems that one of two things must be the case: > > 1. Wikipedia has outgrown mysql, in the se

[Foundation-l] Wikistats is back

2009-01-02 Thread Erik Zachte

A week ago I published new wikistats files, for the first time in 7 months, only to retract them 2 days later, when it turned out that counts for some wikis were completely wrong. After some serious bug hunting I nailed the creepy creature that had been hiding in an unexpected corner (most bugs fin

Re: [Foundation-l] Wikistats is back

2009-01-02 Thread Samuel Klein

Woo!! Thank you belatedly for my new years' dose of infodisiac. --SJ On Wed, Dec 24, 2008 at 5:50 PM, Erik Zachte wrote: > New wikistats reports have been published today, for the first time since > May 2008. The reports have been generated on the new wikistats server > 'Bayes', which is opera

Re: [Foundation-l] Wikistats is back

2009-01-02 Thread Gerard Meijssen

Hoi, On that note ... http://hardware.slashdot.org/article.pl?sid=09/01/02/1546214 Thanks, GerardM 2009/1/1 geni > 2008/12/25 Gerard Meijssen : > > Hoi, > > It is not one either. It has been said repeatedly that the process of a > > straightforward back up is something that is done on a reg

Re: [Foundation-l] Wikistats is back

2009-01-01 Thread geni

2008/12/25 Gerard Meijssen : > Hoi, > It is not one either. It has been said repeatedly that the process of a > straightforward back up is something that is done on a regular basis. No it hasn't -- geni ___ foundation-l mailing list foundation-l@lists

Re: [Foundation-l] Wikistats is back to May 2008 version

2008-12-25 Thread Ziko van Dijk

Beste Erik, Kan gebeuren, ik verwacht des te meer met spanning de nieuwe cijfers. Goed dat je het even nog hebt gemeldt, want ik was al een bijdraag voor een maillinglist aan het schrijven over de heel lage cijfers voor Duitse nieuwe wikipedianen. Erik, het was mooi om je te leren kennen, en in 20

Re: [Foundation-l] Wikistats is back

2008-12-25 Thread Aryeh Gregor

On Wed, Dec 24, 2008 at 7:09 PM, Brian wrote: > Interesting. I realize that the dump is extremely large, but if 7zip is > really the bottleneck then to me the solutions are straightforward: > > 1. Offer an uncompressed version of the dump for download. Bandwidth is > cheap and downloads can be res

[Foundation-l] Wikistats is back to May 2008 version

2008-12-25 Thread Erik Zachte

There is something seriously wrong with the figures for some wikipedias in the new wikistats reports. The figures for some wikis are much too low. When comparing csv files (raw counts) produced in May 2008 and produced recently it is quite easy to tell the difference. For some wikis the data for mo

Re: [Foundation-l] Wikistats is back

2008-12-25 Thread John Vandenberg

On 12/25/08, Gerard Meijssen wrote: > Hoi, > It is not one either. It has been said repeatedly that the process of a > straightforward back up is something that is done on a regular basis. This > however includes a lot of information that we do not allow to be included in > the data export that is

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Gerard Meijssen

Hoi, It is not one either. It has been said repeatedly that the process of a straightforward back up is something that is done on a regular basis. This however includes a lot of information that we do not allow to be included in the data export that is made available to the public. So never mind wh

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread David Gerard

2008/12/25 geni : > I'd more be thinking of handing over a stack of hard drives to > wikimedia chapter reps at wikimania . 2TB external hard disk, gzip on the fly (gzipping is faster than the network - remember, Wikimedia gzips data going between internal servers in the same rack because CPU is

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread geni

2008/12/25 David Gerard : > 2008/12/25 Brian : > >> But at least this would allow Erik, researchers and archivers to get the >> dump faster than they can get the compressed version. The number of people >> who want this can't be > 100, can it? It would need to be metered by an API >> I guess. > > >

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Robert Rohde

On Wed, Dec 24, 2008 at 6:29 PM, Brian wrote: > I'm also curious, what is the estimated amount of time to decompress this > thing? Somewhere around 1 week, I'd guesstimate. -Robert Rohde ___ foundation-l mailing list foundation-l@lists.wikimedia.org U

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread David Gerard

2008/12/25 Brian : > But at least this would allow Erik, researchers and archivers to get the > dump faster than they can get the compressed version. The number of people > who want this can't be > 100, can it? It would need to be metered by an API > I guess. Maybe we can run a sneakernet of DLT

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian

I'm also curious, what is the estimated amount of time to decompress this thing? On Wed, Dec 24, 2008 at 7:24 PM, Brian wrote: > But at least this would allow Erik, researchers and archivers to get the > dump faster than they can get the compressed version. The number of people > who want this c

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian

But at least this would allow Erik, researchers and archivers to get the dump faster than they can get the compressed version. The number of people who want this can't be > 100, can it? It would need to be metered by an API I guess. Cheers, Brian On Wed, Dec 24, 2008 at 7:18 PM, Robert Rohde wro

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Robert Rohde

On Wed, Dec 24, 2008 at 6:05 PM, Brian wrote: > Hi Robert, > > I'm not sure I agree with you.. > > (3 terabytes / 10 megabytes) seconds in days = 3.64 days > > That is, on my university connection I could download the dump in just a few > days. The only cost is bandwidth. While you might be corre

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian

Hi Robert, I'm not sure I agree with you.. (3 terabytes / 10 megabytes) seconds in days = 3.64 days That is, on my university connection I could download the dump in just a few days. The only cost is bandwidth. On Wed, Dec 24, 2008 at 6:46 PM, Robert Rohde wrote: > On Wed, Dec 24, 2008 at 4:0

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Robert Rohde

On Wed, Dec 24, 2008 at 4:09 PM, Brian wrote: > Interesting. I realize that the dump is extremely large, but if 7zip is > really the bottleneck then to me the solutions are straightforward: > > 1. Offer an uncompressed version of the dump for download. Bandwidth is > cheap and downloads can be res

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread David Gerard

2008/12/25 Erik Zachte : > Hi Brian, Brion once explained to me that the post processing of the dump is > the main bottleneck. > Compressing articles with tens of thousands of revisions is a major resource > drain. > Right now every dump is even compressed twice, into bzip2 (for wider > platform c

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian

Also, I wonder if these folks have been consulted for their expertise in compressing wikipedia data: http://prize.hutter1.net/ On Wed, Dec 24, 2008 at 5:09 PM, Brian wrote: > Interesting. I realize that the dump is extremely large, but if 7zip is > really the bottleneck then to me the solutions

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian

Interesting. I realize that the dump is extremely large, but if 7zip is really the bottleneck then to me the solutions are straightforward: 1. Offer an uncompressed version of the dump for download. Bandwidth is cheap and downloads can be resumed, unlike this dump process 2. The WMF offers a servi

[Foundation-l] Wikistats is back

2008-12-24 Thread Erik Zachte

Hi Brian, Brion once explained to me that the post processing of the dump is the main bottleneck. Compressing articles with tens of thousands of revisions is a major resource drain. Right now every dump is even compressed twice, into bzip2 (for wider platform compatibility) and 7zip format (for 2

[Foundation-l] Wikistats is back

2008-12-24 Thread Erik Zachte

John: > For the "Page Views" data on some projects, the May data > looks unusually lower than the June data; > could it be that the May data isn't > a complete month for some projects? Yes, that is indeed the case. I will omit the incomplete month on subsequent reports. Erik Zachte __

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Jon

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you Erik! Erik Zachte wrote: > New wikistats reports have been published today, for the first time since > May 2008. The reports have been generated on the new wikistats server > ‘Bayes’, which is operational since a few weeks. The dump process

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread Brian

Nice work Erik! I am still quite shocked at the amount of time the english wikipedia takes to dump, especially since we seem to have close links to folks who work at mysql. To me it seems that one of two things must be the case: 1. Wikipedia has outgrown mysql, in the sense that, while we can put

Re: [Foundation-l] Wikistats is back

2008-12-24 Thread John Vandenberg

Thank you Erik! For the "Page Views" data on some projects, the May data looks unusually lower than the June data; could it be that the May data isnt a complete month for some projects? http://stats.wikimedia.org/wikisource/EN/TablesPageViewsMonthly.htm http://stats.wikimedia.org/wikiquote/EN/Tab

[Foundation-l] Wikistats is back

2008-12-24 Thread Erik Zachte

New wikistats reports have been published today, for the first time since May 2008. The reports have been generated on the new wikistats server Bayes, which is operational since a few weeks. The dump process itself had been restarted some weeks earlier, new dumps are now available for all 700+ w

Re: [Foundation-l] Wikistats is back

[Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back to May 2008 version

Re: [Foundation-l] Wikistats is back

[Foundation-l] Wikistats is back to May 2008 version

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

[Foundation-l] Wikistats is back

[Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

Re: [Foundation-l] Wikistats is back

[Foundation-l] Wikistats is back

28 matches

Site Navigation

Mail list logo

Footer information