On 12/24/08 3:31 PM, Brian wrote:
> I am still quite shocked at the amount of time the english wikipedia takes
> to dump, especially since we seem to have close links to folks who work at
> mysql. To me it seems that one of two things must be the case:
>
> 1. Wikipedia has outgrown mysql, in the se
A week ago I published new wikistats files, for the first time in 7 months,
only to retract them 2 days later, when it turned out that counts for some
wikis were completely wrong. After some serious bug hunting I nailed the
creepy creature that had been hiding in an unexpected corner (most bugs fin
Woo!! Thank you belatedly for my new years' dose of infodisiac. --SJ
On Wed, Dec 24, 2008 at 5:50 PM, Erik Zachte wrote:
> New wikistats reports have been published today, for the first time since
> May 2008. The reports have been generated on the new wikistats server
> 'Bayes', which is opera
Hoi,
On that note ...
http://hardware.slashdot.org/article.pl?sid=09/01/02/1546214
Thanks,
GerardM
2009/1/1 geni
> 2008/12/25 Gerard Meijssen :
> > Hoi,
> > It is not one either. It has been said repeatedly that the process of a
> > straightforward back up is something that is done on a reg
2008/12/25 Gerard Meijssen :
> Hoi,
> It is not one either. It has been said repeatedly that the process of a
> straightforward back up is something that is done on a regular basis.
No it hasn't
--
geni
___
foundation-l mailing list
foundation-l@lists
Beste Erik,
Kan gebeuren, ik verwacht des te meer met spanning de nieuwe cijfers. Goed
dat je het even nog hebt gemeldt, want ik was al een bijdraag voor een
maillinglist aan het schrijven over de heel lage cijfers voor Duitse nieuwe
wikipedianen.
Erik, het was mooi om je te leren kennen, en in 20
On Wed, Dec 24, 2008 at 7:09 PM, Brian wrote:
> Interesting. I realize that the dump is extremely large, but if 7zip is
> really the bottleneck then to me the solutions are straightforward:
>
> 1. Offer an uncompressed version of the dump for download. Bandwidth is
> cheap and downloads can be res
There is something seriously wrong with the figures for some wikipedias in
the new wikistats reports. The figures for some wikis are much too low. When
comparing csv files (raw counts) produced in May 2008 and produced recently
it is quite easy to tell the difference. For some wikis the data for mo
On 12/25/08, Gerard Meijssen wrote:
> Hoi,
> It is not one either. It has been said repeatedly that the process of a
> straightforward back up is something that is done on a regular basis. This
> however includes a lot of information that we do not allow to be included in
> the data export that is
Hoi,
It is not one either. It has been said repeatedly that the process of a
straightforward back up is something that is done on a regular basis. This
however includes a lot of information that we do not allow to be included in
the data export that is made available to the public. So never mind wh
2008/12/25 geni :
> I'd more be thinking of handing over a stack of hard drives to
> wikimedia chapter reps at wikimania .
2TB external hard disk, gzip on the fly (gzipping is faster than the
network - remember, Wikimedia gzips data going between internal
servers in the same rack because CPU is
2008/12/25 David Gerard :
> 2008/12/25 Brian :
>
>> But at least this would allow Erik, researchers and archivers to get the
>> dump faster than they can get the compressed version. The number of people
>> who want this can't be > 100, can it? It would need to be metered by an API
>> I guess.
>
>
>
On Wed, Dec 24, 2008 at 6:29 PM, Brian wrote:
> I'm also curious, what is the estimated amount of time to decompress this
> thing?
Somewhere around 1 week, I'd guesstimate.
-Robert Rohde
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
U
2008/12/25 Brian :
> But at least this would allow Erik, researchers and archivers to get the
> dump faster than they can get the compressed version. The number of people
> who want this can't be > 100, can it? It would need to be metered by an API
> I guess.
Maybe we can run a sneakernet of DLT
I'm also curious, what is the estimated amount of time to decompress this
thing?
On Wed, Dec 24, 2008 at 7:24 PM, Brian wrote:
> But at least this would allow Erik, researchers and archivers to get the
> dump faster than they can get the compressed version. The number of people
> who want this c
But at least this would allow Erik, researchers and archivers to get the
dump faster than they can get the compressed version. The number of people
who want this can't be > 100, can it? It would need to be metered by an API
I guess.
Cheers,
Brian
On Wed, Dec 24, 2008 at 7:18 PM, Robert Rohde wro
On Wed, Dec 24, 2008 at 6:05 PM, Brian wrote:
> Hi Robert,
>
> I'm not sure I agree with you..
>
> (3 terabytes / 10 megabytes) seconds in days = 3.64 days
>
> That is, on my university connection I could download the dump in just a few
> days. The only cost is bandwidth.
While you might be corre
Hi Robert,
I'm not sure I agree with you..
(3 terabytes / 10 megabytes) seconds in days = 3.64 days
That is, on my university connection I could download the dump in just a few
days. The only cost is bandwidth.
On Wed, Dec 24, 2008 at 6:46 PM, Robert Rohde wrote:
> On Wed, Dec 24, 2008 at 4:0
On Wed, Dec 24, 2008 at 4:09 PM, Brian wrote:
> Interesting. I realize that the dump is extremely large, but if 7zip is
> really the bottleneck then to me the solutions are straightforward:
>
> 1. Offer an uncompressed version of the dump for download. Bandwidth is
> cheap and downloads can be res
2008/12/25 Erik Zachte :
> Hi Brian, Brion once explained to me that the post processing of the dump is
> the main bottleneck.
> Compressing articles with tens of thousands of revisions is a major resource
> drain.
> Right now every dump is even compressed twice, into bzip2 (for wider
> platform c
Also, I wonder if these folks have been consulted for their expertise in
compressing wikipedia data: http://prize.hutter1.net/
On Wed, Dec 24, 2008 at 5:09 PM, Brian wrote:
> Interesting. I realize that the dump is extremely large, but if 7zip is
> really the bottleneck then to me the solutions
Interesting. I realize that the dump is extremely large, but if 7zip is
really the bottleneck then to me the solutions are straightforward:
1. Offer an uncompressed version of the dump for download. Bandwidth is
cheap and downloads can be resumed, unlike this dump process
2. The WMF offers a servi
Hi Brian, Brion once explained to me that the post processing of the dump is
the main bottleneck.
Compressing articles with tens of thousands of revisions is a major resource
drain.
Right now every dump is even compressed twice, into bzip2 (for wider
platform compatibility) and 7zip format (for 2
John:
> For the "Page Views" data on some projects, the May data
> looks unusually lower than the June data;
> could it be that the May data isn't
> a complete month for some projects?
Yes, that is indeed the case. I will omit the incomplete month on subsequent
reports.
Erik Zachte
__
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Thank you Erik!
Erik Zachte wrote:
> New wikistats reports have been published today, for the first time since
> May 2008. The reports have been generated on the new wikistats server
> ‘Bayes’, which is operational since a few weeks. The dump process
Nice work Erik!
I am still quite shocked at the amount of time the english wikipedia takes
to dump, especially since we seem to have close links to folks who work at
mysql. To me it seems that one of two things must be the case:
1. Wikipedia has outgrown mysql, in the sense that, while we can put
Thank you Erik!
For the "Page Views" data on some projects, the May data looks
unusually lower than the June data; could it be that the May data isnt
a complete month for some projects?
http://stats.wikimedia.org/wikisource/EN/TablesPageViewsMonthly.htm
http://stats.wikimedia.org/wikiquote/EN/Tab
New wikistats reports have been published today, for the first time since
May 2008. The reports have been generated on the new wikistats server
Bayes, which is operational since a few weeks. The dump process itself had
been restarted some weeks earlier, new dumps are now available for all 700+
w
28 matches
Mail list logo