Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Fastily
A little late to the party, I just learned about this change today. I maintain a number of bot tasks and database reports on enwp that rely on cross

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Huji Lee
I said it before, and I say it again: *some* databases should be available for cross-wiki JOIN everywhere. This would at least include commons_p and centralauth_p but perhaps also enwiki_p and meta_p I know that we discussed it before and better long-term solutions can be imagined (such as a data

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Aaron Halfaker
> > over a path of effort for the Clouds team It seems to me that the Cloud team is putting in all of the effort they can. I'm not sure where they would find more time and energy to implement a better solution. I imagine any better solution wouldn't be a matter of a few extra hours, but rather

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Huji Lee
I am not being critical of people (namely, the amazing Cloud team) here. I am being critical of decisions. That could even involve much higher level decisions e.g. should WMF have spent more money and hired more resources for this? It could very well be that I am uninformed, and these decisions wer

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Kimmo Virtanen
Hi, > This is painful. I think you raised some really good points about cross-joins with Central Auth and Commons as those are *designed* to be cross-referenced from other wikis. But ultimately, if there's no a reasonable way to do it in the software (Maria DB) we have available, implementing o

[Cloud] Wikimedia remote hackathon on May 22/23!

2021-03-31 Thread Birgit Müller
Hello All, Many of us were hopeful that we would be able to organise an onsite hackathon and meet in person in 2021. While this is sadly not the case, we still wanted to offer the opportunity for the technical community to get together virtually, work together on various projects, and discuss new

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Joaquin Oltra Hernandez
Hi Fastily, we are aware of the use case for matching commons pages/images/sha1s between commons/big wikis and other wikis, as it has come up many times. I'm cataloging all the comments and examples that have come up in the last 5 months in order to provide categorized input to the parent task

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Roy Smith
Is it feasible to do a log analysis of the database servers to find out what tools are (were?) using cross-wiki joins? At least that would ensure that all the tool owners could be contacted directly to make sure they know this is happening. > On Mar 31, 2021, at 3:46 PM, Joaquin Oltra Hernande

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Brooke Storm
> On Mar 31, 2021, at 2:20 PM, Roy Smith wrote: > > Is it feasible to do a log analysis of the database servers to find out what > tools are (were?) using cross-wiki joins? At least that would ensure that > all the tool owners could be contacted directly to make sure they know this > is hap

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Daniel Schwen
> > I run FastilyBot on a Raspberry Pi, and needless to say it would be > grossly impractical for me to perform a "join" in the bot's code. > Why not run it on WMF Cloud? In code joins will very likely work there and Cloud is supported. You are effectively asking to also support a second way here.

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Roy Smith
I'm just playing around on tools-sgebastion-08. I can dump the first 1 million image names about half a minute: > tools.spi-tools-dev:xw-join$ time mysql --defaults-file=$HOME/replica.my.cnf > -h commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select > img_name from image limit 10

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Brooke Storm
> On Mar 31, 2021, at 5:18 PM, Roy Smith wrote: > > I'm just playing around on tools-sgebastion-08. I can dump the first 1 > million image names about half a minute: > >> tools.spi-tools-dev:xw-join$ time mysql >> --defaults-file=$HOME/replica.my.cnf -h >> commonswiki.web.db.svc.wikimedia

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Roy Smith
Thanks for looking into this. I tried this again a little later, and it ran fine. Odd that the amount of memory used depends on the number of rows. I would expect it would stream results to stdout as they came in, but apparently not. Even weirder that the 100M example runs OOM in 10s, while