Re: [Cloud] [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-10 Thread Brooke Storm
This will be happening in around 10 minutes. ToolsDB will be read-only until we can get a consistent dump to rebuild replication. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm > On Nov 6, 2020, at 12:12 PM, Brooke Storm wrote: >

[Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread Joaquin Oltra Hernandez
TLDR: Wiki Replicas' architecture is being redesigned for stability and performance. Cross database JOINs will not be available and a host connection will only allow querying its associated DB. See [1] for more details. Hi! In

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread MusikAnimal
Hi! Most tools query just a single db at a time, so I don't think this will be a massive problem. However some such as Global Contribs[0] and GUC[1] can theoretically query all of them from a single request. Creating new connections on-the-fly seems doable in production, the issue is how to work on

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread Brooke Storm
Hi MA, You could still accomplish the local environment you are describing by using 8 ssh tunnels. All the database name DNS aliases go reference the section names, eventually (s1, s2, s3, s4 in the form of s1.analytics.db.svc.eqiad.wmflabs, etc.). An app could be written to connect to the corre

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread Gergo Tisza
On Tue, Nov 10, 2020 at 1:15 PM MusikAnimal wrote: > Hi! Most tools query just a single db at a time, so I don't think this > will be a massive problem. However some such as Global > Contribs[0] and GUC[1] can theoretically query all of them from a single > request. Creating new connections on-th

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread MusikAnimal
Ah yes, 8 tunnels is more than manageable. The `slice` column in the meta_p.wiki table is the one we need to connect to for said wiki, right? So in theory, I always have SSH tunnels open for every slice, and the first thing I do is check meta_p.wiki for the given wiki, then I know which of those s1

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread Brooke Storm
Yes, you might be able to use meta_p.wiki table. However, when wikis are moved between sections, nothing updates the meta_p.wiki table at this time. Requests to noc.wikimedia.org are accurate and up to date, as far as I know. We only update meta_p when we add the wiki

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread MusikAnimal
Got it. The https://noc.wikimedia.org/conf/dblists/ lists are plenty fast and easy enough to parse. I'll just cache that. It would be neat if we could rely on the slice specified in meta_p in the future, as in my case we have to query meta_p.wiki regardless, but not a big deal :) Thank you! I thin

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread Huji Lee
Cross-wiki JOINS are used by some of the queries we run regularly for fawiki. One of those queries looks for articles that don't have an image in their infobox in fawiki, but do have one on enwiki, so that we can use/import that image. Another one JOINs fawiki data with commons data to look for red

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread AntiCompositeNumber
Most cross-db JOINs can be recreated using two queries and an external tool to filter the results. However, there are some queries that would be simply impractical due to the large amount of data involved, and the query for overlapping local and Commons images is one of them. There are basically tw

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread xover
On Wed, Nov 11, 2020 at 5:26 AM AntiCompositeNumber wrote: > I understand the system engineering reasons for this change, but I > think it's worth underscoring exactly how disruptive it will be for > the queries that depended on this functionality. The use cases seem to be relatively few and rela