Hello, in the next coming months, these changes will happen in databases
and the infrastructure. And it might affect you if you rely on them in your
tools or queries. This list is ordered based on how soon the change will
happen.

We understand that updating your tools and systems can be time consuming,
hence we are giving an advanced notice. I truly apologize for the
inconvenience but many of these changes are needed to keep the site running
smoothly.
Image table redesign

Around fourteen years after the creation of T28741
<https://phabricator.wikimedia.org/T28741>, we are implementing the changes
described therein. Currently, every current version of an image has a row
in the image table and if there are older versions of that file, those rows
could be found in the oldimage table. These two tables (image and oldimage)
will be dropped in around two months. The replacement will be two main
tables: file and filerevision. Every file will have a row in the file table
describing the name and the type. Every version of the file (current and
old) will have a row in filerevision describing the file-specific
information such as its size or the hash of the file, similar to the
existing distinction between pages and revisions. Another improvement is
that every file and file revision will get a unique auto increment id
simplifying many operations and queries. You can check T28741
<https://phabricator.wikimedia.org/T28741> for more information. The new
tables are already accessible in wikireplicas but the data hasn’t been
fully migrated yet.

Term store split out of wikidata’s database

Wikidata’s database has been growing too fast and we need to move the term
store (tables starting with wbt_) to a dedicated cluster to allow growth
and improve wikidata’s performance by utilizing cache locality. The new
section will be called x3 and you will be able to access it in wikireplicas
but this also means you won’t be able to join these tables with the rest of
wikidata’s database (such as page table) since they will be residing in two
physically separate servers that also means most of your queries to
wikidata’s database (and term store) will become faster. We are aiming for
the switch to happen in three months’ time. You can follow the work in
T351820 <https://phabricator.wikimedia.org/T351820>.

Additionally, wb_type table will be dropped and the mapping will be
hard-coded in the code instead. See gerrit:1110810
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1110810>
for more details. This helped us simplify a lot of Wikibase code (example
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1110720>).

Categorylinks normalization

Categorylinks is the next table in the series of links tables being
normalized via the linktarget table (parent ticket
<https://phabricator.wikimedia.org/T300222>, RFC
<https://phabricator.wikimedia.org/T222224>). Similar to templatelinks and
pagelinks tables, cl_to will be dropped and instead the new field
cl_target_id will point to lt_id in the linktarget table. We will also drop
the cl_collation field and replace it with cl_collation_id which will point
to the collation_id field on the new table we are introducing called
collation. We are aiming to get this fully done by the end of the next
quarter (end of June 2025) but it depends on how fast the migration script
can operate and that’s outside of our control. You can follow the work in
T299951 <https://phabricator.wikimedia.org/T299951>.It’s worth noting that
after this migration is done, we will start working on the imagelinks table.

Thank you
-- 
*Amir Sarabadani (he/him)*
Staff Database Architect
Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to