Greetings XML Dump users,

TL;DR: We are pausing the XML Dumps effective from now to correct runtime 
errors that we suspect are causing bad data dumps. We are working on a fix.

Longer:
Over the past couple months, we have noticed a growing amount of runtime errors 
coming from the process that generate the XML content Dumps. It has always been 
the case that this process may have transient issues and may miss some 
revisions some months, but the current situation has become such a recurring 
problem that we now suspect data corruption in recent dumps.
We typically start a full dump (i.e. all revisions for all pages) on the 1st of 
the month for all wikis, and then we start a partial dump (i.e all current 
revisions for all pages) on the 20th of the month. Most of the October 1 2024 
full runs are complete, except the French wiki and Wikidata wiki, which have 
failed for this month. The last successful copies of the French and Wikidata 
wikis are from September. All of the partial runs for the 20th of November are 
complete as well. However, any of these recent dumps may have underlying data 
quality issues.
In the interest of not dumping potentially bad data, we have decided to pause 
the XML Dumps, effective for all future dumps from the date of this 
communication, until we find and fix the root cause of these errors.

We acknowledge that many folks and downstream processes will be impacted and 
apologize for any inconvenience that this may cause you.
We are prioritizing this work, and if interested, you can follow updates at 
https://phabricator.wikimedia.org/T377594. Feel free to open additional tickets 
if your use cases are affected, and do please link them to the main ticket. 
Further, if you have the ability, we welcome data quality analysis of recent 
dumps that you may have noticed in your use cases.
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to