Hi all, We have a set of database reports (on users, articles, etc.) that we used to generate on a weekly basis.[1] Ever since the introduction of the *actor* table,[2] many of the reports that have to do with users have become so slow that the SQL query cannot finish within a reasonable time and is killed. Some other reports have also become slower over time; all of these are shown in red in [1].
One possible solution is to create a script which is scheduled to run once a month; the script would download the latest dump of the wiki database,[3] load it into MySQL/MariaDB, create some additional indexes that would make our desired queries run faster, and generate the reports using this database. A separate script can then purge the data a few days later. We can use the current-version-only DB dumps for this purpose. I am guessing that this process would take several hours to run (somewhere between 2 and 10) and would require about 2 GB of storage just to download and decompress the dump file, and some additional space on the DB side (for data, indexes, etc.) Out of abundance of caution, I thought I should ask for permission now, rather than forgiveness later. Do we have a process for getting approval for projects that require gigabytes of storage and hours of computation, or is what I proposed not even remotely considered a "large" project, meaning I am being overly cautious? Please advise! Huji [1] https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3 [2] https://phabricator.wikimedia.org/T223406 [3] https://dumps.wikimedia.org/fawiki/20200401/
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud