Is the source code public? Maybe the queries could be improved. I ran into many such issues too after the actor migration, but after taking advantage of specialized views[0] and join decomposition (get just the actor IDs, i.e. rev_actor, then the actor_names in a separate query), my tools are seemingly as fast as they were before.
~ MA [0] https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#Advanced_use_cases_of_specialized_views On Mon, Apr 13, 2020 at 5:03 PM Huji Lee <huji.h...@gmail.com> wrote: > I understand. However, I think that the use case we are looking at is > relatively unique. I also think that indexes we need may not be desirable > for all the Wiki Replicas (they would often be multi-column indexes geared > towards a specific set of queries) and I honestly don't want to go through > the several weeks (months?) of discussion to justify them. > > Note that if we open the can of "more indexes on Wiki Replicas" worms, > this would all of a sudden become an all-wiki discussion. I'm not sure if > there are more than a handful wikis that do this level of page-level and > user-level analytics as fawiki does, which means for most wikis (and for > most Wiki Replica databases) those additional indexes may not even be > justified. > > Even if we were to generalize parts of this approach and bring it to Wiki > Replicas, I would still argue that doing it at a smaller extent (one wiki > DB for now) would be a reasonable starting point, no? > > On Mon, Apr 13, 2020 at 4:42 PM Bryan Davis <bd...@wikimedia.org> wrote: > >> On Sun, Apr 12, 2020 at 7:48 AM Huji Lee <huji.h...@gmail.com> wrote: >> > >> > One possible solution is to create a script which is scheduled to run >> once a month; the script would download the latest dump of the wiki >> database,[3] load it into MySQL/MariaDB, create some additional indexes >> that would make our desired queries run faster, and generate the reports >> using this database. A separate script can then purge the data a few days >> later. >> >> If I am understanding your proposal here, I think the main difference >> from the current Wiki Replicas would be "create some additional >> indexes that would make our desired queries run faster". We do have >> some indexes and views in the Wiki Replicas which are specifically >> designed to make common things faster today. If possible, adding to >> these rather than building a one-off process of moving lots of data >> round for your tool would be nice. >> >> I say this not because what you are proposing is a ridiculous >> solution, but because it is a unique solution for your current problem >> that will not help others who are having similar problems. Having 1 >> tool use ToolsDB or a custom Cloud VPS project like this is possible, >> but having 100 tools try to follow that pattern themselves is not. >> >> > Out of abundance of caution, I thought I should ask for permission now, >> rather than forgiveness later. Do we have a process for getting approval >> for projects that require gigabytes of storage and hours of computation, or >> is what I proposed not even remotely considered a "large" project, meaning >> I am being overly cautious? >> >> <https://phabricator.wikimedia.org/project/view/2875/> >> >> Bryan >> -- >> Bryan Davis Technical Engagement Wikimedia Foundation >> Principal Software Engineer Boise, ID USA >> [[m:User:BDavis_(WMF)]] irc: bd808 >> >> _______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) >> https://lists.wikimedia.org/mailman/listinfo/cloud > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud