Alright, now that I've unearthed the page I was looking for, a fuller answer. The wiki replicas have to do a lot of database magic to remove private/deleted data, which means that the normal tables can be slow.
For your logging query, logging_logindex makes things faster because log_action is nulled for deleted log entries in logging and logging_userindex. In logging_logindex, rows with a deleted action are removed entirely: you lose the data for deleted log entries at the benefit of indexes on log_action. That's not a concern with this query because you're filtering on log action anyway. The actor view on the replicas has to do even more database magic and perform subqueries on 8 other tables, so it's almost always going to slow things down. If you use actor_logging instead of actor, you skip 7 of those subqueries. More information: https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Alternative_Views https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#The_actor_table_seems_really_slow--so_does_comment https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/templates/labs/db/views/maintain-views.yaml (configuration for the views on the replicas) As for why these queries were faster a few months ago, that's probably related to database server issues (https://phabricator.wikimedia.org/T246970 and https://phabricator.wikimedia.org/T247978). Using the optimized views brings the all-time query down to about 15 minutes (https://quarry.wmflabs.org/query/43984) and the 2019 query down to about 106 seconds (https://quarry.wmflabs.org/query/43985). AntiCompositeNumber On Wed, Apr 15, 2020 at 18:13 Maciej Jaros <e...@wp.pl> wrote: > Also note that I have a similar query that kind of works but takes much > longer to complete then it used to. > https://quarry.wmflabs.org/query/41680 > > Executed in 46.22 seconds as of Wed Jan 29 2020. > Resultset (1480 rows) > > Executed in 188.32 seconds as of Wed Apr 15 2020. > Resultset (1480 rows) > > So thats about 4 times longer then before on the same data. Well actor > table is probably longer, but I assume this will use indexes: `LEFT JOIN > actor a ON actor_id = log_actor`. > > Cheers, > Nux. > > Maciej Jaros (2020-04-15 23:56): > > Not sure if this is an appropriate place to discuss this but I'm not able > to run this query: > https://quarry.wmflabs.org/query/24267 > > Even a minor part of this query like below needs a significant time to > complete. > SELECT count(*) as review_count, log_actor > FROM logging > WHERE log_type = 'review' AND log_action = 'approve' > GROUP BY log_actor > > I tried with logging_userindex but that don't seem to help. Any chances on > adding extra indexes to make such queries work? I think querying log_type > and log_actor should be quite common for various user stats. > > Even this takes almost 200 seconds to complete: > SELECT count(*) as review_count, log_actor > FROM logging_userindex > WHERE log_type = 'review' AND log_action = 'approve' > AND log_timestamp >= 20190101000000 > AND log_timestamp <= 20190101235959 > GROUP BY log_actor > > Also note that the query 24267 used to work in January 2020. I searched > through the list, but haven't found any changes related to the logging > table since January. > > Cheers, > Nux. > > _______________________________________________ > Wikimedia Cloud Services mailing listcl...@lists.wikimedia.org (formerly > lab...@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud > > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud