Alright, now that I've unearthed the page I was looking for, a fuller
answer.
The wiki replicas have to do a lot of database magic to remove
private/deleted data, which means that the normal tables can be slow.

For your logging query, logging_logindex makes things faster because
log_action is nulled for deleted log entries in logging and
logging_userindex. In logging_logindex, rows with a deleted action are
removed entirely: you lose the data for deleted log entries at the benefit
of indexes on log_action. That's not a concern with this query because
you're filtering on log action anyway.

The actor view on the replicas has to do even more database magic and
perform subqueries on 8 other tables, so it's almost always going to slow
things down. If you use actor_logging instead of actor, you skip 7 of those
subqueries.

More information:
https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Alternative_Views
https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#The_actor_table_seems_really_slow--so_does_comment
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/templates/labs/db/views/maintain-views.yaml
(configuration for the views on the replicas)

As for why these queries were faster a few months ago, that's probably
related to database server issues (https://phabricator.wikimedia.org/T246970
and https://phabricator.wikimedia.org/T247978).

Using the optimized views brings the all-time query down to about 15
minutes (https://quarry.wmflabs.org/query/43984) and the 2019 query down to
about 106 seconds (https://quarry.wmflabs.org/query/43985).

AntiCompositeNumber

On Wed, Apr 15, 2020 at 18:13 Maciej Jaros <e...@wp.pl> wrote:

> Also note that I have a similar query that kind of works but takes much
> longer to complete then it used to.
> https://quarry.wmflabs.org/query/41680
>
> Executed in 46.22 seconds as of Wed Jan 29 2020.
> Resultset (1480 rows)
>
> Executed in 188.32 seconds as of Wed Apr 15 2020.
> Resultset (1480 rows)
>
> So thats about 4 times longer then before on the same data. Well actor
> table is probably longer, but I assume this will use indexes: `LEFT JOIN
> actor a ON actor_id = log_actor`.
>
> Cheers,
> Nux.
>
> Maciej Jaros (2020-04-15 23:56):
>
> Not sure if this is an appropriate place to discuss this but I'm not able
> to run this query:
> https://quarry.wmflabs.org/query/24267
>
> Even a minor part of this query like below needs a significant time to
> complete.
>    SELECT count(*) as review_count, log_actor
>     FROM logging
>     WHERE log_type = 'review' AND log_action = 'approve'
>     GROUP BY log_actor
>
> I tried with logging_userindex but that don't seem to help. Any chances on
> adding extra indexes to make such queries work? I think querying log_type
> and log_actor should be quite common for various user stats.
>
> Even this takes almost 200 seconds to complete:
>     SELECT count(*) as review_count, log_actor
>     FROM logging_userindex
>     WHERE log_type = 'review' AND log_action = 'approve'
>     AND log_timestamp >= 20190101000000
>     AND log_timestamp <= 20190101235959
>     GROUP BY log_actor
>
> Also note that the query 24267 used to work in January 2020. I searched
> through the list, but haven't found any changes related to the logging
> table since January.
>
> Cheers,
> Nux.
>
> _______________________________________________
> Wikimedia Cloud Services mailing listcl...@lists.wikimedia.org (formerly 
> lab...@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to