All great answers; thank you! Let's wait and see what comes out of T224850.

On Tue, Jun 4, 2019 at 8:21 PM Bryan Davis <bd...@wikimedia.org> wrote:

> On Tue, Jun 4, 2019 at 4:54 PM Huji Lee <huji.h...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I have a query that used to do a GROUP BY on rev_user and now I have
> migrated it to do so using rev_actor instead. The latest version looks like
> this [0] and its performance is not good [1] at all. In fact, I left it
> running for fawiki for several hours and it did not generate any results
> (it used to generate results in about 30-40 minutes). Also, the execution
> plan that is shown in the second link involves some tables like
> filearchive, image and oldimage that are not part of the query.
> >
> > Is this a side effect of the revision_actor_temp table?
>
> The EXPLAIN output including filearchive, image, oldimage, and other
> apparently unrelated fields is caused by the view for the actor table.
> This is basically the same problem as I described in
> <https://lists.wikimedia.org/pipermail/cloud/2019-June/000683.html>
> for the view of the comment table. The flag for suppression lives in
> the related entity table, so the actor view runs correlated
> sub-queries for each entity table that could indicate that a specific
> row from the actor table has been suppressed.
>
> I created <https://phabricator.wikimedia.org/T224850> to look for a
> solution for this same class of issue in the comment table. If we find
> a fix there it seems likely that we could do the same thing for the
> actor table, whatever that may end up being.
>
> > Also, what is that "key0" about, which is not associated with any tables?
>
> This is a key generated by the database on a temporary table that is
> being used to answer your query. It looks like there is a little
> display bug in the sql-optimizer tool. The table name that should be
> shown is "<derived2>" which you can see in the HTML source of the
> page. See <
> https://mariadb.com/kb/en/library/derived-table-with-key-optimization/>
> for some details on this type of internal optimization.
>
> > Finally, any advice on how to make it more efficient is highly
> appreciated.
>
> The "Using filesort" notes in the plan are caused by the GROUP BYs. If
> you can figure out how to get rid of the GROUP BY clauses things
> should be faster, but to do that I think you might need to collect
> information using multiple queries and some scripting language to
> collect and correlate the data outside of the database.
>
> I would generally expect any query that tries to compute statistics
> for all editors who have ever been active on a wiki will be slow. You
> might try finding a good way to break the query into smaller batches.
>
> > [0]: https://pastebin.com/jWTzsiJY
> > [1]:
> https://tools.wmflabs.org/sql-optimizer?use=fawiki_p&sql=SELECT%0D%0A++actor_name%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+0+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+article%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+10+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+tpl%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+12+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+helppage%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+14+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+cat%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+100+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+portal%2C%0D%0A++COUNT%28rev_first%29+tot%0D%0AFROM+revision+r%0D%0AJOIN+%28%0D%0A++select%0D%0A++++MIN%28rev_id%29+rev_first%2C%0D%0A++++rev_page%0D%0A++FROM+revision%0D%0A++GROUP+BY+rev_page%0D%0A%29+f%0D%0A++ON+r.rev_id+%3D+f.rev_first%0D%0AJOIN+page%0D%0A++ON+page_id+%3D+r.rev_page%0D%0AJOIN+actor%0D%0A++ON+rev_actor+%3D+actor_id%0D%0ALEFT+JOIN+user_groups%0D%0A++ON+actor_user+%3D+ug_user%0D%0A++AND+ug_group+%3D+%22bot%22%0D%0AWHERE%0D%0A++actor_user+%3C%3E+0%0D%0A++AND+ug_group+IS+NULL%0D%0A++AND+page_namespace+IN+%28%0D%0A++++0%2C%0D%0A++++10%2C%0D%0A++++12%2C%0D%0A++++14%2C%0D%0A++++100%0D%0A++%29%0D%0AGROUP+BY+rev_actor%0D%0AORDER+BY+tot+desc%0D%0ALIMIT+300
>
>
> Bryan
> --
> Bryan Davis              Wikimedia Foundation    <bd...@wikimedia.org>
> [[m:User:BDavis_(WMF)]] Manager, Technical Engagement    Boise, ID USA
> irc: bd808                                        v:415.839.6885 x6855
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to