All great answers; thank you! Let's wait and see what comes out of T224850.
On Tue, Jun 4, 2019 at 8:21 PM Bryan Davis <bd...@wikimedia.org> wrote: > On Tue, Jun 4, 2019 at 4:54 PM Huji Lee <huji.h...@gmail.com> wrote: > > > > Hi all, > > > > I have a query that used to do a GROUP BY on rev_user and now I have > migrated it to do so using rev_actor instead. The latest version looks like > this [0] and its performance is not good [1] at all. In fact, I left it > running for fawiki for several hours and it did not generate any results > (it used to generate results in about 30-40 minutes). Also, the execution > plan that is shown in the second link involves some tables like > filearchive, image and oldimage that are not part of the query. > > > > Is this a side effect of the revision_actor_temp table? > > The EXPLAIN output including filearchive, image, oldimage, and other > apparently unrelated fields is caused by the view for the actor table. > This is basically the same problem as I described in > <https://lists.wikimedia.org/pipermail/cloud/2019-June/000683.html> > for the view of the comment table. The flag for suppression lives in > the related entity table, so the actor view runs correlated > sub-queries for each entity table that could indicate that a specific > row from the actor table has been suppressed. > > I created <https://phabricator.wikimedia.org/T224850> to look for a > solution for this same class of issue in the comment table. If we find > a fix there it seems likely that we could do the same thing for the > actor table, whatever that may end up being. > > > Also, what is that "key0" about, which is not associated with any tables? > > This is a key generated by the database on a temporary table that is > being used to answer your query. It looks like there is a little > display bug in the sql-optimizer tool. The table name that should be > shown is "<derived2>" which you can see in the HTML source of the > page. See < > https://mariadb.com/kb/en/library/derived-table-with-key-optimization/> > for some details on this type of internal optimization. > > > Finally, any advice on how to make it more efficient is highly > appreciated. > > The "Using filesort" notes in the plan are caused by the GROUP BYs. If > you can figure out how to get rid of the GROUP BY clauses things > should be faster, but to do that I think you might need to collect > information using multiple queries and some scripting language to > collect and correlate the data outside of the database. > > I would generally expect any query that tries to compute statistics > for all editors who have ever been active on a wiki will be slow. You > might try finding a good way to break the query into smaller batches. > > > [0]: https://pastebin.com/jWTzsiJY > > [1]: > https://tools.wmflabs.org/sql-optimizer?use=fawiki_p&sql=SELECT%0D%0A++actor_name%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+0+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+article%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+10+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+tpl%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+12+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+helppage%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+14+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+cat%2C%0D%0A++SUM%28%0D%0A++++CASE%0D%0A++++++WHEN+page_namespace+%3D+100+THEN+1%0D%0A++++++ELSE+0%0D%0A++++END%0D%0A++%29+portal%2C%0D%0A++COUNT%28rev_first%29+tot%0D%0AFROM+revision+r%0D%0AJOIN+%28%0D%0A++select%0D%0A++++MIN%28rev_id%29+rev_first%2C%0D%0A++++rev_page%0D%0A++FROM+revision%0D%0A++GROUP+BY+rev_page%0D%0A%29+f%0D%0A++ON+r.rev_id+%3D+f.rev_first%0D%0AJOIN+page%0D%0A++ON+page_id+%3D+r.rev_page%0D%0AJOIN+actor%0D%0A++ON+rev_actor+%3D+actor_id%0D%0ALEFT+JOIN+user_groups%0D%0A++ON+actor_user+%3D+ug_user%0D%0A++AND+ug_group+%3D+%22bot%22%0D%0AWHERE%0D%0A++actor_user+%3C%3E+0%0D%0A++AND+ug_group+IS+NULL%0D%0A++AND+page_namespace+IN+%28%0D%0A++++0%2C%0D%0A++++10%2C%0D%0A++++12%2C%0D%0A++++14%2C%0D%0A++++100%0D%0A++%29%0D%0AGROUP+BY+rev_actor%0D%0AORDER+BY+tot+desc%0D%0ALIMIT+300 > > > Bryan > -- > Bryan Davis Wikimedia Foundation <bd...@wikimedia.org> > [[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA > irc: bd808 v:415.839.6885 x6855 > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud