Re: [Cloud] Help with a query

2017-12-31 Thread Huji Lee
Sorry Brad, I had missed your email. Putting log_timestamp back in and restricting the query to a shorter time period made it work under 20 minutes: https://quarry.wmflabs.org/query/23829 Thank you all for your assistance! On Sun, Dec 31, 2017 at 1:44 PM, Brad Jorsch (Anomie) wrote: > On Sat,

Re: [Cloud] Help with a query

2017-12-31 Thread Bryan Davis
On Sat, Dec 30, 2017 at 8:55 PM, Huji Lee wrote: > What is the proper strategy for temporary (physical or in-memory) tables on > Toolforge? Allowing temp tables on the new Wiki Replica servers was investigated in . The resulting decision was that `CREATE

Re: [Cloud] Help with a query

2017-12-31 Thread Brad Jorsch (Anomie)
On Sat, Dec 30, 2017 at 10:55 PM, Huji Lee wrote: > and log_id > 9406768; > As I said earlier, it was bad advice to use log_id rather than log_timestamp for this query. -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation ___ Wikime

Re: [Cloud] Help with a query

2017-12-30 Thread Huji Lee
The query took more than an hour, so I killed it. I think a better strategy is to first run the query on logging, and get a list of logs, then run the subquery for retrieving the edit counts. My new query is shown below; however, I cannot create temporary tables with my user on Toolforge. I tried

Re: [Cloud] Help with a query

2017-12-30 Thread Brad Jorsch (Anomie)
On Sat, Dec 30, 2017 at 2:10 PM, Maximilian Doerr < maximilian.do...@gmail.com> wrote: > Isn’t there a logging_logindex table to use that should optimize this? > It looks like logging_logindex is useful for queries that are targeting indexes page_time, log_page_id_time, and type_action. It looks

Re: [Cloud] Help with a query

2017-12-30 Thread Huji Lee
There is a logging_logindex table. I incorporated all suggested changes into https://quarry.wmflabs.org/query/23829 Since I still think they query will take more than 20 minutes, I think quarry is not the way to go. I saved the query on the grid and tried to run it using this command: jsub -once

Re: [Cloud] Help with a query

2017-12-30 Thread Maximilian Doerr
Isn’t there a logging_logindex table to use that should optimize this? Cyberpower678 English Wikipedia Account Creation Team English Wikipedia Administrator Global User Renamer > On Dec 30, 2017, at 14:07, Brad Jorsch (Anomie) wrote: > > On Sat, Dec 30, 2017 at 1:07 PM, John

Re: [Cloud] Help with a query

2017-12-30 Thread Brad Jorsch (Anomie)
On Sat, Dec 30, 2017 at 1:07 PM, John wrote: > Use the logging_userindex table instead of logging > That won't make much difference, since the select on the logging table isn't targeting any user columns. On Sat, Dec 30, 2017 at 1:09 PM, John wrote: > I would also find the first log of 2017 a

Re: [Cloud] Help with a query

2017-12-30 Thread Huji Lee
All fantastic suggestions. The EXPLAIN results showed that changing to logging_userindex had potentially the largest impact (reducing the rows to be retrieved from 10mil to 5mil). I am rerunning the query with all suggested recommendations, and will report back. On Sat, Dec 30, 2017 at 1:32 PM, De

Re: [Cloud] Help with a query

2017-12-30 Thread Brad Jorsch (Anomie)
On Sat, Dec 30, 2017 at 1:00 PM, Huji Lee wrote: > The query tries to use indexes, as much as I could think of > You missed using effective indexes. Let's start by getting the EXPLAIN output for your query. https://tools.wmflabs.org/tools-info/optimizer.py? doesn't seem to like the subquery for

Re: [Cloud] Help with a query

2017-12-30 Thread Dennis Tobar
Replace count(*) with count(1) in the subquery. It could help (?) to improve the performance. Dennis Tobar Calderón (Enviado desde un móvil) El 30 dic. 2017 3:10 p. m., "John" escribió: > I would also find the first log of 2017 and use that instead of the > timestamp > > ___

Re: [Cloud] Help with a query

2017-12-30 Thread John
I would also find the first log of 2017 and use that instead of the timestamp ___ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud

Re: [Cloud] Help with a query

2017-12-30 Thread John
Use the logging_userindex table instead of logging On Sat, Dec 30, 2017 at 12:01 PM Huji Lee wrote: > I wrote this query to find all page moves done in fawiki in 2017, and > determine how many edits the performing user had prior to that page move. > The query tries to use indexes, as much as I c