Hi Ludo, On Sun, 11 Nov 2018 18:06:00 +0100 l...@gnu.org (Ludovic Courtès) wrote:
> Indeed, that solves the problem for this simple example, thanks! > > Now, if I go back to the big query that /api/latestbuilds makes¹, the > result is still pretty bad: > > --8<---------------cut here---------------start------------->8--- > sqlite> EXPLAIN QUERY PLAN SELECT * FROM ( > ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp, > Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status, > Builds.job_name, Builds.system, ...> Builds.nix_name, > Specifications.name ...> FROM Builds > ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id > ...> INNER JOIN Specifications ON Evaluations.specification = > Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid)) > ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation)) > ...> AND (:jobset IS NULL OR (:jobset = Specifications.name)) > ...> AND (:job IS NULL OR (:job = Builds.job_name)) > ...> AND (:system IS NULL OR (:system = Builds.system)) > ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation)) > ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status > >= 0) ...> OR (:status = 'pending' AND > >Builds.status < 0) > ...> OR (:status = 'succeeded' AND > Builds.status = 0) ...> OR (:status = 'failed' > AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL > OR :borderlowid IS NULL ...> OR ((:borderlowtime, :borderlowid) < > (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL > OR :borderhighid IS NULL ...> OR ((:borderhightime, :borderhighid) > > (Builds.stoptime, Builds.rowid))) ...> ORDER BY > ...> CASE WHEN :borderlowtime IS NULL > ...> OR :borderlowid IS NULL THEN Builds.stoptime > ...> ELSE -Builds.stoptime > ...> END DESC, > ...> CASE WHEN :borderlowtime IS NULL > ...> OR :borderlowid IS NULL THEN Builds.rowid > ...> ELSE -Builds.rowid > ...> END DESC > ...> LIMIT :nr) > ...> ORDER BY stoptime, rowid ASC; > 1|0|0|SCAN TABLE Builds > 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?) > 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX > sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR > ORDER BY 0|0|0|SCAN SUBQUERY 1 > 0|0|0|USE TEMP B-TREE FOR ORDER BY > --8<---------------cut here---------------end--------------->8--- > > I don’t really know what additional index to create (and I’d rather > let SQLite do it for me, if it were possible). I don't know if there is any automated process to assist you. I have the feeling that query optimization is more an art than science. Hm. This code smells ... It looks too complicated. I don't know if this brings you further concerning performance, here are some thoughts: One problematic part is this construct: :variable IS NULL OR :variable=my_column) Here is a very simple example: sqlite> CREATE TABLE tst ( ...> id INTEGER PRIMARY KEY AUTOINCREMENT, ...> name TEXT NOT NULL, ...> age INTEGER NOT NULL); sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age); sqlite> EXPLAIN QUERY PLAN ...> SELECT * FROM tst WHERE (23=23 OR id=:id); 0|0|0|SCAN TABLE tst sqlite> EXPLAIN QUERY PLAN ...> SELECT * FROM tst WHERE id=:id; 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) sqlite> EXPLAIN QUERY PLAN ...> SELECT * FROM tst WHERE name=:name AND age < 42; 0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND age<?) So, even when we have a constant part(23=23) in the OR clause, this leads to a full table scan. I think the optimizer cannot detect the fact that it is a constant boolean value. In the other examples, it is using the index. Even this OR-clause with two variables looks better: SELECT * FROM tst WHERE (id=:id1 OR id=:id2); 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) 0|0|0|EXECUTE LIST SUBQUERY 1 I double-checked with Postgresql and it is also performing a full table scan in the "boolean-constant OR :id=id" case. I could not find any references on the net about it. When this would be Java/JPA I would suggest to dynamically create the query. Can we do something in Scheme-DB too? I.e. pseudo-code (string-append sql-prefix (unless (empty? derivation) "AND :derivation=Builds.derivation") (unless (empty? jobset) "AND :jobset=Builds.jobset) ...) ;;; Should be more some kind of folding, because of the "AND" ;;; Parameter-filling needs to be considered too Two more things I noticed that are not directly performance oriented: We are directly relying on the rowid here, there is no explicit id-column. This could lead to unpredicted results and reorderings (6th Quirk in document): https://www.sqlite.org/rowidtable.html We should add a column: id INTEGER PRIMARY KEY AUTOINCREMENT Problem is that this concept of AUTOINCREMENT does only work for Primary Keys in Sqlite. So we need to degrade "derivation" to a secondary key, i.e. make it non-null and unique: derivation TEXT NOT NULL UNIQUE, Is there anything speaking against that? Lastly, the query has a limit and an order-by. The question is: Will the result be first ordered and then the limit taken? The answer (I know only for Postgresql and MySql, but I think it is the same for Sqlite, I haven't found any reference): The order is always executed first, but it has to be stable. In this case it is, because we order by Builds.rowid, which is a key. Did this happen intentionally or just by chance? Should we better add a note about that to the SQL code? Björn
pgpH9jTr_qiPk.pgp
Description: OpenPGP digital signature