Re: [tryton-dev] Slow queries when searching m2m fields in big databases

Jordi Esteve Thu, 11 Jun 2015 11:54:04 -0700

On 06/06/15 11:25, Cédric Krier wrote:

On 05 Jun 12:37, Cédric Krier wrote:

On 05 Jun 00:20, Albert Cervera i Areny wrote:

2015-06-04 19:11 GMT+02:00 Jordi Esteve <[email protected]>:

I'm analysing postgres performance in a big database (aprox. 1-2 million or
records of account.invoice.line and stock.move) and I have figured out that
slow queries are executed to get data related to m2m fields (I suppose the
same happens for o2m fields). I think the problem is how the search_domain
is build in these cases. For example, the m2m field invoice_lines in
stock.move model is asked with a domain like that:


[('stock_move', 'in', [1826865, 1826864]), ('invoice_line.id', '!=', None),
('invoice_line.company', '=', 1)]

The second condition ('invoice_line.id', '!=', None) is added by the get()
of the many2many field.
The third condition ('invoice_line.company', '=', 1) is added by a company
rule.

This domain is converted to the following SQL, and if account_invoice_line
table has 1 million records it is very slow (10-20 seconds) to execute and
consumes a lot of postgres resources because the subqueries returns
thousands (nearly million) of records:

SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
   FROM "account_invoice_line-stock_move" AS "a"
   LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = "a"."invoice_line")
   WHERE (("a"."stock_move" IN (1826865, 1826864))
     AND ("a"."invoice_line" IN (SELECT "c"."id" AS "id" FROM
"account_invoice_line" AS "c" WHERE ((("c"."id" IS NOT NULL))
     AND ("c"."id" IN (SELECT "d"."id" AS "id" FROM "account_invoice_line" AS
"d" WHERE (((("d"."company" = 1))) AND true)))))))
   ORDER BY "b"."description" ASC;


I don't know if it will be room to improve how a domain is converted to an
SQL expression. For example, based in the previous example, one like this
that applies the related conditions directly to the JOIN table instead of
the original table with subqueries.

SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
   FROM "account_invoice_line-stock_move" AS "a"
   LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = "a"."invoice_line")
   WHERE (("a"."stock_move" IN (1826865, 1826864))
     AND ("b"."id" IS NOT NULL)
     AND (("b"."company" = 1) AND true))
   ORDER BY "b"."description" ASC;

This query is not equivalent to the generated one, you'd have to make
something like

LEFT JOIN (select .....)

and not use the "b".* in the main where clause, otherwise the LEFT
nature of the join is lost.

Agree on point of view of purely SQL but for Tryton they will have the
same result because "id" is required and "invoice_line" is a foreign
key.

Anyway, I don't know how hard it would be to make tryton generate a
better query.

I think both queries have their advantage indeed it depends on the size
of each tables.
In our case, a join is faster because the database will make a join
between a small set of tuple from "account_invoice_line-stock_move" with
a large table "account_invoice_line". Instead of retrieving almost all
the tuples from "account_invoice_line" to make the IN test.
But if the table size were different then a IN clause could have better
performance.

So I think the improvement should add a parameter on the Many2One
definition to define which kind of "nested" search strategy it should
use.

I wrote a quick POC of this patch at https://bugs.tryton.org/issue4798
It will be great to get some timing report between both options on large
database.

I have tested the duplication and reception of supplier shipments in adatabase with 1,200,000 invoice lines in my laptop (old machine with apostgres tuned to use 400MB).

Before applying the patch this query was generated which takes 25864.758ms - 28027.834 ms (between 26 and 28 seconds):


SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
  FROM "account_invoice_line-stock_move" AS "a"
  LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = "a"."invoice_line")
  WHERE (("a"."stock_move" IN (1826865, 1826864))
    AND ("a"."invoice_line" IN (SELECT "c"."id" AS "id" FROM
"account_invoice_line" AS "c" WHERE ((("c"."id" IS NOT NULL))
    AND ("c"."id" IN (SELECT "d"."id" AS "id" FROM "account_invoice_line" AS
"d" WHERE (((("d"."company" = 1))) AND true)))))))
  ORDER BY "b"."description" ASC;

After applying the patch and setting target_search='join' in the m2ofield invoice_line in account.invoice.line-stock.move table, this querywas generated without the subquery which takes only 6.708 ms:


SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
"a"."invoice_line" AS "invoice_line", ...
  FROM "account_invoice_line-stock_move" AS "a"
  LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = "a"."invoice_line")
  WHERE (("a"."stock_move" IN (1826865, 1826864))
    AND ((("b"."id" IS NOT NULL))
    AND ("b"."id" IN (SELECT "c"."id" AS "id" FROM "account_invoice_line" AS
"c" WHERE (((("c"."company" = 1))) AND true)))))))
  ORDER BY "b"."description" ASC;

As we can see it is nearly 4000 times faster, so the improvement isimpressive. I suppose testing in a server with more and better resourcesthe difference will not be so high, but IMHO really worth it.

By the way, the default option in the new parameter 'target_search' form2o fields is 'subquery'. Maybe is it better to set 'join' by default?If not, then the m2o relations suspected to have a lot of records likemove lines, invoice lines, sale lines, stock moves, ... should have the'join' parameter set in the tryton modules?


--
Jordi Esteve
Consultor Zikzakmedia SL
[email protected]
Mòbil 679 170 693

Zikzakmedia SL
St. Jaume, 9, baixos, 2a
08720 Vilafranca del Penedès
Tel 93 890 2108

Re: [tryton-dev] Slow queries when searching m2m fields in big databases

Reply via email to