On 05 Jun 12:37, Cédric Krier wrote: > On 05 Jun 00:20, Albert Cervera i Areny wrote: > > 2015-06-04 19:11 GMT+02:00 Jordi Esteve <[email protected]>: > > > I'm analysing postgres performance in a big database (aprox. 1-2 million > > > or > > > records of account.invoice.line and stock.move) and I have figured out > > > that > > > slow queries are executed to get data related to m2m fields (I suppose the > > > same happens for o2m fields). I think the problem is how the search_domain > > > is build in these cases. For example, the m2m field invoice_lines in > > > stock.move model is asked with a domain like that: > > > > > > [('stock_move', 'in', [1826865, 1826864]), ('invoice_line.id', '!=', > > > None), > > > ('invoice_line.company', '=', 1)] > > > > > > The second condition ('invoice_line.id', '!=', None) is added by the get() > > > of the many2many field. > > > The third condition ('invoice_line.company', '=', 1) is added by a company > > > rule. > > > > > > This domain is converted to the following SQL, and if account_invoice_line > > > table has 1 million records it is very slow (10-20 seconds) to execute and > > > consumes a lot of postgres resources because the subqueries returns > > > thousands (nearly million) of records: > > > > > > SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move", > > > "a"."invoice_line" AS "invoice_line", ... > > > FROM "account_invoice_line-stock_move" AS "a" > > > LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = > > > "a"."invoice_line") > > > WHERE (("a"."stock_move" IN (1826865, 1826864)) > > > AND ("a"."invoice_line" IN (SELECT "c"."id" AS "id" FROM > > > "account_invoice_line" AS "c" WHERE ((("c"."id" IS NOT NULL)) > > > AND ("c"."id" IN (SELECT "d"."id" AS "id" FROM "account_invoice_line" > > > AS > > > "d" WHERE (((("d"."company" = 1))) AND true))))))) > > > ORDER BY "b"."description" ASC; > > > > > > > > > I don't know if it will be room to improve how a domain is converted to an > > > SQL expression. For example, based in the previous example, one like this > > > that applies the related conditions directly to the JOIN table instead of > > > the original table with subqueries. > > > > > > SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move", > > > "a"."invoice_line" AS "invoice_line", ... > > > FROM "account_invoice_line-stock_move" AS "a" > > > LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = > > > "a"."invoice_line") > > > WHERE (("a"."stock_move" IN (1826865, 1826864)) > > > AND ("b"."id" IS NOT NULL) > > > AND (("b"."company" = 1) AND true)) > > > ORDER BY "b"."description" ASC; > > > > This query is not equivalent to the generated one, you'd have to make > > something like > > > > LEFT JOIN (select .....) > > > > and not use the "b".* in the main where clause, otherwise the LEFT > > nature of the join is lost. > > Agree on point of view of purely SQL but for Tryton they will have the > same result because "id" is required and "invoice_line" is a foreign > key. > > > Anyway, I don't know how hard it would be to make tryton generate a > > better query. > > I think both queries have their advantage indeed it depends on the size > of each tables. > In our case, a join is faster because the database will make a join > between a small set of tuple from "account_invoice_line-stock_move" with > a large table "account_invoice_line". Instead of retrieving almost all > the tuples from "account_invoice_line" to make the IN test. > But if the table size were different then a IN clause could have better > performance. > > So I think the improvement should add a parameter on the Many2One > definition to define which kind of "nested" search strategy it should > use.
I wrote a quick POC of this patch at https://bugs.tryton.org/issue4798 It will be great to get some timing report between both options on large database. -- Cédric Krier - B2CK SPRL Email/Jabber: [email protected] Tel: +32 472 54 46 59 Website: http://www.b2ck.com/
