On 05 Jun 12:37, Cédric Krier wrote:
> On 05 Jun 00:20, Albert Cervera i Areny wrote:
> > 2015-06-04 19:11 GMT+02:00 Jordi Esteve <[email protected]>:
> > > I'm analysing postgres performance in a big database (aprox. 1-2 million 
> > > or
> > > records of account.invoice.line and stock.move) and I have figured out 
> > > that
> > > slow queries are executed to get data related to m2m fields (I suppose the
> > > same happens for o2m fields). I think the problem is how the search_domain
> > > is build in these cases. For example, the m2m field invoice_lines in
> > > stock.move model is asked with a domain like that:
> > >
> > > [('stock_move', 'in', [1826865, 1826864]), ('invoice_line.id', '!=', 
> > > None),
> > > ('invoice_line.company', '=', 1)]
> > >
> > > The second condition ('invoice_line.id', '!=', None) is added by the get()
> > > of the many2many field.
> > > The third condition ('invoice_line.company', '=', 1) is added by a company
> > > rule.
> > >
> > > This domain is converted to the following SQL, and if account_invoice_line
> > > table has 1 million records it is very slow (10-20 seconds) to execute and
> > > consumes a lot of postgres resources because the subqueries returns
> > > thousands (nearly million) of records:
> > >
> > > SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
> > > "a"."invoice_line" AS "invoice_line", ...
> > >   FROM "account_invoice_line-stock_move" AS "a"
> > >   LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = 
> > > "a"."invoice_line")
> > >   WHERE (("a"."stock_move" IN (1826865, 1826864))
> > >     AND ("a"."invoice_line" IN (SELECT "c"."id" AS "id" FROM
> > > "account_invoice_line" AS "c" WHERE ((("c"."id" IS NOT NULL))
> > >     AND ("c"."id" IN (SELECT "d"."id" AS "id" FROM "account_invoice_line" 
> > > AS
> > > "d" WHERE (((("d"."company" = 1))) AND true)))))))
> > >   ORDER BY "b"."description" ASC;
> > >
> > >
> > > I don't know if it will be room to improve how a domain is converted to an
> > > SQL expression. For example, based in the previous example, one like this
> > > that applies the related conditions directly to the JOIN table instead of
> > > the original table with subqueries.
> > >
> > > SELECT "a"."id" AS "id", "a"."stock_move" AS "stock_move",
> > > "a"."invoice_line" AS "invoice_line", ...
> > >   FROM "account_invoice_line-stock_move" AS "a"
> > >   LEFT JOIN "account_invoice_line" AS "b" ON ("b"."id" = 
> > > "a"."invoice_line")
> > >   WHERE (("a"."stock_move" IN (1826865, 1826864))
> > >     AND ("b"."id" IS NOT NULL)
> > >     AND (("b"."company" = 1) AND true))
> > >   ORDER BY "b"."description" ASC;
> > 
> > This query is not equivalent to the generated one, you'd have to make
> > something like
> > 
> > LEFT JOIN (select .....)
> > 
> > and not use the "b".* in the main where clause, otherwise the LEFT
> > nature of the join is lost.
> 
> Agree on point of view of purely SQL but for Tryton they will have the
> same result because "id" is required and "invoice_line" is a foreign
> key.
> 
> > Anyway, I don't know how hard it would be to make tryton generate a
> > better query.
> 
> I think both queries have their advantage indeed it depends on the size
> of each tables.
> In our case, a join is faster because the database will make a join
> between a small set of tuple from "account_invoice_line-stock_move" with
> a large table "account_invoice_line". Instead of retrieving almost all
> the tuples from "account_invoice_line" to make the IN test.
> But if the table size were different then a IN clause could have better
> performance.
> 
> So I think the improvement should add a parameter on the Many2One
> definition to define which kind of "nested" search strategy it should
> use.

I wrote a quick POC of this patch at https://bugs.tryton.org/issue4798
It will be great to get some timing report between both options on large
database.

-- 
Cédric Krier - B2CK SPRL
Email/Jabber: [email protected]
Tel: +32 472 54 46 59
Website: http://www.b2ck.com/

Reply via email to