Re: How bad is using queries with thousands of values for operators IN or ANY?

Pavel Stehule Mon, 31 Aug 2020 01:53:37 -0700

po 31. 8. 2020 v 10:04 odesílatel Thorsten Schöning <tschoen...@am-soft.de>
napsal:


> Hi all,
>
> I have lots of queries in which I need to restrict access to rows
> using some decimal row-ID and am mostly doing so with using the
> operator IN in WHERE-clauses. Additionally I'm mostly embedding the
> IDs as ","-seperated list into the query directly, e.g. because I
> already hit a limitation of ~32k parameters of the JDBC-driver[1] for
> Postgres.
>
> I really thought that in most cases simply sending a large amount of
> IDs embedded into the query is better than looping, because it safes
> roundtrips to access the DB, the planner of the DB has all pieces of
> information it needs to decide best strategies etc. OTOH, with recent
> tests and an increased number of IDs of about factor 100, I have
> additional load in Tomcat before actually sending the query to the DB
> already and in the DB itself as well of course. I've attached an
> example query and plan.
>
> > ->  Hash  (cost=242592.66..242592.66 rows=6825 width=39) (actual
> time=91.117..91.117 rows=40044 loops=3)
> >       Buckets: 65536 (originally 8192)  Batches: 1 (originally 1)
> Memory Usage: 3016kB
> >       ->  Hash Join  (cost=137.57..242592.66 rows=6825 width=39) (actual
> time=10.194..82.412 rows=40044 loops=3)
> >             Hash Cond: (meter.meter_bcd = meter_bcd.id)
> >             ->  Index Scan using pk_meter on meter
> (cost=0.42..242237.10 rows=40044 width=25) (actual time=9.350..71.276
> rows=40044 loops=3)
> >                   Index Cond: (id = ANY ('{[...]}'::integer[]))
> >             ->  Hash  (cost=135.73..135.73 rows=113 width=22) (actual
> time=0.830..0.830 rows=113 loops=3)
>
> Do you know of any obvious limitations of the JDBC-driver of handling
> such large queries? In the end, the query is mostly large text with
> only very few bind parameters.
>
> Do you know of any obvious problem in Postgres itself with that query,
> when parsing it or alike? Do things simply take how long they take and
> are mostly comparable to looping or is there some additional overhead
> the larger the query itself gets? From my naive expectation, comparing
> IDs shouldn't care if things get looped or transmitted at once.
>
> I'm just trying to collect some input for where to look at to optimize
> things in the future. Thanks!
>

It is not good - it increases the memory necessary for query parsing,
optimizer and executor are slower.

Postgres currently has not any optimization for processing searching in
these long lists - so this search is very slow against other methods.

I think this is a signal so something in the design database or
architecture is wrong. Sure, there can be exception, but the Postgres has
not any optimization for this design

Regards

Pavel


> [1]: https://github.com/pgjdbc/pgjdbc/issues/90
>
> Mit freundlichen Grüßen,
>
> Thorsten Schöning
>
> --
> Thorsten Schöning       E-Mail: thorsten.schoen...@am-soft.de
> AM-SoFT IT-Systeme      http://www.AM-SoFT.de/
>
> Telefon...........05151-  9468- 55
> Fax...............05151-  9468- 88
> Mobil..............0178-8 9468- 04
>
> AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
> AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

Re: How bad is using queries with thousands of values for operators IN or ANY?

Reply via email to