Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-08-01 Thread Jeff Janes
On Tue, Aug 1, 2017 at 9:24 AM, Dmitry Lazurkin wrote: > On 08/01/2017 07:13 PM, Jeff Janes wrote: > > I think that HashSet is a Java-specific term. It is just a hash table in > which there is no data to store, just the key itself (and probably a cash > of the hashcode of that key), correct? > >

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-08-01 Thread Dmitry Lazurkin
On 08/01/2017 07:13 PM, Jeff Janes wrote: > I think that HashSet is a Java-specific term. It is just a hash table > in which there is no data to store, just the key itself (and probably > a cash of the hashcode of that key), correct? Yes. And in Java HashSet implemented on top of HashMap (: > I

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-08-01 Thread Jeff Janes
On Mon, Jul 31, 2017 at 12:29 PM, Dmitry Lazurkin wrote: > On 31.07.2017 19:42, Jeff Janes wrote: > > I think it is simply because no one has gotten around to implementing it > that way. When you can just write it as a values list instead, the > incentive to make the regular in-list work better

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Dmitry Lazurkin
On 31.07.2017 19:42, Jeff Janes wrote: > I think it is simply because no one has gotten around to implementing > it that way. When you can just write it as a values list instead, the > incentive to make the regular in-list work better is not all that strong. > > Cheers, > > Jeff I see from explai

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes
On Mon, Jul 24, 2017 at 8:03 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Mon, Jul 24, 2017 at 7:58 PM, David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane wrote: >> >>> >>> The cost to form the inner hash is basically negligi

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes
On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane wr > > > regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id > IN > :values_clause; > QUERY PLAN > >

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes
On Tue, Jul 25, 2017 at 2:03 AM, Dmitry Lazurkin wrote: > On 25.07.2017 05:50, Jeff Janes wrote: > >> It isn't either-or. It is the processing of millions of rows over the >> large in-list which is taking the time. Processing an in-list as a hash >> table would be great, but no one has gotten ar

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-31 Thread Jeff Janes
On Mon, Jul 24, 2017 at 3:12 PM, Dmitry Lazurkin wrote: > On 25.07.2017 00:31, David G. Johnston wrote: > > > Basically you want to write something like: > > SELECT * > FROM ids > JOIN ( :values_clause ) vc (vid) ON (vc.vid = ids.id)​ > > or > > WITH vc AS (SELECT vid FROM ORDER BY ... LIMIT

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-26 Thread Dmitry Lazurkin
On 23.07.2017 14:35, dilaz03 . wrote: > - IN-VALUES clause adds new node to plan. Has additional node big > overhead? How about filter by two or more IN-VALUES clause? > Hmmm. This works. -- Full table can fit in memory show shared_buffers; shared_buffers 4GB show work_mem; w

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-25 Thread Dmitry Lazurkin
On 25.07.2017 05:50, Jeff Janes wrote: It isn't either-or. It is the processing of millions of rows over the large in-list which is taking the time. Processing an in-list as a hash table would be great, but no one has gotten around to it implementing it yet. Maybe Dmitry will be the one to do

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread David G. Johnston
On Mon, Jul 24, 2017 at 8:11 PM, Tom Lane wrote: > ​[*docs] > If the data were perfectly distributed, with the same > * number of tuples going into each available bucket, then the bucketsize > * fraction would be 1/nbuckets. But this happy state of affairs will > occur > * only if (a) there

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Tom Lane
"David G. Johnston" writes: > On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane wrote: >> The cost to form the inner hash is basically negligible whether it's >> de-duped or not, but if it's not (known) de-duped then the cost >> estimate for the semijoin is going to rise some, and that discourages >> sel

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread David G. Johnston
On Mon, Jul 24, 2017 at 7:58 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane wrote: > >> >> The cost to form the inner hash is basically negligible whether it's >> de-duped or not, but if it's not (known) de-duped then the cost >> estimate fo

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread David G. Johnston
On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane wrote: > > The cost to form the inner hash is basically negligible whether it's > de-duped or not, but if it's not (known) de-duped then the cost > estimate for the semijoin is going to rise some, and that discourages > selecting it. > ​Why does the "has

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Jeff Janes
On Jul 24, 2017 14:19, "PT" wrote: On Mon, 24 Jul 2017 13:17:56 +0300 Dmitry Lazurkin wrote: > On 07/24/2017 01:40 AM, PT wrote: > > In this example you count approximately 40,000,000 values, which is > > about 40% of the table. > > 4 000 000 (: > > > If you really need these queries to be fast

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Tom Lane
"David G. Johnston" writes: > On Mon, Jul 24, 2017 at 3:22 PM, Dmitry Lazurkin wrote: >> ALTER TABLE ids ALTER COLUMN id SET NOT NULL; >> EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN >> :values_clause; >> >> Aggregate (cost=245006.46..245006.47 rows=1 width=8) (actual >> time

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Dmitry Lazurkin
On 25.07.2017 01:25, David G. Johnston wrote: On Mon, Jul 24, 2017 at 3:22 PM, Dmitry Lazurkin >wrote: ALTER TABLE ids ALTER COLUMN id SET NOT NULL; EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN :values_clause; Aggregate (cost=245006

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Dmitry Lazurkin
On 25.07.2017 01:15, David G. Johnston wrote: On Mon, Jul 24, 2017 at 3:12 PM, Dmitry Lazurkin >wrote: And I have one question. I don't understand why IN-VALUES doesn't use Semi-Join? PostgreSQL has Hash Semi-Join... For which task the database has node of

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread David G. Johnston
On Mon, Jul 24, 2017 at 3:22 PM, Dmitry Lazurkin wrote: > ALTER TABLE ids ALTER COLUMN id SET NOT NULL; > EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id IN > :values_clause; > > Aggregate (cost=245006.46..245006.47 rows=1 width=8) (actual > time=3824.095..3824.095 rows=1 loops=1)

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread David G. Johnston
On Mon, Jul 24, 2017 at 3:12 PM, Dmitry Lazurkin wrote: > And I have one question. I don't understand why IN-VALUES doesn't use > Semi-Join? PostgreSQL has Hash Semi-Join... For which task the database > has node of this type? > ​Semi-Join is canonically written as: SELECT * FROM tbl WHERE EXI

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Dmitry Lazurkin
On 25.07.2017 00:31, David G. Johnston wrote: Basically you want to write something like: SELECT * FROM ids JOIN ( :values_clause ) vc (vid) ON (vc.vid = ids.id )​ or WITH vc AS (SELECT vid FROM ORDER BY ... LIMIT ) SELECT * FROM ids JOIN vc ON (vid = ids.id

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Dmitry Lazurkin
On 25.07.2017 00:17, PT wrote: The IN clause is not what's taking all the time. It's the processing of millions of rows that's taking all the time. IN (...) - 17 sec IN (VALUES ...) - 4 sec So performance issue is with IN-clause. Perhaps you should better describe what it is you really want t

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread David G. Johnston
On Sun, Jul 23, 2017 at 4:35 AM, dilaz03 . wrote: > - IN-VALUES clause adds new node to plan. Has additional node big > overhead? How about filter by two or more IN-VALUES clause? > ​IN-VALUES is just another word for "TABLE" which is another word for "RELATION". Writing relational database que

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread PT
On Mon, 24 Jul 2017 13:17:56 +0300 Dmitry Lazurkin wrote: > On 07/24/2017 01:40 AM, PT wrote: > > In this example you count approximately 40,000,000 values, which is > > about 40% of the table. > > 4 000 000 (: > > > If you really need these queries to be faster, I would suggest > > materializin

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Dmitry Lazurkin
On 07/24/2017 01:40 AM, PT wrote: > In this example you count approximately 40,000,000 values, which is > about 40% of the table. 4 000 000 (: > If you really need these queries to be faster, I would suggest > materializing the data, i.e. create a table like: > > CREATE TABLE id_counts ( > id BI

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread Dmitry Lazurkin
On 07/24/2017 01:40 AM, PT wrote: > In this example you count approximately 40,000,000 values, which is > about 40% of the table. 4 000 000 (: > If you really need these queries to be faster, I would suggest > materializing the data, i.e. create a table like: > > CREATE TABLE id_counts ( > id BI

Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-23 Thread PT
On Sun, 23 Jul 2017 14:35:24 +0300 "dilaz03 ." wrote: > Hello. > > I have database with events with type from different souces identified by > id. I have query which filters events by IN-clause with many ids (1-500 > ids). I see poor perfomance of IN-clause and try to investigate this > problem.

[GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-23 Thread dilaz03 .
Hello. I have database with events with type from different souces identified by id. I have query which filters events by IN-clause with many ids (1-500 ids). I see poor perfomance of IN-clause and try to investigate this problem. SELECT version();