Re: Weird behaviour of ROLLUP/GROUPING
Le jeu. 17 janv. 2019 à 08:27, Andrew Gierth a écrit : > > "Guillaume" == Guillaume Lelarge writes: > > >> I will see about fixing this, somehow. > > Guillaume> Thanks a lot. > > I've committed a fix (to all supported branches, since this bug actually > precedes the addition of GROUPING SETS and can be triggered with a > simple GROUP BY if you try hard enough). The regression test says it > works now, but it'd be good if you could try it again on REL_11_STABLE > (at commit e74d8c5085 or later) to check that it fixes your case. > > I checked on REL9_6_STABLE and REL_11_STABLE, and it works great. Thank a lot for the quick fix! -- Guillaume.
Re: Barman versus pgBackRest
Hello, One strong point of barman IMHO is transparently converting an incremental backup to a full backup for retention purposes, so retention specification is far more liberal than with pgbackrest, and configuring for incremental backup does not pose any limitations to the schedule of backups. In our environment our net connection to the remote site (repo) is extremely lousy, (although within Switzerland if that makes any difference), so with pgbackrest a full backup of our 1.3TB db, would take about 2 days ,lets say set in cron weekly on sunday, (Sunday 1:00->Tuesday), then I would have to take incr backups from Wednesday->Saturday. And we would have to also take a full backup next Sunday. With pgbarman we had to set : reuse_backup = link retention_policy = RECOVERY WINDOW OF 14 DAYS and just perform regular backups. So for barman every backup can be used as a base for the next backup, which achieves fast backups and reduced disk space. In pgbackrest one has to be explicit about the retention of both full backups and diff backups. it would be nice in pgbackrest to only have incremental backups and let the system do the necessary management (converting incremental to full) transparently and asynchronously, e.g.via cron. I have read about the --repo-hardlink option. " This gives the appearance that each backup is a full backup at the file-system level " So could we just take a first full backup and then switch permanently to incr backups? -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Postgres Automated Failover
Hi, I am planning to use Postgresql with TimescaleDb extension. I have to design a system similar to AWS RDS which supports automated failover, transparent minor version upgrades etc. In early days, I worked with PgPool II to enable heartbeat between the Postgres servers. Is there any new tool to achieve the same by which I can design a transparent failover system for Postgres? Also is there any multi-master Postgresql solution? Is Postgres-XC production ready? Any suggestions is appreciated. Thanks.
Re: Postgres Automated Failover
There is postgres multimaster replication. Please explore bdr multi master node. On Thu, Jan 17, 2019 at 7:03 PM AI Rumman wrote: > Hi, > > I am planning to use Postgresql with TimescaleDb extension. I have to > design a system similar to AWS RDS which supports automated failover, > transparent minor version upgrades etc. > In early days, I worked with PgPool II to enable heartbeat between the > Postgres servers. Is there any new tool to achieve the same by which I can > design a transparent failover system for Postgres? > > Also is there any multi-master Postgresql solution? Is Postgres-XC > production ready? > > Any suggestions is appreciated. > > Thanks. >
Re: Postgres Automated Failover
On Thu, 17 Jan 2019 08:32:48 -0500 AI Rumman wrote: > Hi, > > I am planning to use Postgresql with TimescaleDb extension. I have to > design a system similar to AWS RDS which supports automated failover, > transparent minor version upgrades etc. > In early days, I worked with PgPool II to enable heartbeat between the > Postgres servers. Is there any new tool to achieve the same by which I can > design a transparent failover system for Postgres? There is multiple HA solutions, none are "transparent failover" though. They are just automated-failover. A rollback-ed transaction because of failover will never be transparent from the application point of view. Look at Patroni or PAF, depending on what you want to achieve.
Re: Barman versus pgBackRest
On Thu, 17 Jan 2019 13:09:18 +0200 Achilleas Mantzios wrote: > Hello, > > One strong point of barman IMHO is transparently converting an incremental > backup to a full backup for retention purposes, so retention specification is > far more liberal than with pgbackrest, and configuring for incremental backup > does not pose any limitations to the schedule of backups. In our environment > our net connection to the remote site (repo) is extremely lousy, (although > within Switzerland if that makes any difference), so with pgbackrest a full > backup of our 1.3TB db, would take about 2 days ,lets say set in cron weekly > on sunday, (Sunday 1:00->Tuesday), then I would have to take incr backups > from Wednesday->Saturday. And we would have to also take a full backup next > Sunday. With pgbarman we had to set : reuse_backup = link retention_policy = > RECOVERY WINDOW OF 14 DAYS and just perform regular backups. So for barman > every backup can be used as a base for the next backup, which achieves fast > backups and reduced disk space. In pgbackrest one has to be explicit about > the retention of both full backups and diff backups. it would be nice in > pgbackrest to only have incremental backups and let the system do the > necessary management (converting incremental to full) transparently and > asynchronously, e.g.via cron. I have read about the --repo-hardlink option. " > This gives the appearance that each backup is a full backup at the > file-system level " So could we just take a first full backup and then switch > permanently to incr backups? Funnily enough, I opened an issue a month ago about this feature. If you want to join the brainstorming, discuss, add some opinions and thoughts, see https://github.com/pgbackrest/pgbackrest/issues/644
Re: Postgres Automated Failover
Hi, In my opinion repmgr it's worth a look. https://repmgr.org/ regards, fabio pardi On 17/01/2019 14:32, AI Rumman wrote: > Hi, > > I am planning to use Postgresql with TimescaleDb extension. I have to design > a system similar to AWS RDS which supports automated failover, transparent > minor version upgrades etc. > In early days, I worked with PgPool II to enable heartbeat between the > Postgres servers. Is there any new tool to achieve the same by which I can > design a transparent failover system for Postgres? > > Also is there any multi-master Postgresql solution? Is Postgres-XC production > ready? > > Any suggestions is appreciated. > > Thanks.
RE: Postgres Automated Failover
Hi For PostgreSQL Automatic Failover , we are using repmgr too. Best Regards Didier ROS Expertise SGBD DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD -Message d'origine- De : f.pa...@portavita.eu [mailto:f.pa...@portavita.eu] Envoyé : jeudi 17 janvier 2019 15:29 À : pgsql-general@lists.postgresql.org Objet : Re: Postgres Automated Failover Hi, In my opinion repmgr it's worth a look. https://repmgr.org/ regards, fabio pardi On 17/01/2019 14:32, AI Rumman wrote: > Hi, > > I am planning to use Postgresql with TimescaleDb extension. I have to design > a system similar to AWS RDS which supports automated failover, transparent > minor version upgrades etc. > In early days, I worked with PgPool II to enable heartbeat between the > Postgres servers. Is there any new tool to achieve the same by which I can > design a transparent failover system for Postgres? > > Also is there any multi-master Postgresql solution? Is Postgres-XC production > ready? > > Any suggestions is appreciated. > > Thanks. Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Re: Barman versus pgBackRest
On 17/1/19 4:12 μ.μ., Jehan-Guillaume (ioguix) de Rorthais wrote: On Thu, 17 Jan 2019 13:09:18 +0200 Achilleas Mantzios wrote: Hello, One strong point of barman IMHO is transparently converting an incremental backup to a full backup for retention purposes, so retention specification is far more liberal than with pgbackrest, and configuring for incremental backup does not pose any limitations to the schedule of backups. In our environment our net connection to the remote site (repo) is extremely lousy, (although within Switzerland if that makes any difference), so with pgbackrest a full backup of our 1.3TB db, would take about 2 days ,lets say set in cron weekly on sunday, (Sunday 1:00->Tuesday), then I would have to take incr backups from Wednesday->Saturday. And we would have to also take a full backup next Sunday. With pgbarman we had to set : reuse_backup = link retention_policy = RECOVERY WINDOW OF 14 DAYS and just perform regular backups. So for barman every backup can be used as a base for the next backup, which achieves fast backups and reduced disk space. In pgbackrest one has to be explicit about the retention of both full backups and diff backups. it would be nice in pgbackrest to only have incremental backups and let the system do the necessary management (converting incremental to full) transparently and asynchronously, e.g.via cron. I have read about the --repo-hardlink option. " This gives the appearance that each backup is a full backup at the file-system level " So could we just take a first full backup and then switch permanently to incr backups? Funnily enough, I opened an issue a month ago about this feature. If you want to join the brainstorming, discuss, add some opinions and thoughts, see https://github.com/pgbackrest/pgbackrest/issues/644 Surely I enjoyed your convo there! -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
pgbouncer
Hello, I am a bit confused about the settings in pgbouncer What's exactly the pool_size? If I set 3, and I tried to connect from 4 shells, I am still able to connect. Same thing for max_db_connections. I set this to 1 and I am able to connect from 2 shells. This is kind of confusing and I'm not really cleared reading the documentation. This is my config. [databases] cmdwela = host=127.0.0.1 port=5432 dbname=cmdwela auth_user=pgbouncer pool_size=120 cicciopasticcio = host=127.0.0.1 port=5432 dbname=cicciopasticcio auth_user=pgbouncer pool_size=2 [pgbouncer] listen_port = 6543 listen_addr = * auth_type = md5 auth_file = /etc/pgbouncer/users.txt auth_query = select uname,phash from user_lookup($1) logfile = /var/log/pgbouncer.log pidfile = /home/postgres/pgbouncer.pid admin_users = admin user=postgres max_db_connections=1 log_connections=0 log_disconnections=0 max_client_conn=5 I just want to limit connections from the app etc. Thanks
Re: pgbouncer
On Thu, Jan 17, 2019 at 8:18 AM Nicola Contu wrote: > > Hello, > I am a bit confused about the settings in pgbouncer > > What's exactly the pool_size? Roughly, the number of open connections pgbouncer will keep to PostgreSQL. > If I set 3, and I tried to connect from 4 shells, I am still able to connect. That would be the point - pgbouncer is sharing the 3 connections it keeps with PostgreSQL between the 4 client connections made to it. > Same thing for max_db_connections. I set this to 1 and I am able to connect > from 2 shells. Same as pool_size but basically a fail-safe since pools are per-user/per-database while this limit is per-database only. > This is kind of confusing and I'm not really cleared reading the > documentation. For each setting you need to understand whether it limits pgbouncer->PostgreSQL or client->pgbouncer Configurations in [databases] limit the former; [pgbouncer] options either provide defaults for the [databases] or limit clients. > [databases] > cmdwela = host=127.0.0.1 port=5432 dbname=cmdwela auth_user=pgbouncer > pool_size=120 > cicciopasticcio = host=127.0.0.1 port=5432 dbname=cicciopasticcio > auth_user=pgbouncer pool_size=2 > > [pgbouncer] > listen_port = 6543 > listen_addr = * > auth_type = md5 > auth_file = /etc/pgbouncer/users.txt > auth_query = select uname,phash from user_lookup($1) > logfile = /var/log/pgbouncer.log > pidfile = /home/postgres/pgbouncer.pid > admin_users = admin > user=postgres > max_db_connections=1 So one open connection max per database/user pool but it is shared - i.e., actively executing queries running in parallel are limited to this number. > max_client_conn=5 > I just want to limit connections from the app etc. That limit is 5 David J.
Identifying comments
I know that "--" has been the single-line comment identifier since SQL was implemented by IBM on mainframes using Hollerith cards. Today, some RDBMSs also allow the C comment indentifier pair, /* ... */. The postgres10 manual when searched for 'comments' presents the double hyphens and, in Section 4.1.3, on the top line of page 39, I read that "-- and /* cannot appear anywhere in an operator name, since they will be taken as the start of a comment." I want only to confirm that I can use /* ... */ for multiline comments in my DDL and DML scripts. TIA, Rich
Re: Identifying comments [ANSWERED]
On Thu, 17 Jan 2019, Rich Shepard wrote: I want only to confirm that I can use /* ... */ for multiline comments in my DDL and DML scripts. Oops! Section 4.1.5 tells me that I can. Apologies to all, Rich
Re: Identifying comments [ANSWERED]
On Thu, Jan 17, 2019 at 8:39 AM Rich Shepard wrote: > > On Thu, 17 Jan 2019, Rich Shepard wrote: > > > I want only to confirm that I can use /* ... */ for multiline comments in my > > DDL and DML scripts. > > Oops! Section 4.1.5 tells me that I can. > You could also just try it... select 1 /* comment */ AS col_one executes just fine... David J.
Re: pgbouncer
David, Are you sure? The behaviour I experienced is different from what you described. On 17/01/2019 16:32, David G. Johnston wrote: > On Thu, Jan 17, 2019 at 8:18 AM Nicola Contu wrote: >> >> Hello, >> I am a bit confused about the settings in pgbouncer >> >> What's exactly the pool_size? > > Roughly, the number of open connections pgbouncer will keep to PostgreSQL. > >> If I set 3, and I tried to connect from 4 shells, I am still able to connect. > > That would be the point - pgbouncer is sharing the 3 connections it > keeps with PostgreSQL between the 4 client connections made to it. pgbouncer will spawn connections until it reach the maximum allowed. Connections go 1:1 between pgbouncer and the client, and are not 'shared' pool_size is the size of the pool that pgbouncer keeps open for you. If not set, the default_pool_size is used. If you check the open connections to your host, you can verify it netstat --tcp -n | grep ... will show ESTABLISHED connections = pool_size client connections will be accepted by pgbouncer until pool_size is reached. after that, pgbouncer can spawn more connection only if reserve_pool_size allows it, and after a wait of reserve_pool_timeout . when your roof is reached (means you are opening pool_size + reserve_pool_size) then client connections will enter a queue. Entering a queue, does not mean be serviced by the database. It means wait your turn. If you connect manually, you will notice an open connection, (socket is opened) but not database shell. regards, fabio pardi > >> Same thing for max_db_connections. I set this to 1 and I am able to connect >> from 2 shells. > > Same as pool_size but basically a fail-safe since pools are > per-user/per-database while this limit is per-database only. > >> This is kind of confusing and I'm not really cleared reading the >> documentation. > > For each setting you need to understand whether it limits > pgbouncer->PostgreSQL or client->pgbouncer > > Configurations in [databases] limit the former; [pgbouncer] options > either provide defaults for the [databases] or limit clients. > >> [databases] >> cmdwela = host=127.0.0.1 port=5432 dbname=cmdwela auth_user=pgbouncer >> pool_size=120 >> cicciopasticcio = host=127.0.0.1 port=5432 dbname=cicciopasticcio >> auth_user=pgbouncer pool_size=2 >> >> [pgbouncer] >> listen_port = 6543 >> listen_addr = * >> auth_type = md5 >> auth_file = /etc/pgbouncer/users.txt >> auth_query = select uname,phash from user_lookup($1) >> logfile = /var/log/pgbouncer.log >> pidfile = /home/postgres/pgbouncer.pid >> admin_users = admin >> user=postgres > >> max_db_connections=1 > So one open connection max per database/user pool but it is shared - > i.e., actively executing queries running in parallel are limited to > this number. > >> max_client_conn=5 >> I just want to limit connections from the app etc. > > That limit is 5 > > David J. >
Re: Refining query statement
On Tue, 15 Jan 2019, Thomas Kellerer wrote: select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact from Contacts as C join Organizations as O on C.org_id = O.org_id join Activities as A on C.contact_id = A.contact_id where A.next_contact <= 'today' and A.next_contact > '2018-12-31' order by c.contact_id, a.next_contact DESC; Combining this with Adrian's advice to use BETWEEN I have this statement that almost works: SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname, c.direct_phone, o.org_name, a.next_contact FROM Contacts AS c JOIN Organizations AS o ON c.org_id = o.org_id JOIN Activities AS a ON c.contact_id = a.contact_id WHERE next_contact BETWEEN '01/01/2019'::date AND 'today'::date ORDER BY c.contact_id, a.next_contact DESC; It fails when the most recent next_contact column in Activities is NULL and an earier row has a non-NULL date in the specified range. I tried specifying max(a.next_contact) and added GROUP BY, but the result set all returned o.org_name columns to the same one. The WHERE clause needs to exclude a contact_id where the most current row in Activities has NULL for the next_contact column. I've tried a few ideas but none work so I need to learn the proper syntax, and I don't find that in Rick van der Lans' or Joe Celko's books I have. Looking forward to learning, Rich
Re: pgbouncer
> If I set 3, and I tried to connect from 4 shells, I am still able to connect. That would be the point - pgbouncer is sharing the 3 connections it keeps with PostgreSQL between the 4 client connections made to it. Mmh, my pool_mode is per session. The 4 sessions were active, not doing any query, but connected to the shell. So that's what my doubt Il giorno gio 17 gen 2019 alle ore 16:32 David G. Johnston < david.g.johns...@gmail.com> ha scritto: > On Thu, Jan 17, 2019 at 8:18 AM Nicola Contu > wrote: > > > > Hello, > > I am a bit confused about the settings in pgbouncer > > > > What's exactly the pool_size? > > Roughly, the number of open connections pgbouncer will keep to PostgreSQL. > > > If I set 3, and I tried to connect from 4 shells, I am still able to > connect. > > That would be the point - pgbouncer is sharing the 3 connections it > keeps with PostgreSQL between the 4 client connections made to it. > > > Same thing for max_db_connections. I set this to 1 and I am able to > connect from 2 shells. > > Same as pool_size but basically a fail-safe since pools are > per-user/per-database while this limit is per-database only. > > > This is kind of confusing and I'm not really cleared reading the > documentation. > > For each setting you need to understand whether it limits > pgbouncer->PostgreSQL or client->pgbouncer > > Configurations in [databases] limit the former; [pgbouncer] options > either provide defaults for the [databases] or limit clients. > > > [databases] > > cmdwela = host=127.0.0.1 port=5432 dbname=cmdwela auth_user=pgbouncer > pool_size=120 > > cicciopasticcio = host=127.0.0.1 port=5432 dbname=cicciopasticcio > auth_user=pgbouncer pool_size=2 > > > > [pgbouncer] > > listen_port = 6543 > > listen_addr = * > > auth_type = md5 > > auth_file = /etc/pgbouncer/users.txt > > auth_query = select uname,phash from user_lookup($1) > > logfile = /var/log/pgbouncer.log > > pidfile = /home/postgres/pgbouncer.pid > > admin_users = admin > > user=postgres > > > max_db_connections=1 > So one open connection max per database/user pool but it is shared - > i.e., actively executing queries running in parallel are limited to > this number. > > > max_client_conn=5 > > I just want to limit connections from the app etc. > > That limit is 5 > > David J. >
strange slow query performance
The following query runs as expected. explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS MainTable_type FROM MainTable WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; QUERY PLAN - Index Scan using ix_MainTable_objectid_datetime on MainTable (cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0 loops=1) Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161)) InitPlan 1 (returns $3) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.403..26.147 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.249..10.485 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne (cost=0.41..16.80 rows=2 width=8) (actual time=0.103..0.233 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne (cost=0.41..880.59 rows=309 width=8) (actual time=0.074..5.022 rows=1041 loops=2) Index Cond: ((ileft >= n2ne.ileft) AND (ileft <= n2ne.iright)) Heap Fetches: 30 -> Index Only Scan using ix_nodeobject_inodeid_objectid on nodeobject ne (cost=0.29..1.81 rows=57 width=8) (actual time=0.002..0.007 rows=9 loops=2082) Index Cond: (inodeid = n1ne.iid) Heap Fetches: 13973 Planning time: 5.693 ms Execution time: 33.383 ms (15 rows) But when we add a second condition to the where clause it causes the query performance to drop dramatically. Values in "any(array" do not make a difference. In this example they are the same but even for different values performance is still the poor. Any ideas as to why this is happening? explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS MainTable_type FROM MainTable WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; QUERY PLAN -- - Index Scan using ix_MainTable_objectid_datetime on MainTable (cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004 rows=0 loops=1) Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND (objectid = 3161)) InitPlan 1 (returns $3) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.438..28.484 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.172..7.141 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne (cost=0.41..16.80 rows=2 width=8) (actual time=0.046..0.111 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne (cost=0.41..880.59 rows=309 width=8) (actual time=0.073..3.438 rows=1041 loops=2) Index Cond: ((ileft >= n2ne.ileft) AND (ileft <= n2ne.iright)) Heap Fetches: 30 -> Index Only Scan using ix_nodeobject_inodeid_objectid on nodeobject ne (cost=0.29..1.81 rows=57 width=8) (actual time=0.003..0.010 rows=9 loops=2082) Index Cond: (inodeid = n1ne.iid) Heap Fetches: 13973 InitPlan 2 (returns $7) -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual time=0.056..11.786 rows=19042 loops=1) -> Nested Loop (cost=0.81..1784.15 rows=618 width=4) (actual time=0.034..1.343 rows=2082 loops=1) -> Index Scan using pk_node on node n2ne_1 (cost=0.41..16.80 rows=2 width=8) (actual time=0.007..0.016 rows=2 loops=1) Index Cond: (iid = ANY ('{14438,10814}'::integer[])) -> Index Only Scan using ix_node on node n1ne_1 (cost=0.41..880.59 rows=309 width=8) (actual time=0.012..0.581
Re: pgbouncer
On Thu, Jan 17, 2019 at 9:06 AM Fabio Pardi wrote: > Are you sure? The behaviour I experienced is different from what you > described. 85%... > > On 17/01/2019 16:32, David G. Johnston wrote: > > On Thu, Jan 17, 2019 at 8:18 AM Nicola Contu wrote: > >> > >> Hello, > >> I am a bit confused about the settings in pgbouncer > >> > >> What's exactly the pool_size? > > > > Roughly, the number of open connections pgbouncer will keep to PostgreSQL. > > > >> If I set 3, and I tried to connect from 4 shells, I am still able to > >> connect. > > > > That would be the point - pgbouncer is sharing the 3 connections it > > keeps with PostgreSQL between the 4 client connections made to it. > > > pgbouncer will spawn connections until it reach the maximum allowed. > Connections go 1:1 between pgbouncer and the client, and are not 'shared' The pgbouncer->PostgreSQL connections in the pool are shared among the different client connections the database/user pairing use. > > pool_size is the size of the pool that pgbouncer keeps open for you. If not > set, the default_pool_size is used. The "you" pronoun doesn't make sense - I am a person but pgbouncer only cares about the client connections I try to make and I can make multiple. > > If you check the open connections to your host, you can verify it > > netstat --tcp -n | grep ... > > will show ESTABLISHED connections = pool_size > > > client connections will be accepted by pgbouncer until pool_size is reached. As you say below pgbouncer accepts the connection and places it into a wait queue for one of the pool pgbouncer->PostgreSQL sessions to become free > after that, pgbouncer can spawn more connection only if reserve_pool_size > allows it, and after a wait of reserve_pool_timeout . Ignoring the accordion behavior for now > > when your roof is reached (means you are opening pool_size + > reserve_pool_size) then client connections will enter a queue. ceiling... > > Entering a queue, does not mean be serviced by the database. It means wait > your turn. If you connect manually, you will notice an open connection, > (socket is opened) but not database shell. There is no such thing as a database shell; clients might have shells. All that matters, though, is whether the connection from the client is open and it is able to send queries over it. That those queries don't get executed right away because they have been queued waiting for an available pgbouncer->PostgreSQL session is what I was describing: sharing the 2 connections/sessions among 5 connected clients. David J.
Re: Refining query statement
On 1/17/19 8:14 AM, Rich Shepard wrote: On Tue, 15 Jan 2019, Thomas Kellerer wrote: select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact from Contacts as C join Organizations as O on C.org_id = O.org_id join Activities as A on C.contact_id = A.contact_id where A.next_contact <= 'today' and A.next_contact > '2018-12-31' order by c.contact_id, a.next_contact DESC; Combining this with Adrian's advice to use BETWEEN I have this statement that almost works: SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname, c.direct_phone, o.org_name, a.next_contact FROM Contacts AS c JOIN Organizations AS o ON c.org_id = o.org_id JOIN Activities AS a ON c.contact_id = a.contact_id WHERE next_contact BETWEEN '01/01/2019'::date AND 'today'::date ORDER BY c.contact_id, a.next_contact DESC; It fails when the most recent next_contact column in Activities is NULL and an earier row has a non-NULL date in the specified range. I tried specifying max(a.next_contact) and added GROUP BY, but the result set all returned o.org_name columns to the same one. The WHERE clause needs to exclude a contact_id where the most current row in Activities has NULL for the next_contact column. I've tried a few ideas but none work so I need to learn the proper syntax, and I don't find that in Rick van der Lans' or Joe Celko's books I have. ? ... WHERE next_contact BETWEEN '01/01/2019'::date AND 'today'::date AND a.next_contact IS NOT NULL ORDER BY c.contact_id, a.next_contact DESC; Looking forward to learning, Rich -- Adrian Klaver adrian.kla...@aklaver.com
Re: Refining query statement
On Thu, Jan 17, 2019 at 9:14 AM Rich Shepard wrote: > ORDER BY c.contact_id, a.next_contact DESC; > The WHERE clause needs to exclude a contact_id where the most current row in > Activities has NULL for the next_contact column. Why is next_contact allowed to be null? Your concept of "most current row" is strictly based upon next_contact so if next_contact is null it is impossible to know whether it is the most current row (consider that two activities could be missing the next_contact date - which of those is the most current?) David J.
Re: Refining query statement
On 1/17/19 8:14 AM, Rich Shepard wrote: On Tue, 15 Jan 2019, Thomas Kellerer wrote: select distinct on (C.contact_id) C.contact_id, C.lname, C.fname, C.direct_phone, O.org_name, A.next_contact from Contacts as C join Organizations as O on C.org_id = O.org_id join Activities as A on C.contact_id = A.contact_id where A.next_contact <= 'today' and A.next_contact > '2018-12-31' order by c.contact_id, a.next_contact DESC; Combining this with Adrian's advice to use BETWEEN I have this statement that almost works: SELECT DISTINCT ON (c.contact_id) c.contact_id, c.lname, c.fname, c.direct_phone, o.org_name, a.next_contact FROM Contacts AS c JOIN Organizations AS o ON c.org_id = o.org_id JOIN Activities AS a ON c.contact_id = a.contact_id WHERE next_contact BETWEEN '01/01/2019'::date AND 'today'::date ORDER BY c.contact_id, a.next_contact DESC; It fails when the most recent next_contact column in Activities is NULL and an earier row has a non-NULL date in the specified range. I tried specifying max(a.next_contact) and added GROUP BY, but the result set all returned o.org_name columns to the same one. The WHERE clause needs to exclude a contact_id where the most current row in Activities has NULL for the next_contact column. I've tried a few ideas but none work so I need to learn the proper syntax, and I don't find that in Rick van der Lans' or Joe Celko's books I have. Got to thinking more and realized the answer depends on what you want the query to produce. Can you let us know what is you are trying to pull out with the query? Looking forward to learning, Rich -- Adrian Klaver adrian.kla...@aklaver.com
Re: Refining query statement
On Thu, 17 Jan 2019, David G. Johnston wrote: Why is next_contact allowed to be null? David, There are a number of reasons. The prospect might have retired, told me to pound sand, or has put off a decision. Your concept of "most current row" is strictly based upon next_contact so if next_contact is null it is impossible to know whether it is the most current row (consider that two activities could be missing the next_contact date - which of those is the most current?) That's quite true. I realize I've been asking the question incorrectly. What I want is a list of contacts to make today. This includes ones that should have been made earlier but weren't and excludes earlier contacts that have no scheduled next contact (therefore, the nulls.). Does this clarify what I'm asking of the data? Thanks, Rich
Re: Refining query statement
On Thu, 17 Jan 2019, Adrian Klaver wrote: Got to thinking more and realized the answer depends on what you want the query to produce. Can you let us know what is you are trying to pull out with the query? Adrian, Certainly. Over breakfast I realized the same thing: the existing SELECT query is not asking for the correct data. As I replied in response to David's message, what I want is a list of contacts to make today. This includes ones that should have been made earlier but weren't and excludes earlier contacts that have no scheduled next contact (therefore, the nulls.). Thanks, Rich
Re: Refining query statement
On Thu, Jan 17, 2019 at 9:47 AM Rich Shepard wrote: > What I want is a list of contacts to make today. This includes ones that > should have been made earlier but weren't and excludes earlier contacts that > have no scheduled next contact (therefore, the nulls.). > > Does this clarify what I'm asking of the data? Yes...though now it just sounds like a flawed data model. How stuck are you in that regard? Those "future" contacts should have their own records and not be derived via an optional field on an existing record. Put differently, how do you know which activities are completed and which are not? Dave
Re: Refining query statement
On Thu, 17 Jan 2019, David G. Johnston wrote: Yes...though now it just sounds like a flawed data model. David, This is what I thought. How stuck are you in that regard? Those "future" contacts should have their own records and not be derived via an optional field on an existing record. My goal is to make a functioning business tracking application for my consulting services. Almost all my prior postgres databases hold environmental data for statistical and spatio-temporal analyses so writing a business application is a new experience for me and I want to get it correct. Put differently, how do you know which activities are completed and which are not? The direct answer is that a completed activity has a row with either a future next-activity date or a null (which is the case when the status of that organization or contact is 'no further contact'.) I should rename the Contacts table as People and the Activities table as Contacts. The original names came from a sales management system I used as a design guide, but they're probably confusing to others as well as to me. :-) I can provide my current schema (eight tables) to the list (perhaps as an attachment), an individual, or put in on a cloud site and pass the URL. Thanks, Rich
Re: Refining query statement
On 1/17/19 9:07 AM, Rich Shepard wrote: On Thu, 17 Jan 2019, David G. Johnston wrote: Yes...though now it just sounds like a flawed data model. David, This is what I thought. How stuck are you in that regard? Those "future" contacts should have their own records and not be derived via an optional field on an existing record. My goal is to make a functioning business tracking application for my consulting services. Almost all my prior postgres databases hold environmental data for statistical and spatio-temporal analyses so writing a business application is a new experience for me and I want to get it correct. Put differently, how do you know which activities are completed and which are not? The direct answer is that a completed activity has a row with either a future next-activity date or a null (which is the case when the status of that organization or contact is 'no further contact'.) To be clear the next-activity date = next_contact in the database, correct? NULL basically means unknown, so having it stand for something is a bit of a stretch. Seems to me a boolean field of name active to denote contacts you need to keep up with is in order. Then make the next_contact field NOT NULL and replace the current NULL values with 'infinity': https://www.postgresql.org/docs/10/datatype-datetime.html 8.5.1.4. Special Values As to current situation: 1) Use my previous suggestion. or 2) WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND 'today'::date https://www.postgresql.org/docs/10/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL I should rename the Contacts table as People and the Activities table as Contacts. The original names came from a sales management system I used as a design guide, but they're probably confusing to others as well as to me. :-) I can provide my current schema (eight tables) to the list (perhaps as an attachment), an individual, or put in on a cloud site and pass the URL. Thanks, Rich -- Adrian Klaver adrian.kla...@aklaver.com
Re: Refining query statement
On Thu, Jan 17, 2019 at 10:07 AM Rich Shepard wrote: >The direct answer is that a completed activity has a row with either a > future next-activity date or a null (which is the case when the status of > that organization or contact is 'no further contact'.) Off the top of my head (and this is a model I am quite familiar with even if I'm doing this email at speed): I'd suggest an actual activity table: activity ( created_on date not null, due_on date not null, performed_on date null, contact_id bigint not null references (contact) ); contact ( contact_id bigserial primary key, told_me_to_go_pound_sand boolean default false ); Now, an exception report can be made for every contact where "told_me_to_go_pound_sand" is false and there is no record on activity where performed_on is null (and also pound sand is true and there is one performed_on is null) The report you want is basically everything on activity where performed_on is null and due_on is today or earlier. A unique partial index can be made (activity.contact_id) where performed_on is null; to ensure that at most only one open activity exists for each contact. David J.
Re: Refining query statement
On Thu, 17 Jan 2019, Adrian Klaver wrote: To be clear the next-activity date = next_contact in the database, correct? Adrian, Yes. I've renamed the Activities table to Contacts and the Contacts table to People. NULL basically means unknown, so having it stand for something is a bit of a stretch. Seems to me a boolean field of name active to denote contacts you need to keep up with is in order. Then make the next_contact field NOT NULL and replace the current NULL values with 'infinity': This makes good sense. As to current situation: 1) Use my previous suggestion. I did so, but will modify it as you suggest: 2) WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND 'today'::date Thanks again, Rich
Re: Refining query statement
On Thu, 17 Jan 2019, Adrian Klaver wrote: Seems to me a boolean field of name active to denote contacts you need to keep up with is in order. Then make the next_contact field NOT NULL and replace the current NULL values with 'infinity': WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND 'today'::date Adrian, Having added a new column, 'active' with boolean values (and a default of true), would this WHERE clause work: WHERE active = TRUE and next_contact <= 'today'::date ? Regards, Rich
Re: Postgres Automated Failover
Thanks, I'll check it out. On Thu, Jan 17, 2019 at 9:06 AM Jehan-Guillaume (ioguix) de Rorthais < iog...@free.fr> wrote: > On Thu, 17 Jan 2019 08:32:48 -0500 > AI Rumman wrote: > > > Hi, > > > > I am planning to use Postgresql with TimescaleDb extension. I have to > > design a system similar to AWS RDS which supports automated failover, > > transparent minor version upgrades etc. > > In early days, I worked with PgPool II to enable heartbeat between the > > Postgres servers. Is there any new tool to achieve the same by which I > can > > design a transparent failover system for Postgres? > > There is multiple HA solutions, none are "transparent failover" though. > They > are just automated-failover. A rollback-ed transaction because of failover > will > never be transparent from the application point of view. > > Look at Patroni or PAF, depending on what you want to achieve. >
Re: Refining query statement
On Thu, 17 Jan 2019, David G. Johnston wrote: Off the top of my head (and this is a model I am quite familiar with even if I'm doing this email at speed): I'd suggest an actual activity table: David, Adrian's suggestion of a Contacts table column called 'active' having a boolean data type seems to me to be more compact. Testing active = true as a condition in the WHERE clause will display only those People with active contact needs. Thanks very much, Rich
Re: Refining query statement
On 1/17/19 10:01 AM, Rich Shepard wrote: On Thu, 17 Jan 2019, Adrian Klaver wrote: Seems to me a boolean field of name active to denote contacts you need to keep up with is in order. Then make the next_contact field NOT NULL and replace the current NULL values with 'infinity': WHERE COALESCE(next_contact, 'infinity') BETWEEN '01/01/2019'::date AND 'today'::date Adrian, Having added a new column, 'active' with boolean values (and a default of true), would this WHERE clause work: WHERE active = TRUE and next_contact <= 'today'::date It would produce results, so yes it would work. The question are they the results you want? The above would return anything with a next_contact less then today. That could extend backwards to some undetermined point in the past, depending on how diligent you where on updating next_contact. Before you go much further I would look over the suggestions posted and then sit down with pen and paper and figure out what it you want to track and how you are going to maintain that tracking. Then draw up an outline of how you will achieve that in code. ? Regards, Rich -- Adrian Klaver adrian.kla...@aklaver.com
Re: Refining query statement
On Thu, 17 Jan 2019, Adrian Klaver wrote: It would produce results, so yes it would work. The question are they the results you want? Adrian, To which the answer is no as I just discovered. The above would return anything with a next_contact less then today. That could extend backwards to some undetermined point in the past, depending on how diligent you where on updating next_contact. Before you go much further I would look over the suggestions posted and then sit down with pen and paper and figure out what it you want to track and how you are going to maintain that tracking. Then draw up an outline of how you will achieve that in code. I have that tracking defined (who is active and needs a contact today or earlier) and am now working on how to code it so only the last contact for a person is examined to determine if the active column = true. I'll ponder that. Thanks, Rich
Need aws_oracle_ext.systimestamp function defination for postgres
Hi all, I am working on a migration from Oracle to PostgreSQL using the AWS SCT. I had converted 823 table out of 866 tables to postgres. However for few tables I am facing challenges in convertion due to "aws_oracle_ext.systimestamp" fuctions. below is the Error for your reference. ERROR: function aws_oracle_ext.systimestamp() does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. I am searching for the fucntion defination for the aws_oracle_ext.systimestamp. Similar to the below "aws_oracle_ext.sysdate". CREATE OR REPLACE FUNCTION aws_oracle_ext.sysdate() RETURNS timestamp without time zone AS $BODY$ DECLARE l_var1 interval ; BEGIN l_var1 := ’0 hour’; /* Please type your value instead of 0 */ return (clock_timestamp()::TIMESTAMP(0) WITHOUT TIME ZONE) + l_var1; END; $BODY$ LANGUAGE plpgsql VOLATILE; Thank you in advance.
Re: strange slow query performance
Ben Snaidero wrote: > The following query runs as expected. > > explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS > MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS > MainTable_type FROM MainTable >WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( > nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON > ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid > = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) > AND objectid=3161; > > QUERY PLAN > > - > Index Scan using ix_MainTable_objectid_datetime on MainTable > (cost=3254.91..3264.39 rows=1 width=32) (actual time=33.094..33.094 rows=0 > loops=1) >Index Cond: ((objectid = ANY ($3)) AND (objectid = 3161)) >InitPlan 1 (returns $3) > -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual > time=0.403..26.147 rows=19042 loops=1) > [...] > Planning time: 5.693 ms > Execution time: 33.383 ms > (15 rows) > > But when we add a second condition to the where clause it causes the query > performance to drop dramatically. Values in "any(array" do not make a > difference. In this example they are the same but even for different values > performance is still the poor. Any ideas as to why this is happening? > > explain analyze SELECT MainTable.gid AS MainTable_gid,MainTable.datetime AS > MainTable_datetime,MainTable.objectid AS MainTable_objectid,MainTable.type AS > MainTable_type FROM MainTable >WHERE objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( > nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON > ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid > = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) > AND objectid = ANY(ARRAY(SELECT NE.objectid AS entityId FROM ( > nodeobject NE INNER JOIN ( Node N1NE INNER JOIN Node N2NE ON > ((N1NE.iLeft >= N2NE.iLeft) AND (N1NE.iLeft <= N2NE.iRight))) ON NE.inodeid > = N1NE.iId ) WHERE N2NE.iId = ANY(ARRAY[14438,10814]))) AND objectid=3161; > > > QUERY PLAN > > > -- > Index Scan using ix_MainTable_objectid_datetime on MainTable > (cost=6509.66..6534.02 rows=1 width=32) (actual time=16442.004..16442.004 > rows=0 loops=1) >Index Cond: ((objectid = ANY ($3)) AND (objectid = ANY ($7)) AND (objectid > = 3161)) >InitPlan 1 (returns $3) > -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual > time=0.438..28.484 rows=19042 loops=1) > [...] >InitPlan 2 (returns $7) > -> Nested Loop (cost=1.10..3254.75 rows=16296 width=4) (actual > time=0.056..11.786 rows=19042 loops=1) > [...] > Planning time: 4.860 ms > Execution time: 16442.462 ms > (26 rows) Not sure what is going on, but why don't you put the condition "WHERE ne.objectid=3161" into the subselects? Then there should be at most one result row, and I can imagine that things will become faster. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Varlena with recursive data structures?
Hi Karl, I'm going down this road myself. In addition to the files Tom Lane pointed out there is also some helpful documentation here: https://www.postgresql.org/docs/current/storage-toast.html#STORAGE-TOAST-INMEMORY On Wed, Jan 16, 2019 at 2:09 PM Sam Patterson wrote: > Hi all, > > I've recently started developing an extension for Postgres for which I'll > need to create a new variable-length base type. The type will require a > tree-like structure in order to parse sufficiently, which of course > probably means having some sort of recursive data structure, like a struct > that has members which are pointers to itself for child nodes. After doing > some research, specifically looking at how other variable-length data types > store their data, it seems almost all of them store the data in a binary > representation, using bit masks and offsets etc in order to store/access > the data whilst having an in-memory representation that's used to > manipulate the data. > > I presume the purpose for using this approach is because all the data in a > varlena type has to be contiguous, and the moment you start using pointers > this is no longer possible. So my question is, given a structure that looks > something like this, > > typedef struct Node > { > char *data; > Node *left; > Node *right; > } Node; > > am I right in saying that I wouldn't be able to store that representation > on-disk, but instead I'd have to transform it into some binary > representation and back again when writing/reading respectively, are there > any alternatives? > > Regards, > > Karl >
Re: strange slow query performance
On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero wrote: > Any ideas as to why this is happening? Not really, I would expect roughly double execution time, not an exponential increase. Still not experienced enough to diagnose with what has been provided but I will suggest you provide the version that you are running these queries against and confirm that the results are consistent across multiple runs to discount the possibility that locking or other transient work is involved. Curious if increasing work_mem helps but its mostly a trial-and-error thing for me (I wouldn't expect it to)... David J.
Re: Refining query statement
On Thu, 17 Jan 2019, Adrian Klaver wrote: Got to thinking more and realized the answer depends on what you want the query to produce. Can you let us know what is you are trying to pull out with the query? Adrian, et al., Took your advice and re-thought what I need the query to return. This allowed me to realize that I don't need a separate contact history query as I can incorporate it in a single query. The goal and pseudocode are: Purpose: List all contact information and contact history for active people where next contact date is less than today. For each person select person_id, lname, fname, and direct_phone from People. For each person get the org_name from Organizations. For each person get contact history in date order and next contact date from Contacts where active = True. Order by next contact dates in ascending order. Query code: SELECT p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, c.cont_date, c.cont_type, c.notes, c.next_contact, c.comment FROM People AS p JOIN Organizations AS o ON p.org_id = o.org_id JOIN Contacts AS c ON c.person_id = p.person_id WHERE c.active = TRUE AND c.next_contact <= 'today'::date GROUP BY o.org_name, p.person_id, c.person_id, c.cont_date, c.cont_type, c.next_contact ORDER BY p.person_id, c.next_contact The results are not correctly grouped or ordered; I'll work on fixing these issues. The other issue that needs fixing is identifying the most recent 'active' value in the Contacts table for each person_id and including that person in the results only when active = 't'. Here's a MWE of a redacted set of rows that should not have been returned by the query: person_id | cont_date | cont_type | notes | active | next_contact | comment --++--+- 36 | 2018-12-12 | Conference | Notes on this conversation. | t | 2018-12-17 | 36 | 2018-12-17 | Phone | Left message asking for a meeting. | t | 2019-01-03 | 36 | 2019-01-03 | Phone | Left another message. | t | 2019-01-07 | 36 | 2019-01-07 | Phone | Going into a meeting. | t | 2019-01-14 | 36 | 2019-01-15 | Phone | Will call when/if. | f | infinity | (5 rows) Because the most recent row's active value is 'f' these rows should not be included as there is no next contact date. I'll keep reading looking for this answer and pointers will be helpful. Regards, Rich
Re: Refining query statement
On 1/17/19 2:44 PM, Rich Shepard wrote: On Thu, 17 Jan 2019, Adrian Klaver wrote: Got to thinking more and realized the answer depends on what you want the query to produce. Can you let us know what is you are trying to pull out with the query? Adrian, et al., Took your advice and re-thought what I need the query to return. This allowed me to realize that I don't need a separate contact history query as I can incorporate it in a single query. The goal and pseudocode are: Purpose: List all contact information and contact history for active people where next contact date is less than today. For each person select person_id, lname, fname, and direct_phone from People. For each person get the org_name from Organizations. For each person get contact history in date order and next contact date from Contacts where active = True. Order by next contact dates in ascending order. Query code: SELECT p.person_id, p.lname, p.fname, p.direct_phone, o.org_name, c.cont_date, c.cont_type, c.notes, c.next_contact, c.comment FROM People AS p JOIN Organizations AS o ON p.org_id = o.org_id JOIN Contacts AS c ON c.person_id = p.person_id WHERE c.active = TRUE AND c.next_contact <= 'today'::date GROUP BY o.org_name, p.person_id, c.person_id, c.cont_date, c.cont_type, c.next_contact ORDER BY p.person_id, c.next_contact The results are not correctly grouped or ordered; I'll work on fixing these issues. The other issue that needs fixing is identifying the most recent 'active' value in the Contacts table for each person_id and including that person in the results only when active = 't'. Here's a MWE of a redacted set of rows that should not have been returned by the query: person_id | cont_date | cont_type | notes | active | next_contact | comment --++--+- 36 | 2018-12-12 | Conference | Notes on this conversation. | t | 2018-12-17 | 36 | 2018-12-17 | Phone | Left message asking for a meeting. | t | 2019-01-03 | 36 | 2019-01-03 | Phone | Left another message. | t | 2019-01-07 | 36 | 2019-01-07 | Phone | Going into a meeting. | t | 2019-01-14 | 36 | 2019-01-15 | Phone | Will call when/if. | f | infinity | (5 rows) Because the most recent row's active value is 'f' these rows should not be included as there is no next contact date. I'll keep reading looking for this answer and pointers will be helpful. I would think the active would be on People or Organizations. Then you can eliminate then from the query results before you ever got to the contact history. Also not sure how you the above result when the query showed: WHERE c.active = TRUE AND c.next_contact <= 'today'::date Or did the history results come from an undisclosed query? Regards, Rich -- Adrian Klaver adrian.kla...@aklaver.com
Re: Refining query statement
On Thu, 17 Jan 2019, Adrian Klaver wrote: I would think the active would be on People or Organizations. Then you can eliminate then from the query results before you ever got to the contact history. Adrian, Excellent point. I don't know why I put active in the contacts table as it does make more sense to put it in the people table (as some clients have multiple sites, each with its own contact person). Also not sure how you the above result when the query showed: WHERE c.active = TRUE AND c.next_contact <= 'today'::date Or did the history results come from an undisclosed query? I don't recall which query I used for the MWE. I'll move the active column to the new table, as well as those values, and keep working on learning how to formulate complex, multi-table queries. Also, moving the active table to peoples, where there is only one row per person (rather than multiple contact rows per person) removes the need to find only the most recent row in the many of the one-to-many relationship. Thanks for the advice, Rich
Re: Refining query statement
On Thu, Jan 17, 2019 at 3:44 PM Rich Shepard wrote: > FROM People AS p > JOIN Organizations AS o ON p.org_id = o.org_id > JOIN Contacts AS c ON c.person_id = p.person_id I would advise changing Contacts to "Activities" as the former can readily be interpreted (and is in the wild) as both "an instance of contacting a person" and "the person at the organization who is being contacted" (i.e., your People class). David J.
Re: Refining query statement
On Thu, 17 Jan 2019, David G. Johnston wrote: I would advise changing Contacts to "Activities" as the former can readily be interpreted (and is in the wild) as both "an instance of contacting a person" and "the person at the organization who is being contacted" (i.e., your People class). David, That's why I originally named the Contacts table Activities and the Peoples table as Contacts. :-) I'll change it back again since there's now a Peoples table. Many thanks! Rich
Re: strange slow query performance
On Thu, Jan 17, 2019 at 4:13 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thu, Jan 17, 2019 at 9:19 AM Ben Snaidero > wrote: > > Any ideas as to why this is happening? > > Not really, I would expect roughly double execution time, not an > exponential increase. Still not experienced enough to diagnose with > what has been provided but I will suggest you provide the version that > you are running these queries against and confirm that the results are > consistent across multiple runs to discount the possibility that > locking or other transient work is involved. > > Curious if increasing work_mem helps but its mostly a trial-and-error > thing for me (I wouldn't expect it to)... > > David J. > Query was tested multiple times with similar results. I also tried increasing work_mem with the same results as well. Memory did not increase or even come close to the 10MB setting. Query does seem to occupy one full cpu for the duration of the run though. Query is running on windows with Postgres 9.6.7