Re: How to observe plan_cache_mode transition from custom to generic plan?

2021-09-06 Thread Mladen Gogala



On 9/6/21 2:26 AM, Laurenz Albe wrote:

"Bind variables" just being an Oraclism for parameters, it is*not*  a
mistake to use them in PostgreSQL.


Actually, it is a mistake because they don't give you any performance 
benefit and can potentially worsen the performance. There is no cursor 
sharing and generic plans can be much worse than "custom" plans, 
generated with the actual values. The only reason for using bind 
variables/parameters is to protect yourself from SQL injection. Creating 
SQL dynamically from input is the recipe for the "little Bobby Tables" 
situation: https://xkcd.com/327/



--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com





Re: How to observe plan_cache_mode transition from custom to generic plan?

2021-09-06 Thread Laurenz Albe
On Mon, 2021-09-06 at 03:07 -0400, Mladen Gogala wrote:
> 
> On 9/6/21 2:26 AM, Laurenz Albe wrote:
> > "Bind variables" just being an Oraclism for parameters, it is*not*  a
> > mistake to use them in PostgreSQL.
> 
> Actually, it is a mistake because they don't give you any performance 
> benefit and can potentially worsen the performance. There is no cursor 
> sharing and generic plans can be much worse than "custom" plans, 
> generated with the actual values. The only reason for using bind 
> variables/parameters is to protect yourself from SQL injection.

I disagree.  There is defnitely a performance benefit in not generating
a plan for each execution, even if plans are only cached per session.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





pg_upgrade - fe_sendauth: no password supplied

2021-09-06 Thread Nick Renders

Hello,

I have been trying to use the pg_upgrade command to update a PostgreSQL 
11 environment to 13 on macOS 11.


I have followed the steps in the documentation, but the command always 
fails when trying to connect to the original database. This is the 
command that is sent:


	/Library/PostgreSQL/13/bin/pg_upgrade -b /Library/PostgreSQL/11/bin -B 
/Library/PostgreSQL/13/bin -d /Volumes/Postgres_Data/PostgreSQL/11/data 
-D /Volumes/Postgres_Data/PostgreSQL/13/data -p 49156 -P 49155 -U 
postgres -j 24 -v


And this is what is logged:

connection to database failed: fe_sendauth: no password supplied
could not connect to source postmaster started with the command:
	"/Library/PostgreSQL/11/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D 
"/Volumes/Postgres_Data/PostgreSQL/11/data" -o "-p 49156 -b  -c 
listen_addresses='' -c unix_socket_permissions=0700 -c 
unix_socket_directories='/Volumes/Free/Upgrade'" start
	"/Library/PostgreSQL/11/bin/pg_ctl" -w -D 
"/Volumes/Postgres_Data/PostgreSQL/11/data" -o "" -m fast stop >> 
"pg_upgrade_server.log" 2>&1



According to the documentation, the connection should be established 
with the data in the .pgpass file. Its contents look like this (the 
password has been changed) :


localhost:49155:*:postgres:password1234
localhost:49156:*:postgres:password1234

The .pgpass file works without problems with the pg_dump and pg_restore 
commands, so I'm fairly certain its contents and privileges are set 
correctly.



The PostgreSQL documentation also mentions that you can update the 
pg_hba.conf file to use authentication method "peer". This has no effect 
either, however when I set it to "trust", the command goes through just 
fine.


So I have been able to do the upgrade, but I am still wondering why I 
got the error in the first place. Any idea why the .pgpass file isn't 
working with the pg_upgrade command?


Best regards,

Nick Renders




Re: update non-indexed value is slow if some non-related index/fk are enabled

2021-09-06 Thread rob stone



On Sun, 2021-09-05 at 11:21 -0400, Tom Lane wrote:
> "David G. Johnston"  writes:
> > On Friday, September 3, 2021, Philippe Doussot <
> > philippe.dous...@up.coop>
> > wrote:
> > > I don't understand why disabling all index from the table speed
> > > up the
> > > update because the boolean column is not indexed
> 
> > Index entries point to physical records.  You just deleted one
> > physical
> > record and added another.  The indexes need to be updated with that
> > information.
> 
> Yeah.  The OP's mental model is apparently update-in-place, but
> that's
> not how Postgres does things.
> 
> The index-update overhead is avoided if the update is "HOT", which
> requires that (a) no indexed column changes and (b) there is room
> on the same page for the new copy of the row.  Ensuring (b) requires
> running with a fairly low fill-factor, which bloats your table and
> thereby creates its own costs.  Still, that might be worth doing
> depending on your particular circumstances.
> 
> regards, tom lane
> 
> 
If the DDL for that table had the column defined like this:-

my_boolean  BOOLEAN,

instead of:-

my_boolean  BOOLEAN NOT NULL DEFAULT FALSE/TRUE, (whichever is
convenient)

then that column would contain either 'f' or 't' on insert instead of
null.

Then even if a fillfactor was not specified for that table, an update
of that single column (which does not appear in an index) would merely
swap the values.
Surely that would write it back in place?

Also, having boolean columns containing a null makes it difficult for
the getter's of that table deciding if 'null' is true or false.

Just an observation.

Rob






Re: update non-indexed value is slow if some non-related index/fk are enabled

2021-09-06 Thread Philippe Doussot

On 06/09/2021 10:21, rob stone wrote:


On Sun, 2021-09-05 at 11:21 -0400, Tom Lane wrote:

"David G. Johnston"  writes:

On Friday, September 3, 2021, Philippe Doussot <
philippe.dous...@up.coop>
wrote:

I don't understand why disabling all index from the table speed
up the
update because the boolean column is not indexed

Index entries point to physical records.  You just deleted one
physical
record and added another.  The indexes need to be updated with that
information.

Yeah.  The OP's mental model is apparently update-in-place, but
that's
not how Postgres does things.

The index-update overhead is avoided if the update is "HOT", which
requires that (a) no indexed column changes and (b) there is room
on the same page for the new copy of the row.  Ensuring (b) requires
running with a fairly low fill-factor, which bloats your table and
thereby creates its own costs.  Still, that might be worth doing
depending on your particular circumstances.

 regards, tom lane



If the DDL for that table had the column defined like this:-

my_boolean  BOOLEAN,

instead of:-

my_boolean  BOOLEAN NOT NULL DEFAULT FALSE/TRUE, (whichever is
convenient)

then that column would contain either 'f' or 't' on insert instead of
null.

Then even if a fillfactor was not specified for that table, an update
of that single column (which does not appear in an index) would merely
swap the values.
Surely that would write it back in place?

Also, having boolean columns containing a null makes it difficult for
the getter's of that table deciding if 'null' is true or false.

Just an observation.

Rob


Yes my columns are NOT NULL DEFAULT FALSE

but the update always change the ctid ( new row in page )

I whas hopping the same optimisation as you: Write in place.

For boolean it is maybe doable because the value is fixed in size but for 
variable length ..

I was also expecting no row rewrite if value don't change .. easy for boolean 
but not for bigger fields


Philippe




Re: update non-indexed value is slow if some non-related index/fk are enabled

2021-09-06 Thread Philippe Doussot

On 05/09/2021 17:21, Tom Lane wrote:

"David G. Johnston"  writes:

On Friday, September 3, 2021, Philippe Doussot
wrote:

I don't understand why disabling all index from the table speed up the
update because the boolean column is not indexed

Index entries point to physical records.  You just deleted one physical
record and added another.  The indexes need to be updated with that
information.

Yeah.  The OP's mental model is apparently update-in-place, but that's
not how Postgres does things.

The index-update overhead is avoided if the update is "HOT", which
requires that (a) no indexed column changes and (b) there is room
on the same page for the new copy of the row.  Ensuring (b) requires
running with a fairly low fill-factor, which bloats your table and
thereby creates its own costs.  Still, that might be worth doing
depending on your particular circumstances.

regards, tom lane

Hi,

Thank you both David and Tom for this precious information.

Yes, I was thinking that the update was made in-place because a boolean true or 
false take the same size :).
I did't expect the ctid was changed.

The main reason is effectivly
> (b) there is (no) room on the same page for the new copy of the row.

I now see that the default TABLE fillfactor is 100. No room for update in same 
page by default (our case).

The CTID is changing for each update (even if I UPDATE many time the same 
boolean column with same value false, false, false .. maybe a room for 
optimisation here )

And with some room (lower fillfactor) I see the HOT working well with the help 
from https://habr.com/en/company/postgrespro/blog/483768/ , Thanks Егор Рогов 
@erogov


Disabing index during update obviously put index out of sync with the new row 
location which require an reindex.

I can now better explain to my team why this update is slow ( without lower 
fillfactor).
Why it is quick with index disabled.
Why disabling index without reindexing it after enabling it is a very bad idea 
.. the ctid as changed (without HOT update) and index is out-of-date.

Many thanks

Philippe




PostgreSQL : column value filtering in Logical Replication

2021-09-06 Thread PALAYRET Jacques
Hello, 

Would it be possible to have a " Column value filtering in Logical Replication 
", on the publication side ? 
Il would not be a " Column Filtering " neither a " row filtering ". It would be 
the possibility to send NULL (instead of the column value, for a publication 
table), when a where clause is unsatisfied. 
In others terms, a way to filter column values according to a logical 
condition. 

CREATE PUBLICATION  [ FOR TABLE [ONLY] table_name [(colname [,…]) 
[WHERE ()] ] 
would become : 
CREATE PUBLICATION  [ FOR TABLE [ONLY] table_name [(colname [WHERE 
()] [,…]) [WHERE ()] ] 
-> The " col_where_clause " could be a where clause to filter or better a 
function to transform the column values. 


For example, with a col_where_clause (for column b) as : " b <= 50 " 

On publisher : 
a | b 
-+-- 
111 | 44.4 
222 | 55.5 
333 | 33.3 

on subscriber : 
a | b 
-+-- 
111 | 44.4 
222 | 
333 | 33.3 


Regards 
- Météo-France - 
PALAYRET JACQUES 
DCSC/MBD 
jacques.palay...@meteo.fr 
Fixe : +33 561078319 


Re: pg_upgrade - fe_sendauth: no password supplied

2021-09-06 Thread Tom Lane
"Nick Renders"  writes:
> [ pg_upgrade fails with ]
>   connection to database failed: fe_sendauth: no password supplied
>   could not connect to source postmaster started with the command:
>   "/Library/PostgreSQL/11/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D 
> "/Volumes/Postgres_Data/PostgreSQL/11/data" -o "-p 49156 -b  -c 
> listen_addresses='' -c unix_socket_permissions=0700 -c 
> unix_socket_directories='/Volumes/Free/Upgrade'" start

> According to the documentation, the connection should be established 
> with the data in the .pgpass file. Its contents look like this (the 
> password has been changed) :
>   localhost:49155:*:postgres:password1234
>   localhost:49156:*:postgres:password1234

I think this is explained by this statement in the libpq documentation:

The host name localhost is also searched for when the connection is a
Unix-domain socket connection and the host parameter matches libpq's
default socket directory path.

pg_upgrade will use a Unix-domain socket (unless on Windows), but it
intentionally puts it in a non-default place --- we can see
unix_socket_directories='/Volumes/Free/Upgrade'
in your example.  That's meant to ensure that outside clients can't
connect to the postmaster(s) during the upgrade, but it's not interacting
too well with this behavior of libpq.

I don't recall for sure, but I think you could have made this work
by putting the socket path (/Volumes/Free/Upgrade) instead of
"localhost" in the .pgpass file.

regards, tom lane




Re: update non-indexed value is slow if some non-related index/fk are enabled

2021-09-06 Thread David G. Johnston
On Monday, September 6, 2021, Philippe Doussot 
wrote:

> I whas hopping the same optimisation as you: Write in place.
>
>
>
How exactly would you expect “update-in-place” to work given the nature of
MVCC?

David J.


Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Daniel Westermann (DWE)
Hi,

while playing with PostgreSQL 14 Beta 3 I noticed a change when it comes to the 
visibility map and vacuum. Test case:


gbench=# create table t1 ( a int, b text ) with ( fillfactor = 100 );
CREATE TABLE
pgbench=# insert into t1 select a, a::text from generate_series(1,100) a;
INSERT 0 100
pgbench=# create index i1 on t1 (a);
CREATE INDEX


gbench=# select ctid,* from t1 order by 1 limit 5;
 ctid  | a | b 
---+---+---
 (0,1) | 1 | 1
 (0,2) | 2 | 2
 (0,3) | 3 | 3
 (0,4) | 4 | 4
 (0,5) | 5 | 5
(5 rows)


pgbench=# begin;
BEGIN
pgbench=*# update t1 set b ='xx' where a = 1;
UPDATE 1
pgbench=*# select ctid,* from t1 order by 1 limit 5;
 ctid  | a | b 
---+---+---
 (0,2) | 2 | 2
 (0,3) | 3 | 3
 (0,4) | 4 | 4
 (0,5) | 5 | 5
 (0,6) | 6 | 6
(5 rows)

pgbench=*# select ctid,* from t1 where a = 1;
  ctid   | a | b  
-+---+
 (4,203) | 1 | xx
(1 row)

pgbench=*# commit;

pgbench=# select pg_visibility_map('t1'::regclass, 0);
 pg_visibility_map 
---
 (f,f)
(1 row)

pgbench=# vacuum t1;
VACUUM
pgbench=# select pg_visibility_map('t1'::regclass, 0);
 pg_visibility_map 
---
 (f,f)   

Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Peter Geoghegan
It's a non-hot update, and so there is a single dead index tuple. You're
seeing the new optimization that makes vacuum skip indexes in marginal
cases.

Try running vacuum with index cleanup = on.

Peter Geoghegan
(Sent from my phone)


Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Daniel Westermann (DWE)
>It's a non-hot update, and so there is a single dead index tuple. You're 
>seeing the new optimization that makes vacuum skip indexes in >marginal cases.

>Try running vacuum with index cleanup = on.

Thank you, Peter


Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Laurenz Albe
On Mon, 2021-09-06 at 07:46 -0700, Peter Geoghegan wrote:
> It's a non-hot update, and so there is a single dead index tuple. You're 
> seeing
> the new optimization that makes vacuum skip indexes in marginal cases. 
> 
> Try running vacuum with index cleanup = on. 

It occurs to me that this new default "auto" setting for "index_cleanup"
may cause a performance regression for people who VACUUM tables frequently
in order to get fast index-only scans.

That is not a bug, but it would be good to alert the users.

It is not an incompatibility that warrants a mention in the release notes,
but perhaps somthing in
https://www.postgresql.org/docs/14/indexes-index-only-scans.html
and/or
https://www.postgresql.org/docs/14/routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP
could be added that recommends that people should consider frequent
VACUUM with "index_cleanup = on" for best performance with index-only scans.

Suggested patch attached, should be backpatched to v14.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com
From d98f4c4cb62b564e8f9a26ed4e8da80dadfbc55c Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 6 Sep 2021 17:47:15 +0200
Subject: [PATCH] Document VACUUM tips for index-only scans

Add hints for tuning autovacuum to get efficient
index-only scans.  This has become even more relevant
than before, because the default "auto" option of
"index_cleanup" introduced by commit 3499df0dee
adds yet another thing to consider.
---
 doc/src/sgml/indices.sgml | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 56fbd45178..4257615b85 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1134,8 +1134,9 @@ SELECT x FROM tab WHERE x = 'key' AND z < 42;
problem.  PostgreSQL tracks, for each page in
a table's heap, whether all rows stored in that page are old enough to be
visible to all current and future transactions.  This information is
-   stored in a bit in the table's visibility map.  An
-   index-only scan, after finding a candidate index entry, checks the
+   stored in a bit in the table's
+   visibility map.
+   An index-only scan, after finding a candidate index entry, checks the
visibility map bit for the corresponding heap page.  If it's set, the row
is known visible and so the data can be returned with no further work.
If it's not set, the heap entry must be visited to find out whether it's
@@ -1155,6 +1156,14 @@ SELECT x FROM tab WHERE x = 'key' AND z < 42;
make this type of scan very useful in practice.
   
 
+  
+   To make sure that index-only scans are efficient, it can be a good idea
+   to see that the table is VACUUMed often enough.  This
+   can be done by lowering 
+   on that table and setting 
+   to auto.
+  
+
   

 INCLUDE
-- 
2.31.1



Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Tom Lane
Laurenz Albe  writes:
> It is not an incompatibility that warrants a mention in the release notes,
> but perhaps somthing in
> https://www.postgresql.org/docs/14/indexes-index-only-scans.html
> and/or
> https://www.postgresql.org/docs/14/routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP
> could be added that recommends that people should consider frequent
> VACUUM with "index_cleanup = on" for best performance with index-only scans.

If enough pages would change their all-visible state to make a significant
difference in index-only scan performance, VACUUM should not be skipping
the cleanup.  If it is, the threshold for that is too aggressive.

Assuming that that choice was made appropriately, I think the advice you
propose here will just cause people to waste lots of cycles on VACUUM
runs that have only marginal effects.

regards, tom lane




Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Laurenz Albe
On Mon, 2021-09-06 at 11:59 -0400, Tom Lane wrote:
> Laurenz Albe  writes:
> > It is not an incompatibility that warrants a mention in the release notes,
> > but perhaps somthing in
> > https://www.postgresql.org/docs/14/indexes-index-only-scans.html
> > and/or
> > https://www.postgresql.org/docs/14/routine-vacuuming.html#VACUUM-FOR-VISIBILITY-MAP
> > could be added that recommends that people should consider frequent
> > VACUUM with "index_cleanup = on" for best performance with index-only scans.
> 
> If enough pages would change their all-visible state to make a significant
> difference in index-only scan performance, VACUUM should not be skipping
> the cleanup.  If it is, the threshold for that is too aggressive.
> 
> Assuming that that choice was made appropriately, I think the advice you
> propose here will just cause people to waste lots of cycles on VACUUM
> runs that have only marginal effects.

#define BYPASS_THRESHOLD_PAGES  0.02/* i.e. 2% of rel_pages */

So up to an additional 2% of all pages can have the all-visible bit
unset with "index_cleanup = auto".

That is probably not worth worrying, right?

Yours,
Laurenz Albe





Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-06 Thread Matthias Apitz


What does the term 'over 20Lakh rows' mean? Thanks

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
August 13, 1961: Better a wall than a war. And, while the GDR was still 
existing,
no German troups and bombs have been killed in Yugoslavia, Afghanistan, 
Afrika...




Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-06 Thread Shubham Mittal
20 Lakh is the current no of rows in the task table.. on which the query is
executed..

On Mon, Sep 6, 2021, 11:44 PM Matthias Apitz  wrote:

>
> What does the term 'over 20Lakh rows' mean? Thanks
>
> matthias
> --
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/
> +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
> August 13, 1961: Better a wall than a war. And, while the GDR was still
> existing,
> no German troups and bombs have been killed in Yugoslavia, Afghanistan,
> Afrika...
>


Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-06 Thread Josef Šimánek
po 6. 9. 2021 v 20:14 odesílatel Matthias Apitz  napsal:
>
>
> What does the term 'over 20Lakh rows' mean? Thanks

AFAIK in India (and surrounding areas) 20 Lakh = 20 * 100 000 = 2 000 000

> matthias
> --
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
> August 13, 1961: Better a wall than a war. And, while the GDR was still 
> existing,
> no German troups and bombs have been killed in Yugoslavia, Afghanistan, 
> Afrika...
>
>




Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-06 Thread Matthias Apitz
El día lunes, septiembre 06, 2021 a las 11:45:34p. m. +0530, Shubham Mittal 
escribió:

> 20 Lakh is the current no of rows in the task table.. on which the query is
> executed..

Ahh, I never came accross this (Indian) unit 'lakh' and now understand
that we're are talking about https://en.wikipedia.org/wiki/Lakh 
and 20 Lakh are only 2.000.000 rows, which isn't a very big number.

Can't help with your query, though.

matthias


-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
August 13, 1961: Better a wall than a war. And, while the GDR was still 
existing,
no German troups and bombs have been killed in Yugoslavia, Afghanistan, 
Afrika...




Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Peter Geoghegan
On Mon, Sep 6, 2021 at 9:21 AM Laurenz Albe  wrote:
> #define BYPASS_THRESHOLD_PAGES  0.02/* i.e. 2% of rel_pages */
>
> So up to an additional 2% of all pages can have the all-visible bit
> unset with "index_cleanup = auto".
>
> That is probably not worth worrying, right?

I don't think it's worth worrying about. I would say that, since I
chose the exact threshold myself. The threshold was a bit arbitrary,
of course.

Note that Daniel's example had a non-HOT update, even though it's the
kind of update that we imagine can use HOT (because it didn't modify
an indexed column). He could have ensured a HOT update by lowering
heap fill factor, but why should that be necessary if updates are rare
anyway?

The bypass-index-vacuuming feature may have had a bit of a messaging
problem. It was something we usually talked about as being about
skipping index vacuuming, because that's what it actually does.
However, the feature isn't really about doing less work during VACUUM.
It's actually about doing *more* work during VACUUM -- more useful
work. Especially setting visibility map bits. But also freezing. Now
you can very aggressively tune VACUUM to do these things more often,
with no fear of that being way too expensive because of index
vacuuming that has only marginal value.

The threshold is not so much about any one VACUUM operation -- you
have to think about the aggregate effect on the table over time. Most
individual tables will never have the new optimization kick in even
once, because the workload just couldn't possibly allow it -- the 2%
threshold is vastly exceeded every single time. The cases that it
actually applies to are pretty much insert-only tables, perhaps with
some HOT updates. 100% clean inserts are probably very rare in the
real world. I believe that it's *vastly* more likely that such a table
will have pages that are ~98%+ free of LP_DEAD line pointers in heap
pages (i.e., the thing that BYPASS_THRESHOLD_PAGES applies to). To get
to 100% you cannot allow even one single insert transaction to abort
since the last VACUUM.

If you assume that BYPASS_THRESHOLD_PAGES is actually too low for your
workload (which is the opposite problem), then it doesn't matter very
much. The feature as coded should still have the desired effect of
skipping index vacuuming in *most* cases where it's unnecessary
(however you happen to define "unnecessary") -- the number of pages
with LP_DEAD items will naturally increase over time without index
vacuuming, until the threshold is crossed. Maybe still-unnecessary
index vacuuming will still take place in 1 out of 5 cases with the
feature. This is still much better than 5 out of 5. More importantly,
you can now aggressively tune vacuuming without noticeably increasing
the number of individual vacuums that still have the problem of
unnecessary index vacuuming. So if you go from 5 vacuums per day to 20
through tuning alone, the number of vacuum operations that do
unnecessary index vacuuming doesn't increase at all (except perhaps
due to rounding effects).

-- 
Peter Geoghegan




Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Peter Geoghegan
On Mon, Sep 6, 2021 at 8:59 AM Tom Lane  wrote:
> Assuming that that choice was made appropriately, I think the advice you
> propose here will just cause people to waste lots of cycles on VACUUM
> runs that have only marginal effects.

Right. The advice that they should receive (if any) is to tune
autovacuum aggressively, and enable autovacuum log output. The log
output reports on whether or not the implementation applied the
optimization in each case.

As I pointed out to Laurenz just now, users that care about index-only
scans are actually the big beneficiaries here. Now they can set
autovacuum_vacuum_insert_threshold very aggressively, without doing a
useless round of index vacuuming just because one inserting
transaction out of a million aborted. Once indexes are removed from
the equation (to the extent that that makes sense), each round of
vacuuming by autovacuum only needs to do work that is proportional to
the number of unset-in-vm heap pages.

I believe that that trade-off makes a lot of sense. Autovacuum has
little chance of keeping anything like 100% of all pages set in the VM
anyway. But it can get a lot closer to it in some cases now.

-- 
Peter Geoghegan




Re: Behavior change in PostgreSQL 14Beta3 or bug?

2021-09-06 Thread Peter Geoghegan
On Mon, Sep 6, 2021 at 7:52 AM Daniel Westermann (DWE)
 wrote:
> >Try running vacuum with index cleanup = on.
>
> Thank you, Peter

Thanks for testing!

-- 
Peter Geoghegan




Choosing an index on partitioned tables.

2021-09-06 Thread Tim Uckun
I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column  As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?




Re: Choosing an index on partitioned tables.

2021-09-06 Thread Tim Uckun
Hi Brent.

I looked at timescaledb. It does make partitioning on date ranges
automatic  which is awesome and as you said it does add a couple of extra
features on top of postgres but their cloud offering are much more
expensive than buying a generic postgres instance from AWS. A generic
t3.medium on amazon costs ~90 dollars per month in singapore and ~140 per
month from timescale.

Is it really worth the extra expense?



On Tue, Sep 7, 2021 at 4:06 PM Brent Wood  wrote:

> Hi Tim,
>
> I've had good success with TimescaleDB for large timesries databases (40b
> readings).
> https://www.timescale.com/
>
> You turn your timestamp table into a Timescale hypertable and it looks
> after the indexing and partitioning automatically, with the table accessed
> like a normal postgres table, but very quickly.
>
> It also adds some SQL functions to add a bit of time based query
> functionality.
>
>
> Cheers
>
> Brent Wood
>
> Principal Technician, Fisheries
> NIWA
> DDI:  +64 (4) 3860529
>
> --
> *From:* Tim Uckun 
> *Sent:* Tuesday, September 7, 2021 15:44
> *To:* pgsql-general 
> *Subject:* Choosing an index on partitioned tables.
>
> I have a series of tables which are going to be queries mostly on two
> columns. A timestamp table and a metric type column.
>
> My plan is to partition by date ranges which means the primary key has
> to include the timestamp column and the id column  As far as I know
> there is no way to specify an index type for those columns.
>
> The metric type is a text column and will not be very selective. It
> will have somewhere around 200 types of metrics and they will all be
> short, less than ten characters.
>
> Given that there will be a lot of records I was wondering what type of
> index would be ideal for that column. Seems like hash indexes would be
> ideal because only comparison will be = and they are smaller than
> Btrees but for a while they were not recommended.
>
> Would hash be the best or would something work better?
>
>
> 
> Brent Wood
> Principal Technician - GIS and Spatial Data Management
> Programme Leader - Environmental Information Delivery
> +64-4-386-0529
>
> National Institute of Water & Atmospheric Research Ltd (NIWA)
> 301 Evans Bay Parade Hataitai Wellington New Zealand
> *Connect with NIWA:* niwa.co.nz  Facebook
>  LinkedIn
>  Twitter
>  Instagram
>  To ensure compliance with legal
> requirements and to maintain cyber security standards, NIWA's IT systems
> are subject to ongoing monitoring, activity logging and auditing. This
> monitoring and auditing service may be provided by third parties. Such
> third parties can access information transmitted to, processed by and
> stored on NIWA's IT systems
>