Re: pg_stat_statements oddity with track = all

2020-12-01 Thread Nikolay Samokhvalov
On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud  wrote:

> Hi,
>
> Someone raised an interested point recently on pg_stat_kcache extension for
> handling nested statements, which also applies to pg_stat_statements.
>
...

> The only idea I have for that is to add a new field to entry key, for
> instance
> is_toplevel.


This particular problem often bothered me when dealing with
pg_stat_statements contents operating under "track = all" (especially when
performing the aggregated analysis, like you showed).

I think the idea of having a flag to distinguish the top-level entries is
great.


> The immediate cons is obviously that it could amplify quite a lot
> the number of entries tracked, so people may need to increase
> pg_stat_statements.max to avoid slowdown if that makes them reach frequent
> entry eviction.
>

If all top-level records in pg_stat_statements have "true" in the new
column (is_toplevel), how would this lead to the need to increase
pg_stat_statements.max? The number of records would remain the same, as
before extending pg_stat_statements.


Re: pg_stat_statements oddity with track = all

2020-12-02 Thread Nikolay Samokhvalov
On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud  wrote:

> On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:
> > If all top-level records in pg_stat_statements have "true" in the new
> > column (is_toplevel), how would this lead to the need to increase
> > pg_stat_statements.max? The number of records would remain the same, as
> > before extending pg_stat_statements.
>
> If the same query is getting executed both at top level and as a nested
> statement, two entries will then be created.  That's probably unlikely for
> things like RI trigger queries, but I don't know what to expect for client
> application queries.
>

Right, but this is how things already work. The extra field you've proposed
won't increase the number of records so it shouldn't affect how users
choose pg_stat_statements.max.


Re: Commitfest 2020-11 is closed

2020-12-03 Thread Nikolay Samokhvalov
On Wed, Dec 2, 2020 at 2:36 PM Andrew Dunstan  wrote:

>
> On 12/2/20 5:13 PM, Thomas Munro wrote:
> >
> > I'm experimenting with Github's built in CI.  All other ideas welcome.
>
>
> I'd look very closely at gitlab.
>

+1.

Why:
- having a great experience for more than 2 years
- if needed, there is an open-source version of it
- it's possible to set up your own CI [custom] runners even when you're
working with their SaaS
- finally, it's on Postgres itself


Re: Add important info about ANALYZE after create Functional Index

2020-10-26 Thread Nikolay Samokhvalov
On Mon, Oct 26, 2020 at 3:46 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> It would seem preferable to call the lack of auto-analyzing after these
> operations a bug and back-patch a fix that injects an analyze side-effect
> just before their completion.  It doesn't have to be smart either,
> analyzing things even if the created (or newly validated) index doesn't
> have statistics of its own isn't a problem in my book.
>

+1 to consider it as a major problem of CREATE INDEX [CONCURRENTLY] for
indexes on expressions, it's very easy to forget what I've observed many
times.

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

And another question: how ANALYZE needs to be run? If it's under the
user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

Thanks,
Nik


Re: Add important info about ANALYZE after create Functional Index

2020-10-26 Thread Nikolay Samokhvalov
On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Monday, October 26, 2020, Nikolay Samokhvalov 
> wrote:
>>
>> Although, this triggers a question – should ANALYZE be automated in, say,
>> pg_restore as well?
>>
>
> Independent concern.
>

It's the same class of issues – after we created some objects, we lack
statistics and willing to automate its collection. If the approach is
automated in one case, it should be automated in the others, for
consistency.

And another question: how ANALYZE needs to be run? If it's under the
>> user's control, there is an option to use vacuumdb --analyze and benefit
>> from using -j to parallelize the work (and, in some cases, benefit from
>> using --analyze-in-stages). If we had ANALYZE as a part of building indexes
>> on expressions, should it be parallelized to the same extent as index
>> creation (controlled by max_parallel_maintenance_workers)?
>>
>
> None of that seems relevant here.  The only relevant parameter I see is
> what to specify for “table_and_columns”.
>

I'm not sure I follow.

Thanks,
Nik


Re: Add important info about ANALYZE after create Functional Index

2020-10-27 Thread Nikolay Samokhvalov
On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

> Would be nice if add some information about it into our docs but not sure
> where. I'm thinking about:
> - doc/src/sgml/ref/create_index.sgml
> - doc/src/sgml/maintenance.sgml (routine-reindex)
>

Attaching the patches for the docs, one for 11 and older, and another for
12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

I still think that automating is the right thing to do but of course, it's
a much bigger topic that a quick fix dor the docs.
From 7e846e6864b48be51bb0afee455f853debe10cb9 Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov 
Date: Tue, 27 Oct 2020 06:26:50 +
Subject: [PATCH 1/2] Rebuilding indexes on expressions requires ANALYZE

It is critically important to run ANALYZE before dropping the old index.
This is only relevant to Postgres version up to 11 because which do not have
REINDEX CONCURRENTLY.
---
 doc/src/sgml/maintenance.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 3b649575e9..c430207deb 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -895,7 +895,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
option can instead be recreated that way. If that is successful and the
resulting index is valid, the original index can then be replaced by
the newly built one using a combination of 
-   and . When an index is used to enforce
+   and . In the case of rebuilding an index on
+   an expression, it is important to run ANALYZE on the
+   table before dropping the original one. When an index is used to enforce
uniqueness or other constraints,  might
be necessary to swap the existing constraint with one enforced by
the new index. Review this alternate multistep rebuild approach
-- 
GitLab


From 58fef5bca6c38e5c6d6ff23896c5f7130631b244 Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov 
Date: Tue, 27 Oct 2020 00:03:34 -0700
Subject: [PATCH 2/2] Building indexes on expressions requires ANALYZE

It is critically important to run ANALYZE after an index on an
expression is created.

- add a not to "Indexes on Expressions"
- mention that ANALYZE is needed when performing index maintenance
for indexes on expressions (this makes sense only Postgres versions
11 and older, since newer versions have support of REINDEX CONCURRENTLY
that doesn't suffer from lack of ANALYZE)
---
 doc/src/sgml/indices.sgml | 9 +
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 210a9e0adf..41adbbd4ad 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -734,6 +734,15 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
query.  Thus, indexes on expressions are useful when retrieval speed
is more important than insertion and update speed.
   
+
+  
+   Note
+   
+Once an index on an expression is successfuly created, it is important to
+run ANALYZE on the corresponding table to gather
+statistics for the expression.
+   
+  
  
 
From c6ec56913d933f328f4e54ac1ab7123a02d4bccf Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov 
Date: Tue, 27 Oct 2020 07:01:34 +
Subject: [PATCH] Building indexes on expressions requires ANALYZE

It is critically important to run ANALYZE after an index on an
expression is created.
---
 doc/src/sgml/indices.sgml | 9 +
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 671299ff05..bb5d7dfdd5 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -741,6 +741,15 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
query.  Thus, indexes on expressions are useful when retrieval speed
is more important than insertion and update speed.
   
+
+  
+   Note
+   
+Once an index on an expression is successfuly created, it is important to
+run ANALYZE on the corresponding table to gather
+statistics for the expression.
+   
+  
  


Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-06 Thread Nikolay Samokhvalov
On Mon, Dec 6, 2021 at 8:16 PM Dilip Kumar  wrote:

> If the subtransaction cache is overflowed in some of the transactions
> then it will affect all the concurrent queries as they need to access
> the SLRU for checking the visibility of each tuple.  But currently
> there is no way to identify whether in any backend subtransaction is
> overflowed or what is the current active subtransaction count.


I think it's a good idea – had the same need when recently researching
various issues with subtransactions [1], needed to patch Postgres in
benchmarking environments. To be fair, there is a way to understand that
the overflowed state is reached for PG 13+ – on standbys, observe reads in
Subtrans in pg_stat_slru. But of course, it's an indirect way.

I see that the patch adds two new columns to pg_stat_activity:
subxact_count and subxact_overflowed. This should be helpful to have.
Additionally, exposing the lastOverflowedXid value would be also good for
troubleshooting of subtransaction edge and corner cases – a bug recently
fixed in all current versions [2] was really tricky to troubleshoot in
production because this value is not visible to DBAs.

[1]
https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful
[2] https://commitfest.postgresql.org/36/3399/


Re: pspg pager is finished

2021-03-20 Thread Nikolay Samokhvalov
On Fri, Mar 19, 2021 at 20:35 Pavel Stehule  wrote:

> Hi
>
> I finished work on pspg.
>
> https://github.com/okbob/pspg
>

Thank you, Pavel. I use it always when possible, and highly recommend it to
others.

> 
>
Nik


Re: lastOverflowedXid does not handle transaction ID wraparound

2021-10-25 Thread Nikolay Samokhvalov
On Thu, Oct 21, 2021 at 07:21 Stan Hu  wrote:

> On Wed, Oct 20, 2021 at 9:01 PM Kyotaro Horiguchi
>  wrote:
> >
> > lastOverflowedXid is the smallest subxid that possibly exists but
> > possiblly not known to the standby. So if all top-level transactions
> > older than lastOverflowedXid end, that means that all the
> > subtransactions in doubt are known to have been ended.
>
> Thanks for the patch! I verified that it appears to reset
> lastOverflowedXid properly.
>

Is it right time to register the patch in the current commit fest, right?
(How to do that?)

On a separate note, I think it would be really good to improve
observability for SLRUs -- particularly for Subtrans SLRU and this
overflow-related aspects.  pg_stat_slru added in PG13 is really helpful,
but not enough to troubleshoot, analyze and tune issues like this, and the
patches related to SLRU. Basic ideas:
- expose to the user how many pages are currently used (especially useful
if SLRU sizes will be configurable, see
https://commitfest.postgresql.org/34/2627/)
- Andrew Borodin also expressed the idea to extend pageinspect to allow
seeing the content of SLRUs
- a more specific thing: allow seeing lastOverflowedXid somehow (via SQL or
in logs) - we see how important it for standbys health, but we cannot see
it now.

Any ideas in the direction of observability?

Nik

>


Re: lastOverflowedXid does not handle transaction ID wraparound

2021-11-01 Thread Nikolay Samokhvalov
On Mon, Oct 25, 2021 at 11:41 AM Nikolay Samokhvalov 
wrote:

> On Thu, Oct 21, 2021 at 07:21 Stan Hu  wrote:
>
>> On Wed, Oct 20, 2021 at 9:01 PM Kyotaro Horiguchi
>>  wrote:
>> >
>> > lastOverflowedXid is the smallest subxid that possibly exists but
>> > possiblly not known to the standby. So if all top-level transactions
>> > older than lastOverflowedXid end, that means that all the
>> > subtransactions in doubt are known to have been ended.
>>
>> Thanks for the patch! I verified that it appears to reset
>> lastOverflowedXid properly.
>
> ...

> Any ideas in the direction of observability?
>

Perhaps, anything additional should be considered separately.

The behavior discussed here looks like a bug.

I also have tested the patch. It works fully as expected, details of
testing – below.

I think this is a serious bug hitting heavily loaded Postgres setups with
hot standbys
 and propose fixing it in all supported major versions ASAP since the fix
looks simple.

Any standby in heavily loaded systems (10k+ TPS) where subtransactions are
used
may experience huge performance degradation on standbys [1]. This is what
happened
recently with GitLab [2]. While a full solution to this problem is
something more complex, probably
requiring changes in SLRU [3], the problem discussed here definitely feels
like a serious bug
– if we fully get rid of subtransactions, since 32-bit lastOverflowedXid is
not reset, in new
XID epoch standbys start experience SubtransControlLock/SubtransSLRU again
–
without any subtransactions. This problem is extremely difficult to
diagnose on one hand,
and it may fully make standbys irresponsible while a long-lasting
transaction last on the primary
("long" here may be a matter of minutes or even dozens of seconds – it
depends on the
TPS level). It is especially hard to diagnose in PG 12 or older – because
it doesn't have
pg_stat_slru yet, so one cannot easily notice Subtrans reads.)

The only current solution to this problem is to restart standby Postgres.

How I tested the patch. First, I reproduced the problem:
- current 15devel Postgres, installed on 2 x c5ad.2xlarge on AWS (8 vCPUs,
16 GiB), working as
primary + standby
- follow the steps described in [3] to initiate SubtransSLRU on the standby
- at some point, stop using SAVEPOINTs on the primary - use regular UPDATEs
instead, wait.

Using the following, observe procArray->lastOverflowedXid:

diff --git a/src/backend/storage/ipc/procarray.c
b/src/backend/storage/ipc/procarray.c
index
bd3c7a47fe21949ba63da26f0d692b2ee618f885..ccf3274344d7ba52a6f28a10b08dbfc310cf97e9
100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2428,6 +2428,9 @@ GetSnapshotData(Snapshot snapshot)
  subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
   xmax);

+if (random() % 10 == 0)
+elog(WARNING, "procArray->lastOverflowedXid: %u",
procArray->lastOverflowedXid);
+
  if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
  suboverflowed = true;
  }

Once we stop using SAVEPOINTs on the primary, the
value procArray->lastOverflowedXid stop
 changing, as expected.

Without the patch applied, lastOverflowedXid remains constant forever –
till the server restart.
And as I mentioned, we start experiencing SubtransSLRU and pg_subtrans
reads.

With the patch, lastOverflowedXid is reset to 0, as expected, shortly after
an ongoing "long"
the transaction ends on the primary.

This solves the bug – we don't have SubtransSLRU on standby without actual
use of subtransactions
on the primary.

[1]
https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful
[2]
https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/
[3]
https://www.postgresql.org/message-id/flat/494C5E7F-E410-48FA-A93E-F7723D859561%40yandex-team.ru#18c79477bf7fc44a3ac3d1ce55e4c169
[4]
https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/21


Re: lastOverflowedXid does not handle transaction ID wraparound

2021-11-01 Thread Nikolay Samokhvalov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

The fix is trivial and works as expected, solving the problem

Tested, described details of the testing in the email thread.

Re: lastOverflowedXid does not handle transaction ID wraparound

2021-11-02 Thread Nikolay Samokhvalov
On Mon, Nov 1, 2021 at 11:55 PM Nikolay Samokhvalov 
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed


Please ignore this – I didn't understand the UI.


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Nikolay Samokhvalov
On Tue, Nov 2, 2021 at 8:55 AM Robert Haas  wrote:

> I think shipping with log_checkpoints=on and
> log_autovacuum_min_duration=10m or so would be one of the best things
> we could possibly do to allow ex-post-facto troubleshooting of
> system-wide performance issues.
>

Fully agree, it would be really great. Glad that you added autovacuum
logging
into the picture.

Maybe, 1m would be better – I recently observed a system with
high TPS, where autoANALYZE took very long for a system, causing
huge slowdown on standbys, starting after a few minutes of ANALYZE launched.
Correlation and then causation was confirmed only thanks to
log_autovacuum_min_duration configured -- without it, no one would be
able to understand what happened.

Back to checkpoint logging. With log_checkpoints = off, and high write
activity
with low max_wal_size we're already "spamming" the logs with lots of
"checkpoints are occurring too frequently" – and this happens very often,
any DBA running a restore process on Postgres with default max_wal_size
(1GB, very low for modern DBs) saw it.

Without details, this definitely looks like "spam" – and it's already
happening
here and there. Details provided by  log_checkpoints = on will give
something
leading to making the decision on max_wal_size reconfiguration based on
data,
not guesswork.

+1 for  log_checkpoints = on
and +1 for log_autovacuum_min_duration = 1m or so.


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Nikolay Samokhvalov
On Tue, Nov 2, 2021 at 11:50 AM Robert Haas  wrote:

> I almost proposed 1m rather than 10m, but then I thought the better of
> it. I think it's unlikely that an autovacuum that takes 1 minute is
> really the cause of some big problem you're having on your system.
> Typical problem cases I see are hours or days long, so even 10 minutes
> is pretty short.
>

I'm talking about the autoANALYZE part, not VACUUM. In my case, it was a
few tables
~100GB-1TB in size, with 1-2 GIN indexes (with fastupdate, default pending
list size limit, 4MB),
10 workers with quite high bar in terms of throttling. And
default_statistics_target = 1000.
Observed autoANALYZE timing reached dozens of minutes, sometimes ~1 hour
for a table.
The problem is that, it looks, ANALYZE (unlike VACUUM) holds snapshot,
takes XID -- and it
all leads to the issues on standbys, if it takes so long. I'm going to post
the findings in a separate
thread, but the point is that autoANALYZE running minutes *may* cause big
performance issues.
That's why 1m seems a good threshold to me, even if leads to having 3 log
entries per minute from
3 workers. It's a quite low log traffic, but the data there is really
useful for retrospective analysis.


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Nikolay Samokhvalov
On Tue, Nov 2, 2021 at 5:02 PM Tom Lane  wrote:

> I'm still of the position that the default ought to be that a
> normally-functioning server generates no ongoing log output.
> Only people who have got Nagios watching their logs, or some
> such setup, are going to want anything different.  And that is
> a minority use-case.  There are going to be way more people
> bitching because their postmaster log overflowed their disk
> than there will be people who are happier because you made
> such output the default.  (Don't forget that our default
> logging setup does not rotate the logs.)
>

Is it known how many new Postgres installations are from
popular packages (that have log rotation enabled) compared
to custom-built and managed in their own way?

If people do not use packages these days, they should take
care of themselves – it includes log rotation and, for example,
autostart. The same people who might complain of overflowed
disks should already be complaining about Postgres not surviving
machine restarts, right?

Nik


Re: Commitfest 2021-11 Patch Triage - Part 2

2021-11-12 Thread Nikolay Samokhvalov
On Tue, Nov 9, 2021 at 7:02 AM Tom Lane  wrote:

> Daniel Gustafsson  writes:
> > 2773: libpq compression
> > ===
> > This patch intended to provide libpq connection compression to "replace
> SSL
> > compression" which was doomed when the patch was written, and have since
> been
> > removed altogether.  The initial approach didn't get much traction but
> there
> > was significant discussion and work, which has since fizzled out.  The
> patch
> > has been updated but there hasn't been meaningful review the past
> months, the
> > last comments seem to imply there being a fair amount of questionmarks
> left in
> > here.  Robert, having been very involved in this do you have any
> thoughts on
> > where we are and where to go (if at all IYO)?
>
> I'm not Robert, but I still have an opinion here, and that it's that this
> feature would at best be an attractive nuisance.  If you need compression
> on a database session, it probably means that the connection is over the
> open internet, which means that you need encryption even more.
>

This assumption is very vague. I personally had multiple cases when network
bandwidth for app <--> Postgres communication, that were fixed wither via
upgrades (spending more money) or making application developers cache more
data (optimization), or, usually, both. Never public internet connections
were involved.

Actually, any growing startup experiences such issues sooner or later.
Network compression would be a great tool for many setups, and some other
popular database systems offer it for a long.


BUFFERS enabled by default in EXPLAIN (ANALYZE)

2021-11-12 Thread Nikolay Samokhvalov
Re-reading the thread [1] (cannot answer there – don't have those emails in
my box anymore), I see that there was strong support for enabling BUFFERS
in EXPLAIN ANALYZE by default. And there were patches. Commitfest entry [2]
was marked Rejected because there were questions to the implementation
based on GUCs.

Attached is a simple patch enabling BUFFERS by default, without involving
GUCs.

Why it is important?

In many cases, people forget about the BUFFERS option in EXPLAIN ANALYZE
and share execution plans without it – sending it via regular communication
channels for work or publishing to visualization systems. Meanwhile, the
option has a lower overhead compared to TIMING (enabled by default for
EXPLAIN ANALYZE) and it is extremely useful for query
optimization. This patch doesn't enable BUFFERS for EXPLAIN executed
without ANALYZE.

Open questions:

1. Should BUFFERS be enabled for regular (w/o ANALYZE) EXPLAIN? Now it may
make sense because of the buffer numbers the planner uses. This patch
doesn't do that, but it definitely may make sense because it can help
people understand why planning time is so big, in some cases.

2. How to adjust documentation? Should EXPLAIN ANALYZE examples be adjusted
to use BUFFERS OFF (easier change) or show some example buffer numbers –
like it is done for timing and cost numbers? I tend to think that the
latter makes more sense.

3. How to adjust regression tests? Of course, now many tests fail. Same
question as for documentation. Excluding buffer, numbers would be an easier
fix, of course – but at least some tests should
check the behavior of BUFFERS (such as both default options – with and
without ANALYZE).
On any given platform, the buffer numbers are pretty stable, so we could
rely on it, but I'm not sure about all possible options being tested and
would appreciate advice here (of course, if the patch makes it thru the
discussion in general).


## Links
[1]
https://www.postgresql.org/message-id/flat/b3197ba8-225f-f53c-326d-5b1756c77c3e%40postgresfriends.org
[2] https://commitfest.postgresql.org/28/2567/


001-buffers-in-explain-analyze-enabled-by-default.patch
Description: Binary data


Re: idea - custom menu

2018-03-26 Thread Nikolay Samokhvalov
On Sat, Mar 3, 2018 at 1:08 PM, Pavel Stehule 
wrote:

> Hi
>

Hi Pavel, I don't know how I missed this email, but finally I found it :-)


> I am looking on project https://github.com/NikolayS/postgres_dba
>
> Nice idea, and perfect publicity of typical PostgreSQL maintenance task.
>

Thank you! It's in very early stage, I'm going to polish existing reports
and add more soon.


>
> We can allow to call some external applications on some key press (not
> command) and evaluate their output. mc has F2 for User menu. It can works
> well with pdmenu or some similar. Some simple menu can be done on we lines
> with ncurses.
>

> It can be substitution of some custom commands.
>
> Comments, notes?
>

This is a great idea. Adding more reports means that menu will be very
long. I considered having two layers of menu, but it will become less
convenient. First of all because of necessity to press  after typing
each command. If user could just press a single button combination, it
would make working with such thing much more convenient. I would also bind
some key to allow repeating the last report without going to the main menu
at all. Had a look at pdmenu – well, if it could be possible to use it for
postgres_dba, it would be awesome. Does it work with MacOS, by the way?

Also (different story), all new Postgres/psql users hate its output for
wide result sets. Making your pspg more "available" for end users would
help a lot here – is it possible to ship it with psql itself, at least as
an optional feature? I don't know anyone who likes default output of wide
tables in psql. The pspg is very underestimated.

And one more thing (also a different story completely): wouldn't it be good
to have ability to connect psql to remote Postgres over ssh tunnel just
with some option in psql? Currently, users need to setup an SSH tunnel with
"ssh" command first, and only then connect psql, but I see that for new
users, it's difficult to manage such thing, some of them end up opening
Postgres to the world (it was mentioned recently in some article that at
least 700k Postgres instances were found with port 5432 available for the
world, so ssh tunneling option would be great in psql).

And finally: several people asked me to use colors in reports, but it's not
possible currently in psql – I can use color output for "\echo" command
now, but I cannot do anything like that for the server's response. For
instance, it would be nice to make some numbers colored/bold if they are
above/below some threshold value, for specific column in the server's
output. It could be \filter command specifying the rules of the output's
post-processing. Can ncurses be useful for such thing?

Regards,
Nikolay


Re: Built-in connection pooling

2018-04-13 Thread Nikolay Samokhvalov
On Fri, Apr 13, 2018 at 2:59 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:
>
> Development in built-in connection pooling will be continued in
> https://github.com/postgrespro/postgresql.builtin_pool.git
> I am not going to send new patches to hackers mailing list any more.
>

Why?


Re: Built-in connection pooling

2018-04-17 Thread Nikolay Samokhvalov
Understood.

One more question. Have you considered creation of pooling tool as a
separate, not built-in tool, but being shipped with Postgres — like psql is
shipped in packages usually called “postgresql-client-XX” which makes psql
the default tool to work in terminal? I constantly hear opinion from
various users, that Postgres needs “default”/official pooling tool.

вт, 17 апр. 2018 г. в 0:44, Konstantin Knizhnik :

>
>
> On 13.04.2018 19:07, Nikolay Samokhvalov wrote:
>
> On Fri, Apr 13, 2018 at 2:59 AM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>>
>> Development in built-in connection pooling will be continued in
>> https://github.com/postgrespro/postgresql.builtin_pool.git
>> I am not going to send new patches to hackers mailing list any more.
>>
>
> Why?
>
>
> Just do not want to spam hackers with a lot of patches.
> Also since I received few feedbacks in this thread, I consider that this
> topic is not so interesting for community.
>
> Please notice that built-in connection pool is conn_pool branch.
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-19 Thread Nikolay Samokhvalov
+1 for the configuration option. Otherwise, migration is a nightmare -- so
many CTEs were written specifically to use the "optimization fence"
behavior. The lack of such configuration options is now a "migration fence".

On Sat, Oct 19, 2019 at 2:49 AM Colin Watson  wrote:

> On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote:
> > > "Michael" == Michael Paquier  writes:
> >  > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
> >  >> However, an alternative would be to backport the new syntax to some
> >  >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be
> >  >> synonymous with "WITH ... AS" in versions prior to 12; there's no
> >  >> need to support "NOT MATERIALIZED" since that's explicitly
> >  >> requesting the new query-folding feature that only exists in 12.
> >  >> Would something like the attached patch against REL_11_STABLE be
> >  >> acceptable? I'd like to backpatch it at least as far as PostgreSQL
> >  >> 10.
> >
> >  Michael> I am afraid that new features don't gain a backpatch. This is
> >  Michael> a project policy. Back-branches should just include bug fixes.
> >
> > I do think an argument can be made for making an exception in this
> > particular case. This wouldn't be backpatching a feature, just accepting
> > and ignoring some of the new syntax to make upgrading easier.
>
> Right, this is my position too.  I'm explicitly not asking for
> backpatching of the CTE-inlining feature, just trying to cope with the
> fact that we now have to spell some particular queries differently to
> retain the performance characteristics we need for them.
>
> I suppose an alternative would be to add a configuration option to 12
> that allows disabling inlining of CTEs cluster-wide: we could then
> upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant
> queries, and then re-enable inlining.  But I like that less because it
> would end up leaving cruft around in PostgreSQL's configuration code
> somewhat indefinitely for the sake of an edge case in upgrading to a
> particular version.
>
> --
> Colin Watson[cjwat...@canonical.com]
>
>
>


Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-19 Thread Nikolay Samokhvalov
On Sat, Oct 19, 2019 at 8:11 AM Isaac Morland 
wrote:

> That embeds a temporary hack in the application code indefinitely.
>

Or postpone the migration indefinitely. I saw so many times how migration
in large companies was postponed because of similar "small" issues -- when
the code base is large, it's easier for managers to say something like "no,
we will better live without cool new features for a couple of more years
than put our systems at risk due to lack of testing".

Nobody invented an excellent way to test production workloads in
non-production environments yet. I know it very well because I'm also
working in this direction for a couple of years. So this feature (a great
one) seems to me as a major roadblock for DBAs and developers who would
like to migrate to 12 and have better performance and all the new features.
Ironically, including this one for the new or the updated code!

If you need to patch all your code adding "AS MATERIALIZED" (and you will
need it if you want to minimize risks of performance degradation after the
upgrade), then it's also a temporary hack. But much, much more expensive in
implementation in large projects, and sometimes even not possible.

I do think that the lack of this configuration option will prevent some
projects from migration for a long time.

Correct me if I'm wrong. Maybe somebody already thought about migration
options here and have good answers? What is the best way to upgrade if you
have dozens of multi-terabyte databases, a lot of legacy code and workloads
where 1 minute of downtime or even performance degradation would cost a lot
of money so it is not acceptable? What would be the good answers here?


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-03-18 Thread Nikolay Samokhvalov
Hello

On Sat, Mar 16, 2019 at 7:32 AM Robert Haas  wrote:

> Also, I think this is now the third independent request to expose
> query ID in pg_stat_statements.  I think we should give the people
> what they want.
>

Count me as the 4th.

This would be a very important feature for automated query analysis.
pg_stat_statements lacks query examples, and the only way to get them is
from the logs.
Where we don't have queryid as well. So people end up either doing it
manually or writing
yet another set of nasty regular expressions.

Routing query analysis s a crucial for any large project. If there are
chances to implement
queryid for pg_stat_activity (or anything that will allow to automate query
analysis)
in Postgres 12 or later -- this would be a great news and huge support for
engineers.
Same level as recently implemented sampling for statement logging.

By the way, if queryid goes to the core someday, I'm sure it is worth to
consider using
it in logs as well.

Thanks,
Nik


Re: PostgreSQL and Real Application Testing (RAT)

2019-08-27 Thread Nikolay Samokhvalov
On Tue, Aug 27, 2019 at 3:47 AM ROS Didier  wrote:

> Hi
>
>
>
> In my business, one of the things blocking the migration from Oracle to
> PostgreSQL is not having the equivalent of Oracle Real Application Testing .
>
> This product captures a charge in production and replay it in a test
> environment.
>
> this allows to know the impacts of a migration to a newer version, the
> creation of an index..
>
> is there an equivalent in the PostgreSQL community?
>
> if not, do you think it's technically possible to do it ?
>
> who would be interested in this project ?
>

Replaying workload might or might not apply well to your case.

There are several major difficulties if you want to replay workload:

1) How to "record" workload. You need to write all your queries to the
Postgres log. Three problems here:
  1a) pgreplay expects log_statements = 'all' while you might prefer
dealing with log_min_duration_statement instead. This is a minor issue
though, quite easy to solve with preprocessing.
  1b) under heavy load, log_min_duration_statement = 0 (or log_statements =
'all') will lead to performance degradation or even downtime. Possible
solutions are: write to memory, or don't write at all but send over the
network.
  1c) ideally, recoding just queries is not enough. To replay workload "as
is", we need to replay queries with known plans. There is no easy solution
to this problem in the Postgres ecosystem yet.

A couple of additional points regarding item 1b and 1c. In Postgres 12,
there is a cool new capability: sampling for query logging,
implemented by Adrien
Nayrat https://commitfest.postgresql.org/20/1691/  WIth this, it will be
possible to fully log, say, 5% of all transactions and use it for
replaying. Moreover, with auto_explain, it will be possible to have plans!
Open questions are: (a) how to determine, if N% is enough, and (b) how to
replay with specified plans. [If anyone is interested in working in this
direction – please reach out to me.]

2) Issues with replaying itself. I can highlight at least two problems here:
  2a) pgreplay might be not enough for your workload, it doesn't scale
well. If interested, look at its analog written in Go,
https://github.com/gocardless/pgreplay-go, but this is quite a young
project.
  2b) Postgres logs have millisecond precision (if you switched from %t to
%m in log_line_prefix), this might be not enough. There is a patch to
microsecond precision from David Fetter
https://www.postgresql.org/message-id/flat/20181023185050.GE6049%40fetter.org,
but that conversation hasn't yet led to commit.

Another approach you might be interested in -- workload simulation. This is
what we (Postgres.ai) now used in most times when building "lab"
environments for our clients. The idea is as follows:
- carefully analyze workload using pg_stat_statements (here, our
open-source tool called "postgres-checkup"
https://gitlab.com/postgres-ai/postgres-checkup might be helpful, see
reports in section K),
- take the most resource-consuming query groups (Top-N ordered by
total_time),
- create a set of files with statements with randomly filled parameters
(won't work for most cases, I discuss restrictions below),
- use pgbench, feed workload files to it, using multiple -f options, with
balancing (-f filename@XX, where XX is to be taked from
pg_statements_analysis, but this time, "calls" and their ratio in the whole
workload will be needed -- again, postgres-checkup can help here).
- run, analyze, compare behavior.

Restrictions of this approach are obvious:
- doesn't work well if most of your transactions have multiple statements,
- in many cases, randomization is hard (not obvious how to organize;
synthetic approach is far from real data distribution in storage and
workload; etc),
- the approach requires a significant amount of manual efforts.

However, the "workload simulation" approach is an extremely helpful
approach in many cases, helping with change management. It doesn't require
anything that might negatively affect your production workload, it utilizes
pgbench (or any other tool) which is reliable, has great features and
scales well.

You might be interested in looking at our tool that we built to conduct a
huge amount of DB experiments, Nancy CLI
https://gitlab.com/postgres-ai/nancy. It supports both "workload replay"
method (with pgreplay) and "workload simulation" (with pgbench). PM me if
you're interested in discussing details.

Thanks,
Nik


Re: [PATCH] Increase the maximum value track_activity_query_size

2019-12-21 Thread Nikolay Samokhvalov
Here is what ORMs do:

select length('SELECT "column_name_1001", "column_name_1002",
"column_name_1003", "column_name_1004", "column_name_1005",
"column_name_1006", "column_name_1007", "column_name_1008",
"column_name_1009", "column_name_1010", "column_name_1011",
"column_name_1012", "column_name_1013", "column_name_1014",
"column_name_1015", "column_name_1016", "column_name_1017",
"column_name_1018", "column_name_1019", "column_name_1020",
"column_name_1021", "column_name_1022", "column_name_1023",
"column_name_1024", "column_name_1025", "column_name_1026",
"column_name_1027", "column_name_1028", "column_name_1029",
"column_name_1030", "column_name_1031", "column_name_1032",
"column_name_1033", "column_name_1034", "column_name_1035",
"column_name_1036", "column_name_1037", "column_name_1038",
"column_name_1039", "column_name_1040", "column_name_1041",
"column_name_1042", "column_name_1043", "column_name_1044",
"column_name_1045", "column_name_1046", "column_name_1047",
"column_name_1048", "column_name_1049", "column_name_1050" FROM
"some_table";');
 length

   1024
(1 row)

That's it – with default settings, you won't see WHERE clause or
anything else.

It is not only about analytical workloads. I see it for regular OLTP
workloads literally *any* large project that use an ORM. Ruby on Rails'
ActiveRecord does it, Java'sHibernate does, and so on.

As a result, many queries exceed track_activity_query_size, and we
end up having queries trimmed in pg_stat_activity. Why it is bad:- it
makes an automated analysis involving pg_stat_activity impossible,
it complicates any manual troubleshooting involving pg_stat_activity.

Changing this parameter in a mission-critical database is difficult
because it requires a restart.

+1 for changing it to 1M or at least to 100k. If the penalty is significant,
at least 10k.

What is the overhead here except the memory consumption?
Consumption of, say,100 * 1M = 1MiB of RAM is a low price for much
better transparency here. But what about the performance penalty?
Some benchmark would be nice to answer this.

On Fri, Dec 20, 2019 at 6:48 PM Michael Paquier  wrote:

> On Fri, Dec 20, 2019 at 08:57:04AM -0500, Bruce Momjian wrote:
> > I can imagine using larger pgstat_track_activity_query_size values for
> > data warehouse queries, where they are long and there are only a few of
> > them.
>
> Why are those queries that long anyway?  A too long IN clause with an
> insane amount of parameters which could be replaced by an ANY clause
> with an array?
> --
> Michael
>


Using old master as new replica after clean switchover

2018-10-24 Thread Nikolay Samokhvalov
Currently, the documentation explicitly states, that after failover, the
old master must be recreated from scratch, or pg_rewind should be used
(requiring wal_log_hints to be on, which is off by default):

> The former standby is now the primary, but the former primary is down and
might stay down. To return to normal operation, a standby server must be
recreated, either on the former primary system when it comes up, or on a
third, possibly new, system. The pg_rewind utility can be used to speed up
this process on large clusters.

My research shows that some people already rely on the following when
planned failover (aka switchover) procedure, doing it in production:

 1) shutdown the current master
 2) ensure that the "master candidate" replica has received all WAL data
including shutdown checkpoint from the old master
 3) promote the master candidate to make it new master
 4) configure recovery.conf on the old master node, while it's inactive
 5) start the old master node as a new replica following the new master.

It looks to me now, that if no steps missed in the procedure, this approach
is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe
not really always – people who know details better will correct me here
maybe). Am I right? Or I'm missing some risks here?

Two changes were made in 9.3 which allowed this approach in general [1]
[2]. Also, I see from the code [3] that during shutdown process, the
walsenders are the last who are stopped, so allow replicas to get the
shutdown checkpoint information.

Is this approach considered as safe now?

if so, let's add it to the documentation, making it official. The patch is
attached.

Links:
[0] 26.3 Failover
https://www.postgresql.org/docs/current/static/warm-standby-failover.html
[1] Support clean switchover
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=985bd7d49726c9f178558491d31a570d47340459
[2] Allow a streaming replication standby to follow a timeline switch
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=abfd192b1b5ba5216ac4b1f31dcd553106304b19
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/walsender.c;hb=HEAD#l276


Regards,
Nik


failover_doc.patch
Description: Binary data


Re: Using old master as new replica after clean switchover

2018-10-25 Thread Nikolay Samokhvalov
On Thu, Oct 25, 2018 at 6:03 AM Jehan-Guillaume de Rorthais 
wrote:

> What about logging the shutdown checkpoint on the old master?
> On the standby side, we could cross-check it with a function confirming:
> 1/ the very last XLogRecord received was the old master shutdown checkpoint
> 2/ the received shutdown checkpoint has the same LSN
>

Additionally, the new instructions in the doc might include recommendation,
what to do if we
found that shutdown checkpoint wasn't received and replayed by the
replica-to-promote. From my
understanding, before promotion, we could "manually" transfer missing WAL
data from the old,
inactive master and replay it on the replica-to-promote (of course, if
recovery_command is
properly configured on it). Right?

By the way, if it looks to me that it might be better to write more than
just few sentences, what if it
will be a new chapter – say, "Switchover", next to "Failover". It would
also give better understanding
to the reading, explicitly distinguishing planned and unplanned processes
of master/replica role
changes.

Regards,
Nik


Re: New GUC to sample log queries

2018-11-29 Thread Nikolay Samokhvalov
On Thu, Nov 29, 2018 at 1:49 PM Alvaro Herrera 
wrote:

> Thanks!  I pushed this with two changes -- one was to reword the docs a
> bit more, and the other was to compute in_sample only if it's going to
> be used (when exceeded is true).  I hope this won't upset any compilers ...
>

This is a great news – I can imaging how helpful this feature will be for
query analysis and
troubleshooting.

At the same time, there is an approach, when we use application (client)
code or pgbouncer's
connect_query parameter to perform sampled logging (with
log_min_duration_statement = 0)
of n% of all *sessions* or *transactions*.

If you use single-query transactions only, new parameter will do equivalent
job for you, while
significantly simplifying you life (pgbouncer is not required and you don't
need to patch application
code). However, if you have multi-statement transaction,
log_statement_sample_rate will not
be enough for troubleshooting – logging just a single statement of a
multi-statement transaction
won't really help to troubleshoot in many cases.

That being said, I wonder, does it make sense to think about extending the
functionality
just committed, with some options to to log all statements of n% of
transactions (or sessions)?
In other words, allowing to choose, at which level perform sampling –
statement, transaction, or
session?

Nik


Re: UUID v7

2024-01-21 Thread Nikolay Samokhvalov
On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin 
wrote:

>
>
> > On 19 Jan 2024, at 13:25, Andrey Borodin  wrote:
> >
> > Also, I've added some documentation on all functions.
>
> Here's v12. Changes:
> 1. Documentation improvements
> 2. Code comments
> 3. Better commit message and reviews list
>

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and
functions work well. I especially like that fact that we keep
uuid_extract_time(..) here – this is a great thing to have for time-based
partitioning, and in many cases we will be able to decide not to have a
creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in
pg17. Yes, the RFC is not fully finalized yet, but it's very close. And
many libraries are already including implementation of UUIDv7 – here are
some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Nik


Re: UUID v7

2024-01-21 Thread Nikolay Samokhvalov
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Manually tested uuidv7(), uuid_extract_time() – they work as expected. The 
basic docs provided look clear.

I haven't checked the tests though and possible edge cases, so leaving it as 
"needs review" waiting for more reviewers

Re: UUID v7

2024-01-24 Thread Nikolay Samokhvalov
On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <
sergeyprokhore...@yahoo.com.au> wrote:

> That's right! There is no point in waiting for the official approval of
> the new RFC, which obviously will not change anything. I have been a
> contributor to this RFC
> 
> for several years, and I can testify that every aspect imaginable has been
> thoroughly researched and agreed upon. Nothing new will definitely appear
> in the new RFC.
>

>From a practical point of view, these two things are extremely important to
have to support partitioning. It is better to implement limitations than
throw them away.

Without them, this functionality will be of a very limited use in
databases. We need to think about large tables – which means partitioning.

Nik


Re: UUID v7

2024-01-24 Thread Nikolay Samokhvalov
On Wed, Jan 24, 2024 at 8:40 PM Nikolay Samokhvalov  wrote:

> On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <
> sergeyprokhore...@yahoo.com.au> wrote:
>
>> That's right! There is no point in waiting for the official approval of
>> the new RFC, which obviously will not change anything. I have been a
>> contributor to this RFC
>> <https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.html#name-acknowledgements>
>> for several years, and I can testify that every aspect imaginable has been
>> thoroughly researched and agreed upon. Nothing new will definitely
>> appear in the new RFC.
>>
>
> From a practical point of view, these two things are extremely important
> to have to support partitioning. It is better to implement limitations than
> throw them away.
>
> Without them, this functionality will be of a very limited use in
> databases. We need to think about large tables – which means partitioning.
>

apologies -- this was a response to another email from you:

> "Other people" think that extracting the timestamp from UUIDv7 in
violation of the new RFC, and generating UUIDv7 from the timestamp were
both terrible and poorly thought out ideas. The authors of the new RFC had
very good reasons to prohibit this. And the problems you face are the best
confirmation of the correctness of the new RFC. It’s better to throw all
this gag out of the official patch. Don't tempt developers to break the new
RFC with these error-producing functions.


Re: [PATCH] Add pretty-printed XML output option

2023-02-21 Thread Nikolay Samokhvalov
On Mon, Feb 20, 2023 at 3:06 PM Jim Jones  wrote:

> As suggested by Peter and Nikolay, v15 now removes the xmlformat
> function from the catalog and adds the [NO] INDENT option to
> xmlserialize, as described in X069.\
>

Great. I'm checking this patch and it seems, indentation stops working if
we have a text node inside:

gitpod=# select xmlserialize(document '13' as text
indent);
  xmlserialize

 +
  +
   13 +
 +

(1 row)

gitpod=# select xmlserialize(document 'text13' as
text indent);
  xmlserialize

 +
 text13+

(1 row)

Worth to mention, Oracle behaves similarly -- indentation doesn't work:
https://dbfiddle.uk/hRz5sXdM.

But is this as expected? Shouldn't it be like this:

  text
  13

?


Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Nikolay Samokhvalov
On Wed, Feb 22, 2023 at 9:18 AM Kirk Wolak  wrote:

> Proposal:  Simply add the %T (PROMPT variable) to output the current time
> (HH24:MI:SS) into the prompt.  This has been in sqlplus since I can
> remember, and I find it really useful when I forgot to time something, or
> to review for Time spent on a problem, or for how old my session is...
>

This is a great idea, in my opinion. I usually do something involving ts to
track timestamps when executing something non-trivial via psql in
interactive (see below) or non-interactive mode.

But this is a not well-known thing to use (and ts is not installed by
default on Ubuntu, etc.) – having timestamps in prompt would be convenient.

test=> \o | ts
test=> select 1;
test=> Feb 22 09:49:49  ?column?
Feb 22 09:49:49 --
Feb 22 09:49:49 1
Feb 22 09:49:49 (1 row)
Feb 22 09:49:49


Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Nikolay Samokhvalov
On Wed, Feb 22, 2023 at 9:55 AM Tom Lane  wrote:

> On the whole I'd rather not eat more of the limited namespace for
> psql prompt codes for this.
>

It depends on personal preferences. When I work on a large screen, I can
afford to spend some characters in prompts, if it gives convenience – and
many do (looking, for example, at modern tmux/zsh prompts showing git
branch context, etc).

Default behavior might remain short – it wouldn't make sense to extend it
for everyone.


Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-23 Thread Nikolay Samokhvalov
On Thu, Feb 23, 2023 at 9:05 AM Maciek Sakrejda 
wrote:

> I think Heikki's solution is probably more practical since (1) ..


Note that these ideas target two *different* problems:
- what was the duration of the last query
- when was the last query executed

So, having both solved would be ideal.


Re: pg_upgrade and logical replication

2023-02-28 Thread Nikolay Samokhvalov
On Fri, Feb 17, 2023 at 7:35 AM Julien Rouhaud  wrote:
>
>  Any table later added in the
> publication is either already fully replicated until that LSN on the upgraded
> node, so only the delta is needed, or has been created after that LSN.  In the
> latter case, the entirety of the table will be replicated with the logical
> replication as a delta right?


What if we consider a slightly adjusted procedure?

0. Temporarily, forbid running any DDL on the source cluster.
1. On the source, create publication, replication slot and remember
the LSN for it
2. Restore the target cluster to that LSN using restore_target_lsn (PITR)
3. Run pg_upgrade on the target cluster
4. Only now, create subscription to target
5. Wait until logical replication catches up
6. Perform a switchover to the new cluster taking care of lags in sequences, etc
7. Resume DDL when needed

Do you see any data loss happening in this approach?




Re: pg_upgrade and logical replication

2023-03-01 Thread Nikolay Samokhvalov
On Tue, Feb 28, 2023 at 4:43 PM Julien Rouhaud  wrote:
>
> On Tue, Feb 28, 2023 at 08:02:13AM -0800, Nikolay Samokhvalov wrote:
> > 0. Temporarily, forbid running any DDL on the source cluster.
>
> This is (at least for me) a non starter, as I want an approach that doesn't
> impact the primary node, at least not too much.
...
> Also, how exactly would you ensure that indeed DDL were forbidden since a long
> enough point in time rather than just "currently" forbidden at the time you do
> some check?

Thanks for your response. I didn't expect that DDL part would attract
attention, my message was not about DDL... – the DDL part was there
just to show that the recipe I described is possible for any PG
version that supports logical replication.

Usually, people perform upgrades involving logical using full
initialization at logical level – at least all posts and articles I
could talk about that. Meanwhile, on one hand, for large DBs, logical
copying is hard (slow, holding xmin horizon, etc.), and on the other
hand, physical replica can be transformed to logical (using the trick
with recover_target_lsn, syncing the state with the slot's LSN) and
initialization at physical level works much better for large
databases. But there is a problem with logical replication when we run
pg_upgrade – as discussed in this thread. So I just wanted to mention
that if we change the order of actions and first run pg_upgrade, and
only then create publication, there should not be a problem anymore.




Re: Transaction timeout

2023-01-13 Thread Nikolay Samokhvalov
On Thu, Jan 12, 2023 at 11:47 AM Andrey Borodin 
wrote:

> On Thu, Jan 12, 2023 at 11:24 AM Nathan Bossart
>  wrote:
> >
> > On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote:
> > > I've rewritten this part to correctly report all timeouts that did
> > > happen. However there's now a tricky comma-formatting code which was
> > > tested only manually.
>

Testing it again, a couple of questions

1) The current test set has only 2 simple cases – I'd suggest adding one
more (that one that didn't work in v1):

gitpod=# set transaction_timeout to '20ms';
SET
gitpod=# begin; select pg_sleep(.01); select pg_sleep(.01); select
pg_sleep(.01); commit;
BEGIN
 pg_sleep
--

(1 row)

ERROR:  canceling statement due to transaction timeout


gitpod=# set statement_timeout to '20ms'; set transaction_timeout to 0; --
to test value for statement_timeout and see that it doesn't fail
SET
SET
gitpod=# begin; select pg_sleep(.01); select pg_sleep(.01); select
pg_sleep(.01); commit;
BEGIN
 pg_sleep
--

(1 row)

 pg_sleep
--

(1 row)

 pg_sleep
--

(1 row)

COMMIT


2) Testing for a longer transaction (2 min), in a gitpod VM (everything is
local, no network involved)

// not sure what's happening here, maybe some overheads that are not
related to the implementation,
// but the goal was to see how precise the limiting is for longer
transactions

gitpod=# set transaction_timeout to '2min';
SET
gitpod=# begin;
BEGIN
gitpod=*# select now(), clock_timestamp(), pg_sleep(3) \watch 1
Fri 13 Jan 2023 03:49:24 PM UTC (every 1s)

  now  |clock_timestamp| pg_sleep
---+---+--
 2023-01-13 15:49:22.906924+00 | 2023-01-13 15:49:24.088728+00 |
(1 row)

[...]

Fri 13 Jan 2023 03:51:18 PM UTC (every 1s)

  now  |clock_timestamp| pg_sleep
---+---+--
 2023-01-13 15:49:22.906924+00 | 2023-01-13 15:51:18.179579+00 |
(1 row)

ERROR:  canceling statement due to transaction timeout

gitpod=!#
gitpod=!# rollback;
ROLLBACK
gitpod=# select timestamptz '2023-01-13 15:51:18.179579+00' - '2023-01-13
15:49:22.906924+00';
?column?
-
 00:01:55.272655
(1 row)

gitpod=# select interval '2min' - '00:01:55.272655';
?column?
-
 00:00:04.727345
(1 row)

gitpod=# select interval '2min' - '00:01:55.272655' - '4s';
?column?
-
 00:00:00.727345
(1 row)

– it seems we could (should) have one more successful "1s wait, 3s sleep"
iteration here, ~727ms somehow wasted in a loop, quite a lot.


Re: Transaction timeout

2023-01-13 Thread Nikolay Samokhvalov
On Fri, Jan 13, 2023 at 10:16 AM Andrey Borodin 
wrote:

> > – it seems we could (should) have one more successful "1s wait, 3s
> sleep" iteration here, ~727ms somehow wasted in a loop, quite a lot.
>
> I think big chunk from these 727ms were spent between "BEGIN" and
> "select now(), clock_timestamp(), pg_sleep(3) \watch 1".
>

Not really – there was indeed ~2s delay between BEGIN and the first
pg_sleep query, but those ~727ms is something else.

here we measure the remainder between the beginning of the transaction
measured by "now()' and the the beginning of the last successful pg_sleep()
query:

gitpod=# select timestamptz '2023-01-13 15:51:18.179579+00' - '2023-01-13
15:49:22.906924+00';
?column?
-
 00:01:55.272655
(1 row)

It already includes all delays that we had from the beginning of our
transaction.

The problem with my question was that I didn't take into attention that
'2023-01-13 15:51:18.179579+00' is when the last successful query
*started*. So the remainder of our 2-min quota – 00:00:04.727345 – includes
the last successful loop (3s of successful query + 1s of waiting), and then
we have failed after ~700ms.

In other words, there are no issues here, all good.

> Many thanks for looking into this!

many thanks for implementing it


Re: mxid_age() and age(xid) appear undocumented

2023-11-13 Thread Nikolay Samokhvalov
On Mon, Nov 13, 2023 at 5:01 PM Peter Geoghegan  wrote:
>
> On Mon, Nov 13, 2023 at 4:43 PM Bruce Momjian  wrote:
> > I looked into this and all the 4-byte xid functions are marked as
> > deprecated for the 8-byte variants.  I don't think documenting 4-byte
> > mxid_age() and age(xid) makes sense anymore, and I don't see their value
> > enough to create 8-byte versions, so I just added C comments that they
> > were undocumented, in the attached patch.
>
> I'm sympathetic to the goal of making 4 byte XIDs an on-disk
> implementation detail that is all but completely hidden from users.
> However, there are practical problems with taking that to its logical
> extreme. At least right now.
>
> These functions are in fact documented -- albeit only partially. There
> are references to both in "Routine Vacuuming". Moreover, those
> references are rather useful; they're the basis of many
> monitoring/alerting queries. If anything, I'd recommend adding more
> documentation for these two functions.
>
> We also continue to show 32-bit XIDs (alongside 32-bit relfrozenxid)
> in the output of VACUUM VERBOSE/autovacuum log messages. (Though that
> issue can be fixed fairly easily.)
>
> The bottom line is that there is only one way to figure out the age of
> a table right now, and it involves 32-bit XIDs/MXIDs, and these two
> functions. And, if we were to change something in this area, we'd
> definitely need to provide for the needs of those monitoring queries I
> mentioned.

Also, the doc page [1] mentions mxid_age(), but doesn't provide a
snippet to use it. The regular XID wraparound section above has such a
snippet. As a consequence, almost nobody monitors for MultiXact
wraparound – I noticed it recently once again, exploring numerous blog
posts and tools on this topic to write a howto [2] in my collection.

In other words, maybe there should be not only a reference doc for the
function itself present in the doc (the lack of it seems to be an
issue for all older versions), but also a snippet to control MultiXact
ID wraparound – while it's still a potential problem, it would be good
to have both a function reference doc and a how-to-use-it doc.

[1] 
https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND
[2] 
https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0044_how_to_monitor_transaction_id_wraparound_risks.md




Re: Fix bug with indexes on whole-row expressions

2023-12-14 Thread Nikolay Samokhvalov
On Wed, Dec 13, 2023 at 7:01 AM Tom Lane  wrote:

> ywgrit  writes:
> > I forbid to create indexes on whole-row expression in the following
> patch.
> > I'd like to hear your opinions.
>
> As I said in the previous thread, I don't think this can possibly
> be acceptable.  Surely there are people depending on the capability.
> I'm not worried so much about the exact case of an index column
> being a whole-row Var --- I agree that that's pretty useless ---
> but an index column that is a function on a whole-row Var seems
> quite useful.  (Your patch fails to detect that, BTW, which means
> it does not block the case presented in bug #18244.)
>
> I thought about extending the ALTER TABLE logic to disallow changes
> in composite types that appear in index expressions.  We already have
> find_composite_type_dependencies(), and it turns out that this already
> blocks ALTER for the case you want to forbid, but we concluded that we
> didn't need to prevent it for the bug #18244 case:
>
>  * If objsubid identifies a specific column, refer to that in error
>  * messages.  Otherwise, search to see if there's a user column of
> the
>  * type.  (We assume system columns are never of interesting
> types.)
>  * The search is needed because an index containing an expression
>  * column of the target type will just be recorded as a
> whole-relation
>  * dependency.  If we do not find a column of the type, the
> dependency
>  * must indicate that the type is transiently referenced in an
> index
>  * expression but not stored on disk, which we assume is OK, just
> as
>  * we do for references in views.  (It could also be that the
> target
>  * type is embedded in some container type that is stored in an
> index
>  * column, but the previous recursion should catch such cases.)
>
> Perhaps a reasonable answer would be to issue a WARNING (not error)
> in the case where an index has this kind of dependency.  The index
> might need to be reindexed --- but it might not, too, and in any case
> I doubt that flat-out forbidding the ALTER is a helpful idea.
>
> regards, tom lane
>

WARNING can be easily overlooked. Users of mobile/web apps don't see
Postgres WARNINGs.

Forbidding ALTER sounds more reasonable.

Do you see any good use cases for whole-row indexes?

And for such cases, wouldn't it be reasonable for users to specify all
columns explicitly? E.g.:

   create index on t using btree(row(c1, c2, c3));


Re: Set log_lock_waits=on by default

2023-12-21 Thread Nikolay Samokhvalov
On Thu, Dec 21, 2023 at 05:29 Laurenz Albe  wrote:

> Here is a patch to implement this.
> Being stuck behind a lock for more than a second is almost
> always a problem, so it is reasonable to turn this on by default.


I think it's a very good idea. On all heavily loaded systems I have
observed so far, we always have turned it on. 1s (default deadlock_timeout)
is quite large value for web/mobile apps, meaning that default frequency of
logging is quite low, so any potential suffering from observer effect
doesn't happen -- saturation related active session number happens much,
much earlier, even if you have very slow disk IO for logging.

At the same time, I like the idea by Robert to separate logging of log
waits and deadlock_timeout logic -- the current implementation is a quite
confusing for new users. I also had cases when people wanted to log lock
waits earlier than deadlock detection. And also, most always lock wait
logging lacks the information another the blocking session (its state, and
last query, first of all), but is maybe an off topic worthing another
effort of improvements.

Nik


Re: Transaction timeout

2023-10-10 Thread Nikolay Samokhvalov
On Wed, Sep 6, 2023 at 1:16 AM Fujii Masao  wrote:
> With the v4 patch, I found that timeout errors no longer occur during the 
> idle in
> transaction phase. Instead, they occur when the next statement is executed. 
> Is this
> the intended behavior? I thought some users might want to use the transaction 
> timeout
> feature to prevent prolonged transactions and promptly release resources 
> (e.g., locks)
> in case of a timeout, similar to idle_in_transaction_session_timeout.

I agree – it seems reasonable to interrupt transaction immediately
when the timeout occurs. This was the idea – to determine the maximum
possible time for all transactions that is allowed on a server, to
avoid too long-lasting locking and not progressing xmin horizon.

That being said, I also think this wording in the docs:

+Setting transaction_timeout in
+postgresql.conf is not recommended
because it would
+affect all sessions.

It was inherited from statement_timeout, where I also find this
wording too one-sided. There are certain situations where we do want
global setting to be set – actually, any large OLTP case (to be on
lower risk side; those users who need longer timeout, can set it when
needed, but by default we do need very restrictive timeouts, usually <
1 minute, like we do in HTTP or application servers). I propose this:
> Setting transaction_timeout in postgresql.conf should be done with caution 
> because it affects all sessions.

Looking at the v4 of the patch, a couple of more comments that might
be helpful for v5 (which is planned, as I understand):

1)  it might be beneficial to add tests for more complex scenarios,
e.g., subtransactions

2) In the error message:

+ errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1,
+ stmt_reason, comma2, tx_reason)));

– it seems we can have excessive commas here

3) Perhaps, we should say that we cancel the transaction, not
statement (especially in the case when it is happening in the
idle-in-transaction state).

Thanks for working on this feature!




Re: Observability in Postgres

2022-02-14 Thread Nikolay Samokhvalov
On Mon, Feb 14, 2022 at 10:15 Greg Stark  wrote:

>  
> For now my approach is to implement a background worker that listens
> on a new port and is basically its own small web server with shared
> memory access


This reminds me bg_mon (included into Spilo, docker image used by Zalando
operator for Postgres):

https://github.com/CyberDem0n/bg_mon


Re: Transaction timeout

2022-12-02 Thread Nikolay Samokhvalov
On Fri, Dec 2, 2022 at 9:18 PM Andrey Borodin  wrote:

> Hello,
>
> We have statement_timeout, idle_in_transaction_timeout,
> idle_session_timeout and many more! But we have no
> transaction_timeout. I've skimmed thread [0,1] about existing timeouts
> and found no contraindications to implement transaction_timeout.
>
> Nikolay asked me if I can prototype the feature for testing by him,
> and it seems straightforward. Please find attached. If it's not known
> to be a bad idea - we'll work on it.
>

Thanks!! It was a super quick reaction to my proposal Honestly, I was
thinking about it for several years, wondering why it's still not
implemented.

The reasons to have it should be straightforward – here are a couple of
them I can see.

First one. In the OLTP context, we usually have:
- a hard timeout set in application server
- a hard timeout set in HTTP server
- users not willing to wait more than several seconds – and almost never
being ready to wait for more than 30-60s.

If Postgres allows longer transactions (it does since we cannot reliably
limit their duration now, it's always virtually unlimited), it might be
doing the work that nobody is waiting / is needing anymore, speeding
resources, affecting health, etc.

Why we cannot limit transaction duration reliably? The existing timeouts
(namely, statement_timeout + idle_session_timeout) don't protect from
having transactions consisting of a series of small statements and short
pauses between them. If such behavior happens (e.g., a long series of fast
UPDATEs in a loop). It can be dangerous, affecting general DB health (bloat
issues). This is reason number two – DBAs might want to decide to minimize
the cases of long transactions, setting transaction limits globally (and
allowing to set it locally for particular sessions or for some users in
rare cases).

Speaking of the patch – I just tested it (gitpod:
https://gitpod.io/#https://gitlab.com/NikolayS/postgres/tree/transaction_timeout);
it applies, works as expected for single-statement transactions:

postgres=# set transaction_timeout to '2s';
SET
postgres=# select pg_sleep(3);
ERROR:  canceling transaction due to transaction timeout

But it fails in the "worst" case I've described above – a series of small
statements:

postgres=# set transaction_timeout to '2s';
SET
postgres=# begin; select pg_sleep(1); select pg_sleep(1); select
pg_sleep(1); select pg_sleep(1); select pg_sleep(1); commit;
BEGIN
 pg_sleep
--

(1 row)

 pg_sleep
--

(1 row)

 pg_sleep
--

(1 row)

 pg_sleep
--

(1 row)

 pg_sleep
--

(1 row)

COMMIT
postgres=#


Re: Transaction timeout

2022-12-05 Thread Nikolay Samokhvalov
On Sat, Dec 3, 2022 at 9:41 AM Andrey Borodin  wrote:

> Fixed. Added test for this.
>

Thanks! Tested (gitpod:
https://gitpod.io/#https://gitlab.com/NikolayS/postgres/tree/transaction_timeout-v2
),

works as expected.


Re: Transaction timeout

2022-12-05 Thread Nikolay Samokhvalov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Tested, works as expected;

documentation is not yet added

Prevent accidental whole-table DELETEs and UPDATEs

2023-02-02 Thread Nikolay Samokhvalov
In many cases, a DELETE or UPDATE not having a WHERE clause (or having it
with a condition matching all rows in the table) is a sign of some kind of
mistake, leading to accidental data loss, performance issues, producing a
lot of dead tuples, and so on. Recently, this topic was again discussed [1]

Attached is a patch implemented by Andrey Boroding (attached) during our
today's online session [2], containing a rough prototype for two new GUCs:

- prevent_unqualified_deletes
- prevent_unqualified_updates

Both are "false" by default; for superusers, they are not applied.

There is also another implementation of this idea, in the form of an
extension [3], but I think having this in the core would be beneficial to
many users.

Looking forward to your feedback.

[1] https://news.ycombinator.com/item?id=34560332
[2] https://www.youtube.com/watch?v=samLkrC5xQA
[3] https://github.com/eradman/pg-safeupdate


0001-Add-GUCs-to-preven-some-accidental-massive-DML.patch
Description: Binary data


Re: Amcheck verification of GiST and GIN

2023-02-02 Thread Nikolay Samokhvalov
On Thu, Feb 2, 2023 at 12:15 PM Peter Geoghegan  wrote:

> On Thu, Feb 2, 2023 at 11:51 AM Peter Geoghegan  wrote:
>
...

> Admittedly there is some value in seeing multiple WARNINGs to true
> experts that are performing some kind of forensic analysis, but that
> doesn't seem worth it to me -- I'm an expert, and I don't think that
> I'd do it this way for any reason other than it being more convenient
> as a way to get information about a system that I don't have access
> to. Even then, I think that I'd probably have serious doubts about
> most of the extra information that I'd get, since it might very well
> be a downstream consequence of the same basic problem.
>
...

I understand your thoughts (I think) and agree with them, but at least one
scenario where I do want to see *all* errors is corruption prevention –
running
amcheck in lower environments, not in production, to predict and prevent
issues.
For example, not long ago, Ubuntu 16.04 became EOL (in phases), and people
needed to upgrade, with glibc version change. It was quite good to use
amcheck
on production clones (running on a new OS/glibc) to identify all indexes
that
need to be rebuilt. Being able to see only one of them would be very
inconvenient. Rebuilding all indexes didn't seem a good idea in the case of
large databases.


Re: Amcheck verification of GiST and GIN

2023-02-02 Thread Nikolay Samokhvalov
On Thu, Feb 2, 2023 at 12:43 PM Peter Geoghegan  wrote:

> I agree that this matters at the level of whole indexes.
>

I already realized my mistake – indeed, having multiple errors for 1 index
doesn't seem to be super practically helpful.


> I think that that problem should be solved at a higher level, in the
> program that runs amcheck. Note that pg_amcheck will already do this
> for B-Tree indexes.
>

That's a great tool, and it's great it supports parallelization, very useful
on large machines.


> We should add a "Tip" to the amcheck documentation on 14+ about this.
> We should clearly advise users that they should probably just use
> pg_amcheck.


and with -j$N, with high $N (unless it's production)


Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Nikolay Samokhvalov
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> I suggest we call it "xmlformat", which is an established term for this.
>

Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should
go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly
different form, with ability to specify the number of spaces for
indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.


Re: [PATCH] Add pretty-printed XML output option

2023-02-17 Thread Nikolay Samokhvalov
On Fri, Feb 17, 2023 at 1:14 AM Jim Jones  wrote:

> After your comment I'm studying the possibility to extend the existing
> xmlserialize function to add the indentation feature. If so, how do you
> think it should look like? An extra parameter? e.g.
>
> SELECT xmlserialize(DOCUMENT '42'::XML AS text,
> true);
>
> .. or more or like Oracle does it
>
> SELECT XMLSERIALIZE(DOCUMENT xmltype('42') AS BLOB
> INDENT)
> FROM dual;
>

My idea was to follow the SQL standard (part 14, SQL/XML); unfortunately,
there is no free version, but there are drafts at
http://www.wiscorp.com/SQLStandards.html.

 ::=
  XMLSERIALIZE  [  ]

   AS 
  [  ]
  [  ]
  [  ]

  [  ]
  

 ::=
  [ NO ] INDENT


Oracle's extension SIZE=n also seems interesting (including a special case
SIZE=0, which means using new-line characters without spaces for each line).


Re: Amcheck verification of GiST and GIN

2022-06-22 Thread Nikolay Samokhvalov
On Wed, Jun 22, 2022 at 11:35 AM Andrey Borodin 
wrote:

> > On 30 May 2022, at 12:40, Andrey Borodin  wrote:
> >
> > What do you think?
>
> Hi Andrey!
>

Hi Andrey!

Since you're talking to yourself, just wanted to support you – this is an
important thing, definitely should be very useful for many projects; I hope
to find time to test it in the next few days.

Thanks for working on it.


Re: log_autovacuum in Postgres 14 -- ordering issue

2021-08-25 Thread Nikolay Samokhvalov
On Wed, Aug 25, 2021 at 10:34 AM Peter Geoghegan  wrote:

> It would be a lot clearer if the "buffer usage" line was simply moved
> down. I think that it should appear after the lines that are specific
> to the table's indexes -- just before the "avg read rate" line. That
> way we'd group the buffer usage output with all of the other I/O
> related output that summarizes the VACUUM operation as a whole.
>

The last two lines are also "*** usage" -- shouldn't the buffer numbers be
next to them?


Re: Release 14 Schedule

2021-09-19 Thread Nikolay Samokhvalov
Observability-related improvements are also good and very important for the
future of DBA operations -- compute_query_id, new pg_stat_**, etc.

Things like new knob idle_session_timeout and restore_command change not
requiring a restart will be very noticeable too.

On Sun, Sep 19, 2021 at 2:45 PM Jonathan S. Katz 
wrote:

> On 9/19/21 12:32 PM, Justin Pryzby wrote:
> > On Sat, Sep 18, 2021 at 01:37:19PM -0400, Tom Lane wrote:
> >> We don't yet have a list-of-major-features for the v14 release notes.
> >> Anybody care to propose one?
> >
> >   . Allow extended statistics on column expressions;
> >   . Memoize node which can improve speed of nested loop joins;
> >   . Allow use of LZ4 compression for faster access to TOASTed fields;
> >   . JSONB and H-store types may be subscripted, as may be participating
> data types provided by extensions.
> >   . Many improvements to performance of VACUUM;
> >
> > Maybe these??
>
> I would propose a few different ones. I'm looking at the overall breadth
> of user impact as I propose these and the reactions I've seen in the field.
>
> - General performance improvements for databases with multiple
> connections (the MVCC snapshot work).
>
> - The reduction in bloat on frequently updated B-trees; that was a
> longstanding complaint against PostgreSQL that was resolved.
>
> - I agree with the JSON improvements; I'd bucket this in data types and
> include the support of multiranges.
>
> - Logical decoding / replication received some significant performance
> improvements
>
> - Many improvements in query parallelism. One that stands out is how
> parallel queries can be leveraged using FDWs now, in particular the
> postgres_fdw.
>
> - I agree with VACUUM suggestion as well.
>
> I can try proposing some wording on this in a bit; I'm working on the
> overdue draft of the press release, and thought I'd chime in here first.
>
> Thanks,
>
> Jonathan
>
>


Re: Add create and update timestamp to all objects

2021-09-26 Thread Nikolay Samokhvalov
On Sun, Sep 26, 2021 at 1:11 PM Efrain J. Berdecia 
wrote:

> Are there any plans to add a create and last updated time stamp field to
> any and all objects in postgres?
>
> Possibly even adding a updated_by documenting which role created and last
> updated the object.
>
> All done natively and without the need for extra extensions.
>
> Thanks in advance.
>

Why would you need that? Each timestamp[tz] value is 8 bytes.

And setting up values via defaults and triggers is straightforward, isn't
it?


Re: amcheck verification for GiST and GIN

2021-07-29 Thread Nikolay Samokhvalov
Hello,

First of all, thank you all -- Andrey, Peter, Heikki and others -- for this
work, GIN support in amcheck is *really* needed, especially for OS upgrades
such as from Ubuntu 16.04 (which is EOL now) to 18.04 or 20.04

I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any OS/glibc
changes.

On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how to
create a synthetic demo of this. Any suggestions?

And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).

Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5) snippet used to
run amcheck:
https://gitlab.com/-/snippets/2001962 (see file #3)

Before reindex:


INFO:  [2021-07-29 17:44:42.525+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 117935; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 65754, offset
232


test=# reindex index index_XXX_trigram;

REINDEX



After REINDEX:


INFO:  [2021-07-29 18:01:23.339+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 70100; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 70048, offset
253



On Thu, Jul 15, 2021 at 00:03 Heikki Linnakangas  wrote:

> On 07/08/2020 00:33, Peter Geoghegan wrote:
> > On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko 
> wrote:
> >> Here is the patch which I (with Andrey as my advisor) built on the top
> of the last patch from this thread:
> https://commitfest.postgresql.org/25/1800/ .
> >> It adds an ability to verify validity  of GIN index. It is not polished
> yet, but it works and we wanted to show it to you so you can give us some
> feedback, and also let you know about this work if you have any plans of
> writing something like that yourselves, so that you do not redo what is
> already done.
> >
> > Can you rebase this patch, please?
> >
> > Also suggest breaking out the series into distinct patch files using
> > "git format-patch master".
>
> I rebased the GIN parts of this patch, see attached. I also ran pgindent
> and made some other tiny cosmetic fixes, but I didn't review the patch,
> only rebased it in the state it was.
>
> I was hoping that this would be useful to track down the bug we're
> discussing here:
>
> https://www.postgresql.org/message-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw%40mail.gmail.com.
>
> But now that I look what checks this performs, I doubt this will catch
> the kind of corruption that's happened there. I suspect it's more subtle
> than an inconsistencies between parent and child pages, because only a
> few rows are affected. But doesn't hurt to try.
>
> - Heikki
>


Re: amcheck verification for GiST and GIN

2021-08-01 Thread Nikolay Samokhvalov
Thank you, v5 didn't find any issues at all. One thing: for my 29 indexes,
the tool generated output 3.5 GiB. I guess many INFO messages should be
downgraded to something like DEBUG1?

On Fri, Jul 30, 2021 at 2:35 AM Heikki Linnakangas  wrote:

> On 29/07/2021 21:34, Nikolay Samokhvalov wrote:
> > I was trying to check a bunch of GINs on some production after switching
> > from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
> > 16.04 first (that is still used on prod for that DB), without any
> > OS/glibc changes.
> >
> > On 16.04, I still saw errors and it was not really expected because this
> > should mean that production is corrupted too. So, REINDEX should fix it.
> > But it didn't -- see output below. I cannot give data and thinking how
> > to create a synthetic demo of this. Any suggestions?
> >
> > And is this a sign that the tool is wrong rather that we have a real
> > corruption cases? (I assume if we did, we would see no errors after
> > REINDEXing -- of course, if GIN itself doesn't have bugs).
> >
> > Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
> > slightly adjusted to work with PG12 (
> > https://gitlab.com/postgres/postgres/-/merge_requests/5
> > <https://gitlab.com/postgres/postgres/-/merge_requests/5>) snippet used
> > to run amcheck:
> > https://gitlab.com/-/snippets/2001962
> > <https://gitlab.com/-/snippets/2001962> (see file #3)
>
> Almost certainly the tool is wrong. We went back and forth a few times
> with Pawel, fixing various bugs in the amcheck patch at this thread:
>
> https://www.postgresql.org/message-id/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf%40iki.fi.
>
> Can you try again with the latest patch version from that thread,
> please? That's v5-0001-Amcheck-for-GIN-13stable.patch.
>
> - Heikki
>


Re: Subtransactions + a long-running transaction leading to performance degradation on standbys

2021-08-19 Thread Nikolay Samokhvalov
On Thu, Aug 19, 2021 at 10:04 PM Andrey Borodin 
wrote:

> I just want to note, that on your screenshot unpatched version runs 400K
> tps, while patched runs 280K tps. I see the dates are different and this
> effect is not observed in  [0]. Probably, you run tests on different
> machines.


Indeed, patched Postgres I was testing on 2x smaller EC2 instances, it is
documented in the GitLab issue. But I added an additional note here:
https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/21#note_655731979


> While your experiments clearly shows that patch can save DB from
> degradation under pathological workload it would be great to ensure patch
> does not incur penalty on normal workload.
>

Makes sense. I'll try to find time to make a direct comparison.

I documented all the steps in detail in the GitLab issue, so anyone
interested can reproduce it and explore the problem at different angles.


Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-07-10 Thread Nikolay Samokhvalov
On Fri, Jul 7, 2023 at 6:31 AM Stephen Frost  wrote:

> * Nikolay Samokhvalov (n...@postgres.ai) wrote:
> > But this can happen with anyone who follows the procedure from the docs
> as
> > is and doesn't do any additional steps, because in step 9 "Prepare for
> > standby server upgrades":
> >
> > 1) there is no requirement to follow specific order to shut down the
> nodes
> >- "Streaming replication and log-shipping standby servers can remain
> > running until a later step" should probably be changed to a
> > requirement-like "keep them running"
>
> Agreed that it would be good to clarify that the primary should be shut
> down first, to make sure everything written by the primary has been
> replicated to all of the replicas.
>

Thanks!

Here is a patch to fix the existing procedure description.

I agree with Andrey – without it, we don't have any good way to upgrade
large clusters in short time. Default rsync mode (without "--size-only")
takes a lot of time too, if the load is heavy.

With these adjustments, can "rsync --size-only" remain in the docs as the
*fast* and safe method to upgrade standbys, or there are still some
concerns related to corruption risks?


pg-upgrade-docs-clarify-rsync-size-only.patch
Description: Binary data


Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-07-10 Thread Nikolay Samokhvalov
gt;

Well, we have two primary servers – old and new. I tried to clarify it in
the new version.


>
> > + Standbys can be running at this point or fully stopped.
>
> "or be fully stopped." I think.
>
> > + If they
> > + are still running, you can stop, upgrade, and start them one by
> one; this
> > + can be useful to keep the cluster open for read-only transactions.
> > +
>
> Maybe this is clear from the context, but "upgrade" in the above should
> maybe more explicitly refer to the rsync method or else people might
> think one can run pg_upgrade on them after all?
>

Maybe. It will require changes in other parts of this doc.
Thinking (here:
https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)

Meanwhile, attached is v2

thanks for the comments
From: Nikolay Samokhvalov 
Date: Mon, 10 Jul 2023 20:07:18 +
Subject: [PATCH] Refine instructions for major upgrades on standby servers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Recipe involving pg_upgrade -k and rsync --size-only can be dangerous
– clarify what to do to avoid standby corruption.

Discussion: https://www.postgresql.org/message-id/flat/CAM527d8heqkjG5VrvjU3Xjsqxg41ufUyabD9QZccdAxnpbRH-Q%40mail.gmail.com
---
 doc/src/sgml/ref/pgupgrade.sgml | 39 +++--
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c6859..41242bb4200 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -361,10 +361,10 @@ make prefix=/usr/local/pgsql.new install

 

-Stop both servers
+Stop both primary servers
 
 
- Make sure both database servers are stopped using, on Unix, e.g.:
+ Make sure both primary servers are stopped using, on Unix, e.g.:
 
 
 pg_ctl -D /opt/PostgreSQL/9.6 stop
@@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
 
 
 
- Streaming replication and log-shipping standby servers can
+ Streaming replication and log-shipping standby servers must
  remain running until a later step.
 

 
-   
+   
 Prepare for standby server upgrades
 
 
- If you are upgrading standby servers using methods outlined in section , verify that the old standby
- servers are caught up by running pg_controldata
- against the old primary and standby clusters.  Verify that the
- Latest checkpoint location values match in all clusters.
- (There will be a mismatch if old standby servers were shut down
- before the old primary or if the old standby servers are still running.)
+ If you are upgrading standby servers using methods outlined in 
+ , ensure that they were running when 
+ you shut down the primaries in the previous step, so all the latest changes 
+ and the shutdown checkpoint record were received. You can verify this by running 
+ pg_controldata against the old primary and standby 
+ clusters. The Latest checkpoint location values must match in all 
+ nodes. A mismatch might occur if old standby servers were shut down before 
+ the old primary. To fix a mismatch, start all old servers and return to the 
+ previous step; proceeding with mismatched 
+ Latest checkpoint location may lead to standby corruption.
+
+
+
  Also, make sure wal_level is not set to
  minimal in the postgresql.conf file on the
  new primary cluster.
@@ -497,7 +503,6 @@ pg_upgrade.exe
  linkend="warm-standby"/>) standby servers, you can follow these steps to
  quickly upgrade them.  You will not be running pg_upgrade on
  the standby servers, but rather rsync on the primary.
- Do not start any servers yet.
 
 
 
@@ -508,6 +513,16 @@ pg_upgrade.exe
  is running.
 
 
+
+ Before running rsync, to avoid standby corruption, it is absolutely
+ critical to ensure that both old and new primary servers are shut down
+ and standby servers have received the last changes as discussed in
+ . 
+ Standby servers can be running at this point or be fully stopped. If they 
+ are still running. You can stop, upgrade, and start them one by one; this
+ can be useful to keep the cluster open for read-only transactions.
+
+
 
 
  
--



Configurable FP_LOCK_SLOTS_PER_BACKEND

2023-07-12 Thread Nikolay Samokhvalov
We're observing a few cases with lockmanager spikes in a few quite loaded
systems.

These cases are different; queries are different, Postgres versions are 12,
13, and 14.

But in all cases, servers are quite beefy (96-128 vCPUs, ~600-800 GiB)
receiving a lot of TPS (a few dozens of thousands). Most queries that
struggle from wait_event=lockmanager involve a substantial number of
tables/indexes, often with partitioning.

FP_LOCK_SLOTS_PER_BACKEND is now hard-coded 16 in storage/proc.h – and it
is now very easy to hit this threshold in a loaded system, especially, for
example, if a table with a dozen of indexes was partitioned. It seems any
system with good growth hits it sooner or later.

I wonder, would it make sense to:
1) either consider increasing this hard-coded value, taking into account
that 16 seems to be very low for modern workloads, schemas, and hardware –
say, it could be 64,
2) or even make it configurable – a new GUC.

Thanks,
Nikolay Samokhvalov
Founder, Postgres.ai


Re: UUID v7

2023-06-22 Thread Nikolay Samokhvalov
On Tue, Feb 14, 2023 at 6:13 AM Kyzer Davis (kydavis)  wrote:
> I am happy to see others interested in the improvements provided by UUIDv7!

Thank you for providing the details!

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
https://commitfest.postgresql.org/43/4388/
- recent post by Ants Aasma, Cybertec about the downsides of
traditional UUID raised a big discussion today on HN:
https://news.ycombinator.com/item?id=36429986




pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-06-29 Thread Nikolay Samokhvalov
Hi!

(posting this to -hackers rather than to -docs since it seems a deeper
problem than just adjusting the docs)

I recently observed a case with standby corruption after upgrading pg12 to
pg14, which was presented in the form of XX001 errors on the new cluster's
standby nodes. e.g.:
  ERROR: missing chunk number 0 for toast value 3228893903 in
pg_toast_79504413

Comparing the content of the data directory and checking files with md5sum,
I noticed that some files for some TOAST index have different content on
new standby nodes compared to the new primary – and interesting was the
fact all standbys had the same content. Just different compared to the
primary.

We used the "rsync --size-only" snippet from the docs
https://www.postgresql.org/docs/current/pgupgrade.html to upgrade standbys.

With "--size-only", 1 GiB files for tables and indexes obviously cannot be
reliably synchronized. In our case, we perform additional steps involving
logical replication, advancing primary to certain LSN position -- and
during that, we keep standbys down. This explains the increased corruption
risks. But I think these risks are present for those who just follow the
steps in the docs as is, and probably some fixes or improvements are needed
here.

The main question: why do we consider "rsync --size-only" as reliable in
the general case? May standby corruption happen if we use follow steps from
https://www.postgresql.org/docs/current/pgupgrade.html?

Considering several general situations:
1. For streaming replication:
a. if we shut down the primary first, based on the code in walsender.c
defining how shutdown even is handled, replicas should receive all the
changes?
b. if shut down standbys first (might be preferred if we run cluster
under Patroni control, to avoid unnecessary failover), then some changes
from the primary won't be received by standbys – and we do have standby
corruption risks
2. For replication based on WAL shipping, I don't think we can guarantee
that all changes are propagated to standbys.

The docs also have this:

> 9. Prepare for standby server upgrades
> If you are upgrading standby servers using methods outlined in section
Step 11, verify that the old standby servers are caught up by running
pg_controldata against the old primary and standby clusters. Verify that
the “Latest checkpoint location” values match in all clusters. (There will
be a mismatch if old standby servers were shut down before the old primary
or if the old standby servers are still running.) Also, make sure wal_level
is not set to minimal in the postgresql.conf file on the new primary
cluster.

– admitting that there might be mismatch. But if there is mismatch, rsync
--size-only is not going to help synchronize properly, right?

I was thinking about how to improve here, some ideas:
- "rsync --checksum" doesn't seem to be a good idea, it's, unfortunately,
very, very slow, though it would be the most reliable approach (but since
it's slow, I guess it's not worth even mentioning, crossing this out)
- we could remove "--size-only" and rely on default rsync behavior –
checking size and modification time; but how reliable would it be in
general case?
- make the step verifying “Latest checkpoint location” *after* shutting
down all nodes as mandatory, with instructions on how to avoid mismatch:
e.g., shut down primary first, disabling automated failover software, if
any, then run pg_controldata on standbys while they are running, and on
primary while it's already shut down (probably, different instructions are
needed for WAL shipping and streaming cases)
- probably, we should always run "rsync --checksum" for pg_wal
- I think, it's time to provide a snippet to run "rsync" in multiple
threads. A lot of installations today have many vCPUs and fast SSDs, and
running single-threaded rsync seems to be very slow (especially if we do
need to move away from "--size-only"). If it makes sense, I could come up
with some patch proposal for the docs
-  it's probably time to implement support for standby upgrade in
pg_upgrade itself, finding some way to take care of standbys and moving
away from the need to run rsync or to rebuild standby nodes? Although, this
is just a raw idea without a proper proposal yet.

Does this make sense or I'm missing something and the current docs describe
a reliable process? (As I said, we have deviated from the process, to
involve logical replication, so I'm not 100% sure I'm right suspecting the
original procedure in having standby corruption risks.)

Thanks,
Nikolay Samokhvalov
Founder, Postgres.ai


Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

2023-06-30 Thread Nikolay Samokhvalov
On Fri, Jun 30, 2023 at 14:33 Bruce Momjian  wrote:

> On Fri, Jun 30, 2023 at 04:16:31PM -0400, Robert Haas wrote:
> > I'm not quite clear on how Nikolay got into trouble here. I don't
> > think I understand under exactly what conditions the procedure is
> > reliable and under what conditions it isn't. But there is no way in
> > heck I would ever advise anyone to use this procedure on a database
> > they actually care about. This is a great party trick or something to
> > show off in a lightning talk at PGCon, not something you ought to be
> > doing with valuable data that you actually care about.
>
> Well, it does get used, and if we remove it perhaps we can have it on
> our wiki and point to it from our docs.


In my case, we performed some additional writes on the primary before
running "pg_upgrade -k" and we did it *after* we shut down all the
standbys. So those changes were not replicated and then "rsync --size-only"
ignored them. (By the way, that cluster has wal_log_hints=on to allow
Patroni run pg_rewind when needed.)

But this can happen with anyone who follows the procedure from the docs as
is and doesn't do any additional steps, because in step 9 "Prepare for
standby server upgrades":

1) there is no requirement to follow specific order to shut down the nodes
   - "Streaming replication and log-shipping standby servers can remain
running until a later step" should probably be changed to a
requirement-like "keep them running"

2) checking the latest checkpoint position with pg_controldata now looks
like a thing that is good to do, but with uncertainty purpose -- it does
not seem to be used to support any decision
  - "There will be a mismatch if old standby servers were shut down before
the old primary or if the old standby servers are still running" should
probably be rephrased saying that if there is mismatch, it's a big problem

So following the steps as is, if some writes on the primary are not
replicated (due to whatever reason) before execution of pg_upgrade -k +
rsync --size-only, then those writes are going to be silently lost on
standbys.

I wonder, if we ensure that standbys are fully caught up before upgrading
the primary, if we check the latest checkpoint positions, are we good to
use "rsync --size-only", or there are still some concerns? It seems so to
me, but maybe I'm missing something.

> --

Thanks,
Nikolay Samokhvalov
Founder, Postgres.ai


Re: Postgres is not able to handle more than 4k tables!?

2020-07-09 Thread Nikolay Samokhvalov
Hi Konstantin, a silly question: do you consider the workload you have as
well-optimized? Can it be optimized further? Reading this thread I have a
strong feeling that a very basic set of regular optimization actions is
missing here (or not explained): query analysis and optimization based on
pg_stat_statements (and, maybe pg_stat_kcache), some method to analyze the
state of the server in general, resource consumption, etc.

Do you have some monitoring that covers pg_stat_statements?

Before looking under the hood, I would use multiple pg_stat_statements
snapshots (can be analyzed using, say, postgres-checkup or pgCenter) to
understand the workload and identify the heaviest queries -- first of all,
in terms of total_time, calls, shared buffers reads/hits, temporary files
generation. Which query groups are Top-N in each category, have you looked
at it?

You mentioned some crazy numbers for the planning time, but why not to
analyze the picture holistically and see the overall numbers? Those queries
that have increased planning time, what their part of total_time, on the
overall picture, in %? (Unfortunately, we cannot see Top-N by planning time
in pg_stat_statements till PG13, but it doesn't mean that we cannot have
some good understanding of overall picture today, it just requires more
work).

If workload analysis & optimization was done holistically already, or not
possible due to some reason — pardon me. But if not and if your primary
goal is to improve this particular setup ASAP, then the topic could be
started in the -performance mailing list first, discussing the workload and
its aspects, and only after it's done, raised in -hackers. No?

On Thu, Jul 9, 2020 at 8:57 AM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Hi Stephen,
>
> Thank you for supporting an opinion that it is the problems not only of
> client system design (I agree it is not so good
> idea to have thousands tables and thousands active backends) but also of
> Postgres.
>
> We have made more investigation and found out one more problem in
> Postgres causing "invalidation storm".
> There are some log living transactions which prevent autovacuum to do it
> work and  remove dead tuples.
> So autovacuum is started once and once again and each time did no
> progress but updated statistics and so sent invalidation messages.
> autovacuum_naptime was set to 30 seconds, so each 30 seconds autovacuum
> proceed huge number of tables and initiated large number of invalidation
> messages which quite soon cause overflow of validation message buffers
> for backends performing long OLAP queries.
>
> It makes me think about two possible optimizations:
>
> 1. Provide separate invalidation messages for relation metadata and its
> statistic.
> So update of statistic should not invalidate relation cache.
> The main problem with this proposal is that pg_class contains relpages
> and reltuples columns which conceptually are \ part of relation statistic
> but stored in relation cache. If relation statistic is updated, then
> most likely this fields are also changed. So we have to remove this
> relation
> from relation cache in any case.
>
> 2. Remember in relation info XID of oldest active transaction at the
> moment of last autovacuum.
> At next autovacuum iteration we first of all compare this stored XID
> with current oldest active transaction XID
> and bypass vacuuming this relation if XID is not changed.
>
> Thoughts?
>
> >> So, that's really the core of your problem.  We don't promise that
> >> you can run several thousand backends at once.  Usually it's recommended
> >> that you stick a connection pooler in front of a server with (at most)
> >> a few hundred backends.
> > Sure, but that doesn't mean things should completely fall over when we
> > do get up to larger numbers of backends, which is definitely pretty
> > common in larger systems.  I'm pretty sure we all agree that using a
> > connection pooler is recommended, but if there's things we can do to
> > make the system work at least a bit better when folks do use lots of
> > connections, provided we don't materially damage other cases, that's
> > probably worthwhile.
>
> I also think that Postgres performance should degrade gradually with
> increasing number
> of active backends. Actually further investigations of this particular
> case shows that such large number of
> database connections was caused by ... Postgres slowdown.
> During normal workflow number of active backends is few hundreds.
> But "invalidation storm" cause hangout of queries, so user application
> has to initiate more and more new connections to perform required actions.
> Yes, this may be not the best behavior of application in this case. At
> least it should first terminate current connection using
> pg_terminate_backend. I just want to notice that large number of
> backends was not the core of the problem.
>
> >
> > Making them GUCs does seem like it's a few steps too far... but it'd be
> > nice if we could arran

Re: Postgres is not able to handle more than 4k tables!?

2020-07-09 Thread Nikolay Samokhvalov
Great idea.

In addition to this, it would be good to consider another optimization for
the default transaction isolation level: making autovacuum to clean dead
tuples in relations that are not currently used in any transaction and when
there are no IN_PROGRESS transactions running at RR or S level (which is a
very common case because RC is the default level and this is what is
actively used in heavily loaded OLTP systems which often suffer from
long-running transactions). I don't know the details of how easy it would
be to implement, but it always wondered that autovacuum has the global XID
"horizon".

With such an optimization, the "hot_standby_feedback=on" mode could be
implemented also more gracefully, reporting "min(xid)" for ongoing
transactions on standbys separately for RC and RR/S levels.

Without this, we cannot have good performance for HTAP use cases for
Postgres – the presence of just a small number of long-running
transactions, indeed, is known to kill the performance of OLTP workloads
quickly. And leading to much faster bloat growth than it could be.

However, maybe I'm wrong in these considerations, or it's impossible / too
difficult to implement.

On Thu, Jul 9, 2020 at 9:38 AM Alexander Korotkov 
wrote:

> On Thu, Jul 9, 2020 at 6:57 PM Konstantin Knizhnik
>  wrote:
> > 2. Remember in relation info XID of oldest active transaction at the
> > moment of last autovacuum.
> > At next autovacuum iteration we first of all compare this stored XID
> > with current oldest active transaction XID
> > and bypass vacuuming this relation if XID is not changed.
>
>
> This option looks good for me independently of the use case under
> consideration.  Long-running transactions are an old and well-known
> problem.  If we can skip some useless work here, let's do this.
>
> --
> Regards,
> Alexander Korotkov
>
>
>


Re: Default gucs for EXPLAIN

2020-05-23 Thread Nikolay Samokhvalov
This is a very good improvement! Using information about buffers is my favorite 
way to optimize queries.

Not having BUFFERS enabled by default means that in most cases, when asking for 
help, people send execution plans without buffers info.

And it's simply in on event to type "(ANALYZE, BUFFERS)" all the time.

So I strongly support this change, thank you, Vik.

On Sat, May 23 2020 at 02:14, Vik Fearing < v...@postgresfriends.org > wrote:

> 
> 
> 
> Here is a patch to provide default gucs for EXPLAIN options.
> 
> 
> 
> I have two goals with this patch. The first is that I personally
> *always* want BUFFERS turned on, so this would allow me to do it without
> typing it every time.
> 
> 
> 
> The second is it would make it easier to change the actual default for
> settings if we choose to do so because users would be able to switch it
> back if they prefer.
> 
> 
> 
> The patch is based off of a995b371ae.
> --
> Vik Fearing
> 
> 
>

Re: Default gucs for EXPLAIN

2020-05-25 Thread Nikolay Samokhvalov
On Mon, May 25, 2020 at 6:36 PM, Bruce Momjian < br...@momjian.us > wrote:

> 
> 
> 
> I am not excited about this new feature. Why do it only for EXPLAIN? That
> is a log of GUCs. I can see this becoming a feature creep disaster.
> 
> 
> 
> 

How about changing the default behavior, making BUFFERS enabled by default? 
Those who don't need it, always can say BUFFERS OFF — the say as for TIMING.

Re: New committers announced at PGCon 2018

2018-06-01 Thread Nikolay Samokhvalov
сб, 2 июня 2018 г. в 1:10, Teodor Sigaev :

>
> > Etsuro Fujita
> > Peter Geoghegan
> > Amit Kapila
> > Alexander Korotkov
> > Thomas Munro
> > Michael Paquier
> > Tomas Vondra
> >
> > Congratulations to all!
>
> +7!


+7,

and +2 to you and Oleg with progress in building your Postgres team

Nikolay

>


Re: proposal: alternative psql commands quit and exit

2017-12-08 Thread Nikolay Samokhvalov
On Fri, Dec 8, 2017 at 2:10 AM, Michael Paquier 
wrote:
>
> I think that you are going to need better reasons than just being more
> friendly with other database clients ...
>

This is not about other database clients. This is about users considering
migration to Postgres.
People!

I'm definitely +1 for this, I've seen many people struggling because of
this. Being more friendly to new users = more users in future.

At least "quit" and "exit" might be aliases for "help".


ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2017-12-11 Thread Nikolay Samokhvalov
Very interesting read: https://arxiv.org/abs/1712.01208

HN discussion: https://news.ycombinator.com/item?id=15894896

Some of the comments (from Twitter
https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
at GOOG just released a paper showing how machine-learned indexes can
replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
B-Trees, 10-100x less space. Executes on GPU, which are getting faster
unlike CPU. Amazing."

Can those ideas be applied to Postgres in its current state? Or it's not
really down-to-earth?


vacuumdb --analyze-only (e.g., after a major upgrade) vs. partitioned tables: pg_statistic missing stats for the partitioned table itself

2024-10-24 Thread Nikolay Samokhvalov
Nikolay Samokhvalov 
2:47 PM (0 minutes ago)
to pglsql-hackers
I just learned that vacuumdb --analyze-only doesn't update stats for the
partitioned table itself, taking care only about individual partitions:

(DDL doesn't matter here)

# vacuumdb --analyze-only -U postgres test --verbose
...
INFO:  analyzing "public.measurement_2023_01"
INFO:  "measurement_2023_01": scanned 6370 of 6370 pages, containing
100 live rows and 0 dead rows; 3 rows in sample, 100 estimated
total rows
INFO:  "measurement_2023_02": scanned 6257 of 6257 pages, containing 982279
live rows and 0 dead rows; 3 rows in sample, 982279 estimated total rows
INFO:  analyzing "public.measurement_2023_03"
INFO:  "measurement_2023_03": scanned 6483 of 6483 pages, containing
1017721 live rows and 0 dead rows; 3 rows in sample, 1017721 estimated
total rows
...

test=# select starelid::regclass, count(*) from pg_statistic where
starelid::regclass::text ~ 'measurement' group by 1 order by 1;
  starelid   | count
-+---
 measurement_2023_01 | 4
 measurement_2023_02 | 4
 measurement_2023_03 | 4
(3 rows)

While for the single-threaded SQL-level ANALYZE:

test=# analyze verbose measurement;
...
test=# select starelid::regclass, count(*) from pg_statistic where
starelid::regclass::text ~ 'measurement' group by 1 order by 1;
  starelid   | count
-+---
 measurement | 4
 measurement_2023_01 | 4
 measurement_2023_02 | 4
 measurement_2023_03 | 4
(4 rows)

This means that if, after running pg_upgrade, we use vacuumdb to update
stats faster, some stats may be missing, potentially leading to suboptimal
performance.

Additionally, it doesn't help that pg_stat_all_tables doesn't show
counters/timestamps for partitioned table, even after SQL-level ANALYZE:

test=# select relname, analyze_count, autoanalyze_count, last_analyze,
last_autoanalyze from pg_stat_user_tables where relname ~ 'measurement';
   relname   | analyze_count | autoanalyze_count |
last_analyze  | last_autoanalyze
-+---+---+---+--
 measurement_2023_01 | 2 | 0 | 2024-10-24
21:25:47.979958+00 |
 measurement_2023_02 | 2 | 0 | 2024-10-24
21:25:48.070355+00 |
 measurement_2023_03 | 2 | 0 | 2024-10-24
21:25:48.154613+00 |
(3 rows)

This is also discussed in
https://www.postgresql.org/message-id/flat/CADofcAXVbD0yGp_EaC9chmzsOoSai3jcfBCnyva3j0RRdRvMVA%40mail.gmail.com

I propose considering 3 fixes:

1) vacuumdb --analyze / --analyze-only to update stats for the partitioned
table, so people using pg_upgrade are not in trouble
2) present the ANALYZE metadata for partitioned tables in pg_stat_all_tables
3) for old versions, either backpatch with fix (1) OR just add to the docs
(and maybe to the final words pg_upgrade prints), suggesting something like
this in addition to vacuumdb analyze-only:

-- psql snippet
select format(
  'analyze verbose %I.%I;',
  relnamespace::oid::regnamespace,
  oid::regclass
) as vacuum_command
from pg_class
where relkind = 'p' \gexec

Additionally, I do like the idea of ANALYZE ONLY from the -general
discussion above (though, there might be confusion with logic of --analyze
and --analyze-only in vacuumdb).

Does it make sense?

Nik


Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-05 Thread Nikolay Samokhvalov
On Tue, Nov 5, 2024 at 10:19 AM Robert Haas  wrote:

> On Tue, Nov 5, 2024 at 1:02 PM Nikolay Samokhvalov
>  wrote:
> > hi, I have a proposal, resulted from numerous communications with
> various folks, both very experienced and new Postgres users:
> >
> > 1) EXPLAIN ANALYZE Is sometimes very confusing (because there is
> ANALYZE). Let's rename it to EXPLAIN EXECUTE?
>
> The trouble is that EXPLAIN EXECUTE already means something.
>
> robert.haas=# explain execute foo;
> ERROR:  prepared statement "foo" does not exist
>
> Granted, that would not make it impossible to make EXPLAIN (EXECUTE) a
> synonym for EXPLAIN (ANALYZE), but IMHO it would be pretty confusing
> if EXPLAIN EXECUTE and EXPLAIN (EXECUTE) did different things.
>
> > 2) VERBOSE doesn't include BUFFERS, and doesn't include SETTINGS; it
> might be also confusing sometimes. Let's include them so VERBOSE would be
> really verbose?
>
> I agree that the naming here isn't great, but I think making the
> options non-orthogonal would probably be worse.
>
> > 3) small thing about grammar: allow omitting parentheses, so EXPLAIN
> EXECUTE VERBOSE would work.
>
> Perhaps surprisingly, it turns out that this is not a small change. As
> Tom mentions, this would have a pretty large blast radius. In fact,
> the reason I wrote the patch to introduce parenthesized options for
> EXPLAIN was precisely because the unparenthesized option syntax does
> not scale nicely at all.
>

I appreciate all yours and Tom's very quick comments here!

Item 3 is already solved, as it turned out.

Let's focus on item 2. Is it really impossible to make VERBOSE really
verbose?


Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-05 Thread Nikolay Samokhvalov
hi, I have a proposal, resulted from numerous communications with various
folks, both very experienced and new Postgres users:

1) EXPLAIN ANALYZE Is sometimes very confusing (because there is ANALYZE).
Let's rename it to EXPLAIN EXECUTE?

2) VERBOSE doesn't include BUFFERS, and doesn't include SETTINGS; it might
be also confusing sometimes. Let's include them so VERBOSE would be really
verbose?

3) small thing about grammar: allow omitting parentheses, so EXPLAIN
EXECUTE VERBOSE would work.

if both changes are done, we could use EXPLAIN (EXECUTE, VERBOSE) to be
able to collect data in a great way for analysis.

have a really nice week,
Nik


Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-05 Thread Nikolay Samokhvalov
On Tue, Nov 5, 2024 at 10:16 AM Tom Lane  wrote:

> Nikolay Samokhvalov  writes:
> > 1) EXPLAIN ANALYZE Is sometimes very confusing (because there is
> ANALYZE).
> > Let's rename it to EXPLAIN EXECUTE?
>
> This has got far too many years of history to be renamed now.
>

This is a really, really strange argument. Postgres keeps receiving new
audiences at larger and larger scale. And they are confused.

It's better late than never. I didn't believe we would have "quit" working
in psql.


>
> > 2) VERBOSE doesn't include BUFFERS, and doesn't include SETTINGS; it
> might
> > be also confusing sometimes. Let's include them so VERBOSE would be
> really
> > verbose?
>
> This is not likely to fly for compatibility reasons.
>

Can you elaborate?


>
> > 3) small thing about grammar: allow omitting parentheses, so EXPLAIN
> > EXECUTE VERBOSE would work.
>
> The reason for the parens is that the other way would require reserving
> all these options as keywords.
>

turns out, EXPLAIN ANALYZE VERBOSE already working (it's just not as
verbose as one might expect_:

test=# explain analyze verbose select;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0)
(1 row)


Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-05 Thread Nikolay Samokhvalov
On Tue, Nov 5, 2024 at 10:30 AM Tom Lane  wrote:

> we're not working in a green field here, and all these
> decisions have history.
>

I hear you and understand.

Ready to do legwork here.

1. VERBOSE first appeared in 1997 in 6.3 in 3a02ccfa, with different
meaning:

  > This command [EXPLAIN] outputs details about the supplied query. The
default
> output is the computed query cost.  \f2verbose\f1 displays the full query
  > plan and cost.

2. Support for parenthesis was added in d4382c4a (2009, 8.5), with "test"
option COSTS, and this opened gates to extending with many options.

3. BUFFERS was added in d4382c4 (also 2009, 8.5), discussion
https://www.postgresql.org/message-id/flat/4AC12A17.5040305%40timbira.com,
I didn't see that inclusion it to VERBOSE was discussed.

In my opinion, this option is invaluable: most of the performance
optimization is done by reducing IO so seeing these numbers helps make
decisions much faster. I always use them. When you optimize and, for
example, want to verify an index idea, it's not good to do it on production
– it's better to work with clones. There, we can have weaker hardware,
different buffer state, etc. So timing numbers might be really off. Timing
can be different even on the same server, e.g. after restart, when buffer
pool is not warmed up. But BUFFERS never lie – they are not affected by
saturated CPU if it happens, lock acquisition waits, etc. Not looking at
them is missing an essential part of analysis, I strongly believe.

It looks like in 2009, when the BUFFERS option was created, it was not
enough understanding that it is so useful, so it was not discussed to
include them by default or at least – as we discuss here – to involve in
VERBOSE.

I want to emphasize: BUFFERS is essential in my work and more and more
people are convinced that during the optimization process, when you're
inside it, in most cases it's beneficial to focus on BUFFERS. Notice that
explain.depesz.com, explain.dalibo.com, pgMustard and many tools recognize
it and ask users to include BUFFERS to analysis. And see the next item:

4. Making BUFFERS default behavior for EXPLAIN ANALYZE was raised several
times, for example
https://www.postgresql.org/message-id/flat/CANNMO++=LrJ4upoeydZhbmpd_ZgZjrTLueKSrivn6xmb=yf...@mail.gmail.com
(2021) – and my understanding that it was received great support and it
discussed in detail why it's useful, but then several attempts to implement
it were not accomplished because of tech difficulties (as I remember,
problem with broken tests and how to fix that).

5. EXPLAIN ALL proposed in
https://www.postgresql.org/message-id/flat/080fe841-e38d-42a9-ad6d-48cabed16...@endpoint.com
(2016) – I think it's actually a good idea originally, but didn't survive
questions of mutually exclusive options and non-binary options, and then
discussion stopped after pivoting in direction of GUC.

6. FInally, the fresh SERIALIZE option was discussed in
https://www.postgresql.org/message-id/flat/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1%40gmx.de
(2023-2024, 17), and unfortunately again.

I might be missing some discussions – please help me find them; I also
expect that there are many people who support me thinking that BUFFERS are
very useful and should be default or at least inside VERBOSE. Meanwhile:
- to be able to have all data in hand during analysis, we need to recommend
users to collect plans using EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS),
which looks really long
- independently, I know see pgMustard ended up having a similar
recommendation: https://www.pgmustard.com/getting-a-query-plan:
   > For better advice, we recommend using at least: explain (analyze,
format json, buffers, verbose, settings)

My proposal remains: EXPLAIN ANALYZE VERBOSE -- let's consider this, please.


Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE

2024-11-05 Thread Nikolay Samokhvalov
On Tue, Nov 5, 2024 at 2:54 PM Nikolay Samokhvalov 
wrote:

> 6. FInally, the fresh SERIALIZE option was discussed in
> https://www.postgresql.org/message-id/flat/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1%40gmx.de
> (2023-2024, 17), and unfortunately again.
>

(didn't finish the phrase here and hit Send)
...again, I don't see that it was discussed to include the SERIALIZE
behavior to VERBOSE. I don't use SERIALIZE myself, but during our podcasts,
Michael (CCing him) was wondering why it was so.

Summary: I haven't found explicit discussions of including new options to
VERBOSE, when that new options were created. I used Google, the .org
search, and postgres.ai semantic search over archives involving
pgvector/HNSW – I might be missing something, or it was really not
discussed when new options were added.


Re: pg_dump, pg_dumpall, pg_restore: Add --no-policies option

2025-01-10 Thread Nikolay Samokhvalov
Thank you Greg.

002_pg_dump.pl already deals with CREATE POLICY and ALTER TABLE .. ENABLE
ROW LEVEL SECURITY, so I just added "--no-policies" there, to have basic
coverage.

Nik


On Fri, Jan 10, 2025 at 9:44 AM Greg Sabino Mullane 
wrote:

> Looks good to me. Would ideally like to see some tests: should be easy
> enough to add to t/002_pg_dump.pl, but probably not worth it just for a
> simple flag like this? We don't test a lot of other flags, but on the other
> hand, that's what a test suite is supposed to do.
>
> Cheers,
> Greg
>
>
From d66470a0188be437789934e1d03795ca687553e5 Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov 
Date: Thu, 9 Jan 2025 20:21:39 +
Subject: [PATCH] pg_dump, pg_dumpall, pg_restore: Add --no-policies option

Add --no-policies option to control row level security policy handling
in dump and restore operations. When this option is used, both CREATE
POLICY commands and ALTER TABLE ... ENABLE ROW LEVEL SECURITY commands
are excluded from dumps and skipped during restores.

This is useful in scenarios where policies need to be redefined in the
target system or when moving data between environments with different
security requirements.
---
 doc/src/sgml/ref/pg_dump.sgml|  9 +
 doc/src/sgml/ref/pg_dumpall.sgml |  9 +
 doc/src/sgml/ref/pg_restore.sgml | 10 ++
 src/bin/pg_dump/pg_backup.h  |  2 ++
 src/bin/pg_dump/pg_backup_archiver.c |  7 +++
 src/bin/pg_dump/pg_dump.c|  6 ++
 src/bin/pg_dump/pg_dumpall.c |  5 +
 src/bin/pg_dump/pg_restore.c |  4 
 src/bin/pg_dump/t/002_pg_dump.pl |  9 +
 9 files changed, 61 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index d66e901f51b..2f8c7b38048 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1080,6 +1080,15 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-policies
+  
+   
+Do not dump row security policies.
+   
+  
+ 
+
  
   --no-publications
   
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 014f2792589..78bd7c73247 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -422,6 +422,15 @@ exclude database PATTERN
   
  
 
+ 
+  --no-policies
+  
+   
+Do not dump row security policies.
+   
+  
+ 
+
  
   --no-publications
   
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index b8b27e1719e..57802e3c9ef 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -703,6 +703,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-policies
+  
+   
+Do not output commands to restore row security policies, even if
+the archive contains them.
+   
+  
+ 
+
  
   --no-publications
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index f0f19bb0b29..3084f1ec417 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -110,6 +110,7 @@ typedef struct _restoreOptions
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
+	int			no_policies;	/* Skip row security policies */
 	int			no_publications;	/* Skip publication entries */
 	int			no_security_labels; /* Skip security label entries */
 	int			no_subscriptions;	/* Skip subscription entries */
@@ -181,6 +182,7 @@ typedef struct _dumpOptions
 	int			no_comments;
 	int			no_security_labels;
 	int			no_publications;
+	int			no_policies;	/* Skip row security policies */
 	int			no_subscriptions;
 	int			no_toast_compression;
 	int			no_unlogged_table_data;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 707a3fc844c..305e7955c1c 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -183,6 +183,7 @@ dumpOptionsFromRestoreOptions(RestoreOptions *ropt)
 	dopt->disable_dollar_quoting = ropt->disable_dollar_quoting;
 	dopt->dump_inserts = ropt->dump_inserts;
 	dopt->no_comments = ropt->no_comments;
+	dopt->no_policies = ropt->no_policies;
 	dopt->no_publications = ropt->no_publications;
 	dopt->no_security_labels = ropt->no_security_labels;
 	dopt->no_subscriptions = ropt->no_subscriptions;
@@ -2944,6 +2945,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 	if (ropt->no_comments && strcmp(te->desc, "COMMENT") == 0)
 		return 0;
 
+	/* If it's a policy, maybe ignore it */
+	if (ropt->no_policies &&
+		(strcmp(te->desc, "POLICY") == 0 ||
+		 strcmp(te->desc, "ROW SECURITY") == 0))
+		return 0;
+
 	/*
 	 * If it's a publication or a table part of a publication, maybe ignore

pg_dump, pg_dumpall, pg_restore: Add --no-policies option

2025-01-09 Thread Nikolay Samokhvalov
pg_dump[all] and pg_restore have a lot of "--no-XXX" options,

I noticed there is no "--no-policies"; the patch implements it for pg_dump,
pg_dumpall, and pg_restore.

This can be useful in scenarios where policies need to be redefined in the
target system or when moving data between environments with different
security requirements.

Looking for feedback.

Nik


0001-pg_dump-pg_dumpall-pg_restore-Add-no-policies-option.patch
Description: Binary data


Re: Built-in Raft replication

2025-04-15 Thread Nikolay Samokhvalov
On Tue, Apr 15, 2025 at 8:08 AM Greg Sabino Mullane 
wrote:

> On Mon, Apr 14, 2025 at 1:15 PM Konstantin Osipov 
> wrote:
>
>> If anyone is working on Raft already I'd be happy to discuss
>> the details. I am fairly new to the PostgreSQL hackers ecosystem
>> so cautious of starting work in isolation/knowing there is no
>> interest in accepting the feature into the trunk.
>>
>
> Putting aside the technical concerns about this specific idea, it's best
> to start by laying out a very detailed plan of what you would want to
> change, and what you see as the costs and benefits. It's also extremely
> helpful to think about developing this as an extension. If you get stuck
> due to extension limitations, propose additional hooks. If the hooks will
> not work, explain why.
>

This is exactly what I wanted to write as well. The idea is great. At the
same time, I think, consensus on many decisions will be extremely hard to
reach, so this project has a high risk of being very long. Unless it's an
extension, at least in the beginning.

Nik


Support specifying compression level in wal_compression

2025-05-27 Thread Nikolay Samokhvalov
I thought it makes sense to extend wal_compression to support compression
levels.

The patch replaces the simple enum-based setting with string-based
'method[:level]' syntax, similar to the --compress option in pg_dump.

What's inside:
- Unified GUC design: wal_compression = 'method[:level]'
- Supported formats: 'off', 'on', 'pglz', 'lz4[:level]', 'zstd[:level]'
- Algorithm-specific defaults: LZ4 defaults to level 1, ZSTD to level 3
when no level is specified.
- Parameter validation is performed at SET time.
- The string is parsed only once during GUC assignment.
- Backward compatibility for common settings: While the GUC type changes
from enum to string, common previous boolean-like string values (e.g.,
'off', 'true', '0') are handled, mapping to 'pglz' or 'off' correspondingly.
- Includes docs change proposal, as well extensive new regression and TAP
tests.

Additionally, it adds LZ4HC support -- for LZ4, if levels 10-12 are
specified, then, when available (checked at build time + runtime), it's
supposed to use LZ4HC for higher compression ratio. This part needs
additional testing.


Originally, I considered adding wal_compression_level but eventually
decided to exten wal_compression, because of two reasons:
1. with a separate param, a question of defaults needs to be solved, and I
didn't find an elegant solution;
2. "method:level" syntax is already used in another place – pg_dump

An early version of this patch was reviewed by Andrey Borodin off-list, and
it was extremely helpful.

looking forward to seeing feedback

Nik


001_wal_compression_with_levels_and_lz4hc.patch
Description: Binary data


Re: Proposal: Job Scheduler

2025-05-30 Thread Nikolay Samokhvalov
On Fri, May 30, 2025 at 02:22 Andrei Lepikhov  wrote:

> On 5/30/25 03:17, Nikolay Samokhvalov wrote:
> > On Thu, Jun 6, 2024 at 5:31 AM Dmitry Dolgov <9erthali...@gmail.com
> > +1. The PostgreSQL ecosystem is surprisingly fragmented, when it
> comes
> > to quite essential components that happen to be outside of the core.
> But
> > of course it doesn't mean that there should be _one_ component of
> every
> > kind in core, more like it makes sense to have _one_ component
> available
> > out of the box (where the box is whatever form of PostgreSQL that
> gets
> > delivered to users, e.g. a distro package, container, etc.).
> >
> >
> > +1 too.
> >
> > There is a huge reason to have a job scheduler in core – new partition
> > creation.
> >
> > In my opinion, partitioning in Postgres needs more automation, and new
> > partition creation is a big missing piece. And it does require a
> scheduler.
> >
> > I like pg_timetable a lot, but it's written in Go;
> >
> > pg_cron is written in Go, and it's already present in most managed
> > Postgres platforms. Why not to bring it to Postgres core so we could
> > then use it to improve developer experience of dealing with partitioning?
> I would say you should provide a reason why it is too difficult to stay
> outside the core, such as pg_hint_plan or a similar feature.
> In my opinion, the main reason to push an extension into contrib is if
> it has a strong connection with the core API. But the scheduler seems as
> far from the volatile API features as possible.
> That's more, contrib extensions have essential priority to external
> solutions and reduce development impulse in the area.



I'm not proposing to include it as contrib module -- I propose to include
it to core code base and then use to implement automated partition creation.

>


Re: Proposal: Job Scheduler

2025-05-29 Thread Nikolay Samokhvalov
On Thu, Jun 6, 2024 at 5:31 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Thu, Jun 06, 2024 at 12:53:38PM GMT, Alvaro Herrera wrote:
> > On 2024-Jun-06, Dave Page wrote:
> >
> > > It's this kind of choice that means it's unlikely we'd include any one
> > > option in PostgreSQL, much like various other tools such as failover
> > > managers or poolers.
> >
> > TBH I see that more as a bug than as a feature, and I see the fact that
> > there are so many schedulers as a process failure.  If we could have
> > _one_ scheduler in core that encompassed all the important features of
> > all the independent ones we have, with hooks or whatever to allow the
> > user to add any fringe features they need, that would probably lead to
> > less duplicative code and divergent UIs, and would be better for users
> > overall.
> >
> > That's, of course, just my personal opinion.
>
> +1. The PostgreSQL ecosystem is surprisingly fragmented, when it comes
> to quite essential components that happen to be outside of the core. But
> of course it doesn't mean that there should be _one_ component of every
> kind in core, more like it makes sense to have _one_ component available
> out of the box (where the box is whatever form of PostgreSQL that gets
> delivered to users, e.g. a distro package, container, etc.).
>

+1 too.

There is a huge reason to have a job scheduler in core – new partition
creation.

In my opinion, partitioning in Postgres needs more automation, and new
partition creation is a big missing piece. And it does require a scheduler.

I like pg_timetable a lot, but it's written in Go;

pg_cron is written in Go, and it's already present in most managed Postgres
platforms. Why not to bring it to Postgres core so we could then use it to
improve developer experience of dealing with partitioning?


Re: Proposal: Job Scheduler

2025-05-29 Thread Nikolay Samokhvalov
On Thu, May 29, 2025 at 6:17 PM Nikolay Samokhvalov  wrote:

> pg_cron is written in Go, and it's already present in most managed
> Postgres platforms. Why not to bring it to Postgres core so we could then
> use it to improve developer experience of dealing with partitioning?
>

I mean, in C, of course.


Add --system-identifier / -s option to pg_resetwal

2025-05-31 Thread Nikolay Samokhvalov
hi hackers,

I was just involved in a DR case, where we needed to change system ID in
control data, and noticed that there is no such capability, officially – so
I thought, it would be good to have it

the attached patch adds a new -s / --system-identifier option to
pg_resetwal that allows users to change the database cluster's system
identifier; it can be useful when you need to make a restored cluster
distinct from the original, or when cloning for testing

some aspects about the patch:
- accepts positive 64-bit integers only (zero not allowed)
- requires interactive confirmation or --force flag for safety
- detects non-TTY environments and requires --force in scripts
- included input validation and error handling
- updated docs with clear warnings about compatibility
- added tests, including some edge cases

looking forward to hearing feedback!

Nik


0001-pg_resetwal--system-identifier.patch
Description: Binary data


Re: track generic and custom plans in pg_stat_statements

2025-05-31 Thread Nikolay Samokhvalov
On Thu, May 29, 2025 at 11:56 AM Sami Imseih  wrote:

> It turns out that 1722d5eb05d8e reverted 525392d5727f, which
> made CachedPlan available in QueryDesc and thus
> available to pgss_ExecutorEnd.
>
> So now we have to make CachedPlan available to QueryDesc as
> part of this change. The reason the patch was reverted is related
> to a memory leak [0] in the BuildCachedPlan code and is not related
> to the part that made CachedPlan available to QueryDesc.
>
> See v6 for the rebase of the patch and addition of testing for EXPLAIN
> and EXPLAIN ANALYZE which was missing from v5.
>

I reviewed v6:

- applies to master cleanly, builds, tests pass and all works as expected
- overall, the patch looks great and I found no major issues
- tests and docs look good overall
- in docs, one minor comment:
> "Total number of statements executed using a generic plan" vs. what
we already have for `calls`
here, in "Number of times the statement was executed", I see some
inconsistencies:
- the word "total" should be removed, I think
- and maybe we should make wording consistent with the existing
text – "number of times the statement ...")
- Also very minor, the test queries have duplicate `calls` columns:
> SELECT calls, generic_plan_calls, custom_plan_calls, toplevel, calls,
...
-  plan->status is set in GetCachedPlan() but I don't see explicit
initialization in plan creation paths; maybe it's worth having a defensive
initialization for possible edge cases:
> plan->status = PLAN_CACHE_STATUS_UNKNOWN;
(here I'm not 100% sure, as palloc0 in CreateCachedPlan should
zero-initialize it to PLAN_CACHE_STATUS_UNKNOWN anyway)

that's all I could find – and overall it's a great addition,

thank you, looking forward to having these two columns in prod.

Nik


Re: track generic and custom plans in pg_stat_statements

2025-05-31 Thread Nikolay Samokhvalov
On Sat, May 31, 2025 at 5:06 PM Nikolay Samokhvalov  wrote:

> On Thu, May 29, 2025 at 11:56 AM Sami Imseih  wrote:
>
>> It turns out that 1722d5eb05d8e reverted 525392d5727f, which
>> made CachedPlan available in QueryDesc and thus
>> available to pgss_ExecutorEnd.
>>
>> So now we have to make CachedPlan available to QueryDesc as
>> part of this change. The reason the patch was reverted is related
>> to a memory leak [0] in the BuildCachedPlan code and is not related
>> to the part that made CachedPlan available to QueryDesc.
>>
>> See v6 for the rebase of the patch and addition of testing for EXPLAIN
>> and EXPLAIN ANALYZE which was missing from v5.
>>
>
> I reviewed v6:
>
> - applies to master cleanly, builds, tests pass and all works as expected
> - overall, the patch looks great and I found no major issues
> - tests and docs look good overall
> - in docs, one minor comment:
> > "Total number of statements executed using a generic plan" vs. what
> we already have for `calls`
> here, in "Number of times the statement was executed", I see some
> inconsistencies:
> - the word "total" should be removed, I think
> - and maybe we should make wording consistent with the existing
> text – "number of times the statement ...")
> - Also very minor, the test queries have duplicate `calls` columns:
> > SELECT calls, generic_plan_calls, custom_plan_calls, toplevel,
> calls, ...
> -  plan->status is set in GetCachedPlan() but I don't see explicit
> initialization in plan creation paths; maybe it's worth having a defensive
> initialization for possible edge cases:
> > plan->status = PLAN_CACHE_STATUS_UNKNOWN;
> (here I'm not 100% sure, as palloc0 in CreateCachedPlan should
> zero-initialize it to PLAN_CACHE_STATUS_UNKNOWN anyway)
>
> that's all I could find – and overall it's a great addition,
>
> thank you, looking forward to having these two columns in prod.
>

Ah, one more thing: the subject here and in CommitFest entry, "track
generic and custom plans" slightly confused me at first, I think it's worth
including words "calls" or "execution" there, and in the commit message,
for clarity. Or just including the both column names as is.

Nik