Re: How to validate restore of backup?

2024-08-23 Thread Peter J. Holzer
On 2024-08-23 08:13:40 +0200, Peter J. Holzer wrote:
> On 2024-08-22 16:09:47 +0500, Muhammad Usman Khan wrote:
> > For validation of databases, you can use the following approach
> > 
> > /usr/pgsql-16/bin/pg_dump -d postgres -h localhost -p 5428 | md5sum > 
> > /var/lib/
> > pgsql/db1.txt
> > /usr/pgsql-16/bin/pg_dump -d postgres -h localhost -p 5420 | md5sum > 
> > /var/lib/
> > pgsql/db2.txt
> > diff db1.txt db2.txt
> > 
> > By executing above queries, if diff is null then it means there is no
> > difference between source and destination databases.
> 
> But on the other hand, if the diff is null, it doesn't mean there is a

That should have read "if the diff is NOT null", of course.

> (meaningful) difference between the databases.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Where is my app installed?

2024-08-23 Thread Arbol One

Thanks everyone, the issue was resolved.


--
*/ArbolOne.ca/* Using Fire Fox and Thunderbird. ArbolOne is composed of 
students and volunteers dedicated to providing free services to 
charitable organizations. ArbolOne on Java Development is in progress [ í ]

On a subscriber, why is last_msg_send_time in pg_stat_subscription sometimes null?

2024-08-23 Thread Koen De Groote
I'm running this query:

SELECT subname, received_lsn, latest_end_lsn,
last_msg_send_time, last_msg_receipt_time
FROM pg_catalog.pg_stat_subscription;

And "last_msg_send_time" will sometimes be null. All I can find in the
documentation is that this value represents: "Send time of last message
received from origin WAL sender; NULL for parallel apply workers"

But I have no context for what parallel apply workers are or why they are
NULL sometimes and other times not.

Can anyone explain?

Regards,
Koen De Groote


Re: Is there a way to translate pg_amop.amopstrategy into a description?

2024-08-23 Thread Morris de Oryx
Thanks for the confirmation. And, I'd say that this feature would go under
"nice to have" rather than anything more important. Although, it *would *be
nice.

On Thu, Aug 22, 2024 at 5:42 PM Tom Lane  wrote:

> Morris de Oryx  writes:
> > What I'm hoping for is a function like
> > get_opt_class_strategy_description(optclass, straregy_number)  I've
> > looked at the source a bit, and it seems that there is no such
> > function, and that it might well be difficult to implement. The
> > strategy numbers are, as far as I can see, local to the specific
> > opt_class, which has no requirement to label them in any particular
> > way.
>
> That's correct.  For btree and hash, the meanings of the strategy
> numbers are determined by the index AM; but for (IIRC) all of our
> other index AMs they're determined by the individual opclass.  So
> anything like this would have to be implemented by dedicated code
> in each opclass.  Perhaps that's worth doing, but it'd be a fair
> amount of work.
>
> regards, tom lane
>


Re: On a subscriber, why is last_msg_send_time in pg_stat_subscription sometimes null?

2024-08-23 Thread Adrian Klaver

On 8/23/24 07:33, Koen De Groote wrote:

I'm running this query:

SELECT subname, received_lsn, latest_end_lsn,
last_msg_send_time, last_msg_receipt_time
FROM pg_catalog.pg_stat_subscription;

And "last_msg_send_time" will sometimes be null. All I can find in the 
documentation is that this value represents: "Send time of last message 
received from origin WAL sender; NULL for parallel apply workers"


But I have no context for what parallel apply workers are or why they 
are NULL sometimes and other times not. >

Can anyone explain?


The source has that information:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/applyparallelworker.c;h=e7f7d4c5e4bd3ff63687cb21b84534a05a3962cc;hb=HEAD

The top part explains the process.

As to the time:

/*
 * There are three fields in each message received by the parallel apply
 * worker: start_lsn, end_lsn and send_time. Because we have updated these
 * statistics in the leader apply worker, we can ignore these fields in the
 * parallel apply worker (see function LogicalRepApplyLoop).
 */



Regards,
Koen De Groote


--
Adrian Klaver
adrian.kla...@aklaver.com





dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Matthew Tice
Hi All,

I'm trying to understand why there's a difference between what pgstattuple
reports and pg_stat_user_tables reports (for the number of dead tuples).

As I understand, pgstattuple and pgstattuple_approx return the exact number
of dead tuples (as noted in the documentation) and based on an older Stack
Overflow answer the value returned from pg_stat_user_tables "uses the most
recent data collected by ANALYZE".

Why would it be that even after analyzing a table the n_dead_tup value is
still vastly different than dead_tuple_count?

> SELECT * FROM (SELECT dead_tuple_count from
pgstattuple_approx('oban.oban_jobs'))a, (SELECT
n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
 ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname =
'oban')b;
-[ RECORD 1 ]-
dead_tuple_count | 3736
n_dead_tup   | 1127044
last_autovacuum  | 2024-08-23 16:00:30.983141+00
last_analyze | 2024-08-23 15:33:50.628422+00
now  | 2024-08-23 16:01:19.915893+00
autovacuum_count | 446478
SELECT 1

> vacuum (verbose,analyze) oban.oban_jobs;

vacuuming "oban.oban_jobs"
table "oban_jobs": index scan bypassed: 29341 pages from table (0.79% of
total) have 747 dead item identifiers
launched 2 parallel vacuum workers for index cleanup (planned: 2)
index "oban_jobs_args_index" now contains 18281 row versions in 10232 pages
0 index row versions were removed.
0 index pages were newly deleted.
56 index pages are currently deleted, of which 833 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
index "oban_jobs_meta_index" now contains 18281 row versions in 9698 pages
0 index row versions were removed.
0 index pages were newly deleted.
35 index pages are currently deleted, of which 621 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
table "oban_jobs": found 855 removable, 9661 nonremovable row versions in
29341 out of 3727204 pages
1330 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 3696951 frozen pages.
912 skipped pages using mintxid fork.
CPU: user: 0.12 s, system: 0.08 s, elapsed: 0.22 s.
vacuuming "pg_toast.pg_toast_72454950"
table "pg_toast_72454950": found 0 removable, 0 nonremovable row versions
in 0 out of 0 pages
0 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 skipped pages using mintxid fork.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
analyzing "oban.oban_jobs"
"oban_jobs": scanned 3 of 3727204 pages, containing 75 live rows and
10501 dead rows; 75 rows in sample, 9318 estimated total rows
VACUUM

> SELECT * FROM (SELECT dead_tuple_count from
pgstattuple_approx('oban.oban_jobs'))a, (SELECT
n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
 ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname =
'oban')b;
-[ RECORD 1 ]-
dead_tuple_count | 1701
n_dead_tup   | 1306009
last_autovacuum  | 2024-08-23 16:01:31.034229+00
last_analyze | 2024-08-23 16:01:47.85574+00
now  | 2024-08-23 16:01:55.734589+00
autovacuum_count | 446479

This is a Google Alloy DB instance running:
> select version();
-[ RECORD 1 ]-
version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian clang
version 12.0.1, 64-bit
SELECT 1


Re: dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Adrian Klaver

On 8/23/24 09:14, Matthew Tice wrote:

Hi All,

I'm trying to understand why there's a difference between what 
pgstattuple reports and pg_stat_user_tables reports (for the number of 
dead tuples).


As I understand, pgstattuple and pgstattuple_approx return the exact 
number of dead tuples (as noted in the documentation) and based on an 


https://www.postgresql.org/docs/current/pgstattuple.html

pgstattuple_approx(regclass) returns record

pgstattuple_approx is a faster alternative to pgstattuple that 
returns approximate results.


Not sure how you get exact count out of that?



This is a Google Alloy DB instance running:


https://cloud.google.com/alloydb/docs/overview

"AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible 
database service that's designed for your most demanding workloads, 
including hybrid transactional and analytical processing. AlloyDB pairs 
a Google-built database engine with a cloud-based, multi-node 
architecture to deliver enterprise-grade performance, reliability, and 
availability."


Where the important parts are 'PostgreSQL-compatible' and 'Google-built 
database engine'. You probably need to reach out to Google to see what 
that means for this situation.




 > select version();
-[ RECORD 1 ]-
version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian 
clang version 12.0.1, 64-bit

SELECT 1


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Matthew Tice
On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
wrote:

> On 8/23/24 09:14, Matthew Tice wrote:
> > Hi All,
> >
> > I'm trying to understand why there's a difference between what
> > pgstattuple reports and pg_stat_user_tables reports (for the number of
> > dead tuples).
> >
> > As I understand, pgstattuple and pgstattuple_approx return the exact
> > number of dead tuples (as noted in the documentation) and based on an
>
> https://www.postgresql.org/docs/current/pgstattuple.html
>
> pgstattuple_approx(regclass) returns record
>
>  pgstattuple_approx is a faster alternative to pgstattuple that
> returns approximate results.
>
> Not sure how you get exact count out of that?
>

Maybe the wording is a little confusing to me. Under the section
for pgstattuple_approx:
"pgstattuple_approx tries to avoid the full-table scan and returns exact
dead tuple statistics along with an approximation of the number and size of
live tuples and free space."


>
> > This is a Google Alloy DB instance running:
>
> https://cloud.google.com/alloydb/docs/overview
>
> "AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible
> database service that's designed for your most demanding workloads,
> including hybrid transactional and analytical processing. AlloyDB pairs
> a Google-built database engine with a cloud-based, multi-node
> architecture to deliver enterprise-grade performance, reliability, and
> availability."
>
> Where the important parts are 'PostgreSQL-compatible' and 'Google-built
> database engine'. You probably need to reach out to Google to see what
> that means for this situation.
>
> Got it, thanks Adrian.


>
> >  > select version();
> > -[ RECORD 1 ]-
> > version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian
> > clang version 12.0.1, 64-bit
> > SELECT 1
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Adrian Klaver

On 8/23/24 09:33, Matthew Tice wrote:



On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:



https://www.postgresql.org/docs/current/pgstattuple.html


pgstattuple_approx(regclass) returns record

      pgstattuple_approx is a faster alternative to pgstattuple that
returns approximate results.

Not sure how you get exact count out of that?


Maybe the wording is a little confusing to me. Under the section 
for pgstattuple_approx:
"pgstattuple_approx tries to avoid the full-table scan and returns exact 
dead tuple statistics along with an approximation of the number and size 
of live tuples and free space."


Yeah, see what you mean.

The part that bears more investigating for this case is:

"It does this by skipping pages that have only visible tuples according 
to the visibility map (if a page has the corresponding VM bit set, then 
it is assumed to contain no dead tuples).


Wondering if PostgreSQl-compatible covers this?




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Adrian Klaver

On 8/23/24 09:51, Adrian Klaver wrote:

On 8/23/24 09:33, Matthew Tice wrote:



On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:



    https://www.postgresql.org/docs/current/pgstattuple.html
    

    pgstattuple_approx(regclass) returns record

      pgstattuple_approx is a faster alternative to pgstattuple that
    returns approximate results.

    Not sure how you get exact count out of that?


Maybe the wording is a little confusing to me. Under the section 
for pgstattuple_approx:
"pgstattuple_approx tries to avoid the full-table scan and returns 
exact dead tuple statistics along with an approximation of the number 
and size of live tuples and free space."


Yeah, see what you mean.

The part that bears more investigating for this case is:

"It does this by skipping pages that have only visible tuples according 
to the visibility map (if a page has the corresponding VM bit set, then 
it is assumed to contain no dead tuples).


Wondering if PostgreSQl-compatible covers this?


Meant to add:

What happens if you use pgstattuple instead?

--
Adrian Klaver
adrian.kla...@aklaver.com





Re: On a subscriber, why is last_msg_send_time in pg_stat_subscription sometimes null?

2024-08-23 Thread Koen De Groote
Adrian,

Thanks for pointing me at the source code.

Digging a bit, the view seems to eventually get its data on last msg send
time from here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/launcher.c;h=c566d50a072b92bd07f4179100275d0d0b1f4c7c;hb=HEAD#l1288

And in particular, this:

1332 if (worker.last_send_time == 0)
1333 nulls[5] = true;
1334 else
1335 values[5] = TimestampTzGetDatum(worker.last_send_time);
1336 if (worker.last_recv_time == 0)
1337 nulls[6] = true;
1338 else
1339 values[6] = TimestampTzGetDatum(worker.last_recv_time);

I don't have any knowledge of C, or postgres internals, so I may well be
wrong in what follows:

>From the bit of comment you posted my impression is that this means there
are separate workers that each send their update, that is then reflected in
the output of the pg_stat_subscription table... many workers, but only 1
table to show metrics, to show both the update by the leader and the
parallel workers...

And these parallel workers get created on the fly, I assume? So they might
well have a last_send_time of 0 if they haven't done anything yet?

What I would expect to see is a table that tells me how a particular
publishers/subscriber is doing, and the metrics around that process, and
the concept of "when data was last sent" to be persistent on the level of
the publisher/subscriber, not based on the lifespan of ephemeral workers
that each time they get created start at 0 and so now the table claims
"last_send_msg" is NULL because of it.

Am I getting that wrong? Is my understanding mistaken?

Regards,
Koen De Groote





On Fri, Aug 23, 2024 at 5:02 PM Adrian Klaver 
wrote:

> On 8/23/24 07:33, Koen De Groote wrote:
> > I'm running this query:
> >
> > SELECT subname, received_lsn, latest_end_lsn,
> > last_msg_send_time, last_msg_receipt_time
> > FROM pg_catalog.pg_stat_subscription;
> >
> > And "last_msg_send_time" will sometimes be null. All I can find in the
> > documentation is that this value represents: "Send time of last message
> > received from origin WAL sender; NULL for parallel apply workers"
> >
> > But I have no context for what parallel apply workers are or why they
> > are NULL sometimes and other times not. >
> > Can anyone explain?
>
> The source has that information:
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/applyparallelworker.c;h=e7f7d4c5e4bd3ff63687cb21b84534a05a3962cc;hb=HEAD
>
> The top part explains the process.
>
> As to the time:
>
> /*
>   * There are three fields in each message received by the parallel apply
>   * worker: start_lsn, end_lsn and send_time. Because we have updated these
>   * statistics in the leader apply worker, we can ignore these fields in
> the
>   * parallel apply worker (see function LogicalRepApplyLoop).
>   */
>
> >
> > Regards,
> > Koen De Groote
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: On a subscriber, why is last_msg_send_time in pg_stat_subscription sometimes null?

2024-08-23 Thread Adrian Klaver

On 8/23/24 12:31, Koen De Groote wrote:

Adrian,

Thanks for pointing me at the source code.

Digging a bit, the view seems to eventually get its data on last msg 
send time from here: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/logical/launcher.c;h=c566d50a072b92bd07f4179100275d0d0b1f4c7c;hb=HEAD#l1288 


And in particular, this:

1332         if (worker.last_send_time == 0)
1333             nulls[5] = true;
1334         else
1335             values[5] = TimestampTzGetDatum(worker.last_send_time);
1336         if (worker.last_recv_time == 0)
1337             nulls[6] = true;
1338         else
1339             values[6] = TimestampTzGetDatum(worker.last_recv_time);

I don't have any knowledge of C, or postgres internals, so I may well be 
wrong in what follows:


 From the bit of comment you posted my impression is that this means 
there are separate workers that each send their update, that is then 
reflected in the output of the pg_stat_subscription table... many 
workers, but only 1 table to show metrics, to show both the update by 
the leader and the parallel workers...


This is getting out of my depth, but that has not stopped me before so 
onward.


I think it is important to realize parallel workers are optional:

https://www.postgresql.org/docs/current/sql-createsubscription.html

streaming (enum)

Specifies whether to enable streaming of in-progress transactions 
for this subscription. The default value is off, meaning all 
transactions are fully decoded on the publisher and only then sent to 
the subscriber as a whole.


If set to on, the incoming changes are written to temporary files 
and then applied only after the transaction is committed on the 
publisher and received by the subscriber.


If set to parallel, incoming changes are directly applied via one 
of the parallel apply workers, if available. If no parallel apply worker 
is free to handle streaming transactions then the changes are written to 
temporary files and applied after the transaction is committed. Note 
that if an error happens in a parallel apply worker, the finish LSN of 
the remote transaction might not be reported in the server log.






And these parallel workers get created on the fly, I assume? So they 
might well have a last_send_time of 0 if they haven't done anything yet?


What I would expect to see is a table that tells me how a particular 
publishers/subscriber is doing, and the metrics around that process, and 
the concept of "when data was last sent" to be persistent on the level 
of the publisher/subscriber, not based on the lifespan of ephemeral 
workers that each time they get created start at 0 and so now the table 
claims "last_send_msg" is NULL because of it.


Some combination of?:

https://www.postgresql.org/docs/current/view-pg-replication-slots.html

https://www.postgresql.org/docs/current/catalog-pg-subscription-rel.html



Am I getting that wrong? Is my understanding mistaken?

Regards,
Koen De Groote




--
Adrian Klaver
adrian.kla...@aklaver.com