Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

2023-01-18 Thread Laurenz Albe
On Tue, 2023-01-17 at 15:22 +, HECTOR INGERTO wrote:
> > Another case: a transaction COMMITs, and a slightly later transaction reads 
> > the data
> > and sets a hint bit.  If the snapshot of the file system with the data 
> > directory in it
> > is slightly later than the snapshot of the file system with "pg_wal", the 
> > COMMIT might
> > not be part of the snapshot, but the hint bit could be.
> > 
> > Then these uncommitted data could be visible if you recover from the 
> > snapshot.
>  
> Thank you all. I have it clearer now.
>  
> As a last point. Making the snapshot to the WAL dataset first or last would 
> make any difference?

Imagine you run DROP TABLE.  During the implicit COMMIT at the end of the 
statement,
the files behind the table are deleted.  If the snapshot of "pg_wal" is earlier 
than
the snapshot of the data files, you end up with a table that is not yet dropped,
but the files are gone.

I won't try to find an example if you now ask what if no checkpoint ends 
between the
statements, the snapshot on "pg_wal" is earlier and we don't run DROP TABLE.

Why do you go to all this effort rather than performing a correct backup?

Yours,
Laurenz Albe




RE: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

2023-01-18 Thread HECTOR INGERTO
I wanted to understand the underlying issue.

I use ZFS snapshots instead of a “correct” backup because with only two 
machines it allows me to have backups in the main machine and in the secondary 
too that acts as hotspare at the same time.

To accomplish the same I would need 3 nodes. The main, the replica hotspare and 
the proper backup.



De: Laurenz Albe
Enviado: miércoles, 18 de enero de 2023 11:02
Para: HECTOR INGERTO; Magnus 
Hagander
CC: pgsql-gene...@postgresql.org 

Asunto: Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple 
zpools?

On Tue, 2023-01-17 at 15:22 +, HECTOR INGERTO wrote:
> > Another case: a transaction COMMITs, and a slightly later transaction reads 
> > the data
> > and sets a hint bit.  If the snapshot of the file system with the data 
> > directory in it
> > is slightly later than the snapshot of the file system with "pg_wal", the 
> > COMMIT might
> > not be part of the snapshot, but the hint bit could be.
> >
> > Then these uncommitted data could be visible if you recover from the 
> > snapshot.
>
> Thank you all. I have it clearer now.
>
> As a last point. Making the snapshot to the WAL dataset first or last would 
> make any difference?

Imagine you run DROP TABLE.  During the implicit COMMIT at the end of the 
statement,
the files behind the table are deleted.  If the snapshot of "pg_wal" is earlier 
than
the snapshot of the data files, you end up with a table that is not yet dropped,
but the files are gone.

I won't try to find an example if you now ask what if no checkpoint ends 
between the
statements, the snapshot on "pg_wal" is earlier and we don't run DROP TABLE.

Why do you go to all this effort rather than performing a correct backup?

Yours,
Laurenz Albe



Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

2023-01-18 Thread Rob Sargent

On 1/18/23 09:38, HECTOR INGERTO wrote:


I wanted to understand the underlying issue.

I use ZFS snapshots instead of a “correct” backup because with only 
two machines it allows me to have backups in the main machine and in 
the secondary too that acts as hotspare at the same time.


To accomplish the same I would need 3 nodes. The main, the replica 
hotspare and the proper backup.




Isn't "proper backup" typically thought of as offsite reloadable file 
capture, hopefully relatively recent?  Disaster recovery, and all that.




Interpreting postgres execution plan along with AND/OR precedence

2023-01-18 Thread Dirschel, Steve
Table definition:

  Table "load.lm_queue"
 Column |  Type  | Collation | Nullable 
| Default
++---+--+-
guid   | character(33)  |   | not null |
host_name  | character varying(40)  |   |  |
priority   | numeric(11,0)  |   | not null |
request_time   | timestamp(6) without time zone |   | not null |
collection_name| character varying(40)  |   | not null |
stage_id   | numeric(11,0)  |   |  |
source_file| character varying(250) |   |  |
lm_id  | numeric(11,0)  |   |  |
start_time | timestamp(6) without time zone |   |  |
status_text| character varying(225) |   | not null 
| NULL::character varying
load_data_id   | character varying(500) |   |  |
docs_in_load   | numeric(11,0)  |   |  |
client_name| character varying(50)  |   |  |
status_code| numeric(11,0)  |   |  |
email_address  | character varying(2000)|   |  |
hold_flag  | character(1)   |   |  |
process_type   | character varying(40)  |   |  |
cancel_flag| character(1)   |   |  |
file_type  | character varying(6)   |   |  |
lm_data| character varying(4000)|   |  |
ds_request_time| timestamp(6) without time zone |   |  |
ds_id  | numeric(11,0)  |   |  |
ds_start_time  | timestamp(6) without time zone |   |  |
auto_promote_flag  | character(1)   |   |  |
extract_out_file   | character varying(250) |   |  |
last_upd_time  | timestamp(6) without time zone |   |  |
ds_fail_count  | numeric(11,0)  |   |  |
cc_collection  | character varying(40)  |   |  |
cc_environment | character varying(40)  |   |  |
cc_fail_on_db_mismatch | character(1)   |   |  |
cc_tracking_guid   | character varying(33)  |   |  |
cc_numrows | character varying(50)  |   |  |
cc_owner   | character varying(30)  |   |  |
cc_password| character varying(30)  |   |  |
parent_guid| character varying(33)  |   |  |
Indexes:
"xpklm_queue" PRIMARY KEY, btree (guid)
"idx_hn_cn_dsid_sc_dst_dfc" btree (host_name, collection_name, ds_id, 
status_code, ds_start_time, ds_fail_count)
"ix_lm_cc" btree (collection_name, client_name)
"ix_lm_chl" btree (client_name, host_name, lm_id)


Query and plan:

explain
SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, 
PROCESS_TYPE, PARENT_GUID
FROM LOAD.LM_QUEUE lmq1
WHERE CLIENT_NAME='WLCASES'
AND HOLD_FLAG='Y'
AND HOST_NAME='WLLOADB'
AND STATUS_CODE in (1)
AND NOT EXISTS
(SELECT COLLECTION_NAME
   FROM LOAD.LM_QUEUE lmq2
  WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME
AND LM_ID <> 0
AND PROCESS_TYPE NOT IN('EXTRACT'))
OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')
   AND LM_ID = 0
   AND CLIENT_NAME='WLCASES'
   AND HOST_NAME= 'WLLOADB' )
ORDER BY PRIORITY DESC, REQUEST_TIME ASC;


Sort  (cost=1578.99..1579.00 rows=1 width=120)
   Sort Key: lmq1.priority DESC, lmq1.request_time
   ->  Index Scan using ix_lm_chl on lm_queue lmq1  (cost=0.40..1578.98 rows=1 
width=120)
 Index Cond: (((client_name)::text = 'WLCASES'::text) AND 
((host_name)::text = 'WLLOADB'::text))
 Filter: (((hold_flag = 'Y'::bpchar) AND (status_code = '1'::numeric) 
AND (NOT (alternatives: SubPlan 1 or hashed SubPlan 2))) OR 
(((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = 
'0'::numeric)))
 SubPlan 1
   ->  Index Scan using ix_lm_cc on lm_queue lmq2  (cost=0.40..177.93 
rows=1 width=0)
 Index Cond: ((collection_name)::text = 
(lmq1.collection_name)::text)
 Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 
'EXTRACT'::text))
 SubPlan 2
   ->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 
width=32)
 Filter: ((lm_id <> '0'::numer

Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

2023-01-18 Thread Ron

On 1/18/23 10:54, Rob Sargent wrote:

On 1/18/23 09:38, HECTOR INGERTO wrote:


I wanted to understand the underlying issue.

I use ZFS snapshots instead of a “correct” backup because with only two 
machines it allows me to have backups in the main machine and in the 
secondary too that acts as hotspare at the same time.


To accomplish the same I would need 3 nodes. The main, the replica 
hotspare and the proper backup.




Isn't "proper backup" typically thought of as offsite reloadable file 
capture, hopefully relatively recent?  Disaster recovery, and all that.


That is in fact what "proper backup" means.  "Hot spare" means high 
availability, and *HA is not DR*.




--
Born in Arizona, moved to Babylonia.

Re: Interpreting postgres execution plan along with AND/OR precedence

2023-01-18 Thread Tom Lane
"Dirschel, Steve"  writes:
>  Filter: ... (alternatives: SubPlan 1 or hashed SubPlan 2) ...
>  SubPlan 1
>->  Index Scan using ix_lm_cc on lm_queue lmq2  (cost=0.40..177.93 
> rows=1 width=0)
>  Index Cond: ((collection_name)::text = 
> (lmq1.collection_name)::text)
>  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text 
> <> 'EXTRACT'::text))
>  SubPlan 2
>->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 
> width=32)
>  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text 
> <> 'EXTRACT'::text))

> I understand SubPlan 1 above- it is joining into the NOT EXISTS via the
> lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME and then applying the other
> filtering inside the NOT EXISTS.  But I don't understand SubPlan 2.
> Given the filter conditions under SubPlan 2 it is also coming from the
> NOT EXISTS because that is where LM_ID <> 0 and PROCESS_TYPE <> EXTRACT
> exist but I don't understand the scenario where this query would need to
> use SubPlan 2.  Would anyone be able to explain under what condition(s)
> SubPlan 2 would get executed?

The key is the "alternatives:" notation.  SubPlan 1 can be used in a
"retail" fashion by invoking it once per outer row, passing a new
value of lmq1.collection_name each time.  SubPlan 2 is meant to be
invoked just once, and its output (ie, all the relevant values of
lmq2.collection_name) will be loaded into an in-memory hash table
which is then probed for each outer row.  At the point where these
subplans are made, we don't have enough information about how many
outer rows there will be to decide which way is better, so we create
both subplans and postpone the decision till execution.  That's all
just related to the EXISTS clause, though.

(Since v14 we don't do it like that anymore, so that this confusing
EXPLAIN notation is gone.)

> I'm trying to understand the precedence of AND/OR operations when
> everything is not tied together with ()'s.

The OR is lower priority than all the ANDs, so yeah moving some
clauses to be after the OR would change the semantics.  I think
you probably need some more parentheses here; it's not clear
exactly what semantics you are after.

regards, tom lane




Re: Interpreting postgres execution plan along with AND/OR precedence

2023-01-18 Thread Erik Wienhold
> On 18/01/2023 18:56 CET Tom Lane  wrote:
>
> "Dirschel, Steve"  writes:
> >  Filter: ... (alternatives: SubPlan 1 or hashed SubPlan 2) ...
> >  SubPlan 1
> >->  Index Scan using ix_lm_cc on lm_queue lmq2  
> > (cost=0.40..177.93 rows=1 width=0)
> >  Index Cond: ((collection_name)::text = 
> > (lmq1.collection_name)::text)
> >  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text 
> > <> 'EXTRACT'::text))
> >  SubPlan 2
> >->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 
> > width=32)
> >  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text 
> > <> 'EXTRACT'::text))
> 
> > I understand SubPlan 1 above- it is joining into the NOT EXISTS via the
> > lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME and then applying the other
> > filtering inside the NOT EXISTS.  But I don't understand SubPlan 2.
> > Given the filter conditions under SubPlan 2 it is also coming from the
> > NOT EXISTS because that is where LM_ID <> 0 and PROCESS_TYPE <> EXTRACT
> > exist but I don't understand the scenario where this query would need to
> > use SubPlan 2.  Would anyone be able to explain under what condition(s)
> > SubPlan 2 would get executed?
> 
> The key is the "alternatives:" notation.  SubPlan 1 can be used in a
> "retail" fashion by invoking it once per outer row, passing a new
> value of lmq1.collection_name each time.  SubPlan 2 is meant to be
> invoked just once, and its output (ie, all the relevant values of
> lmq2.collection_name) will be loaded into an in-memory hash table
> which is then probed for each outer row.  At the point where these
> subplans are made, we don't have enough information about how many
> outer rows there will be to decide which way is better, so we create
> both subplans and postpone the decision till execution.  That's all
> just related to the EXISTS clause, though.
>
> (Since v14 we don't do it like that anymore, so that this confusing
> EXPLAIN notation is gone.)

EXPLAIN ANALYZE shows which subplan gets executed.  Look for "never executed".

--
Erik




Re: Interpreting postgres execution plan along with AND/OR precedence

2023-01-18 Thread Erik Wienhold
> On 18/01/2023 18:03 CET Dirschel, Steve  
> wrote:
>
> The plan changes:
>
> Sort (cost=9382.94..9382.97 rows=12 width=169)
> Sort Key: lmq1.priority DESC, lmq1.request_time
> -> Bitmap Heap Scan on lm_queue lmq1 (cost=4572.59..9382.73 rows=12 width=169)
> Recheck Cond: ((client_name)::text = 'WLCASES'::text)
> Filter: ((NOT (alternatives: SubPlan 1 or hashed SubPlan 2)) OR 
> (((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id 
> = '0'::numeric) AND ((host_name)::text = 'WLLOADB'::text) AND (hold_flag = 
> 'Y'::bpchar) AND ((host_name)::text = 'WLLOADB'::text) AND (status_code = 
> '1'::numeric)))
> -> Bitmap Index Scan on ix_lm_chl (cost=0.00..4572.58 rows=25 width=0)
> Index Cond: ((client_name)::text = 'WLCASES'::text)
> SubPlan 1
> -> Bitmap Heap Scan on lm_queue lmq2 (cost=164.44..188.42 rows=1 width=0)
> Recheck Cond: ((lmq1.collection_name)::text = (collection_name)::text)
> Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 
> 'EXTRACT'::text))
> -> Bitmap Index Scan on ix_lm_cc (cost=0.00..164.44 rows=6 width=0)
> Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)
> SubPlan 2
> -> Seq Scan on lm_queue lmq2_1 (cost=0.00..124999.49 rows=25 width=32)
> Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 
> 'EXTRACT'::text))
>
> In the original plan above I believe the query drives off index ix_lm_chl
> applying both client_name = WLCASES and host_name = WLLOADB to the index cond.
> In the plan directly above I believe it also drives off index ix_lm_chl but it
> is only applying client_name = WLCASES to the index cond.
>
> If AND’s are applied first then why wouldn’t the modified query apply both
> client_name = WLCASES and host_name = WLLOADB to the index cond? Is it because
> those are moved below the OR condition?

Yes because those two conditions are combined with OR and that's handled by the
bitmap scans in your second execution plan.  See also:
https://www.postgresql.org/docs/14/indexes-bitmap-scans.html

--
Erik




Tools for moving normalized data around

2023-01-18 Thread Peter
Hi,

  imagine I have a database containing normalized data: a whole bunch
of tables all related via foreign keys (i.e. the thing one should
usually have ;) ).

So there is a dependency graph: all records relate to others in
some tree-like fashion (has-many, belongs-to, etc.)

Now I want to grab some part of the data, on a certain condition
(let's say all records belonging to user 'Bob', if there is a "user"
table somewhere at the tree-bottom), and move it to another database
with the very same layout - which is already populated with data
and runs a different instance of the same application.

Grabbing the intended records is just some dead-simple Selects. But
then inserting them into the other database is not fun, because some
primary keys will likely collide.
And if we start to update the primary keys to new unique values, we
must consequently update ALL the foreign keys throughout the entire
tree of tables.

How to do this with two tables connected via one foreign key, that is
explained a dozen times in Stackoverflow. But what if the tree is 50
tables and 120 foreign key columns?
It can be done. But probably not manually.

So, as this seems a very usual use-case for normalized data, is there
any tooling available? Anywhere? (I searched, I didn't find.)

Also, it seems the provided commands are not very supporting. Try to
do an Insert and *always* create a new PK from the sequence, and
return the old and the new PK for the inserted row. It seems this does
not work without either naming all the other columns explicitely in
the insert (impossible, they are always different) or establishing
some trigger functions...


Different example, same general problem: Imagine I have such a tree
of normalized tables, and I want to grab a part of it and roll only
that part back in time, to some state it was X weeks ago.
(That's possible if we store records of changes in an extra column
within the rows themselves. It might even be possible with some
postgres-internal data, which has a knowledge of history - but there
seems no API access to that.)

But I want to have this working on click-button, independent
of the table layouts, only with a basic graph of the dependency tree
and with an unlimited number of involved tables, i.e.: maintainable.


I have searched for solutions (or building blocks to solutions) for
these and similar problems, and didn't find much. I don't understand
that - SQL is old, normalization is old, (even postgres has already
become quite old) and these are genuine issues.
It can't be I'm the only one thinking about such things.

So maybe I'm looking at the wrong place? Any ideas welcome.

Cheers,
PMc




Re: Tools for moving normalized data around

2023-01-18 Thread Jeremy Smith
>
>
> How to do this with two tables connected via one foreign key, that is
> explained a dozen times in Stackoverflow. But what if the tree is 50
> tables and 120 foreign key columns?
> It can be done. But probably not manually.
>
> So, as this seems a very usual use-case for normalized data, is there
> any tooling available? Anywhere? (I searched, I didn't find.)
>
>
I haven't used it, but this seems like the problem that Jailer is trying to
solve: https://github.com/Wisser/Jailer


Re: Tools for moving normalized data around

2023-01-18 Thread Dominique Devienne
On Wed, Jan 18, 2023 at 9:03 PM Jeremy Smith  wrote:
>> How to do this with two tables connected via one foreign key, that is
>> explained a dozen times in Stackoverflow. But what if the tree is 50
>> tables and 120 foreign key columns?
>> It can be done. But probably not manually.
>>
>> So, as this seems a very usual use-case for normalized data, is there
>> any tooling available? Anywhere? (I searched, I didn't find.)
>
> I haven't used it, but this seems like the problem that Jailer is trying to 
> solve: https://github.com/Wisser/Jailer

Seems to dothe subsetting alright, but it doesn't mention the
conflict-resolution for inserting into an already populated target
schema.

We've done something similar, merging many same-schema DBs into a
single local SQLite DB (with conflicts, thus no constraints),
iteratively resolving the conflicts (SK/PK and NK) per-table in
topological order, propagating PK changes to FKs in child tables
later.
Then load the result into a full-constrained PostgreSQL DB. Clever'er
minds can probably do it all in SQL, but we did it with a combination
of imperative code and SQL. Not exactly your use case Peter, but close
enough I think. I don't think a tool could have done what we did,
it's too ad-hoc and specific to our use case. Took a while, and
required lots of testing (unit tests, and functional QA tests). FWIW.
--DD




Re: Tools for moving normalized data around

2023-01-18 Thread Gavan Schneider
On 19 Jan 2023, at 6:47, Peter wrote:

> Now I want to grab some part of the data, on a certain condition
> (let's say all records belonging to user 'Bob', if there is a "user"
> table somewhere at the tree-bottom), and move it to another database
> with the very same layout - which is already populated with data
> and runs a different instance of the same application.
>
> Grabbing the intended records is just some dead-simple Selects. But
> then inserting them into the other database is not fun, because some
> primary keys will likely collide.
>
A very small sliver in this problem:
The key collision problem could be avoided if the unique and arbitrary keys 
were UUID
Many of the other keys should be related to their respective “table of truth” 
so the migration is dependant on these being the same across the locations

The rest TBA down thread :)

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920




Re: Tools for moving normalized data around

2023-01-18 Thread Rob Sargent

On 1/18/23 13:15, Gavan Schneider wrote:

On 19 Jan 2023, at 6:47, Peter wrote:


Now I want to grab some part of the data, on a certain condition
(let's say all records belonging to user 'Bob', if there is a "user"
table somewhere at the tree-bottom), and move it to another database
with the very same layout - which is already populated with data
and runs a different instance of the same application.

Grabbing the intended records is just some dead-simple Selects. But
then inserting them into the other database is not fun, because some
primary keys will likely collide.


A very small sliver in this problem:
The key collision problem could be avoided if the unique and arbitrary keys 
were UUID
Many of the other keys should be related to their respective “table of truth” 
so the migration is dependant on these being the same across the locations

The rest TBA down thread :)

Regards

Gavan Schneider


You'll have to "know" the tree and order inserts accordingly. Starting 
with the independent tables, then their dependants, then /their/ 
dependants etc.
And if the ids are currently integers, any chance you can negate the in 
transit?  You'll have to disable any sequence that's in play.


Database size different on Primary and Standby?

2023-01-18 Thread Hilbert, Karin
I manage some PostgreSQL clusters on Linux.  We have a Primary & two Standby 
servers & for Production, there is also a DR server.  We use repmgr for our HA 
solution & the Standbys are cloned from the Primary using the repmgr standby 
clone command.

My manager asked for a report of all the user databases & their sizes for each 
server in the cluster.
I used the psql "\l+" command & then extracted the database name & the size 
from the output.
I expected the databases to be the same size on the Standbys as on the Primary, 
but I found that some of the databases were smaller on the Standby servers than 
on the Primary.

For example, the output on the Primary for one of the user databases showed as: 
 8997 kB, but on the Standbys, it was 8849 kB.

I even dropped the database on the Primary & then restored it from a backup.  
Then checked the sizes again & they still showed the difference.

I also found that the template1 database on the Primary was 7821 kB, but on the 
Standbys, it was 7673 kB.
Is this normal?  Why would the sizes be different?

Thanks,
Karin Hilbert
Database Administration
Pennsylvania State University



Re: Database size different on Primary and Standby?

2023-01-18 Thread Erik Wienhold
> On 19/01/2023 00:09 CET Hilbert, Karin  wrote:
>
> I manage some PostgreSQL clusters on Linux. We have a Primary & two Standby
> servers & for Production, there is also a DR server. We use repmgr for our HA
> solution & the Standbys are cloned from the Primary using the repmgr standby
> clone command.
>
> My manager asked for a report of all the user databases & their sizes for each
> server in the cluster. I used the psql "\l+" command & then extracted the
> database name & the size from the output. I expected the databases to be the
> same size on the Standbys as on the Primary, but I found that some of the
> databases were smaller on the Standby servers than on the Primary.
>
> For example, the output on the Primary for one of the user databases showed
> as: 8997 kB, but on the Standbys, it was 8849 kB.

The standbys could be missing some indexes because schema changes are not
replicated and must be applied manually.

> I even dropped the database on the Primary & then restored it from a backup.
> Then checked the sizes again & they still showed the difference.
>
> I also found that the template1 database on the Primary was 7821 kB, but on
> the Standbys, it was 7673 kB. Is this normal? Why would the sizes be 
> different?

Is template1 identical (schema and data) on primary and standby?

Could also be different page sizes. But that's a compilation option. What does
SHOW block_size say on those systems?

--
Erik




Re: Database size different on Primary and Standby?

2023-01-18 Thread Ian Lawrence Barwick
2023年1月19日(木) 8:50 Erik Wienhold :
>
> > On 19/01/2023 00:09 CET Hilbert, Karin  wrote:
> >
> > I manage some PostgreSQL clusters on Linux. We have a Primary & two Standby
> > servers & for Production, there is also a DR server. We use repmgr for our 
> > HA
> > solution & the Standbys are cloned from the Primary using the repmgr standby
> > clone command.
> >
> > My manager asked for a report of all the user databases & their sizes for 
> > each
> > server in the cluster. I used the psql "\l+" command & then extracted the
> > database name & the size from the output. I expected the databases to be the
> > same size on the Standbys as on the Primary, but I found that some of the
> > databases were smaller on the Standby servers than on the Primary.
> >
> > For example, the output on the Primary for one of the user databases showed
> > as: 8997 kB, but on the Standbys, it was 8849 kB.
>
> The standbys could be missing some indexes because schema changes are not
> replicated and must be applied manually.

This is incorrect; with streaming replication all changes applied on the primary
are applied on the standby.

(...)
> Could also be different page sizes. But that's a compilation option. What does
> SHOW block_size say on those systems?

It is impossible to start a standby using binaries built with a
different block size to
the primary.

Regards

Ian Barwick




Re: Database size different on Primary and Standby?

2023-01-18 Thread Erik Wienhold
> On 19/01/2023 01:23 CET Ian Lawrence Barwick  wrote:
>
> 2023年1月19日(木) 8:50 Erik Wienhold :
> >
> > > On 19/01/2023 00:09 CET Hilbert, Karin  wrote:
> > >
> > > I manage some PostgreSQL clusters on Linux. We have a Primary & two 
> > > Standby
> > > servers & for Production, there is also a DR server. We use repmgr for 
> > > our HA
> > > solution & the Standbys are cloned from the Primary using the repmgr 
> > > standby
> > > clone command.
> > >
> > > My manager asked for a report of all the user databases & their sizes for 
> > > each
> > > server in the cluster. I used the psql "\l+" command & then extracted the
> > > database name & the size from the output. I expected the databases to be 
> > > the
> > > same size on the Standbys as on the Primary, but I found that some of the
> > > databases were smaller on the Standby servers than on the Primary.
> > >
> > > For example, the output on the Primary for one of the user databases 
> > > showed
> > > as: 8997 kB, but on the Standbys, it was 8849 kB.
> >
> > The standbys could be missing some indexes because schema changes are not
> > replicated and must be applied manually.
>
> This is incorrect; with streaming replication all changes applied on the 
> primary
> are applied on the standby.

Thanks. I was thinking about logical replication.

> (...)
> > Could also be different page sizes. But that's a compilation option. What 
> > does
> > SHOW block_size say on those systems?
>
> It is impossible to start a standby using binaries built with a
> different block size to
> the primary.

Makes sense for streaming replication.

--
Erik




Re: Database size different on Primary and Standby?

2023-01-18 Thread Tom Lane
Erik Wienhold  writes:
> On 19/01/2023 01:23 CET Ian Lawrence Barwick  wrote:
>> This is incorrect; with streaming replication all changes applied on the 
>> primary
>> are applied on the standby.

> Thanks. I was thinking about logical replication.

It's not entirely clear whether the OP is talking about physical or
logical replication --- the discrepancy would sure be a lot easier
to explain if it is logical rep.  In physical rep the databases
*should* be pretty much physically identical.  I think though that
the amount of WAL that's being kept around could be different.

regards, tom lane




Re: Database size different on Primary and Standby?

2023-01-18 Thread Ron

On 1/18/23 17:09, Hilbert, Karin wrote:
I manage some PostgreSQL clusters on Linux.  We have a Primary & two 
Standby servers & for Production, there is also a DR server.  We use 
repmgr for our HA solution & the Standbys are cloned from the Primary 
using the *repmgr standby clone* command.


My manager asked for a report of all the user databases & their sizes for 
each server in the cluster.
I used the psql *"\l+"* command & then extracted the database name & the 
size from the output.
I expected the databases to be the same size on the Standbys as on the 
Primary, but I found that some of the databases were smaller on the 
Standby servers than on the Primary.


For example, the output on the Primary for one of the user databases 
showed as: *8997 kB*,but on the Standbys, it was *8849 kB*.


I even dropped the database on the Primary & then restored it from a 
backup.  Then checked the sizes again & they still showed the difference.


I also found that the template1 database on the Primary was *7821 kB*, but 
on the Standbys, it was *7673 kB*.

*/Is this normal?  Why would the sizes be different?/*


Maybe it is.  8997 - 8849 = 7821 - 7673 = 148.  That's right: both primaries 
are exatly 148KB larger.


This will tell you where the differences are: $ du -kc $PGDATA

--
Born in Arizona, moved to Babylonia.