PG 13.6 : Data corruption error message not being sent to syslog

2022-05-25 Thread Abhishek Bhola
This morning I noticed this error in my PG CSV log file.

```
2022-05-25 23:59:17.776 JST,,,117110,,628cf2c5.1c976,1,,2022-05-25 23:59:17
JST,9/611296,0,ERROR,XX001,"uncommitted xmin 16395356 from before xid
cutoff 144683296 needs to be frozen","while scanning block 19267 of
relation ""relation_name""
```

This was a data corruption error that I resolved by truncating the table
and reloading it. However, I was curious why this message was not sent to
my syslog.

My logging configuration in the `postgresql.conf` is as follows:
```
# Logging settings
# 
log_destination = 'csvlog,syslog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql_%Y%m%d.log'
log_truncate_on_rotation = off
log_rotation_age = 1h
log_rotation_size = 0

log_timezone = 'Japan'
log_line_prefix = '%t [%p]: [%l-1] %h:%u@%d:[PG]:CODE:%e '

log_statement = 'all'
log_min_messages = info # DEBUG5
log_min_error_statement = info  # DEBUG5
log_error_verbosity = default
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0
log_connections = on
log_disconnections = on
log_duration = off
log_min_duration_statement = 1000
log_autovacuum_min_duration = 3000ms
```

The OS info for the server on which this DB is running is as follows:
```
LSB Version::core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:CentOS Linux release 7.9.2009 (Core)
Release:7.9.2009
Codename:   Core
```

And there is a filter defined for the rsyslog messages as follows:
```
## Added by DataConsulting Team for syslog filter
if $programname == 'postgres' and \
($msg contains 'CODE:28000'or \
$msg contains 'CODE:28P01' or \
$msg contains 'CODE:3D000' or \
$msg contains 'CODE:08006' or \
$msg contains 'CODE:42501'\
) then ///xxx/pg_log/postgres_filter.log

if $programname == 'postgres' and \
( \
not ($msg contains '[PG]') or \
$msg contains 'CODE:0' or \
$msg contains 'CODE:28000' or \
$msg contains 'CODE:28P01' or \
$msg contains 'CODE:3D000' or \
$msg contains 'CODE:42501' or \
$msg contains 'CODE:42601' or \
$msg contains 'CODE:42P01' or \
$msg contains 'CODE:42P02' or \
$msg contains 'CODE:08006' or \
$msg contains  'CODE:42703'   \
) then stop
```

`syslog_ident` and `syslog_facility` is as follows:
```
postgres=# show syslog_ident ;
 syslog_ident
--
 postgres
(1 row)

postgres=# show syslog_facility ;
 syslog_facility
-
 local0
(1 row)
```

I was thinking that such an error message would be sent to the
`/var/log/message`, from where it would be picked, but this wasn't the
case. Could someone explain why this is so and what can I change for these
messages to be sent to syslog?

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: PG 13.6 : Data corruption error message not being sent to syslog

2022-05-25 Thread Abhishek Bhola
But the error doesn't consist of any of the above codes, so why is it being
skipped?

On Wed, 25 May 2022, 23:11 Tom Lane,  wrote:

> Abhishek Bhola  writes:
> > This morning I noticed this error in my PG CSV log file.
> > 2022-05-25 23:59:17.776 JST,,,117110,,628cf2c5.1c976,1,,2022-05-25
> 23:59:17
> > JST,9/611296,0,ERROR,XX001,"uncommitted xmin 16395356 from before xid
> > cutoff 144683296 needs to be frozen","while scanning block 19267 of
> > relation ""relation_name""
>
> > This was a data corruption error that I resolved by truncating the table
> > and reloading it. However, I was curious why this message was not sent to
> > my syslog.
>
> Seems to me this filter explains that well enough:
>
> > ## Added by DataConsulting Team for syslog filter
> > if $programname == 'postgres' and \
> > ($msg contains 'CODE:28000'or \
> > $msg contains 'CODE:28P01' or \
> > $msg contains 'CODE:3D000' or \
> > $msg contains 'CODE:08006' or \
> > $msg contains 'CODE:42501'\
> > ) then ///xxx/pg_log/postgres_filter.log
>
> You should realize of course that syslog is not a 100% reliable logging
> mechanism -- at least most implementations are capable of dropping
> messages under load.  But we needn't guess about reasons for missed
> messages here.
>
> regards, tom lane
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Abhishek Bhola
I recently set up a Postgres15 master-slave (Primary-secondary) cluster on
2 nodes. At the time of starting, I ensured that data files are exactly the
same on both the nodes. The size of the DB is 1.5TB.
The directory structure on both nodes looks as follows:
```
  - /PROD/datadg/
   |
   |-> /PROD/datadg/tablespace
   |   |-> /PROD/datadg/tablespace/tablespace1
   |   |-> /PROD/datadg/tablespace/tablespace2
   |
   |-> /PROD/datadg/data
   |   |-> /PROD/datadg/data/pg_tblspc
   |   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
/PROD/datadg/tablespace/tablespace1
   |   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
/PROD/datadg/tablespace/tablespace2

```
Almost a week later now, I see almost a 2GB size difference in the
tablespace folders on the 2 nodes. I also see some file count difference on
both the nodes.

`autovacuum` is on on both the nodes and there aren't any `idle in
transaction` queries on the slave node. Also there hasn't been any
disruption on the streaming replication. I did not get any error like `WAL
segment already removed` or so on the slave node. `pg_stat_replication` on
the master node also doesn't show anything out of the ordinary and the
`sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not
see a difference in counts of most tables either, haven't verified for all
of them.
So my **first question** is:

* Why is there a difference in the files in the tablespace folder? I can
understand the difference in the modification timestamps, but some files
are just missing on the slave node.

Now if I were to run `vacuumdb` on the master node, there are chances that
the slave node will break and give an error like this
```
PANIC,XX000,"WAL contains references to invalid pages","WAL redo at
875E/21A70BD0 for
 Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel
16405/16419/533716933, fork 2, blk 26; blkref #1: rel
16405/16419/533716933, blk 853758""","startup
```

In the case when slave node breaks, these are the steps I usually do to
bring the slave node back:
1) Start `pg_backup_start('backup')` on the master node
2) rsync the files from master to slave by running the following on the
slave node:
```
rsync -av --delete master_node:/PROD/datadg/data/ /PROD/datadg/data
--exclude 'pg_log' --exclude 'pg_replslot'
```
3. Stop `pg_backup_stop()` on  master node

4. Start the slave node again and it usually works, even though the
tablespace files might not still be the same.


**Second question**:

* What is the best way to bring the slave node back? Is the `rsync` between
tablespaces required? And if yes, what is the best method to do it for very
large databases, something maybe as big as 30TB or more. I don't want to
`rsync` all the files even if the timestamp on them is different. So is a
command like this safe to do? Or should an option like `--checksum` be
used?
```
rsync -av --delete master_node:/PROD/datadg/tablespace/
/PROD/datadg/tablespace --size-only
```

**Third question:**

* Is it advised to run `vacuumdb` before or after bringing the slave node
back again?

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Abhishek Bhola
Hi Stephen

Thank you for your reply.

Basically, this is not a valid way to perform a backup/restore of PG.

Is it not valid only for PG 15 or even for earlier versions? I have always
referred to this https://www.postgresql.org/docs/8.1/backup-online.html


> better, use an existing well maintained backup/restore tool for this

Is there any tool you could recommend? A tool to bring back the standby
node when it was down for a day or so for some OS/firmware upgrade.
I have tried pgBackRest, but it requires a control node. So anything that
doesn't involve a 3rd server for this situation would be helpful.

 this also doesn't grab and restore the absolutely required
> backup_label file that's returned from pg_backup_stop()
>
I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
but I did not see any backup_label_file being created.
psql (15.1)
Type "help" for help.

postgres=# select pg_backup_start('backup');
 pg_backup_start
-
 68/3228
(1 row)

postgres=# select pg_backup_stop();
NOTICE:  all required WAL segments have been archived
pg_backup_stop
---
 (68/32000100,"START WAL LOCATION: 68/3228 (file
000100680032)+
 CHECKPOINT LOCATION: 68/3260
  +
 BACKUP METHOD: streamed
   +
 BACKUP FROM: primary
  +
 START TIME: 2023-07-26 08:51:28 JST
   +
 LABEL: backup
   +
 START TIMELINE: 1
   +
 ","16724 /PGDATA/datadg/tbs1
  +
 16725 /PGDATA/datadg/tbs2
   +
 ")
(1 row)

I read the documentation on this page
https://www.postgresql.org/docs/current/functions-admin.html

> The desired contents of the backup label file and the tablespace map file
> are returned as part of the result of the function and must be written to
> files in the backup area.

I don't understand, "*must be written to files in the backup area*". Does
it mean we need to manually create a file first on the master node and then
rsync it to the backup node?

Thanks

On Tue, Jul 25, 2023 at 9:22 PM Stephen Frost  wrote:

> Greetings,
>
> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> > I recently set up a Postgres15 master-slave (Primary-secondary) cluster
> on
> > 2 nodes. At the time of starting, I ensured that data files are exactly
> the
> > same on both the nodes. The size of the DB is 1.5TB.
> > The directory structure on both nodes looks as follows:
> > ```
> >   - /PROD/datadg/
> >|
> >|-> /PROD/datadg/tablespace
> >|   |-> /PROD/datadg/tablespace/tablespace1
> >|   |-> /PROD/datadg/tablespace/tablespace2
> >|
> >|-> /PROD/datadg/data
> >|   |-> /PROD/datadg/data/pg_tblspc
> >|   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
> > /PROD/datadg/tablespace/tablespace1
> >|   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
> > /PROD/datadg/tablespace/tablespace2
> >
> > ```
> > Almost a week later now, I see almost a 2GB size difference in the
> > tablespace folders on the 2 nodes. I also see some file count difference
> on
> > both the nodes.
>
> Not really enough to go on here.
>
> > `autovacuum` is on on both the nodes and there aren't any `idle in
> > transaction` queries on the slave node. Also there hasn't been any
> > disruption on the streaming replication. I did not get any error like
> `WAL
> > segment already removed` or so on the slave node. `pg_stat_replication`
> on
> > the master node also doesn't show anything out of the ordinary and the
> > `sent_lsn`, `write_lsn` and `flush_lsn` are regularly updated. I can not
> > see a difference in counts of most tables either, haven't verified for
> all
> > of them.
> > So my **first question** is:
> >
> > * Why is there a difference in the files in the tablespace folder? I can
> > understand the difference in the modification timestamps, but some files
> > are just missing on the slave node.
>
> Unlogged tables would certainly be a pretty easy explanation of size
> differences between the two.  There's also temporary files that might be
> created on one system but not the other for in-progress queries.  These
> aren't the only possibilities but just a couple of likely candidates.
>
> > Now if I were to run `vacuumdb` on the master node, there are chances
> that
> > the slave node will break and give an error like this
> > ```
> > PANIC,XX000,"WAL contains references to invalid pages","WAL redo at
> > 875E/21A70BD0 for
> >  Heap2/VISIBLE: cutoff xid 60350476 flags 0x01; blkref #0: rel
> > 16405/16419/533716

Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-25 Thread Abhishek Bhola
Hi Stephen

I got the latest documentation
<https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP>and
understood that I was supposed to run
`select * from pg_backup_stop();`  and store the labelfile output to a file
in the data directory of the secondary node.

I also understood that this Low-level API backup is not really
a recommended way and I need to move to another method.
However, till I fully set up a tool like pgBackRest (of which you are one
of the contributors - very impressive!) for PROD, can you please tell me:
1) Are there any other options that are safe to use yet fast? Like you said
rsync --size-only would miss a lot of changes, but anything that would also
not start syncing every data file with only different timestamp, even
though it is exactly the same, including the checksum.
2) While rsyncing from the master node, do I need to exclude `pg_wal`
folder?
3) The replica/standby node should have access to the WAL segments archived
during time between start and stop backup, right?
4) What kind of data corruption would have been introduced due to the
absence of labelfile and if there is a way to identify the corruption? So
for example, if I am able to reindex all the tables, select all the data
and vacuum the db, do I still need to pg_dump and pg_restore those DBs?

Thanks

On Wed, Jul 26, 2023 at 11:00 AM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> Hi Stephen
>
> Thank you for your reply.
>
> Basically, this is not a valid way to perform a backup/restore of PG.
>
> Is it not valid only for PG 15 or even for earlier versions? I have always
> referred to this https://www.postgresql.org/docs/8.1/backup-online.html
>
>
>> better, use an existing well maintained backup/restore tool for this
>
> Is there any tool you could recommend? A tool to bring back the standby
> node when it was down for a day or so for some OS/firmware upgrade.
> I have tried pgBackRest, but it requires a control node. So anything that
> doesn't involve a 3rd server for this situation would be helpful.
>
>  this also doesn't grab and restore the absolutely required
>> backup_label file that's returned from pg_backup_stop()
>>
> I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
> but I did not see any backup_label_file being created.
> psql (15.1)
> Type "help" for help.
>
> postgres=# select pg_backup_start('backup');
>  pg_backup_start
> -
>  68/3228
> (1 row)
>
> postgres=# select pg_backup_stop();
> NOTICE:  all required WAL segments have been archived
> pg_backup_stop
>
> ---
>  (68/32000100,"START WAL LOCATION: 68/3228 (file
> 000100680032)+
>  CHECKPOINT LOCATION: 68/3260
> +
>  BACKUP METHOD: streamed
>+
>  BACKUP FROM: primary
> +
>  START TIME: 2023-07-26 08:51:28 JST
>+
>  LABEL: backup
>+
>  START TIMELINE: 1
>+
>  ","16724 /PGDATA/datadg/tbs1
> +
>  16725 /PGDATA/datadg/tbs2
>+
>  ")
> (1 row)
>
> I read the documentation on this page
> https://www.postgresql.org/docs/current/functions-admin.html
>
>> The desired contents of the backup label file and the tablespace map file
>> are returned as part of the result of the function and must be written to
>> files in the backup area.
>
> I don't understand, "*must be written to files in the backup area*". Does
> it mean we need to manually create a file first on the master node and then
> rsync it to the backup node?
>
> Thanks
>
> On Tue, Jul 25, 2023 at 9:22 PM Stephen Frost  wrote:
>
>> Greetings,
>>
>> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
>> > I recently set up a Postgres15 master-slave (Primary-secondary) cluster
>> on
>> > 2 nodes. At the time of starting, I ensured that data files are exactly
>> the
>> > same on both the nodes. The size of the DB is 1.5TB.
>> > The directory structure on both nodes looks as follows:
>> > ```
>> >   - /PROD/datadg/
>> >|
>> >|-> /PROD/datadg/tablespace
>> >|   |-> /PROD/datadg/tablespace/tablespace1
>> >|   |-> /PROD/datadg/tablespace/tablespace2
>> >|
>> >|-> /PROD/datadg/data
>> >|   |-> /PROD/datadg/data/pg_tblspc
>> >|   |   | -> /PROD/datadg/data/pg_tblspc/16432 ->
>> > /PROD/datadg/tablespace/tablespace1
>> >|   |   | -> /PROD/datadg/data/pg_tblspc/16433 ->
>> > /PROD/datadg/

Re: Difference in the tablespace folders on primary and secondary nodes

2023-07-26 Thread Abhishek Bhola
Hi Stephen

Thank you for the very detailed reply. I tried the above method and it
works. I'm still setting up pgBackRest.

I have one last question if you can answer that too please.

> A delta restore will only restore those
> files on the replica which are different from what was in the backup and
> that can be done using timestamp/file-size because pgBackRest tracks
> that information and will set it on restore.

As per my understanding, primary and standby nodes can have exactly the
same data, with no data corruption, but still have different timestamps on
the data files in the tablespace folder. Please correct me if I am wrong.
If that's the case, then will the  backup (incremental/delta)  taken from
the primary node on pgBackRest, restore the files on the secondary node,
just because they have a different timestamp? Or does pgBackRest have some
mechanism to detect this and skip those files? Please assume for this case
that we ran out of capacity to store the WAL segments from the primary
while the secondary node was down.

Thanks

On Wed, 26 Jul 2023, 22:55 Stephen Frost,  wrote:

> Greetings,
>
> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> > > Basically, this is not a valid way to perform a backup/restore of PG.
> >
> > Is it not valid only for PG 15 or even for earlier versions? I have
> always
> > referred to this https://www.postgresql.org/docs/8.1/backup-online.html
>
> For earlier versions too.
>
> > > better, use an existing well maintained backup/restore tool for this
> >
> > Is there any tool you could recommend? A tool to bring back the standby
> > node when it was down for a day or so for some OS/firmware upgrade.
> > I have tried pgBackRest, but it requires a control node. So anything that
> > doesn't involve a 3rd server for this situation would be helpful.
>
> pgBackRest doesn't actually require a control node, though it's
> certainly recommended to have one.
>
> Also, if you use physical replication slots, you can have the primary
> hold on to the WAL necessary for the replica to catch back up until the
> replica is back and then it can just fetch the WAL from the primary and
> catch up without any kind of re-sync being necessary.
>
> If you're worried about running out of space on the primary for this
> (which is an entirely valid concern...) then you could ship the WAL
> files to an archive system or location using pgBackRest or possibly some
> other tool and then have the replica fetch the WAL from there once it's
> back online.
>
> pgBackRest also has the ability to do incremental backups and delta
> restores.  An incremental backup will only store the data that's changed
> since the prior backup, minimizing the storage space in the repo and the
> time required for the backup.  A delta restore will only restore those
> files on the replica which are different from what was in the backup and
> that can be done using timestamp/file-size because pgBackRest tracks
> that information and will set it on restore.  pgBackRest does also have
> an option to do checksum-based restores, which it will automatically use
> if anything looks odd regarding the timestamps.
>
> > > this also doesn't grab and restore the absolutely required
> > > backup_label file that's returned from pg_backup_stop()
> >
> > I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
> > but I did not see any backup_label_file being created.
> > psql (15.1)
> > Type "help" for help.
> >
> > postgres=# select pg_backup_start('backup');
> >  pg_backup_start
> > -
> >  68/3228
> > (1 row)
> >
> > postgres=# select pg_backup_stop();
> > NOTICE:  all required WAL segments have been archived
> > pg_backup_stop
> >
> ---
> >  (68/32000100,"START WAL LOCATION: 68/3228 (file
> > 000100680032)+
> >  CHECKPOINT LOCATION: 68/3260
> >   +
> >  BACKUP METHOD: streamed
> >+
> >  BACKUP FROM: primary
> >   +
> >  START TIME: 2023-07-26 08:51:28 JST
> >+
> >  LABEL: backup
> >+
> >  START TIMELINE: 1
> >+
> >  ","16724 /PGDATA/datadg/tbs1
> >   +
> >  16725 /PGDATA/datadg/tbs2
> >+
> >  ")
> > (1 row)
> >
> > I read the documentation on this page
> > https://www.postgresql.org/docs/current/functions-admin.html
>
> What's returned from pg_backup_stop() is the backup_label that needs to
> be stored with

pgBackRest for a 50 TB database

2023-08-27 Thread Abhishek Bhola
Hi

I am trying to use pgBackRest for all my Postgres servers. I have tested it
on a sample database and it works fine. But my concern is for some of the
bigger DB clusters, the largest one being 50TB and growing by about
200-300GB a day.

I plan to mount NAS storage on my DB server to store my backup. The server
with 50 TB data is using DELL Storage underneath to store this data and has
36 18-core CPUs.

As I understand, pgBackRest recommends having 2 full backups and then
having incremental or differential backups as per requirement. Does anyone
have any reference numbers on how much time a backup for such a DB would
usually take, just for reference. If I take a full backup every Sunday and
then incremental backups for the rest of the week, I believe the
incremental backups should not be a problem, but the full backup every
Sunday might not finish in time.

I think converting a diff/incr backup to a full backup has been discussed
here , but not yet
implemented. If there is a workaround, please let me know. Or if someone is
simply using pgBackRest for a bigger DB (comparable to 50TB), please share
your experience with the exact numbers and config/schedule of backups. I
know the easiest way would be to use it myself and find out, but since it
is a PROD DB, I wanted to get some ideas before starting.

Thanks
Abhishek

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: pgBackRest for a 50 TB database

2023-08-27 Thread Abhishek Bhola
Hi Stephen

Thank you for the prompt response.
Hearing it from you makes me more confident about rolling it to PROD.
I will have a discussion with the network team once about and hear what
they have to say and make an estimate accordingly.

If you happen to know anyone using it with that size and having published
their numbers, that would be great, but if not, I will post them once I set
it up.

Thanks for your help.

Cheers,
Abhishek

On Mon, Aug 28, 2023 at 12:22 AM Stephen Frost  wrote:

> Greetings,
>
> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
> > I am trying to use pgBackRest for all my Postgres servers. I have tested
> it
> > on a sample database and it works fine. But my concern is for some of the
> > bigger DB clusters, the largest one being 50TB and growing by about
> > 200-300GB a day.
>
> Glad pgBackRest has been working well for you.
>
> > I plan to mount NAS storage on my DB server to store my backup. The
> server
> > with 50 TB data is using DELL Storage underneath to store this data and
> has
> > 36 18-core CPUs.
>
> How much free CPU capacity does the system have?
>
> > As I understand, pgBackRest recommends having 2 full backups and then
> > having incremental or differential backups as per requirement. Does
> anyone
> > have any reference numbers on how much time a backup for such a DB would
> > usually take, just for reference. If I take a full backup every Sunday
> and
> > then incremental backups for the rest of the week, I believe the
> > incremental backups should not be a problem, but the full backup every
> > Sunday might not finish in time.
>
> pgBackRest scales extremely well- what's going to matter here is how
> much you can give it in terms of resources.  The primary bottle necks
> will be CPU time for compression, network bandwidth for the NAS, and
> storage bandwidth of the NAS and the DB filesystems.  Typically, CPU
> time dominates due to the compression, though if you're able to give
> pgBackRest a lot of those CPUs then you might get to the point of
> running out of network bandwidth or storage bandwidth on your NAS.
> We've certainly seen folks pushing upwards of 3TB/hr, so a 50TB backup
> should be able to complete in less than a day.  Strongly recommend
> taking an incremental backup more-or-less immediately after the full
> backup to minimize the amount of WAL you'd have to replay on a restore.
> Also strongly recommend actually doing serious restore tests of this
> system to make sure you understand the process, have an idea how long
> it'll take to restore the actual files with pgBackRest and then how long
> PG will take to come up and replay the WAL generated during the backup.
>
> > I think converting a diff/incr backup to a full backup has been discussed
> > here <https://github.com/pgbackrest/pgbackrest/issues/644>, but not yet
> > implemented. If there is a workaround, please let me know. Or if someone
> is
> > simply using pgBackRest for a bigger DB (comparable to 50TB), please
> share
> > your experience with the exact numbers and config/schedule of backups. I
> > know the easiest way would be to use it myself and find out, but since it
> > is a PROD DB, I wanted to get some ideas before starting.
>
> No, we haven't implemented that yet.  It's starting to come up higher in
> our list of things we want to work on though.  There are risks to doing
> such conversions though that have to be considered- it creates long
> dependencies on things all working because if there's a PG or pgBackRest
> bug or some way that corruption slipped in then that ends up getting
> propagated down.  If you feel really confident that your restore testing
> is good (full restore w/ PG replaying WAL, running amcheck across the
> entire restored system, then pg_dump'ing everything and restoring it
> into a new PG cluster to re-validate all constraints, doing additional
> app-level review and testing...) then that can certainly help with
> mitigation of the risks mentioned above.
>
> Overall though, yes, people certainly use pgBackRest for 50TB+ PG
> clusters.
>
> Thanks,
>
> Stephen
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: pgBackRest for a 50 TB database

2023-10-02 Thread Abhishek Bhola
Hello,

As said above, I tested pgBackRest on my bigger DB and here are the results.
Server on which this is running has the following config:
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):36
On-line CPU(s) list:   0-35
Thread(s) per core:1
Core(s) per socket:18
Socket(s): 2
NUMA node(s):  2

Data folder size: 52 TB (has some duplicate files since it is restored from
tapes)
Backup is being written on to DELL Storage, mounted on the server.

pgbackrest.conf with following options enabled
repo1-block=y
repo1-bundle=y
start-fast=y


1. *Using process-max: 30, Time taken: ~26 hours*
full backup: 20230926-092555F
timestamp start/stop: 2023-09-26 09:25:55+09 / 2023-09-27
11:07:18+09
wal start/stop: 00010001AC0E0044 /
00010001AC0E0044
database size: 38248.9GB, database backup size: 38248.9GB
repo1: backup size: 6222.0GB

2. *Using process-max: 10, Time taken: ~37 hours*
 full backup: 20230930-190002F
timestamp start/stop: 2023-09-30 19:00:02+09 / 2023-10-02
08:01:20+09
wal start/stop: 00010001AC0E004E /
00010001AC0E004E
database size: 38248.9GB, database backup size: 38248.9GB
repo1: backup size: 6222.0GB

Hope it helps someone to use these numbers as some reference.

Thanks


On Mon, Aug 28, 2023 at 12:30 AM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> Hi Stephen
>
> Thank you for the prompt response.
> Hearing it from you makes me more confident about rolling it to PROD.
> I will have a discussion with the network team once about and hear what
> they have to say and make an estimate accordingly.
>
> If you happen to know anyone using it with that size and having published
> their numbers, that would be great, but if not, I will post them once I set
> it up.
>
> Thanks for your help.
>
> Cheers,
> Abhishek
>
> On Mon, Aug 28, 2023 at 12:22 AM Stephen Frost  wrote:
>
>> Greetings,
>>
>> * Abhishek Bhola (abhishek.bh...@japannext.co.jp) wrote:
>> > I am trying to use pgBackRest for all my Postgres servers. I have
>> tested it
>> > on a sample database and it works fine. But my concern is for some of
>> the
>> > bigger DB clusters, the largest one being 50TB and growing by about
>> > 200-300GB a day.
>>
>> Glad pgBackRest has been working well for you.
>>
>> > I plan to mount NAS storage on my DB server to store my backup. The
>> server
>> > with 50 TB data is using DELL Storage underneath to store this data and
>> has
>> > 36 18-core CPUs.
>>
>> How much free CPU capacity does the system have?
>>
>> > As I understand, pgBackRest recommends having 2 full backups and then
>> > having incremental or differential backups as per requirement. Does
>> anyone
>> > have any reference numbers on how much time a backup for such a DB would
>> > usually take, just for reference. If I take a full backup every Sunday
>> and
>> > then incremental backups for the rest of the week, I believe the
>> > incremental backups should not be a problem, but the full backup every
>> > Sunday might not finish in time.
>>
>> pgBackRest scales extremely well- what's going to matter here is how
>> much you can give it in terms of resources.  The primary bottle necks
>> will be CPU time for compression, network bandwidth for the NAS, and
>> storage bandwidth of the NAS and the DB filesystems.  Typically, CPU
>> time dominates due to the compression, though if you're able to give
>> pgBackRest a lot of those CPUs then you might get to the point of
>> running out of network bandwidth or storage bandwidth on your NAS.
>> We've certainly seen folks pushing upwards of 3TB/hr, so a 50TB backup
>> should be able to complete in less than a day.  Strongly recommend
>> taking an incremental backup more-or-less immediately after the full
>> backup to minimize the amount of WAL you'd have to replay on a restore.
>> Also strongly recommend actually doing serious restore tests of this
>> system to make sure you understand the process, have an idea how long
>> it'll take to restore the actual files with pgBackRest and then how long
>> PG will take to come up and replay the WAL generated during the backup.
>>
>> > I think converting a diff/incr backup to a full backup has been
>> discussed
>> > here <https://github.com/pgbackrest/pgbackrest/issues/644>, but not yet
>> > implemented. If there is a workaround, please let me know. Or if
>> someone is
>> > simpl

Re: pgBackRest for a 50 TB database

2023-10-03 Thread Abhishek Bhola
Hi Stephen

No, I did not try that. Let me try that now and report the numbers here,
both in terms of size and time taken.
Thanks for the suggestion.


On Tue, Oct 3, 2023 at 10:39 PM Stephen Frost  wrote:

> Greetings,
>
> On Mon, Oct 2, 2023 at 20:08 Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> As said above, I tested pgBackRest on my bigger DB and here are the
>> results.
>> Server on which this is running has the following config:
>> Architecture:  x86_64
>> CPU op-mode(s):32-bit, 64-bit
>> Byte Order:Little Endian
>> CPU(s):36
>> On-line CPU(s) list:   0-35
>> Thread(s) per core:1
>> Core(s) per socket:18
>> Socket(s): 2
>> NUMA node(s):  2
>>
>> Data folder size: 52 TB (has some duplicate files since it is restored
>> from tapes)
>> Backup is being written on to DELL Storage, mounted on the server.
>>
>> pgbackrest.conf with following options enabled
>> repo1-block=y
>> repo1-bundle=y
>> start-fast=y
>>
>
> Thanks for sharing!  Did you perhaps consider using zstd for the
> compression..?  You might find that you get similar compression in less
> time.
>
> Thanks.
>
> Stephen
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: pgBackRest for a 50 TB database

2023-10-05 Thread Abhishek Bhola
Hi Stephen

Here is the update with compress-type=zst in the config file
Process-max is still 30. *But it longer than before, around 27 hours 50
mins*

full backup: 20231004-130621F
timestamp start/stop: 2023-10-04 13:06:21+09 / 2023-10-05
15:56:03+09
wal start/stop: 00010001AC0E0054 /
00010001AC0E0054
database size: 38249.0GB, database backup size: 38249.0GB
repo1: backup size: 5799.8GB

Do you think I could be missing something?

@Krishane

Let me try to answer the questions the best I can
1. The Connectivity protocol for DB is FC.
I cannot pinpoint the exact reason why it takes 26 hours. If I knew
exactly, I would have improved it myself.
I don't think 10 hours is even realistic, although if you can improve this
number, please let us know.

2. Yes, it is a dedicated DB server.

3. You're right, it is NAS

Thanks

On Wed, Oct 4, 2023 at 2:37 PM KK CHN  wrote:

> Greetings,
> Happy to hear you successfully performed pgBackRest for a 50TB DB. Out of
> curiosity I would like to know your infrastructure settings.
>
> 1. The  connectivity protocoal and bandwidth you used for your backend
> storage ?  Is it iSCSI, FC FCoE or GbE ? what's the exact reason for
> the 26 Hours it took in the best case ? What factors may reduce 26 Hours to
> much less time say 10 Hour or so for a 50 TB DB to  backup destination ??
> What to  fine tune or deploy  for a better performance?
>
> 2. It has been said that  you are running the DB on a 2 slot 18 core
> processor = 36 Physical cores ..  Is it a dedicated Server H/W entirely
> dedicated for a 50 TB database alone ?
> Why I asked, nowadays mostly we may run the DB servers on VMs in
> virtualized environments..  So I would like to know  all 36 Physical cores
> and associated RAM are all utilized by your 50 TB Database server ? or any
> vacant CPU cores/Free RAM on those server machines?
>
> 3.  What kind of connectivity/bandwidth between DB server and Storage
> backend you established ( I Want to know the server NIC card details,
> Connectivity Channel protocol/bandwidth and Connecting Switch spec from DB
> Server to Storage backend( NAS in this case right ?)
>
> Could you share the recommendations / details as in your case , Becoz I'm
> also in need to perform such a pgBackRest trial from a  production DB  to
> a  suitable Storage Device( Mostly Unified storage  DELL Unity)
>
> Any inputs are most welcome.
>
> Thanks,
> Krishane
>
> On Tue, Oct 3, 2023 at 12:14 PM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> Hello,
>>
>> As said above, I tested pgBackRest on my bigger DB and here are the
>> results.
>> Server on which this is running has the following config:
>> Architecture:  x86_64
>> CPU op-mode(s):32-bit, 64-bit
>> Byte Order:Little Endian
>> CPU(s):36
>> On-line CPU(s) list:   0-35
>> Thread(s) per core:1
>> Core(s) per socket:18
>> Socket(s): 2
>> NUMA node(s):  2
>>
>> Data folder size: 52 TB (has some duplicate files since it is restored
>> from tapes)
>> Backup is being written on to DELL Storage, mounted on the server.
>>
>> pgbackrest.conf with following options enabled
>> repo1-block=y
>> repo1-bundle=y
>> start-fast=y
>>
>>
>> 1. *Using process-max: 30, Time taken: ~26 hours*
>> full backup: 20230926-092555F
>> timestamp start/stop: 2023-09-26 09:25:55+09 / 2023-09-27
>> 11:07:18+09
>> wal start/stop: 00010001AC0E0044 /
>> 00010001AC0E0044
>> database size: 38248.9GB, database backup size: 38248.9GB
>> repo1: backup size: 6222.0GB
>>
>> 2. *Using process-max: 10, Time taken: ~37 hours*
>>  full backup: 20230930-190002F
>> timestamp start/stop: 2023-09-30 19:00:02+09 / 2023-10-02
>> 08:01:20+09
>> wal start/stop: 00010001AC0E004E /
>> 00010001AC0E004E
>> database size: 38248.9GB, database backup size: 38248.9GB
>> repo1: backup size: 6222.0GB
>>
>> Hope it helps someone to use these numbers as some reference.
>>
>> Thanks
>>
>>
>> On Mon, Aug 28, 2023 at 12:30 AM Abhishek Bhola <
>> abhishek.bh...@japannext.co.jp> wrote:
>>
>>> Hi Stephen
>>>
>>> Thank you for the prompt response.
>>> Hearing it from you makes me more confident about rolling it to PROD.
>>> I will have a discussion with the network team once about and hear what
>>> they have to say and make an estimate accordingly.
>

Change error code severity for syslog?

2023-10-11 Thread Abhishek Bhola
Hi

The Postgres errors when sent to syslog have their severity levels
translated as follows:
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

I have a system which raises an alert every time syslog receives an ERROR
(or higher severity level).

For most of the Postgres errors, translating it to WARNING level in syslog
works, however, I wanted to translate *ERROR codes XX000/1/2* in Postgres
to be ERROR in syslog as well, so that it triggers the alert system and I
can notified.

Is there any way to change the severity level for these specific error
codes in Postgres?

Thanks

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: pgBackRest for a 50 TB database

2023-12-19 Thread Abhishek Bhola
Hello Stephen

Just an update on this. After we deployed it on our PROD system, the
results were far better than testing.
Time taken is around 4-5 hours only. And has been the case for the last 3
months or so.
full backup: 20231209-150002F
timestamp start/stop: 2023-12-09 15:00:02+09 / 2023-12-09
19:33:56+09
wal start/stop: 00010001DCC3008E /
00010001DCC300A6
database size: 32834.8GB, database backup size: 32834.8GB
repo1: backup size: 5096.4GB

Now a question. I restored this big DB and it all works fine. However, I
was wondering if there was a way to disable the subscription on Postgres
while restoring the data using pgbackrest?
So for example, I have been taking a backup of this DB which has an active
subscription.
When I am restoring the DB for test purposes, I don't want the subscription
to be there. Is there any option to ignore the subscription?

Thanks

On Thu, Oct 5, 2023 at 10:19 PM Stephen Frost  wrote:

> Greetings,
>
> On Thu, Oct 5, 2023 at 03:10 Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> Here is the update with compress-type=zst in the config file
>> Process-max is still 30. *But it longer than before, around 27 hours 50
>> mins*
>>
>> full backup: 20231004-130621F
>> timestamp start/stop: 2023-10-04 13:06:21+09 / 2023-10-05
>> 15:56:03+09
>> wal start/stop: 00010001AC0E0054 /
>> 00010001AC0E0054
>> database size: 38249.0GB, database backup size: 38249.0GB
>> repo1: backup size: 5799.8GB
>>
>> Do you think I could be missing something?
>>
>
> Sounds like there’s something else which is the bottleneck once you have
> process-max at 30. I suspect you could reduce that process-max value and
> have around the same time still with zstd.  Ultimately if you want it to be
> faster then you’ll need to figure out what the bottleneck is (seemingly not
> CPU, unlikely to be memory, so that leaves network or storage) and address
> that.
>
> We’ve seen numbers approaching 10TB/hr with lots of processes and zstd and
> fast storage on high end physical hardware.
>
> Thanks,
>
> Stephen
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Postgresql 11: terminating walsender process due to replication timeout

2021-09-08 Thread Abhishek Bhola
I have found some questions about the same error, but didn't find any of
them answering my problem.

The setup is that I have two Postgres11 clusters (A and B) and they are
making use of publication and subscription features to copy data from A to
B.

A (source DB- publication) --> B (target DB - subscription)

This works fine, but often (not always) when the data volume being inserted
on a table in node A increases, it gives the following error.

"terminating walsender process due to replication timeout"

The data volume at the moment being entered is about 30K rows per second
continuously for hours through COPY command.

Earlier the wal_sender_timeout was set to 5 sec and I would see this error
much often. I then increased it to 1 min and the frequency of this error
reduced. But I don't want to keep increasing it without understanding what
is causing it. I looked at the code of walsender.c and know the exact lines
where it's coming from.

But I am still not clear which parameter is making the sender assume that
the receiver node is inactive and therefore it should stop the wal_sender.

Can anyone please suggest what changes I should make to remove this error?

sourcedb=# show wal_sender_timeout;
 wal_sender_timeout
 1min
(1 row)

sourcedb=# select * from pg_replication_slots;
 slot_name  |  plugin  | slot_type | datoid |
database | temporary | active | active_pid | xmin | catalog_xmin |
restart_lsn   | confirmed_flush_lsn
+--+---++--+---+++--+--++-
 sub_target_DB  | pgoutput | logical   |  16501 |
sourcedb | f | t  |  68229 |  | 98839088 |
116D0/C36886F8 | 116D0/C3E5D370



targetdb=# show wal_receiver_timeout;
 wal_receiver_timeout--
 1min
(1 row)


targetdb=# show wal_retrieve_retry_interval ;
 wal_retrieve_retry_interval-
 5s
(1 row)

targetdb=# show wal_receiver_status_interval;
 wal_receiver_status_interval--
 2s
(1 row)

targetdb=# select * from pg_stat_subscription;
   subid|  subname   |  pid  | relid |
received_lsn  |  last_msg_send_time   |
last_msg_receipt_time | latest_end_lsn |l
atest_end_time++---+---++---+---++---
 2378695757 | sub_target_DB  | 62371 |   |
116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09 | 2021-08-20
09:05:15.398471+09 | 116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09

Increased the wal_sender_timeout to 5 mins and the error started
appearing more frequently instead. Not only that, it even killed the
active subscription and stopped replicating data. Had to restart it.
So clearly, just increasing the wal_sender_timeout hasn't helped.

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Postgresql 11: terminating walsender process due to replication timeout

2021-09-09 Thread Abhishek Bhola
sourcedb:~$ postgres --version
postgres (PostgreSQL) 11.6

Sorry for missing this information.
But looks like this fix is already included in the version I am running.

Regards,
Abhishek Bhola

On Thu, Sep 9, 2021 at 3:56 PM Kyotaro Horiguchi 
wrote:

> At Thu, 9 Sep 2021 14:52:25 +0900, Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote in
> > I have found some questions about the same error, but didn't find any of
> > them answering my problem.
> >
> > The setup is that I have two Postgres11 clusters (A and B) and they are
> > making use of publication and subscription features to copy data from A
> to
> > B.
> >
> > A (source DB- publication) --> B (target DB - subscription)
> >
> > This works fine, but often (not always) when the data volume being
> inserted
> > on a table in node A increases, it gives the following error.
> >
> > "terminating walsender process due to replication timeout"
> >
> > The data volume at the moment being entered is about 30K rows per second
> > continuously for hours through COPY command.
> >
> > Earlier the wal_sender_timeout was set to 5 sec and I would see this
> error
> > much often. I then increased it to 1 min and the frequency of this error
> > reduced. But I don't want to keep increasing it without understanding
> what
> > is causing it. I looked at the code of walsender.c and know the exact
> lines
> > where it's coming from.
> >
> > But I am still not clear which parameter is making the sender assume that
> > the receiver node is inactive and therefore it should stop the
> wal_sender.
> >
> > Can anyone please suggest what changes I should make to remove this
> error?
>
> What minor-version is the Postgres server mentioned? PostgreSQL 11
> have gotten the following fix at 11.6, which could be related to the
> trouble.
>
> https://www.postgresql.org/docs/11/release-11-6.html
>
> > Fix timeout handling in logical replication walreceiver processes
> > (Julien Rouhaud)
> >
> > Erroneous logic prevented wal_receiver_timeout from working in
> > logical replication deployments.
>
> The details of the fix is here.
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f60f690fac1bf375b92cf2f8682e8fe8f69098
> > Fix timeout handling in logical replication worker
> >
> > The timestamp tracking the last moment a message is received in a
> > logical replication worker was initialized in each loop checking if a
> > message was received or not, causing wal_receiver_timeout to be ignored
> > in basically any logical replication deployments.  This also broke the
> > ping sent to the server when reaching half of wal_receiver_timeout.
>
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Postgresql 11: terminating walsender process due to replication timeout

2021-09-10 Thread Abhishek Bhola
So is there any solution to this issue?
I did try to increase the wal_sender_timeout and it broke the pub/sub.
I increased the wal_receiver_timeout and it wouldn't attempt to restart the
subscription until that time elapsed.
Due to that, the WAL segments got removed by the time it came up again and
it stopped working.
So given that the publisher is publishing at a higher rate than the
subscriber is subscribing, what can be done?

On Fri, Sep 10, 2021 at 9:26 AM Kyotaro Horiguchi 
wrote:

> At Thu, 9 Sep 2021 16:06:25 +0900, Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote in
> > sourcedb:~$ postgres --version
> > postgres (PostgreSQL) 11.6
> >
> > Sorry for missing this information.
> > But looks like this fix is already included in the version I am running.
>
> Ok. I'm not sure but there may be a case where too-busy (or too poor
> relative to the publisher) subscriber cannot send a response for a
> long time. Usually keep-alive packets sent from publisher causes
> subscriber response even while busy time but it seems that if
> subscriber applies changes more than two times slower than the
> publisher sends, subscriber doesn't send a response in the timeout
> window.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


[Major version upgrade] pg_upgrade fails despite passing check mode

2021-10-22 Thread Abhishek Bhola
While upgrading from Postgres 11 to 13, the pg_upgrade passed the check mode

postgres@dxxxpgs03:/D1/datadg/data13$ /usr/pgsql-13/bin/pg_upgrade -k
--check --new-datadir='/D1/datadg/data13'
--old-datadir='/D1/datadg/data' --new-bind$
r='/usr/pgsql-13/bin' --old-bindir='/usr/pgsql-11/bin'
Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch   ok
Checking for tables WITH OIDS   ok
Checking for invalid "sql_identifier" user columns  ok
Checking for presence of required libraries ok
Checking database user is the install user  ok
Checking for prepared transactions  ok
Checking for new cluster tablespace directories ok

*Clusters are compatible*

But then while *performing the actual upgrade, it failed* saying the "*owner
of a FDW should be a super user*"

So I started the old Postgres 11 again, changed the owner of that FDW to a
super user.

Now when I try to run the upgrade (even in check mode) again, it says "*new
cluster tablespace directory already exists*"

Is it safe to delete the new tablespace directories?

What are my options to safely start/upgrade the DB without any chances of
data corruption?

Thanks

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: [Major version upgrade] pg_upgrade fails despite passing check mode

2021-10-24 Thread Abhishek Bhola
Thanks, let me try that out.

But is there a way to know for sure if pg_upgrade will have any such
problems?
Other than FDW's owner, any other things I should check before upgrading
other clusters?

Regards
Abhishek Bhola

On Sat, Oct 23, 2021 at 12:20 AM Laurenz Albe 
wrote:

> On Fri, 2021-10-22 at 17:11 +0900, Abhishek Bhola wrote:
> > postgres@dxxxpgs03:/D1/datadg/data13$ /usr/pgsql-13/bin/pg_upgrade
> -k --check
> >   --new-datadir='/D1/datadg/data13'
> --old-datadir='/D1/datadg/data'
> >   --new-bindir='/usr/pgsql-13/bin' --old-bindir='/usr/pgsql-11/bin'
> >
> > *Clusters are compatible*
> >
> > But then while performing the actual upgrade, it failed saying the
> "owner of a FDW should be a super user"
> >
> > So I started the old Postgres 11 again, changed the owner of that FDW to
> a super user.
> >
> > Now when I try to run the upgrade (even in check mode) again, it says
> "new cluster tablespace directory already exists"
> >
> > Is it safe to delete the new tablespace directories?
>
> Don't remove the tablespace directories, because they contain the original
> data from the
> old cluster.  But there should be directories called PG_13_202007201, and
> you have to
> remove those.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Subscription stuck at initialize state

2022-02-01 Thread Abhishek Bhola
I have 2 sets of publication/subscription between my 2 DBs.
One of them is working fine and the other one is stuck at initializing
state for all the tables.

sourcedb=# select * from pg_publication;
  oid  |pubname | pubowner | puballtables | pubinsert |
pubupdate | pubdelete | pubtruncate | pubviaroot
---++--+--+---+---+---+-+
 19585 | omx_archive_big_tables |16420 | f| t | t
  | t | t   | f
 19584 | omx_archive|16420 | f| t
  | t | t | t   | f


targetdb=# select * from pg_subscription_rel ;
  srsubid   |  srrelid   | srsubstate |   srsublsn
+++---
 3615804367 | 3322052690 | i  |
 3615804367 | 3322052570 | i  |
 3615756798 | 3322051793 | r  | 9E7E/BF5F82D8
 3615804367 | 3322052133 | i  |
 3615804367 | 3322054214 | i  |
 3615756798 | 3322051802 | r  | 9E7E/C149BBD8
 3615804367 | 3322051757 | i  |

targetdb=# select * from pg_subscription;
-[ RECORD 1
]---+--
oid | 3615756798
subdbid | 16589
subname | sub_omx_archive_big_tables_tci
subowner| 16420
subenabled  | t
subconninfo | xxx
subslotname | sub_omx_archive_big_tables_tci
subsynccommit   | off
subpublications | {omx_archive_big_tables}
-[ RECORD 2
]---+--
oid | 3615804367
subdbid | 16589
subname | sub_omx_archive_tci
subowner| 16420
subenabled  | t
subconninfo | 
subslotname | sub_omx_archive_tci
subsynccommit   | off
subpublications | {omx_archive}

I have dropped the subscription, recreated and refreshed it many times, but
it won't move from the initializing phase.

Any suggestions on how to start copying the data again, other than dropping
the publication and re-creating it?

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Subscription stuck at initialize state

2022-02-01 Thread Abhishek Bhola
The only statement I see on the target DB log is
"logical replication apply worker for subscription ""sub_omx_archive_tci""
has started",""

I don't see the logical replication table synchronization worker started
for any of the tables in this subscription as I see in the other one.

Is there anything in particular that I should be looking for in the log
files?

On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin 
wrote:

> Hi Abishek,
>
> Have you checked the subscriber and publisher database log files to see if
> there is a problem blocking the subscription? For example, a subscribed
> table missing a column that exists in the publisher.
>
> Cheers,
>
> Steve
>
> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> I have 2 sets of publication/subscription between my 2 DBs.
>> One of them is working fine and the other one is stuck at initializing
>> state for all the tables.
>>
>> sourcedb=# select * from pg_publication;
>>   oid  |pubname | pubowner | puballtables | pubinsert |
>> pubupdate | pubdelete | pubtruncate | pubviaroot
>>
>> ---++--+--+---+---+---+-+
>>  19585 | omx_archive_big_tables |16420 | f| t | t
>> | t | t   | f
>>  19584 | omx_archive|16420 | f| t
>> | t | t | t   | f
>>
>>
>> targetdb=# select * from pg_subscription_rel ;
>>   srsubid   |  srrelid   | srsubstate |   srsublsn
>> +++---
>>  3615804367 | 3322052690 | i  |
>>  3615804367 | 3322052570 | i  |
>>  3615756798 | 3322051793 | r  | 9E7E/BF5F82D8
>>  3615804367 | 3322052133 | i  |
>>  3615804367 | 3322054214 | i  |
>>  3615756798 | 3322051802 | r  | 9E7E/C149BBD8
>>  3615804367 | 3322051757 | i  |
>>
>> targetdb=# select * from pg_subscription;
>> -[ RECORD 1
>> ]---+--
>> oid | 3615756798
>> subdbid | 16589
>> subname | sub_omx_archive_big_tables_tci
>> subowner| 16420
>> subenabled  | t
>> subconninfo | xxx
>> subslotname | sub_omx_archive_big_tables_tci
>> subsynccommit   | off
>> subpublications | {omx_archive_big_tables}
>> -[ RECORD 2
>> ]---+--
>> oid | 3615804367
>> subdbid | 16589
>> subname | sub_omx_archive_tci
>> subowner| 16420
>> subenabled  | t
>> subconninfo | 
>> subslotname | sub_omx_archive_tci
>> subsynccommit   | off
>> subpublications | {omx_archive}
>>
>> I have dropped the subscription, recreated and refreshed it many times,
>> but it won't move from the initializing phase.
>>
>> Any suggestions on how to start copying the data again, other than
>> dropping the publication and re-creating it?
>>
>>
>> *This correspondence (including any attachments) is for the intended
>> recipient(s) only. It may contain confidential or privileged information or
>> both. No confidentiality or privilege is waived or lost by any
>> mis-transmission. If you receive this correspondence by mistake, please
>> contact the sender immediately, delete this correspondence (and all
>> attachments) and destroy any hard copies. You must not use, disclose, copy,
>> distribute or rely on any part of this correspondence (including any
>> attachments) if you are not the intended
>> recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。*
>
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Subscription stuck at initialize state

2022-02-02 Thread Abhishek Bhola
Update: Tried dropping and recreating publication on the source DB and
subscription still won't move ahead.
Not sure what I am missing.

On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> The only statement I see on the target DB log is
> "logical replication apply worker for subscription ""sub_omx_archive_tci""
> has started",""
>
> I don't see the logical replication table synchronization worker started
> for any of the tables in this subscription as I see in the other one.
>
> Is there anything in particular that I should be looking for in the log
> files?
>
> On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin 
> wrote:
>
>> Hi Abishek,
>>
>> Have you checked the subscriber and publisher database log files to see
>> if there is a problem blocking the subscription? For example, a subscribed
>> table missing a column that exists in the publisher.
>>
>> Cheers,
>>
>> Steve
>>
>> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola <
>> abhishek.bh...@japannext.co.jp> wrote:
>>
>>> I have 2 sets of publication/subscription between my 2 DBs.
>>> One of them is working fine and the other one is stuck at initializing
>>> state for all the tables.
>>>
>>> sourcedb=# select * from pg_publication;
>>>   oid  |pubname | pubowner | puballtables | pubinsert |
>>> pubupdate | pubdelete | pubtruncate | pubviaroot
>>>
>>> ---++--+--+---+---+---+-+
>>>  19585 | omx_archive_big_tables |16420 | f| t |
>>> t | t | t   | f
>>>  19584 | omx_archive|16420 | f| t
>>>   | t | t | t   | f
>>>
>>>
>>> targetdb=# select * from pg_subscription_rel ;
>>>   srsubid   |  srrelid   | srsubstate |   srsublsn
>>> +++---
>>>  3615804367 | 3322052690 | i  |
>>>  3615804367 | 3322052570 | i  |
>>>  3615756798 | 3322051793 | r  | 9E7E/BF5F82D8
>>>  3615804367 | 3322052133 | i  |
>>>  3615804367 | 3322054214 | i  |
>>>  3615756798 | 3322051802 | r  | 9E7E/C149BBD8
>>>  3615804367 | 3322051757 | i  |
>>>
>>> targetdb=# select * from pg_subscription;
>>> -[ RECORD 1
>>> ]---+--
>>> oid | 3615756798
>>> subdbid | 16589
>>> subname | sub_omx_archive_big_tables_tci
>>> subowner| 16420
>>> subenabled  | t
>>> subconninfo | xxx
>>> subslotname | sub_omx_archive_big_tables_tci
>>> subsynccommit   | off
>>> subpublications | {omx_archive_big_tables}
>>> -[ RECORD 2
>>> ]---+--
>>> oid | 3615804367
>>> subdbid | 16589
>>> subname | sub_omx_archive_tci
>>> subowner| 16420
>>> subenabled  | t
>>> subconninfo | 
>>> subslotname | sub_omx_archive_tci
>>> subsynccommit   | off
>>> subpublications | {omx_archive}
>>>
>>> I have dropped the subscription, recreated and refreshed it many times,
>>> but it won't move from the initializing phase.
>>>
>>> Any suggestions on how to start copying the data again, other than
>>> dropping the publication and re-creating it?
>>>
>>>
>>> *This correspondence (including any attachments) is for the intended
>>> recipient(s) only. It may contain confidential or privileged information or
>>> both. No confidentiality or privilege is waived or lost by any
>>> mis-transmission. If you receive this correspondence by mistake, please
>>> contact the sender immediately, delete this correspondence (and all
>>> attachments) and destroy any hard copies. You must not use, disclose, copy,
>>> distribute or rely on any part of this correspondence (including any
>>> attachments) if you are not the intended
>>> recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下

Re: Subscription stuck at initialize state

2022-02-02 Thread Abhishek Bhola
So far I figured out that the problem is on the subscriber side.
The same publication, when subscribed to on another DB, works fine.
Also noticed that the remote_lsn value on the target DB is still 0/0.

targetdb=# select * from pg_replication_origin_status ;
 local_id |  external_id  |   remote_lsn   |   local_lsn
--+---++
1 | pg_3615756798 | 9E96/37152C80  | 1518C/9014BD98
2 | pg_3616584803 | 0/0   | 0/0
(2 rows)

Would really appreciate it if someone could help me with this.



On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola <
abhishek.bh...@japannext.co.jp> wrote:

> Update: Tried dropping and recreating publication on the source DB and
> subscription still won't move ahead.
> Not sure what I am missing.
>
> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> The only statement I see on the target DB log is
>> "logical replication apply worker for subscription
>> ""sub_omx_archive_tci"" has started",""
>>
>> I don't see the logical replication table synchronization worker started
>> for any of the tables in this subscription as I see in the other one.
>>
>> Is there anything in particular that I should be looking for in the log
>> files?
>>
>> On Wed, Feb 2, 2022 at 11:31 AM Steve Baldwin 
>> wrote:
>>
>>> Hi Abishek,
>>>
>>> Have you checked the subscriber and publisher database log files to see
>>> if there is a problem blocking the subscription? For example, a subscribed
>>> table missing a column that exists in the publisher.
>>>
>>> Cheers,
>>>
>>> Steve
>>>
>>> On Wed, Feb 2, 2022 at 1:26 PM Abhishek Bhola <
>>> abhishek.bh...@japannext.co.jp> wrote:
>>>
>>>> I have 2 sets of publication/subscription between my 2 DBs.
>>>> One of them is working fine and the other one is stuck at initializing
>>>> state for all the tables.
>>>>
>>>> sourcedb=# select * from pg_publication;
>>>>   oid  |pubname | pubowner | puballtables | pubinsert |
>>>> pubupdate | pubdelete | pubtruncate | pubviaroot
>>>>
>>>> ---++--+--+---+---+---+-+
>>>>  19585 | omx_archive_big_tables |16420 | f| t |
>>>> t | t | t   | f
>>>>  19584 | omx_archive|16420 | f| t
>>>>   | t | t | t   | f
>>>>
>>>>
>>>> targetdb=# select * from pg_subscription_rel ;
>>>>   srsubid   |  srrelid   | srsubstate |   srsublsn
>>>> +++---
>>>>  3615804367 | 3322052690 | i  |
>>>>  3615804367 | 3322052570 | i  |
>>>>  3615756798 | 3322051793 | r  | 9E7E/BF5F82D8
>>>>  3615804367 | 3322052133 | i  |
>>>>  3615804367 | 3322054214 | i  |
>>>>  3615756798 | 3322051802 | r  | 9E7E/C149BBD8
>>>>  3615804367 | 3322051757 | i  |
>>>>
>>>> targetdb=# select * from pg_subscription;
>>>> -[ RECORD 1
>>>> ]---+--
>>>> oid | 3615756798
>>>> subdbid | 16589
>>>> subname | sub_omx_archive_big_tables_tci
>>>> subowner| 16420
>>>> subenabled  | t
>>>> subconninfo | xxx
>>>> subslotname | sub_omx_archive_big_tables_tci
>>>> subsynccommit   | off
>>>> subpublications | {omx_archive_big_tables}
>>>> -[ RECORD 2
>>>> ]---+--
>>>> oid | 3615804367
>>>> subdbid | 16589
>>>> subname | sub_omx_archive_tci
>>>> subowner| 16420
>>>> subenabled  | t
>>>> subconninfo | 
>>>> subslotname | sub_omx_archive_tci
>>>> subsynccommit   | off
>>>> subpublications | {omx_archive}
>>>>
>>>> I have dropped the subscription, recreated and refreshed it many times,
>>>> but it won't move from the initializing phase.
>>>>
>>

Re: Subscription stuck at initialize state

2022-02-02 Thread Abhishek Bhola
Hi Vijaykumar,

I checked the pg_subscription_rel and all the tables in that subscription
are in the state - i (initialize).
I also tried creating a new publication on the source DB with just one
table and tried to subscribe it, it doesn't work either.
However, when I try to subscribe it on some other DB than the one mentioned
above, it works.
By which I am deducing that publication and the source DB are okay, the
problem is on the target DB and it's subscription.
Maybe I will have to restart the DB as a last resort, but I am not sure if
that will solve the problem either.


On Thu, Feb 3, 2022 at 3:33 PM Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

>
>
> On Thu, Feb 3, 2022, 10:32 AM Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> So far I figured out that the problem is on the subscriber side.
>> The same publication, when subscribed to on another DB, works fine.
>> Also noticed that the remote_lsn value on the target DB is still 0/0.
>>
>> targetdb=# select * from pg_replication_origin_status ;
>>  local_id |  external_id  |   remote_lsn   |   local_lsn
>> --+---++
>> 1 | pg_3615756798 | 9E96/37152C80  | 1518C/9014BD98
>> 2 | pg_3616584803 | 0/0   | 0/0
>> (2 rows)
>>
>> Would really appreciate it if someone could help me with this.
>>
>>
>>
>> On Thu, Feb 3, 2022 at 9:53 AM Abhishek Bhola <
>> abhishek.bh...@japannext.co.jp> wrote:
>>
>>> Update: Tried dropping and recreating publication on the source DB and
>>> subscription still won't move ahead.
>>> Not sure what I am missing.
>>>
>>> On Wed, Feb 2, 2022 at 1:20 PM Abhishek Bhola <
>>> abhishek.bh...@japannext.co.jp> wrote:
>>>
>>>> The only statement I see on the target DB log is
>>>> "logical replication apply worker for subscription
>>>> ""sub_omx_archive_tci"" has started",""
>>>>
>>>> I don't see the logical replication table synchronization worker
>>>> started for any of the tables in this subscription as I see in the other
>>>> one.
>>>>
>>>
>
> This might help you track subscription state along with
> pg_stat_subscription.
> https://www.postgresql.org/docs/10/catalog-pg-subscription-rel.html
>
> It might be a huge table being 'copy' ed and writes would be slow due to
> too many indexes etc.
>
> https://postgresteam.slack.com/files/UQMFAU01W/F02V69YK59P/untitled.sql
>
> also if you see nothing is moving, you may want to login to the server and
> strace the worker processes pid to see if you see any movement at all or it
> keeps looping on some errors.
>
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Subscription stuck at initialize state

2022-02-03 Thread Abhishek Bhola
/BF5F82D8
 3615756798 | 3322051802 | r  | 9E7E/C149BBD8
 *3616584803 *| 3322052690 |* i   *   |
 *3616584803 *| 3322052570 |* i * |
 3616584803 | 3322052133 | *i * |


* *pg_stat_subscription*
targetdb=# select * from pg_stat_subscription ;
   subid|subname |  pid   | relid |
 received_lsn  |  last_msg_send_time   | last_msg_receipt_time
| latest_end_lsn |latest_end_time
+++---++---+---++---
 3615756798 | sub_omx_archive_big_tables_tci | 178901 |   |
9E98/E3E42DB0  | 2022-02-03 18:23:57.904696+09 | 2022-02-03
18:23:57.904768+09 | 9E98/E3E42DB0  | 2022-02-03 18:23:57.904696+09
 3616584803 | sub_omx_archive_tci| 168277 |   |
9E98/E3E42DB0  | 2022-02-03 18:23:57.904693+09 | 2022-02-03
18:23:57.904754+09 | 9E98/E3E42DB0  | 2022-02-03 18:23:57.904693+09

Sorry for the long message.


On Thu, Feb 3, 2022 at 5:25 PM Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

>
>
> On Thu, 3 Feb 2022 at 12:44, Abhishek Bhola <
> abhishek.bh...@japannext.co.jp> wrote:
>
>> Hi Vijaykumar,
>>
>> I checked the pg_subscription_rel and all the tables in that subscription
>> are in the state - i (initialize).
>> I also tried creating a new publication on the source DB with just one
>> table and tried to subscribe it, it doesn't work either.
>> However, when I try to subscribe it on some other DB than the one
>> mentioned above, it works.
>> By which I am deducing that publication and the source DB are okay, the
>> problem is on the target DB and it's subscription.
>> Maybe I will have to restart the DB as a last resort, but I am not sure
>> if that will solve the problem either.
>>
>>
> its a very verbose mail, so if it noisy, kindly ignore.
>
> else,
>
> can you check basic connectivity from the subscriber to publisher using
> psql  and run a simple query ?
> can you share your "create publication" and "create subscription"
> commands/statements too please?
>
> i am attaching a general logical replication setup on a single server and
> put some scenarios where replication breaks and how to monitor and how to
> resume.
> and how that is monitored.
>
>
> postgres@controller:~$ tail db1/postgresql.conf db2/postgresql.conf
> ==> db1/postgresql.conf <==
>
> # Add settings for extensions here
> wal_level=logical
> archive_mode = on
> archive_command = '/bin/true'
> max_wal_size = 48MB
> min_wal_size = 32MB
> shared_buffers = 32MB
> port = 5001
> max_logical_replication_workers = 10
>
> ==> db2/postgresql.conf <==
> # Add settings for extensions here
> wal_level=logical
> archive_mode = on
> archive_command = '/bin/true'
> max_wal_size = 48MB
> min_wal_size = 32MB
> shared_buffers = 32MB
> port = 5002
> max_logical_replication_workers = 10
>
> postgres@controller:~$ pg_ctl -D db1 -l db1.log start
> waiting for server to start done
> server started
> postgres@controller:~$ pg_ctl -D db2 -l db2.log start
> waiting for server to start done
> server started
> postgres@controller:~$ psql -p 5001
> psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
> Type "help" for help.
>
> postgres=# \x
> Expanded display is on.
> postgres=# create table t1(id int primary key);
> CREATE TABLE
> postgres=# create table t2(id int); -- this will throw error on delete, no
> replica identity
> CREATE TABLE
> postgres=# insert into t1 select x from generate_series(1, 100) x;
> INSERT 0 100
> postgres=# insert into t2 select x from generate_series(1, 100) x;
> INSERT 0 100
> postgres=# checkpoint;
> CHECKPOINT
> postgres=# \q
> postgres@controller:~$ psql -p 5002
> psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
> Type "help" for help.
>
> postgres=#  create table t1(id int primary key);
> CREATE TABLE
> postgres=# create table t2(id int);
> CREATE TABLE
> postgres=# \q
> postgres@controller:~$ ps aux | grep -i postgres:
> postgres1116  0.0  0.4 113632  8232 ?Ss   13:24   0:00
> postgres: checkpointer
> postgres1117  0.0  0.2 113496  5868 ?Ss   13:24   0:00
> postgres: background writer
> postgres1118  0.0  0.3 113496  6964 ?Ss   13:24   0:00
> postgres: walwriter
> postgres1119  0.0  0.4 114032  8432 ?Ss   13:24   0:00
> postgres: autovacuum launcher
> postgres1120  0.0  0.2 113496  4132 ?Ss   13:24   0:00
> postgres: archiver
> postgres1121  0.0  0.2  72112  4896 ?Ss   13:24   0:00
> post

Re: Subscription stuck at initialize state

2022-02-03 Thread Abhishek Bhola
Hi Vijaykumar

Thank you so much for taking out so much of your time to recreate the bug.
I checked the max_logical_replication_workers on both the  nodes and they
are set at 4 at the moment.
The reason why it is failing is that there are 3 existing replications on
the target node already and when I create this 4th one, it is not able to
create that temporary replication worker, needed to copy the initial data,
and therefore it never moves ahead of that state.
Since these are production databases, I can't restart them during the
weekdays, so I will try to change them during the weekend.
But I did find the warning message in the target DB logs.

"WARNING,53400,"out of logical replication worker slots",,"You might need
to increase max_logical_replication_workers.",,,"","logical replication
worker"
So I am sure this will work.

THANK YOU SO MUCH.


On Fri, Feb 4, 2022 at 3:04 AM Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

> trimming the email, to avoid noise.
>
> I spent a lot of time trying multiple options/combinations and finally
> managed to replicate your debug output.
> this is when i have ,
>
> postgres=# show max_logical_replication_workers;
> * max_logical_replication_workers*
> -
> * 2*
> (1 row)
>
>
> on publisher
> postgres@controller:~$ psql -p 5001
> psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
> Type "help" for help.
>
> postgres=# create table t_bytea(id int primary key, somebytea bytea);
> CREATE TABLE
> postgres=# create table t1(id int);
> CREATE TABLE
> postgres=# create table t2(id int);
> CREATE TABLE
> postgres=# create table t3(id int);
> CREATE TABLE
> postgres=# create publication mypub1 for table t1;
> CREATE PUBLICATION
> postgres=# create publication mypub2 for table t2;
> CREATE PUBLICATION
> postgres=# create publication mypub3 for table t3;
> CREATE PUBLICATION
> postgres=# create publication mypub4 for table t3;
> CREATE PUBLICATION
> postgres=# create publication mypub5 for table t_bytea;
> CREATE PUBLICATION
> postgres=# insert into t_bytea  select x,repeat(repeat('xxx', 100),
> 1000)::bytea from generate_series(1, 1000) x;
> INSERT 0 1000
>
>
>
> on subscriber
> postgres@controller:~$ psql -p 5002
> psql (14.1 (Ubuntu 14.1-2.pgdg20.04+1))
> Type "help" for help.
>
> postgres=# create table t_bytea(id int primary key, somebytea bytea);
> CREATE TABLE
> postgres=#  create table t1(id int);
> CREATE TABLE
> postgres=#  create table t2(id int);
> CREATE TABLE
> postgres=#  create table t3(id int);
> CREATE TABLE
> postgres=#  create table t4(id int);
> CREATE TABLE
> postgres=#  create table t5(id int);
> CREATE TABLE
> postgres=# create subscription mysub1 connection 'port=5001' publication
> mypub1;
> NOTICE:  created replication slot "mysub1" on publisher
> CREATE SUBSCRIPTION
> postgres=# create subscription mysub2 connection 'port=5001' publication
> mypub2;
> NOTICE:  created replication slot "mysub2" on publisher
> CREATE SUBSCRIPTION
> postgres=# create subscription mysub3 connection 'port=5001' publication
> mypub3;
> NOTICE:  created replication slot "mysub3" on publisher
> CREATE SUBSCRIPTION
> postgres=# create subscription mysub4 connection 'port=5001' publication
> mypub4;
> NOTICE:  created replication slot "mysub4" on publisher
> CREATE SUBSCRIPTION
> postgres=# create subscription mysub5 connection 'port=5001' publication
> mypub5;
> NOTICE:  created replication slot "mysub5" on publisher
> CREATE SUBSCRIPTION
> *postgres=# select count(1) from t_bytea;*
> * count*
> *---*
> * 0*
> *(1 row)*
>
> postgres=# table pg_subscription_rel;
>  srsubid | srrelid | srsubstate | srsublsn
> -+-++---
>16406 |   16391 | r  | 0/1722838
>  *  16407 |   16394 | i  |*
> *   16408 |   16397 | i  |*
> *   16409 |   16397 | i  |*
> *   16410 |   16384 | i  |*
> (5 rows)
>
> # as expected no data in t_bytea (as it could not get any worker)
> postgres=# select count(1) from t_bytea;
>  count
> ---
>  0
> (1 row)
>
>
>
> but logs clearly state the problem
> 2022-02-03 23:18:31.107 IST [3430] LOG:  logical replication table
> synchronization worker for subscription "mysub1", table "t1" has started
> 2022-02-03 23:18:31.138 IST [3430] LOG:  logical replication table
> synchronization worker for subscription "mysub1", table "t1" has finished
> 2022-02-03 23:18:40.730 IST [3433] LOG:  logical replication apply worker
> for subscription "mysub2" has started
> *2022-02-03 23:18:40.737 IST [3433] WARNING:  out of logical replication
> worker slots*
> *2022-02-03 23:18:40.737 IST [3433] HINT:  You might need to increase
> max_logical_replication_workers.*
> *2022-02-03 23:18:45.865 IST [3433] WARNING:  out of logical replication
> worker slots*
>
>
> *#publisher logs*
> 2022-02-03 23:18:31.096 IST [3427] STATEMENT:  CREATE_REPLICATION_SLOT
> "mysub1" LOGICAL pgoutput NOEXPORT_SNAPSHOT
> 2022-02-03 23:18:31.106 IST [3429] LOG:  starting logical decoding 

Why are data files getting modified?

2022-02-09 Thread Abhishek Bhola
I have a table with monthly partitions, starting from 2011.
Each partition has about 2-3 billion rows and is about 40TB of data in
total.
I am running a select query to \copy the data into csv files for each month.

I see there is an *AUTOVACUUM* job started on all the partitions that have
been read (on which select has run). I am assuming the reason why Postgres
is autovacuuming these large partitions is this:

"This implies that if a table is not otherwise vacuumed, autovacuum will be
invoked on it approximately once every autovacuum_freeze_max_age minus
vacuum_freeze_min_age transactions. For tables that are regularly vacuumed
for space reclamation purposes, this is of little importance. However, for
static tables (including tables that receive inserts, but no updates or
deletes), there is no need to vacuum for space reclamation, so it can be
useful to try to maximize the interval between forced autovacuums on very
large static tables. Obviously one can do this either by increasing
autovacuum_freeze_max_age or decreasing vacuum_freeze_min_age."

https://www.postgresql.org/docs/13/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Please see these partition tables are static, i.e., they received data many
years ago, with no updates ever.

db=# SELECT c.relname as table_name, c.relkind as type, age(c.relfrozenxid)
as age, c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <>
2147483647 ORDER BY 3 DESC LIMIT 20;
table_name | type |age| relfrozenxid
---+--+---+--
 hist_omx_orderaudit201506 | r| 201463508 |   319891
 hist_omx_orderaudit201509 | r| 201418497 |   364902
 hist_omx_orderaudit201205 | r| 201142042 |   641357

autovacuum_freeze_max_age
---
 2
(1 row)

Am I wrong in my understanding why this autovacuum is getting triggered?

However, my main question is that why do the data files in the tablespace,
corresponding to these partition tables (that have been read or selected
recently), or the ones autovacuumed recently have a latest timestamp?

There was no insert or update on these partitions, so why do I see an
updated timestamp? Does autovacuum change something on the datafiles?

Thanks

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_


Re: Why are data files getting modified?

2022-02-09 Thread Abhishek Bhola
I see, but then why is the autovacuum getting triggered after the select
command?

On Thu, Feb 10, 2022, 12:14 AM Laurenz Albe 
wrote:

> On Wed, 2022-02-09 at 17:29 +0900, Abhishek Bhola wrote:
> > I have a table with monthly partitions, starting from 2011.
> > Each partition has about 2-3 billion rows and is about 40TB of data in
> total.
> > I am running a select query to \copy the data into csv files for each
> month.
> >
> > I see there is an AUTOVACUUM job started on all the partitions that have
> been
> > read (on which select has run). I am assuming the reason why Postgres is
> > autovacuuming these large partitions is this:
> >
> > db=# SELECT c.relname as table_name, c.relkind as type,
> age(c.relfrozenxid) as age,
> > c.relfrozenxid FROM pg_class AS c WHERE age(c.relfrozenxid) <>
> 2147483647 ORDER BY 3 DESC LIMIT 20;
> > table_name | type |age| relfrozenxid
> > ---+--+---+--
> >  hist_omx_orderaudit201506 | r| 201463508 |   319891
> >  hist_omx_orderaudit201509 | r| 201418497 |   364902
> >  hist_omx_orderaudit201205 | r| 201142042 |   641357
> >
> > autovacuum_freeze_max_age
> > ---
> >  2
> > (1 row)
> >
> > Am I wrong in my understanding why this autovacuum is getting triggered?
> >
> > However, my main question is that why do the data files in the
> tablespace, corresponding
> > to these partition tables (that have been read or selected recently), or
> the ones
> > autovacuumed recently have a latest timestamp?
> >
> > There was no insert or update on these partitions, so why do I see an
> updated timestamp?
> > Does autovacuum change something on the datafiles?
>
> Yes, autovacuum will freeze old rows in the table, which modifies the data
> files.
>
> To keep this to a minimum, you can run VACUUM (FREEZE) on a partition as
> soon
> as you know that its data won't get modified any more.
> Still, I wouldn't be sure that the file modification timestamp won't change
> any more after that.  But that should be pretty irrelevant; at any rate, no
> substantial data modifications will occur after that.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>

-- 
_This correspondence (including any attachments) is for the intended 
recipient(s) only. It may contain confidential or privileged information or 
both. No confidentiality or privilege is waived or lost by any 
mis-transmission. If you receive this correspondence by mistake, please 
contact the sender immediately, delete this correspondence (and all 
attachments) and destroy any hard copies. You must not use, disclose, copy, 
distribute or rely on any part of this correspondence (including any 
attachments) if you are not the intended 
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_