from:"Ramakrishna m"

Logical Replication Delay on Remote Server

2024-04-23 Thread Ramakrishna m

Hi Team,

I am facing replication lag in postgres16 at times,not able to find the
reason.

Please find the configuration:
1) Two replication slots for two servers (1 is the same data center,
another remote)
2) weekly once/twice facing lag(around 2GB for 1k TPS environment) in the
remote   replication slot and another same data center server slot with 0
lag.

My observation:

1) Noticed pg_stat_replication_slot total_txn reduced from 1k to 5 or 6 but
other  slot is same with 1k TPS
2)And noticed lag for remote slot but fine with same data center
3) Most importantly I have plenty of Bandwidth available in n/w ,2GB is
still freely available out of 4GB network.
4)No IO issues on servers
I am not able to prove if this is due to Network .Can you help me how to
proceed on this?
At Least how the logical decoding total_txn is counting the view
pg_stat_replication_slot?

-- 
Thanks & Regards,
Ram.

Re: Logical Replication Delay on Remote Server

2024-04-23 Thread Ramakrishna m

Hi Team,

adding one more point here,

 5. Pg_stat_replication_slot  total_txn count reduces when lag occurs .Does
decoding stop in case of logical replication delay?


On Tue, 23 Apr 2024 at 22:55, Ramakrishna m  wrote:

> Hi Team,
>
> I am facing replication lag in postgres16 at times,not able to find the
> reason.
>
> Please find the configuration:
> 1) Two replication slots for two servers (1 is the same data center,
> another remote)
> 2) weekly once/twice facing lag(around 2GB for 1k TPS environment) in the
> remote   replication slot and another same data center server slot with 0
> lag.
>
> My observation:
>
> 1) Noticed pg_stat_replication_slot total_txn reduced from 1k to 5 or 6
> but other  slot is same with 1k TPS
> 2)And noticed lag for remote slot but fine with same data center
> 3) Most importantly I have plenty of Bandwidth available in n/w ,2GB is
> still freely available out of 4GB network.
> 4)No IO issues on servers
> I am not able to prove if this is due to Network .Can you help me how to
> proceed on this?
> At Least how the logical decoding total_txn is counting the view
> pg_stat_replication_slot?
>
> --
> Thanks & Regards,
> Ram.
>


-- 
Thanks & Regards,
Ram.

Logical Replication Delay

2024-09-21 Thread Ramakrishna m

Hi Team,

We have configured bidirectional replication (but traffic can only flow in
one direction) between two data centers (distance: 1000 km, maximum Network
latency: 100 ms) with an application TPS (transactions per second) of 700
at maximum.

We are fine with handling up to 500 TPS without observing any lag between
the two data centers. However, when TPS increases, we notice a lag in WAL
files of over 100 GB (initially, it was 1 TB, but after tuning, it was
reduced to 100 GB). During peak times, WAL files are generated at a rate of
4 GB per minute.

All transactions (Tx) take less than 200 ms, with a maximum of 1 second at
times (no long-running transactions).

*Here are the configured parameters and resources:*

   - *OS*: Ubuntu
   - *RAM*: 376 GB
   - *CPU*: 64 cores
   - *Swap*: 32 GB
   - *PostgreSQL Version*: 16.4 (each side has 3 nodes with Patroni and
   etcd configured)
   - *DB Size*: 15 TB

*Parameters configured on both sides:*
Name  Setting
   Unit


log_replication_commands off
logical_decoding_work_mem 524288 kB
max_logical_replication_workers 16
max_parallel_apply_workers_per_subscription  2
max_replication_slots 20
max_sync_workers_per_subscription 2
max_wal_senders 20
max_worker_processes 40
wal_level logical
wal_receiver_timeout 60 ms
wal_segment_size 1073741824 B
wal_sender_timeout 60 ms

*Optimizations applied:*

   1. Vacuum freeze is managed during off-hours; no aggressive vacuum is
   triggered during business hours.
   2. Converted a few tables to unlogged.
   3. Removed unwanted tables from publication.
   4. Partitioned all large tables.

*Pending:*

   1. Turning off/tuning autovacuum parameters to avoid triggering during
   business hours.

*Not possible: *We are running all tables in a single publication, and it
is not possible to separate them.

I would greatly appreciate any suggestions you may have to help avoid
logical replication delays, whether through tuning database or operating
system parameters, or any other recommendations

-- 
Thanks & Regards,
Ram.

Re: Logical Replication Delay

2024-09-22 Thread Ramakrishna m

Hi Justin,

Thank you for your suggestions and detailed insights.

Due to certain business constraints, we are unable to split the tables into
parallel publications. All of the tables involved are primary key tables,
which adds further complexity to separating them into multiple publications.

That said, we truly appreciate your recommendations regarding the use of
binary mode and reviewing the use of IDENTITY SET TO FULL. We will ensure
that the subscriber is operating in binary mode and will recheck the
identity setup to minimize WAL size and overhead.

Regards,
Ram.

On Sun, 22 Sept 2024 at 01:32, Justin  wrote:

> Hi Ramakrishna,
>
> 4GB of WAL generated per minute is a lot.  I would expect the replay on
> the subscriber to lag behind because it is a single process.  PostgreSQL 16
> can create parallel workers for large transactions, however if there is a
> flood of small transactions touching many tables the single LR worker is
> going to fall behind.
>
> The only option is split the LR into multiple  Publications and
> Subscriptions as a single worker can't keep up.
>
> What is the justification to not split the tables across multiple
> Publications and Subscriptions
>
> Additional items to check
>
> Make sure the Subscriber is using binary mode, this avoids an encoding
> step.
> https://www.postgresql.org/docs/current/sql-createsubscription.html
>
> Avoid the use of IDENTITY SET TO FULL on the publisher, if you do use
> IDENTITY FULL make sure the subscriber table identity is set to a
> qualifying unique index.  In previous versions of PG the publisher and
> subscriber identities had to match...
>
> IDENTITY SET TO FULL increase the size of the WAL and the work the
> publisher and subscriber has to do.
>
> Hope this helps.
>
>
>
> On Sat, Sep 21, 2024 at 3:08 PM Ramakrishna m  wrote:
>
>> Hi Team,
>>
>> We have configured bidirectional replication (but traffic can only flow
>> in one direction) between two data centers (distance: 1000 km, maximum
>> Network latency: 100 ms) with an application TPS (transactions per second)
>> of 700 at maximum.
>>
>> We are fine with handling up to 500 TPS without observing any lag between
>> the two data centers. However, when TPS increases, we notice a lag in WAL
>> files of over 100 GB (initially, it was 1 TB, but after tuning, it was
>> reduced to 100 GB). During peak times, WAL files are generated at a rate of
>> 4 GB per minute.
>>
>> All transactions (Tx) take less than 200 ms, with a maximum of 1 second
>> at times (no long-running transactions).
>>
>> *Here are the configured parameters and resources:*
>>
>>- *OS*: Ubuntu
>>- *RAM*: 376 GB
>>- *CPU*: 64 cores
>>- *Swap*: 32 GB
>>- *PostgreSQL Version*: 16.4 (each side has 3 nodes with Patroni and
>>etcd configured)
>>- *DB Size*: 15 TB
>>
>> *Parameters configured on both sides:*
>> Name
>> Setting Unit
>>
>>
>> log_replication_commands off
>> logical_decoding_work_mem 524288 kB
>> max_logical_replication_workers 16
>> max_parallel_apply_workers_per_subscription  2
>> max_replication_slots 20
>> max_sync_workers_per_subscription 2
>> max_wal_senders 20
>> max_worker_processes 40
>> wal_level logical
>> wal_receiver_timeout 60 ms
>> wal_segment_size 1073741824 B
>> wal_sender_timeout 60 ms
>>
>> *Optimizations applied:*
>>
>>1. Vacuum freeze is managed during off-hours; no aggressive vacuum is
>>triggered during business hours.
>>2. Converted a few tables to unlogged.
>>3. Removed unwanted tables from publication.
>>4. Partitioned all large tables.
>>
>> *Pending:*
>>
>>1. Turning off/tuning autovacuum parameters to avoid triggering
>>during business hours.
>>
>> *Not possible: *We are running all tables in a single publication, and
>> it is not possible to separate them.
>>
>> I would greatly appreciate any suggestions you may have to help avoid
>> logical replication delays, whether through tuning database or operating
>> system parameters, or any other recommendations
>>
>> --
>> Thanks & Regards,
>> Ram.
>>
>

Re: Logical Replication Delay

2024-09-29 Thread Ramakrishna m

Hi Justin and Greg,

Thank you for your input and recommendations. We understand your point
regarding separating the tables into different publications and
subscriptions.
However, due to certain business constraints, we are unable to implement
this approach at the moment.

*We are planning to set up logical replication from a standby to another
server. When the primary goes down, there is no issue as the standby
becomes the primary and the logical slots are already present. However,
when the standby goes down, these slots are not copied to the third node or
the primary by Patroni. Is there an option available to handle this
scenario? *

Regards,
Ram.

On Wed, 25 Sept 2024 at 20:12, Greg Sabino Mullane 
wrote:

> On Sat, Sep 21, 2024 at 3:08 PM Ramakrishna m  wrote:
>
>> I would greatly appreciate any suggestions you may have to help avoid
>> logical replication delays, whether through tuning database or operating
>> system parameters, or any other recommendations
>>
> In addition to the things already answered:
>
> * What is the use case for logical replication? I assume your local
> replicas are able to keep up just fine.
>
> * Check the nature of the work for problems, e.g. ORM doing
> unnecessary/redundant updates, maintaining indexes that are not really
> needed
>
> * Looks like your wal_segment_size was boosted to 1GB. What drove that
> change?
>
> * Yes, autovacuum could affect things - make sure
> log_autovacuum_min_durations is set
>
> Cheers,
> Greg
>
>
>

Performance Issue with Hash Partition Query Execution in PostgreSQL 16

2024-11-06 Thread Ramakrishna m

Hi Team,

One of the queries, which retrieves a single record from a table with 16
hash partitions, is taking more than 10 seconds to execute. In contrast,
when we run the same query manually, it completes within milliseconds. This
issue is causing exhaustion of the application pools. Do we have any bugs
in postgrs16 hash partitions? Please find the attached log, table, and
execution plan.

size of the each partitions : 300GB
Index Size : 12GB

Postgres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :  175 GB
Work _mem : 4MB
Max_connections : 3000

OS  : Ubuntu 22.04
Ram : 384 GB
CPU : 64

Please let us know if you need any further information or if there are
additional details required.


Regards,
Ram.


Postgres_hash_partion_issue.sql
Description: Binary data

Logical Replication Delay on Remote Server

Re: Logical Replication Delay on Remote Server

Logical Replication Delay

Re: Logical Replication Delay

Re: Logical Replication Delay

Performance Issue with Hash Partition Query Execution in PostgreSQL 16

6 matches

Site Navigation

Mail list logo

Footer information