On 9/12/22 1:23 AM, vignesh C wrote:
On Fri, 9 Sept 2022 at 11:12, Amit Kapila <amit.kapil...@gmail.com> wrote:

On Thu, Sep 8, 2022 at 9:32 AM vignesh C <vignes...@gmail.com> wrote:


The attached patch has the changes to handle the same.


Pushed. I am not completely sure whether we want the remaining
documentation patch in this thread in its current form or by modifying
it. Johnathan has shown some interest in it. I feel you can start a
separate thread for it to see if there is any interest in the same and
close the CF entry for this work.

Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.

I've been testing this patch in advancing of working on the documentation and came across a behavior I wanted to note. Specifically, I am hitting a deadlock while trying to synchronous replicate between the two instances at any `synchronous_commit` level above `local`.

Here is my set up. I have two instances, "A" and "B".

On A and B, run:

  CREATE TABLE sync (id int PRIMARY KEY, info float);
  CREATE PUBLICATION sync FOR TABLE sync;

On A, run:

  CREATE SUBSCRIPTION sync
  CONNECTION 'connstr-to-B'
  PUBLICATION sync
  WITH (
    streaming=true, copy_data=false,
    origin=none, synchronous_commit='on');

On B, run:

  CREATE SUBSCRIPTION sync
  CONNECTION 'connstr-to-A'
  PUBLICATION sync
  WITH (
    streaming=true, copy_data=false,
    origin=none, synchronous_commit='on');

On A and B, run:

  ALTER SYSTEM SET synchronous_standby_names TO 'sync';
  SELECT pg_reload_conf();

Verify on A and B that pg_stat_replication.sync_state is set to "sync"

  SELECT application_name, sync_state = 'sync' AS is_sync
  FROM pg_stat_replication
  WHERE application_name = 'sync';

The next to commands should be run simultaneously on A and B:

-- run this on A
INSERT INTO sync
SELECT x, random() FROM generate_series(1,2000000, 2) x;

-- run this on B
INSERT INTO sync
SELECT x, random() FROM generate_series(2,2000000, 2) x;

This consistently created the deadlock in my testing.

Discussing with Masahiko off-list, this is due to a deadlock from 4 processes: the walsenders on A and B, and the apply workers on A and B. The walsenders are waiting for feedback from the apply workers, and the apply workers are waiting for the walsenders to synchronize (I may be oversimplifying).

He suggested I try the above example instead with `synchronous_commit` set to `local`. In this case, I verified that there is no more deadlock, but he informed me that we would not be able to use cascading synchronous replication when "origin=none".

If we decide that this is a documentation issue, I'd suggest we improve the guidance around using `synchronous_commit`[1] on the CREATE SUBSCRIPTION page, as the GUC page[2] warns against using `local`:

"The setting local causes commits to wait for local flush to disk, but not for replication. This is usually not desirable when synchronous replication is in use, but is provided for completeness."

Thanks,

Jonathan

[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] https://www.postgresql.org/docs/devel/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to