On 2016-01-09 19:30, Steve Singer wrote:\
I am going to send my comments/issues out in batches as I find them
instead of waiting till I look over everything.


Thanks for looking at this! Yes going in batches/steps makes sense, this is huge patch.


I find this part of the documentation a bit unclear


+Once the provider node is setup, subscribers can be subscribed to it.
First the
+subscriber node must be created:
+
+    SELECT pglogical.create_node(
+        node_name := 'subscriber1',
+        dsn := 'host=thishost port=5432 dbname=db'
+    );
+

My initial reading was that I should execute this on the provider node.
Perhaps instead
-----------------
Once the provider node is setup you can then create subscriber nodes.
Create the subscriber nodes and
then execute the following commands on each subscriber node

create extension pglogical

select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost
dbname=db port=5432');

-------------------

Makes sense I guess, this is probably relic of how this internally evolved (we used to have providers and subscribers before we merged them into nodes).


Also the documentation for create_subscription talks about

+  - `synchronize_structure` - specifies if to synchronize structure from
+    provider to the subscriber, default true


Not sure what's your comment on this.



I did the following

test2=# select pglogical.create_subscription(subscription_name:='default
sub',provider_dsn:='host=localhost dbname=test1 port=5436');
  create_subscription
---------------------
            247109879


Which then resulted in the following showing up in my  PG log

LOG:  worker process: pglogical apply 16542:247109879 (PID 4079) exited
with exit code 1
ERROR:  replication slot name "pgl_test2_test1_default sub" contains
invalid character
HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.
FATAL:  could not send replication command "CREATE_REPLICATION_SLOT
"pgl_test2_test1_default sub" LOGICAL pglogical_output": status
PGRES_FATAL_ERROR: ERROR:  replication slot name
"pgl_test2_test1_default sub" contains invalid character
     HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.


The create_subscription command should check if the subscription name is
valid (meets the rules that will be applied against the slot command).


Yes, fixed. Also added some other sensitization code since we also use dbname in slot name and that can contain whatever.

I wondered how I could fix my mistake.

The docs say

+- `pglogical.pglogical_drop_subscription(subscription_name name,
ifexists bool)`
+  Disconnects the subscription and removes it from the catalog.
+

test2=# select pglogical.pglogical_drop_subscription('default sub', true);
ERROR:  function pglogical.pglogical_drop_subscription(unknown, boolean)
does not exist


The command is actually called pglogical.drop_subscription the docs
should be fixed to show the actual command name


Yep, got this from other people as well, fixed.


I then wanted to add a second table to my database. ('b').

select pglogical.replication_set_add_table('default','public.b',true);
  replication_set_add_table
---------------------------
  t
(1 row)

In my pglog I then got

LOG:  starting sync of table public.b for subscriber defaultsub
ERROR:  replication slot name "pgl_test2_test1_defaultsub_public.b"
contains invalid character
HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.
FATAL:  could not send replication command "CREATE_REPLICATION_SLOT
"pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status
PGRES_FATAL_ERROR: ERROR:  replication slot name
"pgl_test2_test1_defaultsub_public.b" contains invalid character
     HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.


Right, needed the sensitization as well (I am actually using the hash now as there is only 8 chars left anyway).


I then did

test1=# select
pglogical.replication_set_remove_table('default','public.b');
  replication_set_remove_table
------------------------------
  t
(1 row)


but my log still keep repeating the error, so I tried connecting to the
replica and did the same

test2=# select
pglogical.replication_set_remove_table('default','public.b');
ERROR:  replication set mapping -303842815:16726 not found

Is there any way to recover from this situation?


Not really, there is no api yet to remove table from synchronization process so you'd have to manually delete row from pglogical.local_sync_status on subscriber, kill the sync process and remove the slot. I will think about what would be good api to solve this.

The documenation says I can drop a replication set, maybe that will let
replication continue.

+- `pglogical.delete_replication_set(set_name text)`
+  Removes the replication set.
+

select pglogical.delete_replication_set('default');
ERROR:  function pglogical.delete_replication_set(unknown) does not exist
LINE 1: select pglogical.delete_replication_set('default');
                ^
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.

The function is actually pglogical.drop_replication_set , the docs
should be updated.
(note that didn't fix my problem either but then dropping the
subscription did seem to work).

Yeah it doesn't as the problem is on subscriber, replication sets only affect provider. And fixed the docs.




I then re-added the default set to the origin and resubscribed my replica



test2=#  select
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
dbname=test1 port=5436');
  create_subscription
---------------------
           2974019075


I then saw a bunch of
LOG:  worker process: pglogical apply 16542:2974019075 (PID 26778)
exited with exit code 1
ERROR:  subscriber defaultsub initialization failed during
nonrecoverable step (s), please try the setup again
LOG:  worker process: pglogical apply 16542:2974019075 (PID 26779)
exited with exit code 1

in the log but then those stopped and I see

test2=# select  pglogical.show_subscription_status();
show_subscription_status

--------------------------------------------------------------------------------

--------------------------------------------------
  (defaultsub,down,test1,"host=localhost dbname=test1
port=5436",pgl_test2_test1_
defaultsub,"{default,default_insert_only}",{all})
(1 row)


I'm not really sure what to do to 'recover' my cluster at this point so
I'll send this off and rebuild my cluster and start over.


I think the problem here is that you resubscribed with syncrhonize_structure := true while the conflicting structure already existed, that option only works correctly when there is no conflicting structure (we don't try to make diffs or anything, just dump/restore). Recovering should be drop the uninitialized subscription and create new one where you don't synchronize structure.


--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to