On 2016-01-09 19:30, Steve Singer wrote:\
I am going to send my comments/issues out in batches as I find them instead of waiting till I look over everything.
Thanks for looking at this! Yes going in batches/steps makes sense, this is huge patch.
I find this part of the documentation a bit unclear +Once the provider node is setup, subscribers can be subscribed to it. First the +subscriber node must be created: + + SELECT pglogical.create_node( + node_name := 'subscriber1', + dsn := 'host=thishost port=5432 dbname=db' + ); + My initial reading was that I should execute this on the provider node. Perhaps instead ----------------- Once the provider node is setup you can then create subscriber nodes. Create the subscriber nodes and then execute the following commands on each subscriber node create extension pglogical select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost dbname=db port=5432'); -------------------
Makes sense I guess, this is probably relic of how this internally evolved (we used to have providers and subscribers before we merged them into nodes).
Also the documentation for create_subscription talks about + - `synchronize_structure` - specifies if to synchronize structure from + provider to the subscriber, default true
Not sure what's your comment on this.
I did the following test2=# select pglogical.create_subscription(subscription_name:='default sub',provider_dsn:='host=localhost dbname=test1 port=5436'); create_subscription --------------------- 247109879 Which then resulted in the following showing up in my PG log LOG: worker process: pglogical apply 16542:247109879 (PID 4079) exited with exit code 1 ERROR: replication slot name "pgl_test2_test1_default sub" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. FATAL: could not send replication command "CREATE_REPLICATION_SLOT "pgl_test2_test1_default sub" LOGICAL pglogical_output": status PGRES_FATAL_ERROR: ERROR: replication slot name "pgl_test2_test1_default sub" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. The create_subscription command should check if the subscription name is valid (meets the rules that will be applied against the slot command).
Yes, fixed. Also added some other sensitization code since we also use dbname in slot name and that can contain whatever.
I wondered how I could fix my mistake. The docs say +- `pglogical.pglogical_drop_subscription(subscription_name name, ifexists bool)` + Disconnects the subscription and removes it from the catalog. + test2=# select pglogical.pglogical_drop_subscription('default sub', true); ERROR: function pglogical.pglogical_drop_subscription(unknown, boolean) does not exist The command is actually called pglogical.drop_subscription the docs should be fixed to show the actual command name
Yep, got this from other people as well, fixed.
I then wanted to add a second table to my database. ('b'). select pglogical.replication_set_add_table('default','public.b',true); replication_set_add_table --------------------------- t (1 row) In my pglog I then got LOG: starting sync of table public.b for subscriber defaultsub ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. FATAL: could not send replication command "CREATE_REPLICATION_SLOT "pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status PGRES_FATAL_ERROR: ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character.
Right, needed the sensitization as well (I am actually using the hash now as there is only 8 chars left anyway).
I then did test1=# select pglogical.replication_set_remove_table('default','public.b'); replication_set_remove_table ------------------------------ t (1 row) but my log still keep repeating the error, so I tried connecting to the replica and did the same test2=# select pglogical.replication_set_remove_table('default','public.b'); ERROR: replication set mapping -303842815:16726 not found Is there any way to recover from this situation?
Not really, there is no api yet to remove table from synchronization process so you'd have to manually delete row from pglogical.local_sync_status on subscriber, kill the sync process and remove the slot. I will think about what would be good api to solve this.
The documenation says I can drop a replication set, maybe that will let replication continue. +- `pglogical.delete_replication_set(set_name text)` + Removes the replication set. + select pglogical.delete_replication_set('default'); ERROR: function pglogical.delete_replication_set(unknown) does not exist LINE 1: select pglogical.delete_replication_set('default'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. The function is actually pglogical.drop_replication_set , the docs should be updated. (note that didn't fix my problem either but then dropping the subscription did seem to work).
Yeah it doesn't as the problem is on subscriber, replication sets only affect provider. And fixed the docs.
I then re-added the default set to the origin and resubscribed my replica test2=# select pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost dbname=test1 port=5436'); create_subscription --------------------- 2974019075 I then saw a bunch of LOG: worker process: pglogical apply 16542:2974019075 (PID 26778) exited with exit code 1 ERROR: subscriber defaultsub initialization failed during nonrecoverable step (s), please try the setup again LOG: worker process: pglogical apply 16542:2974019075 (PID 26779) exited with exit code 1 in the log but then those stopped and I see test2=# select pglogical.show_subscription_status(); show_subscription_status -------------------------------------------------------------------------------- -------------------------------------------------- (defaultsub,down,test1,"host=localhost dbname=test1 port=5436",pgl_test2_test1_ defaultsub,"{default,default_insert_only}",{all}) (1 row) I'm not really sure what to do to 'recover' my cluster at this point so I'll send this off and rebuild my cluster and start over.
I think the problem here is that you resubscribed with syncrhonize_structure := true while the conflicting structure already existed, that option only works correctly when there is no conflicting structure (we don't try to make diffs or anything, just dump/restore). Recovering should be drop the uninitialized subscription and create new one where you don't synchronize structure.
-- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers