[GENERAL] bdr_init_copy fails when starting 2nd BDR node

2014-12-30 Thread John Casey
I've been having issues while attempting to begin BDR replication. If I set
up the main node, then use bdr_init_copy, it always fails on second node, as
shown below.

 

postgres$ rm -Rf $PGDATA

postgres$ echo db_password | pg_basebackup -X stream -h main_node_ip -p 5432
-U username -D $PGDATA

postgres$ cp $HOME/backup/postgresql.conf $PGDATA

postgres$ bdr_init_copy -U username -D $PGDATA

bdr_init_copy: starting...

Assigning new system identifier: 6098464173726284030...

Creating primary replication slots...

Creating restore point...

Could not connect to the remote server: could not connect to server: No such
file or directory

Is the server running locally and accepting

connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

 

If I start both servers simply by using pg_ctl using conf set up for
replication, I get the following error on the main node:.

 

LOG:  starting background worker process "bdr
(6098483684958107256,1,16384,): dr: apply"

CONTEXT:  slot "bdr_16384_6098483684958107256_1_16384__", output plugin
"bdr", in the startup callback

ERROR:  data stream ended

LOG:  worker process: bdr (6098483684958107256,1,16384,): dr: apply (PID
6294) exited with exit code 1

 

. and, I get the following error on the second node:

 

ERROR:  bdr output plugin: slot creation rejected, bdr.bdr_nodes entry for
local node (sysid=6098483778037269710, timelineid=1, dboid=16384):
status='i', bdr still starting up: applying initial dump of remote node

HINT:  Monitor pg_stat_activity and the logs, wait until the node has caught
up

CONTEXT:  slot "bdr_16384_6098483684958107256_1_16384__", output plugin
"bdr", in the startup callback

LOG:  could not receive data from client: Connection reset by peer

 

It will keep cycling these errors indefinitely.

 

I have gotten this working off and on; but, I keep running into this issue.
I am on CentOS 6.5.  Both servers can execute psql against the databases on
other nodes when not configured for replication, so it is not a connectivity
or firewall issue. I have installed using the beta2 rpm as well as built it
from source for rc1 (bdr stable).

 

Any ideas?

 



Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

2014-12-30 Thread John Casey

> What was your bdr config at this point? The error message indicates that
it tries to
> connect to port 5432 on localhost - but the copy was taken from
'main_node_ip'.
> Perhaps you forgot to specify the ehost in the config?

# Here is my conf on the DR server (where I am running bdr_init_copy)
bdr.connections = 'primary'
bdr.primary_dsn = 'dbname=my_db host=primary_ip user=my_username  port=5432'
bdr.primary_init_replica = on
bdr.primary_replica_local_dsn = 'dbname=my_db user=my_username port=5432'

# For reference, here is the conf on my Primary server:
bdr.connections = 'dr'
bdr.dr_dsn = 'dbname=my_db host=dr_ip user=my_username  port=5432'

> What does 'git describe --tags' return?

bdr-pg/REL9_4beta3-1-120-ga2725dd

-Original Message-
From: Andres Freund [mailto:and...@2ndquadrant.com] 
Sent: Tuesday, December 30, 2014 12:57 PM
To: John Casey
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

Hi,

On 2014-12-29 23:51:05 -0500, John Casey wrote:
> I've been having issues while attempting to begin BDR replication. If 
> I set up the main node, then use bdr_init_copy, it always fails on 
> second node, as shown below.
> 
>  
> 
> postgres$ rm -Rf $PGDATA
> 
> postgres$ echo db_password | pg_basebackup -X stream -h main_node_ip 
> -p 5432 -U username -D $PGDATA
> 
> postgres$ cp $HOME/backup/postgresql.conf $PGDATA
> 
> postgres$ bdr_init_copy -U username -D $PGDATA
> 
> bdr_init_copy: starting...
> 
> Assigning new system identifier: 6098464173726284030...
> 
> Creating primary replication slots...
> 
> Creating restore point...
> 
> Could not connect to the remote server: could not connect to server: 
> No such file or directory
> 
> Is the server running locally and accepting
> 
> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

What was your bdr config at this point? The error message indicates that it
tries to connect to port 5432 on localhost - but the copy was taken from
'main_node_ip'. Perhaps you forgot to specify th ehost in the config?

What does 'git describe --tags' return?

Greetings,

Andres Freund



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Loading Data Dumps for Tables Containing BDR Global Sequence

2015-01-05 Thread John Casey
We have been trying to load our existing database that contains local
sequences into a BDR-based database with global sequences. So, we update our
schema in the new database to USING bdr for all sequences. We make certain
all tables have a primary key. Then we dump our data from our existing
database.

 

We dump our existing database, including schema. We update the existing
local sequences to USING bdr and remove the other initialization parameters.
Then we must remove all setval's that occur after the COPY statements;
because, BDR does not like them. We can load this data; but, the global
sequences are all initialized to 1 on our primary database and 15000 on our
DR database.

 

We have went so far as to write a program that selects the max(id) on each
table; then, it calls nextval until it reaches a value greater to or equal
the max(id). This has to be run on all BDR nodes. Needless to say, this is
slow and is a bad solution.

 

There just doesn't seem to be a good way to accomplish this operation, at
least not a documented way I have been able to find. 



Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

2015-01-05 Thread John Casey
I'm still experiencing similar problems. I'm not certain what parameter you
are referring to when you say 'ehost'. Otherwise,  I did want to clarify a
couple of things. I have tried several combinations, each one fails in
various ways. So ...

(1) What is the exact syntax when calling bdr_init_copy from new nodes when
your database name is not 'postgres' and your user name is not 'postgres'.
Please note if you supply local or remote host/port in the command.
(2) Should you do a pg_ctl start on new node before trying to execute
bdr_init_copy. If I don't I get the error I posted earlier.

I've attached the new nodes (dr) postgresql.conf file.

-Original Message-
From: 'Andres Freund' [mailto:and...@2ndquadrant.com] 
Sent: Wednesday, December 31, 2014 5:04 AM
To: John Casey
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] bdr_init_copy fails when starting 2nd BDR node

Hi,

On 2014-12-30 21:12:17 -0500, John Casey wrote:
> 
> > What was your bdr config at this point? The error message indicates 
> > that
> it tries to
> > connect to port 5432 on localhost - but the copy was taken from
> 'main_node_ip'.
> > Perhaps you forgot to specify the ehost in the config?
> 
> # Here is my conf on the DR server (where I am running bdr_init_copy) 
> bdr.connections = 'primary'
> bdr.primary_dsn = 'dbname=my_db host=primary_ip user=my_username
port=5432'
> bdr.primary_init_replica = on
> bdr.primary_replica_local_dsn = 'dbname=my_db user=my_username port=5432'

My guess is that this is the source of the problem - you probably have one
system and one self compiled libpq around or something similar and they
disagree about the location of the unix socket directory. It complains
about:

> > connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

which means given the above configuration it has to be
primary_replica_local_dsn. Could you a) try to explicitly set
unix_socket_directory=/tmp in postgresql.conf and host=/tmp in the above
config?

Also, please attach postgresql.conf.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


postgresql.conf
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] ALTER TABLE to ADD BDR global sequence

2015-01-05 Thread John Casey
While attempting to alter a table to add a global sequence as a primary key
using the following commands:

 

CREATE SEQUENCE my_table_id_seq USING bdr;

ALTER TABLE my_table

ADD COLUMN id integer PRIMARY KEY NOT NULL DEFAULT
nextval('my_table_id_seq'::regclass);

 

I started to notice some issues that concerned me. In order to create the
sequence, I had to have the replication running. To alter the table, I had
to stop replication. The only way I really knew how to do this was to remove
the bdr properties in the postgres configuration file and restart. At that
point, I executed the ALTER TABLE code, when it got to 15000 records,
Postgres informed me that I needed to turn replication back on so the nodes
could agree on additional sequence allocations. When I turned it back on, it
just kind-of wigged out.

 

So, how is this supposed to work? In addition, what happens when you have
very disparate databases that are both updated often and connected
occasionally (which is what we have). Will it quit doing inserts until it is
connected to the other databases again? That would be really bad.



Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

2015-01-07 Thread John Casey
I have been thinking about an alternate means of implementing global sequences 
that I feel would simplify things. 

Rather than chunking out blocks, set an increment value for each sequence equal 
to the number of nodes in the "cluster". Each node has an offset. So, if you 
have 10 nodes, mode 1 has offset 0 and node 10 has offset 9. The first time a 
nextval is requested on a sequence after starting up, it makes certain that its 
value is set where val mod 10 is equal to the nodes offset. If not, it 
increments up to that value. From that point forward, sequences are incremented 
by 10 each time.

This would work even if you added new nodes, and you could add intelligence to 
support setting proper initial sequence values when tables are altered.

I may be overlooking something; but, it seems like a fairly simple solution 
that would work.


On Jan 7, 2015, at 8:25 AM, Craig Ringer  wrote:

> On 01/04/2015 12:33 AM, John Casey wrote:
> While attempting to alter a table to add a global sequence as a primary
> key using the following commands:
> 
> CREATE SEQUENCE my_table_id_seq USING bdr;
> 
> ALTER TABLE my_table
> 
> ADD COLUMN id integer PRIMARY KEY NOT NULL DEFAULT
> nextval('my_table_id_seq'::regclass);

> I started to notice some issues that concerned me. In order to create
> the sequence, I had to have the replication running. To alter the table,
> I had to stop replication. The only way I really knew how to do this was
> to remove the bdr properties in the postgres configuration file and
> restart. At that point, I executed the ALTER TABLE code, when it got to
> 15000 records, Postgres informed me that I needed to turn replication
> back on so the nodes could agree on additional sequence allocations.

In general global sequences don't play well with full table rewrites.
That's why BDR prevents the full table rewrite.

What you need to do is do the ALTER without the NOT NULL DEFAULT. Then
ALTER to add the DEFAULT so new rows get it. Now UPDATE the table in
chunks to allocate IDs where they're null. Finally, once it's fully
populated, ALTER it to add the NOT NULL DEFAULT (...) .

To get rid of the need for this it'd have to be possible to allow
blocking nextval(..) on global sequences, which for internal reasons is
way more complicated than you might expect.

> When I turned it back on, it just kind-of wigged out.

Disabling replication during sequence voting isn't something that's been
specifically tested for. Some details on "wigged out" would be useful,
though.

In general, once BDR is active it's not a great idea to disable it, make
changes, then re-activate it on a database.

> So, how is this supposed to work?

As above - create the sequence, populate IDs in batches, then set the
default and not-null constraint at the end.

> In addition, what happens when you
> have very disparate databases that are both updated often and connected
> occasionally (which is what we have). Will it quit doing inserts until
> it is connected to the other databases again? That would be really bad.

If you're relying on global sequences and your write rates are fairly
high but your databases are only intermittently connected then yes,
you're probably going to have times where you run out of allocated
sequence values.

You may want to use UUID keys instead, or one of the other conventional
approaches.

Down the track some more control over global sequences might be possible
- controlling how early new chunks are allocated, how big the chunks
are, etc. At the moment it's all pretty fixed, and it's really suited to
systems where they're connected most of the time.

-- 
Craig Ringer   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general