hello ...

as my day has worked out quite nicely up to know i thought to f... it up and post a new concept which has been requested by a customer. the goal is to collect some feedback, ideas and so on (not to be mixed up with "flames"). we have funding for this and we are trying to sort out how to do it the best way. comments are welcome ...
note, this is a first draft i want to refine based on some comments.
here we go ...


Partial WAL Replication for PostgreSQL:
---------------------------------------

As of now the PostgreSQL community has provided patches and functionalities
which allow full WAL-based replication as well as hot-standby. To extend this
functionality and to make PostgreSQL even more suitable for "enterprise"
computing than it is today, we have the commitment of a sponsor to fund partial
replication for PostgreSQL 8.5 / 8.6.

This is the first draft of a proposal to make partial WAL-based replication work and to provide and additional set of fancy features to the community which has
been waiting for real in-core replication for a decade or more.


Why partial replication?
------------------------

In some cases people have master servers which contain enormous amounts of data (XX TB or so). If more than just one replica of this data is needed it might happen that different slaves are used for different purposes. This implies that
not all data will be used by all maschines.
An example: Consider a server at a phone company collecting phone calls, billing data, and maybe network routing data. Data is used by different department and
one maschine is not enough to serve all three departments. With the new
functionality proposed here we could make 3 replicas each holding just a group of tables for specific tasks thus allowing people to buy cheaper hardware for
slaves and use more maschines instead.


Current status:
---------------

Hot-standy and streaming replication have been a huge leap step forward for the community and what is proposed here will be an extension to those patches and functionalities. This concept is NOT aimed to replace anything - it is mainly an
addon.


Nodes and replication filters:
------------------------------

As of 8.4 standby systems are done by creating an archive_command along with a
base backup. Although it is easy to do some users still reported some
difficulties due to a total misunderstanding of PITR.

The idea is to add a functionality to add slaves like this:

CREATE REPLICA node_name
   CONNECT FROM SLAVE 'connect_string'
   TRANSFER COMMAND 'command'
   [ USING replication_filter ];

'command' would be any shell script copying data from the local master to the new database node called node_name. Replication filters can be used to make X
replicas contain the same tables. Filtersets can be created like this:

CREATE REPLICATION FILTER filter_name
   [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ];

Replication filters can be modified ...

ALTER REPLICATION FILTER filter_name RENAME TO new_filtername;
ALTER REPLICATION FILTER filter_name
   { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object;

Filter sets can be dropped like this ...

DROP REPLICATION FILTER filter_name;

Internally CREATE REPLICA would initiate a base backup to the new slave server just like we would do it manually otherwise. The server would automatically use the user defined 'command' to copy one file after the other to the slave box. The idea is basically stolen from archive_command and friends. At this stage we either copy the entire instance as we would do it with a normal base backup or
just what is needed (defined by the replication filter). Users would
automatically only copy data to a slave which is really needed there and which matches their filter config. If the copy is done, we can register the new node
inside a system table and commit the transaction. Also, we can automatically
create a useful recovery.conf setup - we know how to connect from the slave to
the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper
recovery.conf file).

Tables can easily be added or removed from a replication filter with ALTER
REPLICATION FILTER.

Replicas can be removed easily:

DROP REPLICA node_name;

Why SQL to add a node? We are convinced that this is the most simplistic way of
doing things.  It is the most intuitive way of doing things.  We believe it
gives users a real feeling of simplicity. The current way of doing base backups should stay in place as it is - it has proven to be nice for countless tasks. However, it is not suitable for managing 10 or more replicas easily. Especially
not when they are not full blown copies of the master.


Technical ideas:
----------------

System tables:

We suggest to always replicate the entire system catalog. It woulde be a total disaster to try some other implementation. The same applies for other tables - we
always replicate entire tables; no WHERE-clauses allowed when it comes to
replicating any table.
How can a query on the slave figure out if a table is around? The slave just to know "who it is". Then it can lookup easily from the replication filter it is using if a table is actually physically in place or not. If a table is not in
place, we can easily error out.


Remove a table from the slave:

This is not too hard; the master received the command to kill a table the slave. We will send a request to remove all storage files related to a table and adjust the replication filter to make sure that the slave will not replay content of
this table anymore.


Add a table to a slave:

This is slightly more tricky. We start collecting WAL for a table, stop shipping WAL, use the TRANSFER COMMAND to copy the files related to the table added and
resume recovery / sending once the storage file is on the slave.


Addition stuff:

Of course there are many more consistency considerations here. We cannot
replicate an index if the table is not present, etc.


   many thanks,

      hans


--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to