[HACKERS] draft RFC: concept for partial, wal-based replication

Hans-Juergen Schoenig -- PostgreSQL Fri, 30 Oct 2009 09:19:47 -0700

hello ...

as my day has worked out quite nicely up to know i thought to f... it upand post a new concept which has been requested by a customer. the goalis to collect some feedback, ideas and so on (not to be mixed up with"flames").we have funding for this and we are trying to sort out how to do it thebest way. comments are welcome ...

note, this is a first draft i want to refine based on some comments.
here we go ...



Partial WAL Replication for PostgreSQL:
---------------------------------------

As of now the PostgreSQL community has provided patches and functionalities

which allow full WAL-based replication as well as hot-standby. To extendthis

functionality and to make PostgreSQL even more suitable for "enterprise"

computing than it is today, we have the commitment of a sponsor to fundpartial

replication for PostgreSQL 8.5 / 8.6.

This is the first draft of a proposal to make partial WAL-basedreplication workand to provide and additional set of fancy features to the communitywhich has

been waiting for real in-core replication for a decade or more.


Why partial replication?
------------------------

In some cases people have master servers which contain enormous amountsof data(XX TB or so). If more than just one replica of this data is needed itmighthappen that different slaves are used for different purposes. Thisimplies that

not all data will be used by all maschines.

An example: Consider a server at a phone company collecting phone calls,billingdata, and maybe network routing data. Data is used by differentdepartment and

one maschine is not enough to serve all three departments. With the new

functionality proposed here we could make 3 replicas each holding just agroupof tables for specific tasks thus allowing people to buy cheaperhardware for

slaves and use more maschines instead.


Current status:
---------------

Hot-standy and streaming replication have been a huge leap step forwardfor thecommunity and what is proposed here will be an extension to thosepatches andfunctionalities. This concept is NOT aimed to replace anything - it ismainly an

addon.


Nodes and replication filters:
------------------------------

As of 8.4 standby systems are done by creating an archive_command alongwith a

base backup. Although it is easy to do some users still reported some
difficulties due to a total misunderstanding of PITR.

The idea is to add a functionality to add slaves like this:

CREATE REPLICA node_name
   CONNECT FROM SLAVE 'connect_string'
   TRANSFER COMMAND 'command'
   [ USING replication_filter ];

'command' would be any shell script copying data from the local masterto thenew database node called node_name. Replication filters can be used tomake X

replicas contain the same tables. Filtersets can be created like this:

CREATE REPLICATION FILTER filter_name
   [ EMPTY | FULL ] [ INCLUDE | EXCLUDE CHANGES ];

Replication filters can be modified ...

ALTER REPLICATION FILTER filter_name RENAME TO new_filtername;
ALTER REPLICATION FILTER filter_name
   { ADD | REMOVE } { TABLE | INDEX | SEQUENCE } object;

Filter sets can be dropped like this ...

DROP REPLICATION FILTER filter_name;

Internally CREATE REPLICA would initiate a base backup to the new slaveserverjust like we would do it manually otherwise. The server wouldautomatically usethe user defined 'command' to copy one file after the other to the slavebox.The idea is basically stolen from archive_command and friends. At thisstage weeither copy the entire instance as we would do it with a normal basebackup or

just what is needed (defined by the replication filter). Users would

automatically only copy data to a slave which is really needed there andwhichmatches their filter config. If the copy is done, we can register thenew node

inside a system table and commit the transaction. Also, we can automatically

create a useful recovery.conf setup - we know how to connect from theslave to

the master (we can use ' CONNECT FROM SLAVE [ USING ] ' to write a proper
recovery.conf file).

Tables can easily be added or removed from a replication filter with ALTER
REPLICATION FILTER.

Replicas can be removed easily:

DROP REPLICA node_name;

Why SQL to add a node? We are convinced that this is the most simplisticway of

doing things.  It is the most intuitive way of doing things.  We believe it

gives users a real feeling of simplicity. The current way of doing basebackupsshould stay in place as it is - it has proven to be nice for countlesstasks.However, it is not suitable for managing 10 or more replicas easily.Especially

not when they are not full blown copies of the master.


Technical ideas:
----------------

System tables:

We suggest to always replicate the entire system catalog. It woulde bea totaldisaster to try some other implementation. The same applies for othertables - we

always replicate entire tables; no WHERE-clauses allowed when it comes to
replicating any table.

How can a query on the slave figure out if a table is around? The slavejust toknow "who it is". Then it can lookup easily from the replication filterit isusing if a table is actually physically in place or not. If a table isnot in

place, we can easily error out.


Remove a table from the slave:

This is not too hard; the master received the command to kill a tablethe slave.We will send a request to remove all storage files related to a tableand adjustthe replication filter to make sure that the slave will not replaycontent of

this table anymore.


Add a table to a slave:

This is slightly more tricky. We start collecting WAL for a table, stopshippingWAL, use the TRANSFER COMMAND to copy the files related to the tableadded and

resume recovery / sending once the storage file is on the slave.


Addition stuff:

Of course there are many more consistency considerations here. We cannot
replicate an index if the table is not present, etc.


   many thanks,

      hans


--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] draft RFC: concept for partial, wal-based replication

Reply via email to