Andres, nice job on the writeup.
I think one aspect you are missing is that there must be some way for
the multi-masters to
re-stabilize their data sets and quantify any data loss. You cannot do
this without
some replication intelligence in each row of each table so that no
matter how disastrous
the hardware/internet failure in the cloud, the system can HEAL itself
and keep going, no human beings involved.
I am laying down a standard design pattern of columns for each row:
MKEY - Primary key guaranteed unique across ALL nodes in the CLOUD with
NODE information IN THE KEY. (A876543 vs B876543 or whatever)(network
link UP or DOWN)
CSTP - create time stamp on unix time stamp
USTP - last update time stamp based on unix time stamp
UNODE - Node that updated this record
Many applications already need the above information, might as well
standardize it so external replication logic processing can self heal.
Postgresql tables have optional 32 bit int OIDs, you may want consider
having a replication version of the ROID, replication object ID and then
externalize the primary
key generation into a loadable UDF.
Of course, ALL the nodes must be in contact with each other not allowing
signficant drift on their clocks while operating. (NTP is a starter)
I just do not know of any other way to add self healing without the
above information, regardless of whether you hold up transactions for
synchronous
or let them pass thru asynch. Regardless if you are getting your
replication data from the WAL stream or thru the client libraries.
Also, your replication model does not really discuss busted link
replication operations, where is the intelligence for that in the
operation diagram?
Everytime you package up replication into the core, someone has to tear
into that pile to add some extra functionality, so definitely think
about providing sensible hooks for that extra bit of customization to
override the base function.
Cheers,
marco
On 9/22/2012 11:00 AM, Andres Freund wrote:
This time I really attached both...