Re: [HACKERS] [PATCH 16/16] current version of the design document

Merlin Moncure Wed, 13 Jun 2012 08:40:20 -0700

On Wed, Jun 13, 2012 at 9:40 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> Hi Merlin,
>
> On Wednesday, June 13, 2012 04:21:12 PM Merlin Moncure wrote:
>> On Wed, Jun 13, 2012 at 6:28 AM, Andres Freund <and...@2ndquadrant.com>
> wrote:
>> > +synchronized catalog at the decoding site. That adds some complexity to
>> > use +cases like replicating into a different database or cross-version
>> > +replication. For those it is relatively straight-forward to develop a
>> > proxy pg +instance that only contains the catalog and does the
>> > transformation to textual +changes.
>> wow.  Anyways, could you elaborate on a little on how this proxy
>> instance concept would work?
> To do the decoding into another form you need an up2date catalog + correct
> binaries. So the idea would be to have a minimal instance which is just a copy
> of the database with all the tables with an oid < FirstNormalObjectId i.e.
> only the catalog tables. Then you can apply all xlog changes on system tables
> using the existing infrastructure for HS (or use the command trigger
> equivalent we need to build for BDR) and decode everything else into the
> ApplyCache just as done in the patch. Then you would fill out the callbacks
> for the ApplyCache (see patch 14/16 and 15/16 for an example) to do whatever
> you want with the data. I.e. generate plain sql statements or run some
> transform procedure.
>
>> Let's take the case where I have N small-ish schema identical database
>> shards that I want to aggregate into a single warehouse -- something that
>> HS/SR currently can't do.
>> There's a lot of ways to do that obviously but assuming the warehouse
>> would have to have a unique schema, could it be done in your
>> architecture?
> Not sure what you mean by the warehouse having a unique schema? It has the
> same schema as the OLTP counterparts? That would obviously be the easy case if
> you take care and guarantee uniqueness of keys upfront. That basically would
> be trivial ;)


by unique I meant 'not the same as the shards' -- presumably this
would mean one of
a) each shard's data would be in a private schema folder
or
b) you'd have one set of tables but decorated with an extra shard
identifying column that would to be present in all keys to get around
uniqueness issues

> It gets a bit more complex if you need to transform the data for the
> warehouse. I don't plan to put in work to make that possible without some C
> coding (filling out the callbacks and doing the work in there). It shouldn't
> need much though.
>
> Does that answer your question?

yes.  Do you envision it would be possible to wrap the ApplyCache
callbacks in a library that could be exposed as an extension?  For
example, a library that would stick the replication data into a queue
that a userland (non C) process could walk, transform, etc?   I know
that's vague -- my general thrust here is that I find the
transformation features particularly interesting and I'm wondering how
much C coding would be needed to access them in the long term.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 16/16] current version of the design document

Reply via email to