Re: [HACKERS] Support for N synchronous standby servers - take 2

Masahiko Sawada Fri, 11 Dec 2015 09:06:09 -0800

On Wed, Dec 9, 2015 at 8:59 PM, Masahiko Sawada <[email protected]> wrote:
> On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <[email protected]> 
> wrote:
>> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
>> <[email protected]> wrote:
>>> Oops.
>>>
>>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>>> <[email protected]> wrote in 
>>> <[email protected]>
>>>> Hello,
>>>>
>>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada 
>>>> <[email protected]> wrote in 
>>>> <CAD21AoC=an+dkynwsjp6coz-6qmhxxuenxvpisxgpxcuxmp...@mail.gmail.com>
>>>> > >> One question is that what is different between the leading "n" in
>>>> > >> s_s_names and the leading "n" of "n-priority"?
>>>> > >
>>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>>>> > > representing an arbitrary integer number and that in "n-priority"
>>>> > > is literally an "n", meaning "a format with any number of
>>>> > > priority hosts" as a whole. As an instance,
>>>> > >
>>>> > > synchronous_replication_method = "n-priority"
>>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>>>> > >
>>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>>>> > > if we won't provide "1-priority" for backward compatibility,
>>>> > > "priority" would be enough to represent the type.
>>>> > >
>>>> > > By the way, s_r_method is not essentially necessary but it would
>>>> > > be important to avoid complexity of autodetection of formats
>>>> > > including currently undefined ones.
>>>> >
>>>> > Than you for your explanation, I understood that.
>>>> >
>>>> > It means that the format of s_s_names will be changed, which would be 
>>>> > not good.
>>>>
>>>> I believe that the format of definition of "replication set"(?)
>>>> is not fixed and it would be more complex format to support
>>>> nested definition. This should be in very different format from
>>>> the current simple list of names. This is a selection among three
>>>> or possiblly more disigns in order to be tolerable for future
>>>> changes, I suppose.
>>>>
>>>> 1. Additional formats of definition in future will be stored in
>>>>    elsewhere of s_s_names.
>>>>
>>>> 2. Additional format will be stored in s_s_names, the format will
>>>>    be automatically detected.
>>>>
>>>> 3. (ditto), the format is designated by s_r_method.
>>>>
>>>> 4. Any other way?
>>>>
>>>> I choosed the third way. What do you think about future expansion
>>>> of the format?
>>>>
>>
>> I agree with #3 way and the s_s_name format you suggested.
>> I think that It's extensible and is tolerable for future changes.
>> I'm going to implement the patch based on this idea if other hackers
>> agree with this design.
>>
>
> Please find the attached draft patch which supports multi sync replication.
> This patch adds a GUC parameter synchronous_replication_method, which
> represent the method of synchronous replication.
>
> [Design of replication method]
> synchronous_replication_method has two values; 'priority' and
> '1-priority' for now.
> We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.
>
> * s_r_method = '1-priority'
> This method is for backward compatibility, so the syntax of s_s_names
> is same as today.
> The behavior is same as well.
>
> * s_r_method = 'priority'
> This method is for multiple synchronous replication using priority method.
> The syntax of s_s_names is,
>    <number of sync standbys>, <standby name> [, ...]
>
> For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
> node3' means that the master waits for  acknowledge from at least 2
> lowest priority servers.
> If 4 standbys(node1 - node4) are available, the master server waits
> acknowledge from 'node1' and 'node2.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1            | sync
> node2            | sync
> node3            | potential
> node4            | async
> (4 rows)
>
> After 'node2' crashed, the master will wait for acknowledge from
> 'node1' and 'node3'.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1            | sync
> node3            | sync
> node4            | async
> (3 rows)
>
> [Changing replication method]
> When we want to change the replication method, we have to change the
> s_r_method  at first, and then do pg_reload_conf().
> After changing replication method, we can change the s_s_names.
>
> [Expanding replication method]
> If we want to expand new replication method additionally, we need to
> implement two functions for each replication method:
> * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
>   This function obtains the list of standbys considered as synchronous
> at that time, and return its length.
> * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>   This function obtains LSNs(write, flush) considered as synced.
>
> Also, this patch debug code is remain yet, you can debug this behavior
> using by enable DEBUG_REPLICATION macro.
>
> Please give me feedbacks.
>


I've attached updated patch.
Please give me feedbacks.

Regards,

--
Masahiko Sawada

000_multi_sync_replication_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

Reply via email to