On Wed, Dec 9, 2015 at 8:59 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: >> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI >> <horiguchi.kyot...@lab.ntt.co.jp> wrote: >>> Oops. >>> >>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI >>> <horiguchi.kyot...@lab.ntt.co.jp> wrote in >>> <20151117.194010.17198448.horiguchi.kyot...@lab.ntt.co.jp> >>>> Hello, >>>> >>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada >>>> <sawada.m...@gmail.com> wrote in >>>> <CAD21AoC=an+dkynwsjp6coz-6qmhxxuenxvpisxgpxcuxmp...@mail.gmail.com> >>>> > >> One question is that what is different between the leading "n" in >>>> > >> s_s_names and the leading "n" of "n-priority"? >>>> > > >>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names >>>> > > representing an arbitrary integer number and that in "n-priority" >>>> > > is literally an "n", meaning "a format with any number of >>>> > > priority hosts" as a whole. As an instance, >>>> > > >>>> > > synchronous_replication_method = "n-priority" >>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter" >>>> > > >>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so >>>> > > if we won't provide "1-priority" for backward compatibility, >>>> > > "priority" would be enough to represent the type. >>>> > > >>>> > > By the way, s_r_method is not essentially necessary but it would >>>> > > be important to avoid complexity of autodetection of formats >>>> > > including currently undefined ones. >>>> > >>>> > Than you for your explanation, I understood that. >>>> > >>>> > It means that the format of s_s_names will be changed, which would be >>>> > not good. >>>> >>>> I believe that the format of definition of "replication set"(?) >>>> is not fixed and it would be more complex format to support >>>> nested definition. This should be in very different format from >>>> the current simple list of names. This is a selection among three >>>> or possiblly more disigns in order to be tolerable for future >>>> changes, I suppose. >>>> >>>> 1. Additional formats of definition in future will be stored in >>>> elsewhere of s_s_names. >>>> >>>> 2. Additional format will be stored in s_s_names, the format will >>>> be automatically detected. >>>> >>>> 3. (ditto), the format is designated by s_r_method. >>>> >>>> 4. Any other way? >>>> >>>> I choosed the third way. What do you think about future expansion >>>> of the format? >>>> >> >> I agree with #3 way and the s_s_name format you suggested. >> I think that It's extensible and is tolerable for future changes. >> I'm going to implement the patch based on this idea if other hackers >> agree with this design. >> > > Please find the attached draft patch which supports multi sync replication. > This patch adds a GUC parameter synchronous_replication_method, which > represent the method of synchronous replication. > > [Design of replication method] > synchronous_replication_method has two values; 'priority' and > '1-priority' for now. > We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future. > > * s_r_method = '1-priority' > This method is for backward compatibility, so the syntax of s_s_names > is same as today. > The behavior is same as well. > > * s_r_method = 'priority' > This method is for multiple synchronous replication using priority method. > The syntax of s_s_names is, > <number of sync standbys>, <standby name> [, ...] > > For example, s_r_method = 'priority' and s_s_names = '2, node1, node2, > node3' means that the master waits for acknowledge from at least 2 > lowest priority servers. > If 4 standbys(node1 - node4) are available, the master server waits > acknowledge from 'node1' and 'node2. > The each status of wal senders are; > > =# select application_name, sync_state from pg_stat_replication order > by application_name; > application_name | sync_state > ------------------+------------ > node1 | sync > node2 | sync > node3 | potential > node4 | async > (4 rows) > > After 'node2' crashed, the master will wait for acknowledge from > 'node1' and 'node3'. > The each status of wal senders are; > > =# select application_name, sync_state from pg_stat_replication order > by application_name; > application_name | sync_state > ------------------+------------ > node1 | sync > node3 | sync > node4 | async > (3 rows) > > [Changing replication method] > When we want to change the replication method, we have to change the > s_r_method at first, and then do pg_reload_conf(). > After changing replication method, we can change the s_s_names. > > [Expanding replication method] > If we want to expand new replication method additionally, we need to > implement two functions for each replication method: > * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys) > This function obtains the list of standbys considered as synchronous > at that time, and return its length. > * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos) > This function obtains LSNs(write, flush) considered as synced. > > Also, this patch debug code is remain yet, you can debug this behavior > using by enable DEBUG_REPLICATION macro. > > Please give me feedbacks. >
I've attached updated patch. Please give me feedbacks. Regards, -- Masahiko Sawada
000_multi_sync_replication_v2.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers