On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: >> Oops. >> >> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI >> <horiguchi.kyot...@lab.ntt.co.jp> wrote in >> <20151117.194010.17198448.horiguchi.kyot...@lab.ntt.co.jp> >>> Hello, >>> >>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada.m...@gmail.com> >>> wrote in >>> <CAD21AoC=an+dkynwsjp6coz-6qmhxxuenxvpisxgpxcuxmp...@mail.gmail.com> >>> > >> One question is that what is different between the leading "n" in >>> > >> s_s_names and the leading "n" of "n-priority"? >>> > > >>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names >>> > > representing an arbitrary integer number and that in "n-priority" >>> > > is literally an "n", meaning "a format with any number of >>> > > priority hosts" as a whole. As an instance, >>> > > >>> > > synchronous_replication_method = "n-priority" >>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter" >>> > > >>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so >>> > > if we won't provide "1-priority" for backward compatibility, >>> > > "priority" would be enough to represent the type. >>> > > >>> > > By the way, s_r_method is not essentially necessary but it would >>> > > be important to avoid complexity of autodetection of formats >>> > > including currently undefined ones. >>> > >>> > Than you for your explanation, I understood that. >>> > >>> > It means that the format of s_s_names will be changed, which would be not >>> > good. >>> >>> I believe that the format of definition of "replication set"(?) >>> is not fixed and it would be more complex format to support >>> nested definition. This should be in very different format from >>> the current simple list of names. This is a selection among three >>> or possiblly more disigns in order to be tolerable for future >>> changes, I suppose. >>> >>> 1. Additional formats of definition in future will be stored in >>> elsewhere of s_s_names. >>> >>> 2. Additional format will be stored in s_s_names, the format will >>> be automatically detected. >>> >>> 3. (ditto), the format is designated by s_r_method. >>> >>> 4. Any other way? >>> >>> I choosed the third way. What do you think about future expansion >>> of the format? >>> > > I agree with #3 way and the s_s_name format you suggested. > I think that It's extensible and is tolerable for future changes. > I'm going to implement the patch based on this idea if other hackers > agree with this design. >
Please find the attached draft patch which supports multi sync replication. This patch adds a GUC parameter synchronous_replication_method, which represent the method of synchronous replication. [Design of replication method] synchronous_replication_method has two values; 'priority' and '1-priority' for now. We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future. * s_r_method = '1-priority' This method is for backward compatibility, so the syntax of s_s_names is same as today. The behavior is same as well. * s_r_method = 'priority' This method is for multiple synchronous replication using priority method. The syntax of s_s_names is, <number of sync standbys>, <standby name> [, ...] For example, s_r_method = 'priority' and s_s_names = '2, node1, node2, node3' means that the master waits for acknowledge from at least 2 lowest priority servers. If 4 standbys(node1 - node4) are available, the master server waits acknowledge from 'node1' and 'node2. The each status of wal senders are; =# select application_name, sync_state from pg_stat_replication order by application_name; application_name | sync_state ------------------+------------ node1 | sync node2 | sync node3 | potential node4 | async (4 rows) After 'node2' crashed, the master will wait for acknowledge from 'node1' and 'node3'. The each status of wal senders are; =# select application_name, sync_state from pg_stat_replication order by application_name; application_name | sync_state ------------------+------------ node1 | sync node3 | sync node4 | async (3 rows) [Changing replication method] When we want to change the replication method, we have to change the s_r_method at first, and then do pg_reload_conf(). After changing replication method, we can change the s_s_names. [Expanding replication method] If we want to expand new replication method additionally, we need to implement two functions for each replication method: * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys) This function obtains the list of standbys considered as synchronous at that time, and return its length. * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos) This function obtains LSNs(write, flush) considered as synced. Also, this patch debug code is remain yet, you can debug this behavior using by enable DEBUG_REPLICATION macro. Please give me feedbacks. Regards, -- Masahiko Sawada
000_multi_sync_replication_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers