On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.m...@gmail.com> >> wrote: >>> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier >>> <michael.paqu...@gmail.com> wrote: >>>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmh...@gmail.com> wrote: >>>>> You could do that, but first I would code up the simplest, cleanest >>>>> algorithm you can think of and see if it even shows up in a 'perf' >>>>> profile. Microbenchmarking is probably overkill here unless a problem >>>>> is visible on macrobenchmarks. >>>> >>>> This is what I would go for! The current code is doing a simple thing: >>>> select the Nth element using qsort() after scanning each WAL sender's >>>> values. And I think that Sawada-san got it right. Even running on my >>>> laptop a pgbench run with 10 sync standbys using a data set that fits >>>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >>>> using perf top on a non-assert, non-debug build. Hash tables and >>>> allocations get a far larger share. Using the patch, >>>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >>>> nodes. Let's kick the ball for now. An extra patch could make things >>>> better later on if that's worth it. >>> >>> Yeah, since the both K and N could be not large these algorithm takes >>> almost the same time. And current patch does simple thing. When we >>> need over 100 or 1000 replication node the optimization could be >>> required. >>> Attached latest v9 patch. >>> >> >> Few comments: > > Thank you for reviewing. > >> + * In 10.0 we support two synchronization methods, priority and >> + * quorum. The number of synchronous standbys that transactions >> + * must wait for replies from and synchronization method are specified >> + * in synchronous_standby_names. This parameter also specifies a list >> + * of standby names, which determines the priority of each standby for >> + * being chosen as a synchronous standby. In priority method, the standbys >> + * whose names appear earlier in the list are given higher priority >> + * and will be considered as synchronous. Other standby servers appearing >> + * later in this list represent potential synchronous standbys. If any of >> + * the current synchronous standbys disconnects for whatever reason, >> + * it will be replaced immediately with the next-highest-priority standby. >> + * In quorum method, the all standbys appearing in the list are >> + * considered as a candidate for quorum commit. >> >> In the above description, is priority method represented by FIRST and >> quorum method by ANY in the synchronous_standby_names syntax? If so, >> it might be better to write about it explicitly. > > Added description. > >> >> 2. >> --- a/src/backend/utils/sort/tuplesort.c >> +++ b/src/backend/utils/sort/tuplesort.c >> @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) >> } >> >> /* >> - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple >> - * from each input tape. >> - */ >> - state->memtupsize = numInputTapes; >> - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); >> - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); >> - >> - /* >> - * Use all the remaining memory we have available for read buffers among >> - * the input tapes. >> + * Use all the spare memory we have available for read buffers among the >> + * input tapes. >> >> This doesn't belong to this patch. > > Oops, fixed. > >> 3. >> + * Return the list of sync standbys using according to synchronous method, >> >> In above sentence, "using according" seems to either incomplete or wrong >> usage. > > Fixed. > >> 4. >> + else >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + "invalid synchronization method is specified \"%d\"", >> + SyncRepConfig->sync_method)); >> >> Here, the error message doesn't seem to aligned and you might want to >> use errmsg for the same. >> > > Fixed. > >> 5. >> + * In priority method, we need the oldest these positions among sync >> + * standbys. In quorum method, we need the newest these positions >> + * specified by SyncRepConfig->num_sync. >> >> /oldest these/oldest of these >> /newest these positions specified/newest of these positions as specified > > Fixed. > >> Instead of newest, can we consider to use latest? > > Yeah, I changed it so. > > >> 6. >> + int sync_method; /* synchronization method */ >> /* member_names contains nmembers consecutive nul-terminated C strings */ >> char member_names[FLEXIBLE_ARRAY_MEMBER]; >> } SyncRepConfigData; >> >> Can't we use 1 or 2 bytes to store sync_method information? > > I changed it to uint8. > > Attached latest v10 patch incorporated the review comments so far. > Please review it.
Thanks for updating the patch! Do we need to update postgresql.conf.sample? +{any_ident} { + yylval.str = pstrdup(yytext); + return ANY; + } +{first_ident} { + yylval.str = pstrdup(yytext); + return FIRST; + } Why is pstrdup(yytext) necessary here? + standby_list { $$ = create_syncrep_config("1", $1, SYNC_REP_PRIORITY); } + | NUM '(' standby_list ')' { $$ = create_syncrep_config($1, $3, SYNC_REP_QUORUM); } + | ANY NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_QUORUM); } + | FIRST NUM '(' standby_list ')' { $$ = create_syncrep_config($2, $4, SYNC_REP_PRIORITY); } Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works differently from curent version while "list" works in the same way as current one) very confusing? I prefer to either of 1. break the backward-compatibility, i.e., treat the first syntax of "standby_list" as quorum commit 2. keep the backward-compatibility, i.e., treat the second syntax of "NUM (standby_list)" as sync rep with the priority + <literal>pg_stat_replication</></link> view). If the keyword + <literal>FIRST</> is specified, other standby servers appearing + later in this list represent potential synchronous standbys. + If any of the current synchronous standbys disconnects for + whatever reason, it will be replaced immediately with the + next-highest-priority standby. Specifying more than one standby + name can allow very high availability. It seems strange to explain the behavior of FIRST before explaining the syntax of synchronous_standby_names and FIRST. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers