On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmh...@gmail.com> wrote: >>> You could do that, but first I would code up the simplest, cleanest >>> algorithm you can think of and see if it even shows up in a 'perf' >>> profile. Microbenchmarking is probably overkill here unless a problem >>> is visible on macrobenchmarks. >> >> This is what I would go for! The current code is doing a simple thing: >> select the Nth element using qsort() after scanning each WAL sender's >> values. And I think that Sawada-san got it right. Even running on my >> laptop a pgbench run with 10 sync standbys using a data set that fits >> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >> using perf top on a non-assert, non-debug build. Hash tables and >> allocations get a far larger share. Using the patch, >> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >> nodes. Let's kick the ball for now. An extra patch could make things >> better later on if that's worth it. > > Yeah, since the both K and N could be not large these algorithm takes > almost the same time. And current patch does simple thing. When we > need over 100 or 1000 replication node the optimization could be > required. > Attached latest v9 patch. >
Few comments: + * In 10.0 we support two synchronization methods, priority and + * quorum. The number of synchronous standbys that transactions + * must wait for replies from and synchronization method are specified + * in synchronous_standby_names. This parameter also specifies a list + * of standby names, which determines the priority of each standby for + * being chosen as a synchronous standby. In priority method, the standbys + * whose names appear earlier in the list are given higher priority + * and will be considered as synchronous. Other standby servers appearing + * later in this list represent potential synchronous standbys. If any of + * the current synchronous standbys disconnects for whatever reason, + * it will be replaced immediately with the next-highest-priority standby. + * In quorum method, the all standbys appearing in the list are + * considered as a candidate for quorum commit. In the above description, is priority method represented by FIRST and quorum method by ANY in the synchronous_standby_names syntax? If so, it might be better to write about it explicitly. 2. --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) } /* - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple - * from each input tape. - */ - state->memtupsize = numInputTapes; - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); - - /* - * Use all the remaining memory we have available for read buffers among - * the input tapes. + * Use all the spare memory we have available for read buffers among the + * input tapes. This doesn't belong to this patch. 3. + * Return the list of sync standbys using according to synchronous method, In above sentence, "using according" seems to either incomplete or wrong usage. 4. + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + "invalid synchronization method is specified \"%d\"", + SyncRepConfig->sync_method)); Here, the error message doesn't seem to aligned and you might want to use errmsg for the same. 5. + * In priority method, we need the oldest these positions among sync + * standbys. In quorum method, we need the newest these positions + * specified by SyncRepConfig->num_sync. /oldest these/oldest of these /newest these positions specified/newest of these positions as specified Instead of newest, can we consider to use latest? 6. + int sync_method; /* synchronization method */ /* member_names contains nmembers consecutive nul-terminated C strings */ char member_names[FLEXIBLE_ARRAY_MEMBER]; } SyncRepConfigData; Can't we use 1 or 2 bytes to store sync_method information? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers