Incorrect OpenSSL type reference in code comment
The comment which refers to the OpenSSL PEM password callback type has a small typo, the type is called pem_password_cb and not pem_passwd_cb (which is an easy typo to make to make since confusingly enough the functions in OpenSSL are called SSL_*_passwd_cb). PFA patch to fix this. cheers ./daniel pem_password_cb.patch Description: Binary data
Re: PG compilation error with Visual Studio 2015/2017/2019
On Wed, May 13, 2020 at 7:34 PM Tom Lane wrote: > > Amit Kapila writes: > > Now that branches are tagged, I would like to commit and backpatch > > this patch tomorrow unless there are any more comments/objections. > > The "quiet period" is over as soon as the tags appear in git. > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: PG compilation error with Visual Studio 2015/2017/2019
Em qui., 14 de mai. de 2020 às 05:49, Amit Kapila escreveu: > On Wed, May 13, 2020 at 7:34 PM Tom Lane wrote: > > > > Amit Kapila writes: > > > Now that branches are tagged, I would like to commit and backpatch > > > this patch tomorrow unless there are any more comments/objections. > > > > The "quiet period" is over as soon as the tags appear in git. > > > > Pushed. > Thank you for the commit. regards, Ranier Vilela
Re: PG compilation error with Visual Studio 2015/2017/2019
On Thu, May 14, 2020 at 11:07 AM Ranier Vilela wrote: > Em qui., 14 de mai. de 2020 às 05:49, Amit Kapila > escreveu: > >> On Wed, May 13, 2020 at 7:34 PM Tom Lane wrote: >> > >> > Amit Kapila writes: >> > > Now that branches are tagged, I would like to commit and backpatch >> > > this patch tomorrow unless there are any more comments/objections. >> > >> > The "quiet period" is over as soon as the tags appear in git. >> > >> >> Pushed. >> > Thank you for the commit. > Great. Thanks to everyone involved.
Re: COPY, lock release and MVCC
On Thu, May 14, 2020 at 2:06 AM Laurenz Albe wrote: > > > I wonder why you have removed (rel != NULL) check? > > That was just unexcusable sloppiness, nothing more. > LGTM. I have slightly modified the commit message, see if that looks fine to you. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com 0001-Make-COPY-TO-keep-locks-until-the-transaction-end.v3.patch Description: Binary data
Re: pgsql: Show opclass and opfamily related information in psql
Hi! On Tue, May 12, 2020 at 12:09 AM Alvaro Herrera wrote: > On 2020-Mar-08, Alexander Korotkov wrote: > > > Show opclass and opfamily related information in psql > > > > This commit provides psql commands for listing operator classes, operator > > families and its contents in psql. New commands will be useful for > > exploring > > capabilities of both builtin opclasses/opfamilies as well as > > opclasses/opfamilies defined in extensions. > > I had chance to use these new commands this morning. Great, thank you! > Note how operator for strategy 1 are all together, then strategy 2, and > so on. But I think we'd prefer the operators to be grouped together for > the same types (just like \dAp already works); so I would change the clause > from: > ORDER BY 1, 2, o.amopstrategy, 3; > to: > ORDER BY 1, 2, pg_catalog.format_type(o.amoplefttype, NULL), > pg_catalog.format_type(o.amoprighttype, NULL), o.amopstrategy; +1 > Also, while I'm going about this, ISTM it'd make sense to > list same-class operators first, followed by cross-class operators. > That requires to add "o.amoplefttype = o.amoprighttype DESC," after > "ORDER BY 1, 2,". For brin's integer_minmax_ops, the resulting list > would have first (bigint,bigint) then (integer,integer) then > (smallint,smallint), then all the rest: +1 Nikita, what do you think? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgsql: Show opclass and opfamily related information in psql
On 14.05.2020 12:52, Alexander Korotkov wrote: Nikita, what do you think? I agree that this patch is an improvement. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgsql: Show opclass and opfamily related information in psql
On Thu, May 14, 2020 at 1:30 PM Nikita Glukhov wrote: > I agree that this patch is an improvement. OK, I'm going to push this patch if no objections. (Sergey doesn't seem to continue involvement in PostgreSQL development, so it doesn't look like we should wait for him) -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Does TupleQueueReaderNext() really need to copy its result?
On Tue, Aug 27, 2019 at 6:35 AM Andres Freund wrote: > On 2019-08-26 14:09:45 -0400, Robert Haas wrote: > > There's a comment in htup.h which says: > > > > * * Separately allocated tuple: t_data points to a palloc'd chunk that > > * is not adjacent to the HeapTupleData. (This case is deprecated > > since > > * it's difficult to tell apart from case #1. It should be used only > > in > > * limited contexts where the code knows that case #1 will never > > apply.) > > > > I got scared and ran away. > > Perhaps this'd could be sidestepped by funneling through MinimalTuples > instead of HeapTuples. Afaict that should always be sufficient, because > all system column accesses ought to happen below (including being > projected into a separate column, if needed above). With the added > benefit of needing less space, of course. I tried that out (attached). That makes various simple tests like this to go 10%+ faster on my development machine: create table s as select generate_series(1, 5000)::int i, 'hello world' a, 'this is a message' b, 42 c; select pg_prewarm('s'); set force_parallel_mode = on; explain analyze select * from s; PS It looks like the following load of mq_ring_size might be running a little hot due to false sharing with the atomic counters: if (mqh->mqh_consume_pending > mq->mq_ring_size / 4) From 0ba21ee67c9e8f2be404fa7e16d30a815310c52a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 14 May 2020 19:08:50 +1200 Subject: [PATCH] Use MinimalTuple for tuple queues. This saves 8 bytes per tuple in the queue, and avoids the need to copy and free tuples on the receiving side. Gather can emit the returned MinimalTuple directly, but GatherMerge now needs to make an explicit copy because it buffers multiple tuples at a time. Discussion: https://postgr.es/m/CA%2BhUKG%2B8T_ggoUTAE-U%3DA%2BOcPc4%3DB0nPPHcSfffuQhvXXjML6w%40mail.gmail.com --- src/backend/executor/nodeGather.c | 16 +-- src/backend/executor/nodeGatherMerge.c | 40 ++ src/backend/executor/tqueue.c | 30 +-- src/include/executor/tqueue.h | 4 +-- 4 files changed, 46 insertions(+), 44 deletions(-) diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c index 6b8ed867d5..a01b46af14 100644 --- a/src/backend/executor/nodeGather.c +++ b/src/backend/executor/nodeGather.c @@ -46,7 +46,7 @@ static TupleTableSlot *ExecGather(PlanState *pstate); static TupleTableSlot *gather_getnext(GatherState *gatherstate); -static HeapTuple gather_readnext(GatherState *gatherstate); +static MinimalTuple gather_readnext(GatherState *gatherstate); static void ExecShutdownGatherWorkers(GatherState *node); @@ -120,7 +120,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags) * Initialize funnel slot to same tuple descriptor as outer plan. */ gatherstate->funnel_slot = ExecInitExtraTupleSlot(estate, tupDesc, - &TTSOpsHeapTuple); + &TTSOpsMinimalTuple); /* * Gather doesn't support checking a qual (it's always more efficient to @@ -266,7 +266,7 @@ gather_getnext(GatherState *gatherstate) PlanState *outerPlan = outerPlanState(gatherstate); TupleTableSlot *outerTupleSlot; TupleTableSlot *fslot = gatherstate->funnel_slot; - HeapTuple tup; + MinimalTuple tup; while (gatherstate->nreaders > 0 || gatherstate->need_to_scan_locally) { @@ -278,9 +278,9 @@ gather_getnext(GatherState *gatherstate) if (HeapTupleIsValid(tup)) { -ExecStoreHeapTuple(tup, /* tuple to store */ - fslot, /* slot to store the tuple */ - true); /* pfree tuple when done with it */ +ExecStoreMinimalTuple(tup, /* tuple to store */ + fslot, /* slot to store the tuple */ + false); /* don't pfree tuple */ return fslot; } } @@ -308,7 +308,7 @@ gather_getnext(GatherState *gatherstate) /* * Attempt to read a tuple from one of our parallel workers. */ -static HeapTuple +static MinimalTuple gather_readnext(GatherState *gatherstate) { int nvisited = 0; @@ -316,7 +316,7 @@ gather_readnext(GatherState *gatherstate) for (;;) { TupleQueueReader *reader; - HeapTuple tup; + MinimalTuple tup; bool readerdone; /* Check for async events, particularly messages from workers. */ diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c index 317ddb4ae2..47129344f3 100644 --- a/src/backend/executor/nodeGatherMerge.c +++ b/src/backend/executor/nodeGatherMerge.c @@ -45,7 +45,7 @@ */ typedef struct GMReaderTupleBuffer { - HeapTuple *tuple; /* array of length MAX_TUPLE_STORE */ + MinimalTuple *tuple; /* array of length MAX_TUPLE_STORE */ int nTuples; /* number of tuples currently stored */ int readCounter; /* index of next tuple to extract */ bool done; /* true if reader is known
Re: Incorrect OpenSSL type reference in code comment
On 14/05/2020 11:07, Daniel Gustafsson wrote: The comment which refers to the OpenSSL PEM password callback type has a small typo, the type is called pem_password_cb and not pem_passwd_cb (which is an easy typo to make to make since confusingly enough the functions in OpenSSL are called SSL_*_passwd_cb). PFA patch to fix this. Applied, thanks! - Heikki
Re: ldap tls test fails in some environments
Re: Thomas Munro > > 17:28:59 Data directory: > > /<>/build/src/test/ldap/tmp_check/t_001_auth_node_data/pgdata > > I know nothing about Debian package building so I could be missing > something about how this works, but I wonder if our script variable > handling is hygienic enough for paths like that. That's from the sbuild log filter, the special chars are not the problem: 17:06:41 I: NOTICE: Log filtering will replace 'build/postgresql-13-fdfISX/postgresql-13-13~~devel~20200513.1505' with '<>' 17:06:41 I: NOTICE: Log filtering will replace 'build/postgresql-13-fdfISX' with '<>' > > It consistently fails on the build server, but works on my notebook. > > Maybe that simply means slapd is crashing, but there's no slapd > > output. Would it be possible to start slapd with "-d 255", even if > > that means it doesn't background itself? > > That'd require more scripting to put it in the background... Maybe adding "&" is enough, provided it still writes the pid file for shutting down... > This leads me to suspect that something in your build server's > environment that comes later in that list is overridding the > TLS_REQCERT setting. If that's the explanation, perhaps we should do > this, which seems to work OK on my system, since it comes last in the > list: It's not messing with environment variables in that area. I'll see if I can catch a shell in the environment where it fails. Christoph
Re: effective_io_concurrency's steampunk spindle maths
On Wed, May 13, 2020 at 6:58 AM Peter Geoghegan wrote: > Shouldn't you close out the "Should we rename > effective_io_concurrency?" Postgres 13 open item now? Yeah, that doesn't really seem worth the churn. I'll move it to the resolved list in a day or two if no one shows up to argue for a rename.
Re: making update/delete of inheritance trees scale better
On Wed, May 13, 2020 at 9:21 AM Amit Langote wrote: > > Maybe I am misunderstanding you, but the more the rows to update, the > more overhead we will be paying with the new approach. Yes, that's right. How much is that compared to the current planning overhead. How many rows it takes for that overhead to be comparable to the current planning overhead. But let's not sweat on that point much right now. > > So, we will need to do 2 things: > > 1. Implicitly apply an ORDER BY tableoid clause > 2. Add result relation RTIs to ModifyTable.resultRelations in the > order of their RTE's relid. > > Maybe we can do that as a separate patch. Also, I am not sure if it > will get in the way of someone wanting to have ORDER BY LIMIT for > updates. It won't. But may be David's idea is better. > > > > > * Tuple re-routing during UPDATE. For now it's disabled so your design > > > > should work. But we shouldn't design this feature in such a way that > > > > it comes in the way to enable tuple re-routing in future :). > > > > > > Sorry, what is tuple re-routing and why does this new approach get in its > > > way? > > > > An UPDATE causing a tuple to move to a different partition. It would > > get in its way since the tuple will be located based on tableoid, > > which will be the oid of the old partition. But I think this approach > > has higher chance of being able to solve that problem eventually > > rather than the current approach. > > Again, I don't think I understand. We do currently (as of v11) > re-route tuples when UPDATE causes them to move to a different > partition, which, gladly, continues to work with my patch. Ah! Ok. I missed that part then. > > So how it works is like this: for a given "new" tuple, ExecUpdate() > checks if the tuple would violate the partition constraint of the > result relation that was passed along with the tuple. If it does, the > new tuple will be moved, by calling ExecDelete() to delete it from the > current relation, followed by ExecInsert() to find the new home for > the tuple. The only thing that changes with the new approach is how > ExecModifyTable() chooses a result relation to pass to ExecUpdate() > for a given "new" tuple it has fetched from the plan, which is quite > independent from the tuple re-routing mechanism proper. > Thanks for the explanation. -- Best Wishes, Ashutosh Bapat
Potentially misleading name of libpq pass phrase hook
Reviewing TLS changes for v13 I came across one change which I think might be better off with a library qualified name. The libpq frontend sslpassword hook added in 4dc63552109f65 is OpenSSL specific, but it has a very generic name: PQsetSSLKeyPassHook(PQsslKeyPassHook_type hook); This IMO has potential for confusion if we ever commit another TLS backend, since the above hook wont work for any other library (except maybe OpenSSL derivatives like LibreSSL et.al). The backends will always have differently named hooks, as the signatures will be different, but having one with a generic name and another with a library qualified name doesn't seem too friendly to anyone implementing with libpq. As a point of reference; in the backend we added a TLS init hook in commit 896fcdb230e72 which also is specific to OpenSSL, but the name is library qualified making the purpose and usecase perfectly clear: openssl_tls_init_hook. Since we haven't shipped this there is still time to rename, which IMO is the right way forward. PQsslKeyPassHook__type would be one option, but perhaps there are better alternatives? Thoughts? cheers ./daniel
Re: COPY, lock release and MVCC
On Thu, 2020-05-14 at 15:11 +0530, Amit Kapila wrote: > LGTM. I have slightly modified the commit message, see if that looks > fine to you. Fine with me, thanks. > This breaks the cases where a REPEATABLE READ transaction could see an > empty table if it repeats a COPY statement and somebody truncated the > table in the meantime. I would use "case" rather than "cases" here. Yours, Laurenz Albe
Re: new heapcheck contrib module
> On May 13, 2020, at 5:36 PM, Peter Geoghegan wrote: > > On Wed, May 13, 2020 at 5:18 PM Mark Dilger > wrote: >> I am not removing any assertions. I do not propose to remove any >> assertions. When I talk about "hardening against assertions", that is not in >> any way a proposal to remove assertions from the code. > > I'm sorry if I seemed to suggest that you wanted to remove assertions Not a problem at all. As always, I appreciate your involvement in this code and design review. >> I think this is a misrepresentation of the tests that I've been running. > > I didn't actually mean it that way, but I can see how my words could > reasonably be interpreted that way. I apologize. Again, no worries. >> There are two kinds of tests that I have done: >> >> First, there is the regression tests, t/004_verify_heapam.pl, which is >> obviously contrived. That was included in the regression test suite because >> it needed to be something other developers could read, verify, "yeah, I can >> see why that would be corruption, and would give an error message of the >> sort the test expects", and then could be run to verify that indeed that >> expected error message was generated. > > I still don't think that this is necessary. It could work for one type > of corruption, that happens to not have any of the problems, but just > testing that one type of corruption seems rather arbitrary to me. As discussed with Robert off list, this probably doesn't matter. The patch can be committed with or without this particular TAP test. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PG 13 release notes, first draft
On Thu, May 14, 2020 at 07:23:05AM +0200, Fabien COELHO wrote: > > Hello Bruce, > > > > > > * 34a0a81bfb > > > > > > > > We already have: > > > > > > > > Reformat tables containing function information for better > > > > clarity (Tom Lane) > > > > > > > > so it seems it is covered as part of this. > > > > > > AFAICR this one is not by the same author, and although the point was > > > about > > > better clarity, it was not about formating but rather about restructuring > > > text vs binary string function documentations. Then Tom reformatted the > > > result. > > > > Well, we were not even clear we should document changes in the functions > > section, so going into details of all the changes seems unwise. > > The restructuring was a significant change, and ISTM that another function > of the release note is also to implicitely thank contributors (their name is > appended, which does not bring any useful information about the feature from > a release note perspective) hence my suggestion to include this one, > the author of which is not Tom Lane. We list people's names next to items. We don't list items to list people's names, as far as I know of the policy. If you want to change that, you will need to start a new thread and get agreement. > > > > > * e829337d42 > > > > > > > > Uh, this is a doc link formatting addition. I think this falls into the > > > > error message logic, where it is nice when people want it, but they > > > > don't need to know about it ahead of time. > > > > > > [...] > > > > I don't see it. > > While reading again the sequence, ISTM that I did not understand your first > answer, so my answer was kind-of off topic, sorry. This is indeed "link > formatting addition", which helps making the libpq doc more usable. > Probably you do not need to know about it in advance, but I do not think > that it is a good reason not to include it: with the same argument, a > performance improvement would not need to be advertise, you'll see it when > you need it. The same holds for all non-functional improvements, and there > are many which are listed. Peformance items are listed only if they will produce a visible change in performance, or enable new workloads that were too slow in the past. > > > Possibly, but as the "THIS WAS NOT DOCUMENTED BEFORE?" question seemed to > > > still be in the release notes, I gathered that the information had not > > > reached its destination, hence the possible repetition. But maybe the > > > issue > > > is that this answer is not satisfactory. Sorry for the inconvenience. > > > > I removed it already based on feedback from someone else. > > Good. I looked at the online version which is off the latest commits by a > few hours. > > I'd consider moving "Upgrade to use DocBook 4.5 (Peter Eisentraut)" to the > doc section, maybe. Agreed, done. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Strange decreasing value of pg_last_wal_receive_lsn()
(please, the list policy is bottom-posting to keep history clean, thanks). On Thu, 14 May 2020 07:18:33 +0500 godjan • wrote: > -> Why do you kill -9 your standby? > Hi, it’s Jepsen test for our HA solution. It checks that we don’t lose data > in such situation. OK. This test is highly useful to stress data high availability and durability, of course. However, how useful is this test in a context of auto failover for **service** high availability? If all your nodes are killed in the same disaster, how/why an automatic cluster manager should take care of starting all nodes again and pick the right node to promote? > So, now we update logic as Michael said. All ha alive standbys now waiting > for replaying all WAL that they have and after we use pg_last_replay_lsn() to > choose which standby will be promoted in failover. > > It fixed out trouble, but there is one another. Now we should wait when all > ha alive hosts finish replaying WAL to failover. It might take a while(for > example WAL contains wal_record about splitting b-tree). Indeed, this is the concern I wrote about yesterday in a second mail on this thread. > We are looking for options that will allow us to find a standby that contains > all data and replay all WAL only for this standby before failover. Note that when you promote a node, it first replays available WALs before acting as a primary. So you can safely signal the promotion to the node and wait for it to finish the replay and promote. > Maybe you have ideas on how to keep the last actual value of > pg_last_wal_receive_lsn()? Nope, no clean and elegant idea. One your instances are killed, maybe you can force flush the system cache (secure in-memory-only data) and read the latest received WAL using pg_waldump? But, what if some more data are available from archives, but not received from streaming rep because of a high lag? > As I understand WAL receiver doesn’t write to disk walrcv->flushedUpto. I'm not sure to understand what you mean here. pg_last_wal_receive_lsn() reports the actual value of walrcv->flushedUpto. walrcv->flushedUpto reports the latest LSN force-flushed to disk. > > On 13 May 2020, at 19:52, Jehan-Guillaume de Rorthais > > wrote: > > > > > > (too bad the history has been removed to keep context) > > > > On Fri, 8 May 2020 15:02:26 +0500 > > godjan • wrote: > > > >> I got it, thank you. > >> Can you recommend what to use to determine which quorum standby should be > >> promoted in such case? We planned to use pg_last_wal_receive_lsn() to > >> determine which has fresh data but if it returns the beginning of the > >> segment on both replicas we can’t determine which standby confirmed that > >> write transaction to disk. > > > > Wait, pg_last_wal_receive_lsn() only decrease because you killed your > > standby. > > > > pg_last_wal_receive_lsn() returns the value of walrcv->flushedUpto. The > > later is set to the beginning of the segment requested only during the first > > walreceiver startup or a timeline fork: > > > > /* > > * If this is the first startup of walreceiver (on this timeline), > > * initialize flushedUpto and latestChunkStart to the starting > > point. */ > > if (walrcv->receiveStart == 0 || walrcv->receivedTLI != tli) > > { > > walrcv->flushedUpto = recptr; > > walrcv->receivedTLI = tli; > > walrcv->latestChunkStart = recptr; > > } > > walrcv->receiveStart = recptr; > > walrcv->receiveStartTLI = tli; > > > > After a primary loss, as far as the standby are up and running, it is fine > > to use pg_last_wal_receive_lsn(). > > > > Why do you kill -9 your standby? Whay am I missing? Could you explain the > > usecase you are working on to justify this? > > > > Regards, > > > -- Jehan-Guillaume de Rorthais Dalibo
[PATCH] Fix ouside scope t_ctid (ItemPointerData)
Hi, ItemPointerData, on the contrary, from what the name says, it is not a pointer to a structure, but a structure in fact. When assigning the name of the structure variable to a pointer, it may even work, but, it is not the right thing to do and it becomes a nightmare, to discover that any other error they have is at cause. So: 1. In some cases, there may be a misunderstanding in the use of ItemPointerData. 2. When using the variable name in an assignment, the variable's address is used. 3. While this works for a structure, it shouldn't be the right thing to do. 4. If we have a local variable, its scope is limited and when it loses its scope, memory is certainly garbage. 5. While this may be working for heapam.c, I believe it is being abused and should be compliant with the Postgres API and use the functions that were created for this. The patch is primarily intended to correct the use of ItemPointerData. But it is also changing the style, reducing the scope of some variables. If that was not acceptable, reduce the scope and someone has objections, I can change the patch, to focus only on the use of ItemPointerData. But as style changes are rare, if possible, it would be good to seize the opportunity. regards, Ranier Vilela 001_fix_outside_scope_t_ctid.patch Description: Binary data
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
> On May 14, 2020, at 10:40 AM, Ranier Vilela wrote: > > Hi, > ItemPointerData, on the contrary, from what the name says, > it is not a pointer to a structure, but a structure in fact. The project frequently uses the pattern typedef struct FooData { ... } FooData; typedef FooData *Foo; where, in this example, "Foo" = "ItemPointer". So the "Data" part of "ItemPointerData" clues the reader into the fact that ItemPointerData is a struct, not a pointer. Granted, the "Pointer" part of that name may confuse some readers, but the struct itself does contain what is essentially a 48-bit pointer, so that name is not nuts. > When assigning the name of the structure variable to a pointer, it may even > work, > but, it is not the right thing to do and it becomes a nightmare, > to discover that any other error they have is at cause. Can you give a code example of the type of assigment you mean? > So: > 1. In some cases, there may be a misunderstanding in the use of > ItemPointerData. > 2. When using the variable name in an assignment, the variable's address is > used. > 3. While this works for a structure, it shouldn't be the right thing to do. > 4. If we have a local variable, its scope is limited and when it loses its > scope, memory is certainly garbage. > 5. While this may be working for heapam.c, I believe it is being abused and > should be compliant with > the Postgres API and use the functions that were created for this. > > The patch is primarily intended to correct the use of ItemPointerData. > But it is also changing the style, reducing the scope of some variables. > If that was not acceptable, reduce the scope and someone has objections, > I can change the patch, to focus only on the use of ItemPointerData. > But as style changes are rare, if possible, it would be good to seize the > opportunity. I would like to see a version of the patch that only addresses your concerns about ItemPointerData, not because other aspects of the patch are unacceptable (I'm not ready to even contemplate that yet), but because I'm not sure what your complaint is about. Can you restrict the patch to just address that one issue? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
Ranier Vilela writes: > The patch is primarily intended to correct the use of ItemPointerData. What do you think is being "corrected" here? It looks to me like just some random code rearrangements that aren't even clearly bug-free, let alone being stylistic improvements. regards, tom lane
Re: Transactions involving multiple postgres foreign servers, take 2
On Tue, May 12, 2020 at 11:45 AM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > On Thu, 30 Apr 2020 at 20:43, Masahiko Sawada > wrote: > > > > On Tue, 28 Apr 2020 at 19:37, Muhammad Usama wrote: > > > > > > > > > > > > On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada < > masahiko.saw...@2ndquadrant.com> wrote: > > >> > > >> On Fri, 27 Mar 2020 at 22:06, Muhammad Usama > wrote: > > >> > > > >> > Hi Sawada San, > > >> > > > >> > I have been further reviewing and testing the transaction involving > multiple server patches. > > >> > Overall the patches are working as expected bar a few important > exceptions. > > >> > So as discussed over the call I have fixed the issues I found > during the testing > > >> > and also rebased the patches with the current head of the master > branch. > > >> > So can you please have a look at the attached updated patches. > > >> > > >> Thank you for reviewing and updating the patch! > > >> > > >> > > > >> > Below is the list of changes I have made on top of V18 patches. > > >> > > > >> > 1- In register_fdwxact(), As we are just storing the callback > function pointers from > > >> > FdwRoutine in fdw_part structure, So I think we can avoid calling > > >> > GetFdwRoutineByServerId() in TopMemoryContext. > > >> > So I have moved the MemoryContextSwitch to TopMemoryContext after > the > > >> > GetFdwRoutineByServerId() call. > > >> > > >> Agreed. > > >> > > >> > > > >> > > > >> > 2- If PrepareForeignTransaction functionality is not present in > some FDW then > > >> > during the registration process we should only set the > XACT_FLAGS_FDWNOPREPARE > > >> > transaction flag if the modified flag is also set for that server. > As for the server that has > > >> > not done any data modification within the transaction we do not do > two-phase commit anyway. > > >> > > >> Agreed. > > >> > > >> > > > >> > 3- I have moved the foreign_twophase_commit in sample file after > > >> > max_foreign_transaction_resolvers because the default value of > max_foreign_transaction_resolvers > > >> > is 0 and enabling the foreign_twophase_commit produces an error > with default > > >> > configuration parameter positioning in postgresql.conf > > >> > Also, foreign_twophase_commit configuration was missing the comments > > >> > about allowed values in the sample config file. > > >> > > >> Sounds good. Agreed. > > >> > > >> > > > >> > 4- Setting ForeignTwophaseCommitIsRequired in > is_foreign_twophase_commit_required() > > >> > function does not seem to be the correct place. The reason being, > even when > > >> > is_foreign_twophase_commit_required() returns true after setting > ForeignTwophaseCommitIsRequired > > >> > to true, we could still end up not using the two-phase commit in > the case when some server does > > >> > not support two-phase commit and foreign_twophase_commit is set to > FOREIGN_TWOPHASE_COMMIT_PREFER > > >> > mode. So I have moved the ForeignTwophaseCommitIsRequired > assignment to PreCommit_FdwXacts() > > >> > function after doing the prepare transaction. > > >> > > >> Agreed. > > >> > > >> > > > >> > 6- In prefer mode, we commit the transaction in single-phase if the > server does not support > > >> > the two-phase commit. But instead of doing the single-phase commit > right away, > > >> > IMHO the better way is to wait until all the two-phase transactions > are successfully prepared > > >> > on servers that support the two-phase. Since an error during a > "PREPARE" stage would > > >> > rollback the transaction and in that case, we would end up with > committed transactions on > > >> > the server that lacks the support of the two-phase commit. > > >> > > >> When an error occurred before the local commit, a 2pc-unsupported > > >> server could be rolled back or committed depending on the error > > >> timing. On the other hand all 2pc-supported servers are always rolled > > >> back when an error occurred before the local commit. Therefore even if > > >> we change the order of COMMIT and PREPARE it is still possible that we > > >> will end up committing the part of 2pc-unsupported servers while > > >> rolling back others including 2pc-supported servers. > > >> > > >> I guess the motivation of your change is that since errors are likely > > >> to happen during executing PREPARE on foreign servers, we can minimize > > >> the possibility of rolling back 2pc-unsupported servers by deferring > > >> the commit of 2pc-unsupported server as much as possible. Is that > > >> right? > > > > > > > > > Yes, that is correct. The idea of doing the COMMIT on > NON-2pc-supported servers > > > after all the PREPAREs are successful is to minimize the chances of > partial commits. > > > And as you mentioned there will still be chances of getting a partial > commit even with > > > this approach but the probability of that would be less than what it > is with the > > > current sequence. > > > > > > > > >> > > >> > > >> > So I have modified the flow a little bit and instead of doing a > one-phase co
Re: Our naming of wait events is a disaster.
On Tue, May 12, 2020 at 10:54 PM Tom Lane wrote: > I don't actually understand why the LWLock tranche mechanism is designed > the way it is. It seems to be intended to support different backends > having different sets of LWLocks, but I fail to see why that's a good idea, > or even useful at all. In any case, dynamically-created LWLocks are > clearly out of scope for the documentation. The problem that I'm trying > to deal with right now is that even LWLocks that are hard-wired into the > backend code are difficult to enumerate. That wasn't a problem before > we decided we needed to expose them all to user view; but now it is. If you are using parallel query, your backend might have a DSM segment that contains LWLocks. Anyone who is not attached to that DSM segment won't know about them, though possibly they have a different DSM segment containing a tranche with the same name. Also, if you are using extensions that use LWLocks, a particular extension may be loaded into backend A but not backend B. Suppose backend A has an extension loaded but backend B does not. Then suppose that A waits for an LWLock and B meanwhile examines pg_stat_activity. It's hard to come up with totally satisfying solutions to problems like these, but I think the important thing to remember is that, at least in the extension case, this is really an obscure corner case. Normally the same extensions will be loaded everywhere and anything known to one backend will be known also the others. However, it's not guaranteed. I tend to prefer that modules register their own tranches rather than having a central table someplace, because I like the idea that the things that a particular module knows about are contained within its own source files and not spread all over the code base. I think that we've done rather badly with this in a number of places, lwlock.c among them, and I don't like it much. It tends to lead to layering violations, and it also tends to create interfaces that extensions can't really use. I admit that it's not ideal as far as this particular problem is concerned, but I don't think that makes it a bad idea in general. In some sense, the lack of naming consistency here is a manifestation of an underlying chaos in the code: we've created many different ways of waiting for things with many different characteristics and little thought to consistency, and this mechanism has exposed that underlying problem. The wait state interface is surely not to blame for the fact that LWLock names and heavyweight lock types are capitalized inconsistently. In fact, there seem to be few things that PostgreSQL hackers love more than inconsistent capitalization, and this is just one of a whole lot of instances of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Wed, May 13, 2020 at 5:33 PM Peter Geoghegan wrote: > Do you recall seeing corruption resulting in segfaults in production? I have seen that, I believe. I think it's more common to fail with errors about not being able to palloc>1GB, not being able to look up an xid or mxid, etc. but I am pretty sure I've seen multiple cases involving seg faults, too. Unfortunately for my credibility, I can't remember the details right now. > I personally don't recall seeing that. If it happened, the segfaults > themselves probably wouldn't be the main concern. I don't really agree. Hypothetically speaking, suppose you corrupt your only copy of a critical table in such a way that every time you select from it, the system seg faults. A user in this situation might ask questions like: 1. How did my table get corrupted? 2. Why do I only have one copy of it? 3. How do I retrieve the non-corrupted portion of my data from that table and get back up and running? In the grand scheme of things, #1 and #2 are the most important questions, but when something like this actually happens, #3 tends to be the most urgent question, and it's a lot harder to get the uncorrupted data out if the system keeps crashing. Also, a seg fault tends to lead customers to think that the database has a bug, rather than that the database is corrupted. Slightly off-topic here, but I think our error reporting in this area is pretty lame. I've learned over the years that when a customer reports that they get a complaint about a too-large memory allocation every time they access a table, they've probably got a corrupted varlena header. However, that's extremely non-obvious to a typical user. We should try to report errors indicative of corruption in a way that gives the user some clue that corruption has happened. Peter made a stab at improving things there by adding errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of users will never see the error code, only the message, and a lot of corruption produces still produces errors that weren't changed by that commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
Em qui., 14 de mai. de 2020 às 15:03, Mark Dilger < mark.dil...@enterprisedb.com> escreveu: > > > > On May 14, 2020, at 10:40 AM, Ranier Vilela wrote: > > > > Hi, > > ItemPointerData, on the contrary, from what the name says, > > it is not a pointer to a structure, but a structure in fact. > > The project frequently uses the pattern > > typedef struct FooData { > ... > } FooData; > > typedef FooData *Foo; > > where, in this example, "Foo" = "ItemPointer". > > So the "Data" part of "ItemPointerData" clues the reader into the fact > that ItemPointerData is a struct, not a pointer. Granted, the "Pointer" > part of that name may confuse some readers, but the struct itself does > contain what is essentially a 48-bit pointer, so that name is not nuts. > > > > When assigning the name of the structure variable to a pointer, it may > even work, > > but, it is not the right thing to do and it becomes a nightmare, > > to discover that any other error they have is at cause. > > Can you give a code example of the type of assigment you mean? > htup->t_ctid = target_tid; htup->t_ctid = newtid; Both target_tid and newtid are local variable, whe loss scope, memory is garbage. > > > So: > > 1. In some cases, there may be a misunderstanding in the use of > ItemPointerData. > > 2. When using the variable name in an assignment, the variable's address > is used. > > 3. While this works for a structure, it shouldn't be the right thing to > do. > > 4. If we have a local variable, its scope is limited and when it loses > its scope, memory is certainly garbage. > > 5. While this may be working for heapam.c, I believe it is being abused > and should be compliant with > > the Postgres API and use the functions that were created for this. > > > > The patch is primarily intended to correct the use of ItemPointerData. > > But it is also changing the style, reducing the scope of some variables. > > If that was not acceptable, reduce the scope and someone has objections, > > I can change the patch, to focus only on the use of ItemPointerData. > > But as style changes are rare, if possible, it would be good to seize > the opportunity. > > I would like to see a version of the patch that only addresses your > concerns about ItemPointerData, not because other aspects of the patch are > unacceptable (I'm not ready to even contemplate that yet), but because I'm > not sure what your complaint is about. Can you restrict the patch to just > address that one issue? > Certainly. In the same file you can find the appropriate use of the API. ItemPointerSet(&heapTuple->t_self, blkno, offnum); regards, Ranier Vilela 001_fix_outside_scope_t_cid_v2.patch Description: Binary data
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
Em qui., 14 de mai. de 2020 às 15:07, Tom Lane escreveu: > Ranier Vilela writes: > > The patch is primarily intended to correct the use of ItemPointerData. > > What do you think is being "corrected" here? It looks to me like > just some random code rearrangements that aren't even clearly > bug-free, let alone being stylistic improvements. > It is certainly working, but trusting that the memory of a local variable will not change, when it loses its scope, is a risk that, certainly, can cause bugs, elsewhere. And it is certainly very difficult to discover its primary cause. regards, Ranier Vilela
Re: new heapcheck contrib module
On Wed, May 13, 2020 at 7:32 PM Alvaro Herrera wrote: > I agree that this (a test tool that exercises our code against > arbitrarily corrupted data pages) is not going to work as a test that > all buildfarm members run -- it seems something for specialized > buildfarm members to run, or even something that's run outside of the > buildfarm, like sqlsmith. Obviously such a tool would not be able to > run against an assertion-enabled build, and we shouldn't even try. I have a question about what you mean here by "arbitrarily." If you mean that we shouldn't have the buildfarm run the proposed heap corruption checker against heap pages full of randomly-generated garbage, I tend to agree. Such a test wouldn't be very stable and might fail in lots of low-probability ways that could require unreasonable effort to find and fix. If you mean that we shouldn't have the buildfarm run the proposed heap corruption checker against any corrupted heap pages at all, I tend to disagree. If we did that, then we'd basically be releasing a heap corruption checker with very limited test coverage. Like, we shouldn't only have negative test cases, where the absence of corruption produces no results. We should also have positive test cases, where the thing finds some problem... At least, that's what I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SLRU statistics
On Thu, May 14, 2020 at 2:27 AM Fujii Masao wrote: > Therefore what we can do right now seems to make checkpointer report the SLRU > stats while it's running. Other issues need more time to investigate... > Thought? I'm confused by why SLRU statistics are reported by messages sent to the stats collector rather than by just directly updating shared memory. For database or table statistics there can be any number of objects and we can't know in advance how many there will be, so we can't set aside shared memory for the stats in advance. For SLRUs, there's no such problem. Just having the individual backends periodically merge their accumulated backend-local counters into the shared counters seems like it would be way simpler and more performant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Our naming of wait events is a disaster.
Robert Haas writes: > I tend to prefer that modules register their own tranches rather than > having a central table someplace, because I like the idea that the > things that a particular module knows about are contained within its > own source files and not spread all over the code base. I think that > we've done rather badly with this in a number of places, lwlock.c > among them, and I don't like it much. Well, we could solve this problem very easily by ripping out everything having to do with wait-state monitoring ... and personally I'd be a lot in favor of that, because I haven't seen anything about either the design or documentation of the feature that I thought was very well done. However, if you'd like to have wait-state monitoring, and you'd like the documentation for it to be more useful than "go read the code", then I don't see any way around the conclusion that there are going to be centralized lists of the possible wait states. That being the case, refusing to use a centralized list in the implementation seems rather pointless; and having some aspects of the implementation use centralized lists (see the enums in lwlock.h and elsewhere) while other aspects don't is just schizophrenic. > In some sense, the lack of naming consistency here is > a manifestation of an underlying chaos in the code: we've created many > different ways of waiting for things with many different > characteristics and little thought to consistency, and this mechanism > has exposed that underlying problem. Yeah, agreed. Nonetheless, now we have a problem. regards, tom lane
Re: new heapcheck contrib module
On Thu, May 14, 2020 at 11:33 AM Robert Haas wrote: > I have seen that, I believe. I think it's more common to fail with > errors about not being able to palloc>1GB, not being able to look up > an xid or mxid, etc. but I am pretty sure I've seen multiple cases > involving seg faults, too. Unfortunately for my credibility, I can't > remember the details right now. I believe you, both in general, and also because what you're saying here is plausible, even if it doesn't fit my own experience. Corruption is by its very nature exceptional. At least, if that isn't true then something must be seriously wrong, so the idea that it will be different in some way each time seems like a good working assumption. Your exceptional cases are not necessarily the same as mine, especially where hardware problems are concerned. On the other hand, it's also possible for corruption that originates from very different sources to exhibit the same basic inconsistencies and symptoms. I've noticed that SLRU corruption is often a leading indicator of general storage problems. The inconsistencies between certain SLRU state and the heap happens to be far easier to notice in practice, particularly when VACUUM runs. But it's not fundamentally different to inconsistencies from pages within one single main fork of some heap relation. > > I personally don't recall seeing that. If it happened, the segfaults > > themselves probably wouldn't be the main concern. > > I don't really agree. Hypothetically speaking, suppose you corrupt > your only copy of a critical table in such a way that every time you > select from it, the system seg faults. A user in this situation might > ask questions like: I agree that that could be a problem. But that's not what I've seen happen in production systems myself. Maybe there is some low hanging fruit here. Perhaps we can make the real PageGetItemId() a little closer to PageGetItemIdCareful() without noticeable overhead, as I suggested already. Are there any real generalizations that we can make about why backends segfault with corrupt data? Maybe there is. That seems important. > Slightly off-topic here, but I think our error reporting in this area > is pretty lame. I've learned over the years that when a customer > reports that they get a complaint about a too-large memory allocation > every time they access a table, they've probably got a corrupted > varlena header. I certainlt learned the same lesson in the same way. > However, that's extremely non-obvious to a typical > user. We should try to report errors indicative of corruption in a way > that gives the user some clue that corruption has happened. Peter made > a stab at improving things there by adding > errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of > users will never see the error code, only the message, and a lot of > corruption produces still produces errors that weren't changed by that > commit. The theory is that "can't happen" errors having an errcode that should be considered similar to or equivalent to ERRCODE_DATA_CORRUPTED. I doubt that it works out that way in practice, though. -- Peter Geoghegan
Re: new heapcheck contrib module
On 2020-May-14, Robert Haas wrote: > I have a question about what you mean here by "arbitrarily." > > If you mean that we shouldn't have the buildfarm run the proposed heap > corruption checker against heap pages full of randomly-generated > garbage, I tend to agree. Such a test wouldn't be very stable and > might fail in lots of low-probability ways that could require > unreasonable effort to find and fix. This is what I meant. I was thinking of blocks generated randomly. > If you mean that we shouldn't have the buildfarm run the proposed heap > corruption checker against any corrupted heap pages at all, I tend to > disagree. Yeah, IMV those would not be arbitrarily corrupted -- instead they're crafted to be corrupted in some specific way. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Our naming of wait events is a disaster.
On Thu, May 14, 2020 at 2:54 PM Tom Lane wrote: > Well, we could solve this problem very easily by ripping out everything > having to do with wait-state monitoring ... and personally I'd be a lot > in favor of that, because I haven't seen anything about either the > design or documentation of the feature that I thought was very well > done. Well, I'm going to disagree with that, but opinions can vary. If I'd tried to create naming consistency here when I created this stuff, I would've had to rename things in existing systems rather than just expose what was already there, and that wasn't the goal of the patch, and I don't see a very good reason why it should have been. Providing information is a separate project from cleaning up naming. And, while I don't love the fact that people have added new things without trying very hard to be consistent with existing things all that much, I still don't think inconsistent naming rises to the level of a disaster. > However, if you'd like to have wait-state monitoring, and you'd > like the documentation for it to be more useful than "go read the code", > then I don't see any way around the conclusion that there are going to > be centralized lists of the possible wait states. > > That being the case, refusing to use a centralized list in the > implementation seems rather pointless; and having some aspects of the > implementation use centralized lists (see the enums in lwlock.h and > elsewhere) while other aspects don't is just schizophrenic. There's something to that argument, especially it enable us to auto-generate the documentation tables. That being said, my view of this system is that it's good to document the wait events that we have, but also that there are almost certainly going to be cases where we can't say a whole lot more than "go read the code," or at least not without an awful lot of work. I think there's a reasonable chance that someone who sees a lot of ClientRead or DataFileWrite wait events will have some idea what kind of problem is indicated, even without consulting the documentation and even moreso if we have some good documentation which they can consult. But I don't know what anybody's going to do if they see a lot of OldSerXidLock or AddinShmemInitLock contention. Presumably those are cases that the developers thought were unlikely, or they'd have chosen a different locking regimen. If they were wrong, I think it's a good thing for users to have a relatively easy way to find that out, but I'm not sure what anybody's going to do be able to do about it without patching the code, or at least looking at it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SLRU statistics
Robert Haas writes: > I'm confused by why SLRU statistics are reported by messages sent to > the stats collector rather than by just directly updating shared > memory. It would be better to consider that as an aspect of the WIP stats collector redesign, rather than inventing a bespoke mechanism for SLRU stats that's outside the stats collector (and, no doubt, would have its own set of bugs). We don't need to invent even more pathways for this sort of data. regards, tom lane
Re: new heapcheck contrib module
Alvaro Herrera writes: > On 2020-May-14, Robert Haas wrote: >> If you mean that we shouldn't have the buildfarm run the proposed heap >> corruption checker against heap pages full of randomly-generated >> garbage, I tend to agree. Such a test wouldn't be very stable and >> might fail in lots of low-probability ways that could require >> unreasonable effort to find and fix. > This is what I meant. I was thinking of blocks generated randomly. Yeah, -1 for using random data --- when it fails, how you gonna reproduce the problem? >> If you mean that we shouldn't have the buildfarm run the proposed heap >> corruption checker against any corrupted heap pages at all, I tend to >> disagree. > Yeah, IMV those would not be arbitrarily corrupted -- instead they're > crafted to be corrupted in some specific way. I think there's definitely value in corrupting data in some predictable (reproducible) way and verifying that the check code catches it and responds as expected. Sure, this will not be 100% coverage, but it'll be a lot better than 0% coverage. regards, tom lane
Re: Our naming of wait events is a disaster.
Robert Haas writes: > That being said, my view of this system is that it's good to document > the wait events that we have, but also that there are almost certainly > going to be cases where we can't say a whole lot more than "go read > the code," or at least not without an awful lot of work. Can't disagree with that. > I think > there's a reasonable chance that someone who sees a lot of ClientRead > or DataFileWrite wait events will have some idea what kind of problem > is indicated, even without consulting the documentation and even > moreso if we have some good documentation which they can consult. But > I don't know what anybody's going to do if they see a lot of > OldSerXidLock or AddinShmemInitLock contention. I submit that at least part of the problem is precisely one of crappy naming. I didn't know what OldSerXidLock did either, until yesterday when I dug into the code to find out. If it's named something like "SerialSLRULock", then at least somebody who has heard of SLRUs will have an idea of what is indicated. And we are exposing the notion of SLRUs pretty prominently in the monitoring docs as of v13, so that's not an unreasonable presumption. regards, tom lane
Re: new heapcheck contrib module
On 2020-05-11 19:21, Mark Dilger wrote: 1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database. Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index. The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users. Why is this useful over just using the extension's functions via psql? I suppose you could make an argument for a command-line wrapper around almost every admin-focused contrib module (pageinspect, pg_prewarm, pgstattuple, ...), but that doesn't seem very sensible. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Our naming of wait events is a disaster.
On Thu, May 14, 2020 at 3:58 PM Tom Lane wrote: > I submit that at least part of the problem is precisely one of crappy > naming. I didn't know what OldSerXidLock did either, until yesterday > when I dug into the code to find out. If it's named something like > "SerialSLRULock", then at least somebody who has heard of SLRUs will > have an idea of what is indicated. And we are exposing the notion > of SLRUs pretty prominently in the monitoring docs as of v13, so that's > not an unreasonable presumption. To the extent that exposing some of this information causes us to think more carefully about the naming, I think that's all to the good. I don't expect such measures to solve all of our problems in this area, but the idea that we can choose names with no consideration to whether anybody can understand what they mean is wrong even when the audience is only developers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Parallel copy
On Thu, May 14, 2020 at 2:18 AM Amit Kapila wrote: > To support that, we need to consider a few things. > a. Currently, we increment the command counter each time we take a key > share lock on a tuple during trigger execution. I am really not sure > if this is required during Copy command execution or we can just > increment it once for the copy. If we need to increment the command > counter just once for copy command then for the parallel copy we can > ensure that we do it just once at the end of the parallel copy but if > not then we might need some special handling. My sense is that it would be a lot more sensible to do it at the *beginning* of the parallel operation. Once we do it once, we shouldn't ever do it again; that's how it works now. Deferring it until later seems much more likely to break things. > b. Another point is that after inserting rows we record CTIDs of the > tuples in the event queue and then once all tuples are processed we > call FK trigger for each CTID. Now, with parallelism, the FK checks > will be processed once the worker processed one chunk. I don't see > any problem with it but still, this will be a bit different from what > we do in serial case. Do you see any problem with this? I think there could be some problems here. For instance, suppose that there are two entries for different workers for the same CTID. If the leader were trying to do all the work, they'd be handled consecutively. If they were from completely unrelated processes, locking would serialize them. But group locking won't, so there you have an issue, I think. Also, it's not ideal from a work-distribution perspective: one worker could finish early and be unable to help the others. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Our naming of wait events is a disaster.
I wrote: > Digging through the existing callers of SimpleLruInit, we have > namecontrol locksubdir > "async" AsyncCtlLock"pg_notify" > "clog" CLogControlLock "pg_xact" > "commit_timestamp" CommitTsControlLock "pg_commit_ts" > "multixact_member" MultiXactMemberControlLock "pg_multixact/members" > "multixact_offset" MultiXactOffsetControlLock "pg_multixact/offsets" > "oldserxid" OldSerXidLock "pg_serial" > "subtrans" SubtransControlLock "pg_subtrans" > After studying that list for awhile, it seems to me that we could > do worse than to name the SLRUs to match their on-disk subdirectories, > which are names that are already user-visible. So I propose these > base names for the SLRUs: > Notify > Xact > CommitTs > MultiXactMember (or MultiXactMembers) > MultiXactOffset (or MultiXactOffsets) > Serial > Subtrans As POC for this, here's a draft patch to rename the "async" SLRU and associated locks. If people are good with this then I'll go through and do similarly for the other SLRUs. A case could be made for doing s/async/notify/ more widely in async.c; for instance it's odd that the struct protected by NotifyQueueLock didn't get renamed to NotifyQueueControl. But that seems a bit out of scope for the immediate problem, and anyway I'm not sure how far to take it. I don't really want to rename async.c's externally-visible functions, for instance. For the moment I just renamed symbols used in the SimpleLruInit() call. regards, tom lane diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 87502a4..27e7591 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1754,12 +1754,12 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Waiting to manage space allocation in shared memory. - AsyncCtlLock - Waiting to read or update shared notification state. + NotifySLRULock + Waiting to read or update state of the Notify SLRU cache. - AsyncQueueLock - Waiting to read or update notification messages. + NotifyQueueLock + Waiting to read or update NOTIFY messages. AutoFileLock @@ -1941,8 +1941,8 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Waiting to allocate or assign a transaction id. - async - Waiting for I/O on an async (notify) buffer. + NotifyBuffer + Waiting for I/O on a Notify SLRU buffer. buffer_content @@ -4484,7 +4484,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Resets statistics to zero for a single SLRU cache, or for all SLRUs in the cluster. If the argument is NULL, all counters shown in the pg_stat_slru view for all SLRU caches are -reset. The argument can be one of async, +reset. The argument can be one of Notify, clog, commit_timestamp, multixact_offset, multixact_member, oldserxid, or diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 4613bd1..a3ba88d 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -107,7 +107,7 @@ * frontend during startup.) The above design guarantees that notifies from * other backends will never be missed by ignoring self-notifies. * - * The amount of shared memory used for notify management (NUM_ASYNC_BUFFERS) + * The amount of shared memory used for notify management (NUM_NOTIFY_BUFFERS) * can be varied without affecting anything but performance. The maximum * amount of notification data that can be queued at one time is determined * by slru.c's wraparound limit; see QUEUE_MAX_PAGE below. @@ -225,7 +225,7 @@ typedef struct QueuePosition * * Resist the temptation to make this really large. While that would save * work in some places, it would add cost in others. In particular, this - * should likely be less than NUM_ASYNC_BUFFERS, to ensure that backends + * should likely be less than NUM_NOTIFY_BUFFERS, to ensure that backends * catch up before the pages they'll need to read fall out of SLRU cache. */ #define QUEUE_CLEANUP_DELAY 4 @@ -244,7 +244,7 @@ typedef struct QueueBackendStatus /* * Shared memory state for LISTEN/NOTIFY (excluding its SLRU stuff) * - * The AsyncQueueControl structure is protected by the AsyncQueueLock. + * The AsyncQueueControl structure is protected by the NotifyQueueLock. * * When holding the lock in SHARED mode, backends may only inspect their own * entries as well as the head and tail pointers. Consequently we can allow a @@ -254,9 +254,9 @@ typedef struct QueueBackendStatus * When holding the lock in EXCLUSIVE mode, backends can inspect the e
Re: Our naming of wait events is a disaster.
On 2020-May-14, Tom Lane wrote: > A case could be made for doing s/async/notify/ more widely in async.c; > for instance it's odd that the struct protected by NotifyQueueLock > didn't get renamed to NotifyQueueControl. But that seems a bit out > of scope for the immediate problem, and anyway I'm not sure how far to > take it. I don't really want to rename async.c's externally-visible > functions, for instance. For the moment I just renamed symbols used > in the SimpleLruInit() call. That approach seems fine -- we'd only rename those things if and when we modified them for other reasons; and the file itself, probably not at all. Much like our renaming of XLOG to WAL, we changed the user-visible term all at once, but the code kept the original names until changed. Maybe in N years, when the SCM tooling is much better (so that it doesn't get confused by us having renamed the file in the newer branches and back-patching to an older branch), we can rename xlog.c to wal.c and async.c to notify.c. Or maybe not. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PG 13 release notes, first draft
On 2020-05-12 02:41, Justin Pryzby wrote: I'm not opposed to including it, but I think it's still true that the user doesn't need to know in advance that the error message will be additionally helpful in the event of corruption. If we were to include more "error" items, we might also include these: 71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error reporting This is actually a legitimate user-visible feature and should be listed somewhere. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PG 13 release notes, first draft
On Thu, May 14, 2020 at 2:02 PM Peter Eisentraut wrote: > On 2020-05-12 02:41, Justin Pryzby wrote: > > I'm not opposed to including it, but I think it's still true that the user > > doesn't need to know in advance that the error message will be additionally > > helpful in the event of corruption. If we were to include more "error" > > items, > > we might also include these: > > > > 71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error > > reporting > > This is actually a legitimate user-visible feature and should be listed > somewhere. +1 -- it's very handy. Plus it has user-facing knobs. -- Peter Geoghegan
Re: PG 13 release notes, first draft
On Thu, May 14, 2020 at 11:01:51PM +0200, Peter Eisentraut wrote: > On 2020-05-12 02:41, Justin Pryzby wrote: > > I'm not opposed to including it, but I think it's still true that the user > > doesn't need to know in advance that the error message will be additionally > > helpful in the event of corruption. If we were to include more "error" > > items, > > we might also include these: > > > > 71a8a4f6e36547bb060dbcc961ea9b57420f7190 Add backtrace support for error > > reporting > > This is actually a legitimate user-visible feature and should be listed > somewhere. On Thu, May 14, 2020 at 02:02:52PM -0700, Peter Geoghegan wrote: > +1 -- it's very handy. Plus it has user-facing knobs. That's already included: | Allow function call backtraces of errors to be logged (Peter Eisentraut, Álvaro Herrera) | Server variable backtrace_functions specifies which C functions should generate backtraces on error. I 1) I failed to double check my list; and, 2) intended for that to be interpretted as items which could be moved to a separate "error reporting" section of the release notes. -- Justin
Re: new heapcheck contrib module
> On May 14, 2020, at 1:02 PM, Peter Eisentraut > wrote: > > On 2020-05-11 19:21, Mark Dilger wrote: >> 1) A new module, pg_amcheck, which includes a command line client for >> checking a database or subset of a database. Internally it functions by >> querying the database for a list of tables which are appropriate given the >> command line switches, and then calls amcheck's functions to validate each >> table and/or index. The options for selecting/excluding tables and schemas >> is patterned on pg_dump, on the assumption that interface is already >> familiar to users. > > Why is this useful over just using the extension's functions via psql? The tool doesn't hold a single snapshot or transaction for the lifetime of checking the entire database. A future improvement to the tool might add parallelism. Users could do all of this in scripts, but having a single tool with the most commonly useful options avoids duplication of effort. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SEQUENCE values (duplicated) in some corner cases when crash happens
On Wed, May 6, 2020 at 1:52 PM Jeremy Schneider wrote: > The behavior we're observing is that a nextval() call in a committed > transaction is not crash-safe. This was discovered because some > applications were using nextval() to get a guaranteed unique sequence > number [or so they thought], then the application did some processing > with the value and later stored it in a relation of the same database. > > The nextval() number was not used until the transaction was committed - > I don't know what this line means. You said it was stored in a relation, surely that needs to have happened through some command which preceded the commit chronologically, though formally they may have happened atomically. > but then the fact of a value being generated, returned and committed was > lost on crash. The nextval() call used in isolation did not seem to > provide durability. > Are you clarifying the original complaint, or this a new, different complaint? Vini's test cases don't include any insertions. Do you have test cases that can reproduce your complaint? Cheers, Jeff
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
> On May 14, 2020, at 11:34 AM, Ranier Vilela wrote: > > htup->t_ctid = target_tid; > htup->t_ctid = newtid; > Both target_tid and newtid are local variable, whe loss scope, memory is > garbage. Ok, thanks for the concrete example of what is bothering you. In htup_details, I see that struct HeapTupleHeaderData has a field named t_ctid of type struct ItemPointerData. I also see in heapam that target_tid is of type ItemPointerData. The htup->t_ctid = target_tid copies the contents of target_tid. By the time target_tid goes out of scope, the contents are already copied. I would share your concern if t_ctid were of type ItemPointer (aka ItemPointerData *) and the code said htup->t_ctid = &target_tid but it doesn't say that, so I don't see the issue. Also in heapam, I see that newtid is likewise of type ItemPointerData, so the same logic applies. By the time newtid goes out of scope, its contents have already been copied into t_ctid, so there is no problem. But maybe you know all that and are just complaining that the name "ItemPointerData" sounds like a pointer rather than a struct? I'm still unclear whether you believe this is a bug, or whether you just don't like the naming that is used. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SEQUENCE values (duplicated) in some corner cases when crash happens
On 2020-May-14, Jeremy Schneider wrote: > "Later stored it in the table" - I'd have to double check with the other > team, but IIUC it was application pseudo-code like this: > > * execute SQL "select nextval()" and store result in > my_local_variable_unique_id > * commit Yes, simply inserting the sequence value in a (logged!) dummy table before this commit, as you suggest, should fix this problem. The insert ensures that the transaction commit is flushed to WAL. The table need not have indexes, making the insert faster. Just make sure to truncate the table every now and then. +1 to documenting this. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
> On May 14, 2020, at 11:34 AM, Ranier Vilela wrote: > > Certainly. > In the same file you can find the appropriate use of the API. > ItemPointerSet(&heapTuple->t_self, blkno, offnum); It took a couple reads through your patch to figure out what you were trying to accomplish, and I think you are uncomfortable with assigning one ItemPointerData variable from another.ItemPointerData is just a struct with three int16 variables. To make a standalone program that has the same structure without depending on any postgres headers, I'm using "short int" instead of "int16" and structs "TwoData" and "ThreeData" that are analogous to BlockIdData and OffsetNumber. #include typedef struct TwoData { short int a; short int b; } TwoData; typedef struct ThreeData { TwoData left; short int right; } ThreeData; int main(int argc, char **argv) { ThreeData x = { { 5, 10 }, 15 }; ThreeData y = x; x.left.a = 0; x.left.b = 1; x.right = 2; printf("y = { { %d, %d }, %d }\n", y.left.a, y.left.b, y.right); return 0; } If you compile and run this, you'll notice it outputs: y = { { 5, 10 }, 15 } and not the { { 0, 1}, 2 } that you would expect if y were merely pointing at x. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
Em qui., 14 de mai. de 2020 às 19:23, Mark Dilger < mark.dil...@enterprisedb.com> escreveu: > > > > On May 14, 2020, at 11:34 AM, Ranier Vilela wrote: > > > > htup->t_ctid = target_tid; > > htup->t_ctid = newtid; > > Both target_tid and newtid are local variable, whe loss scope, memory is > garbage. > > Ok, thanks for the concrete example of what is bothering you. > > In htup_details, I see that struct HeapTupleHeaderData has a field named > t_ctid of type struct ItemPointerData. I also see in heapam that > target_tid is of type ItemPointerData. The > > htup->t_ctid = target_tid > > copies the contents of target_tid. By the time target_tid goes out of > scope, the contents are already copied. I would share your concern if > t_ctid were of type ItemPointer (aka ItemPointerData *) and the code said > > htup->t_ctid = &target_tid > > but it doesn't say that, so I don't see the issue. > Even if the patch simplifies and uses the API to make the assignments. Really, the central problem does not exist, my fault. Perhaps because it has never made such use, structure assignment. And I failed to do research on the subject before. Sorry. > > Also in heapam, I see that newtid is likewise of type ItemPointerData, so > the same logic applies. By the time newtid goes out of scope, its contents > have already been copied into t_ctid, so there is no problem. > > But maybe you know all that and are just complaining that the name > "ItemPointerData" sounds like a pointer rather than a struct? I'm still > unclear whether you believe this is a bug, or whether you just don't like > the naming that is used. > My concerns were about whether attribution really would copy the structure's content and not just its address. The name makes it difficult, but that was not the point. The tool warned about uninitialized variable, which I mistakenly deduced for loss of scope. Thank you very much for the clarifications and your time. We never stopped learning, and using structure assignment was a new learning experience. regards Ranier Vilela
Re: [PATCH] Fix ouside scope t_ctid (ItemPointerData)
Em qui., 14 de mai. de 2020 às 19:49, Mark Dilger < mark.dil...@enterprisedb.com> escreveu: > > > > On May 14, 2020, at 11:34 AM, Ranier Vilela wrote: > > > > Certainly. > > In the same file you can find the appropriate use of the API. > > ItemPointerSet(&heapTuple->t_self, blkno, offnum); > > It took a couple reads through your patch to figure out what you were > trying to accomplish, and I think you are uncomfortable with assigning one > ItemPointerData variable from another.ItemPointerData is just a struct > with three int16 variables. To make a standalone program that has the same > structure without depending on any postgres headers, I'm using "short int" > instead of "int16" and structs "TwoData" and "ThreeData" that are analogous > to BlockIdData and OffsetNumber. > > #include > > typedef struct TwoData { > short int a; > short int b; > } TwoData; > > typedef struct ThreeData { > TwoData left; > short int right; > } ThreeData; > > int main(int argc, char **argv) > { > ThreeData x = { { 5, 10 }, 15 }; > ThreeData y = x; > x.left.a = 0; > x.left.b = 1; > x.right = 2; > > printf("y = { { %d, %d }, %d }\n", > y.left.a, y.left.b, y.right); > > return 0; > } > > If you compile and run this, you'll notice it outputs: > > y = { { 5, 10 }, 15 } > > and not the { { 0, 1}, 2 } that you would expect if y were merely pointing > at x. > Thanks for the example. But what I wanted to test was struct1 = struct2; Both being of the same type of structure. What I wrongly deduced was that the address of struct2 was saved and not its content. Again, thanks for your time and clarification. regards, Ranier Vilela
Re: Event trigger code comment duplication
On Wed, May 13, 2020 at 04:48:59PM +0900, Michael Paquier wrote: > Still not sure that's worth bothering. So, let's wait a couple of > days first to see if anybody has any comments, though I'd like to just > go with the simplest solution at hand and remove only the duplicated > comment about the standalone business with event triggers. I have gone through this one again this morning, and applied the simple version as merging the checks on the current event trigger state would make us lose some context about why each event gets skipped. -- Michael signature.asc Description: PGP signature
Re: Event trigger code comment duplication
On Thu, May 14, 2020 at 4:25 PM Michael Paquier wrote: > On Wed, May 13, 2020 at 04:48:59PM +0900, Michael Paquier wrote: > > Still not sure that's worth bothering. So, let's wait a couple of > > days first to see if anybody has any comments, though I'd like to just > > go with the simplest solution at hand and remove only the duplicated > > comment about the standalone business with event triggers. > > I have gone through this one again this morning, and applied the > simple version as merging the checks on the current event trigger > state would make us lose some context about why each event gets > skipped. > > Ok. Thanks! David J.
Re: POC: GROUP BY optimization
Hi, I wonder if anyone has plans to try again with this optimization in v14 cycle? The patches no longer apply thanks to the incremental sort patch, but I suppose fixing that should not be extremely hard. The 2020-07 CF is still a couple weeks away, but it'd be good to know if there are any plans to revive this. I'm willing to spend some time on reviewing / testing this, etc. I've only quickly skimmed the old thread, but IIRC there were two main challenges in getting the optimization right: 1) deciding which orderings are interesting / worth additional work I think we need to consider these orderings, in addition to the one specified in GROUP BY: 1) as specified in ORDER BY (if different from 1) 2) one with cheapest sort on unsorted input (depending on number of distinct values, cost of comparisons, etc.) 3) one with cheapest sort on partially sorted input (essentially what we do with the incremental sort paths, but matching the pathkeys in a more elaborate way) 2) costing the alternative orderings I think we've already discussed various ways to leverage as much available info as possible (extended stats, MCVs, ...) but I think the patch only does some of it. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add A Glossary
Thanks everybody. I have compiled together all the suggestions and the result is in the attached patch. Some of it is of my own devising. * I changed "instance", and made "cluster" be mostly a synonym of that. * I removed "global SQL object" and made "SQL object" explain it. * Added definitions for ACID, sequence, bloat, fork, FSM, VM, data page, transaction ID, epoch. * Changed "a SQL" to "an sql" everywhere. * Sorted alphabetically. * Removed caps in term names. I think I should get this pushed, and if there are further suggestions, they're welcome. Dim Fontaine and others suggested a number of terms that could be included; see https://twitter.com/alvherre/status/1246192786287865856 -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml index f638665dc9..b05c065546 100644 --- a/doc/src/sgml/acronyms.sgml +++ b/doc/src/sgml/acronyms.sgml @@ -766,7 +766,7 @@ XID - Transaction Identifier + Transaction identifier diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index 8c6cb6e942..d4255215aa 100644 --- a/doc/src/sgml/glossary.sgml +++ b/doc/src/sgml/glossary.sgml @@ -7,8 +7,22 @@ + + ACID + + + Atomicity, + consistency, + isolation, and + durability. + A set of properties of database transactions intended to guarantee validity + in concurrent operation and even in event of errors, power failures, etc. + + + + - Aggregate Function + Aggregate function A function that @@ -35,11 +49,15 @@ to make decisions about how to execute queries. + + (Don't confuse this term with the ANALYZE option + to the command.) + - Analytic Function + Analytic function @@ -106,8 +124,8 @@ Process of an instance - which act on behalf of client sessions - and handle their requests. + which acts on behalf of a client session + and handles its requests. (Don't confuse this term with the similar terms @@ -118,7 +136,7 @@ - Background Worker (process) + Background worker (process) Process within an instance, @@ -138,10 +156,11 @@ - Background Writer (process) + Background writer (process) - A process that continuously writes dirty pages from + A process that continuously writes dirty + data pages from shared memory to the file system. It wakes up periodically, but works only for a short period in order to distribute its expensive I/O @@ -155,6 +174,16 @@ + + Bloat + + + Space in data pages which does not contain relevant data, + such as unused (free) space or outdated row versions. + + + + Cast @@ -190,7 +219,7 @@ - Check Constraint + Check constraint A type of constraint @@ -208,15 +237,6 @@ - - Checkpointer (process) - - - A specialized process responsible for executing checkpoints. - - - - Checkpoint @@ -244,6 +264,15 @@ + + Checkpointer (process) + + + A specialized process responsible for executing checkpoints. + + + + Class (archaic) @@ -262,27 +291,6 @@ - - Cluster - - - A group of databases plus their - global SQL objects. The - cluster is managed by exactly one - instance. A newly created - Cluster will have three databases created automatically. They are - template0, template1, and - postgres. It is expected that an application will - create one or more additional database aside from these three. - - - (Don't confuse the PostgreSQL-specific term - Cluster with the SQL - command CLUSTER). - - - - Column @@ -363,7 +371,10 @@ A restriction on the values of data allowed within a - Table. + table, + or in attributes of a + + domain. For more information, see @@ -373,18 +384,18 @@ - Data Area + Data area - Data Directory + Data directory The base directory on the filesystem of a server that contains all data files and subdirectories associated with a - cluster with the + instance with the exception of tablespaces. The environment variable PGDATA is commonly used to refer to the @@ -416,15 +427,31 @@ - Database Server + Database server + + Data page + + + The basic structure used to store relation data. + All pages are of the same size. + Data pages are typically stored on disk, each in a specific file, + and can be read to shared buffers + where they can be modified, be
Re: MultiXact\SLRU buffers configuration
At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" wrote in > > GetMultiXactIdMembers believes that 4 is successfully done if 2 > > returned valid offset, but actually that is not obvious. > > > > If we add a single giant lock just to isolate ,say, > > GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency > > unnecessarily. Perhaps we need finer-grained locking-key for standby > > that works similary to buffer lock on primary, that doesn't cause > > confilicts between irrelevant mxids. > > > We can just replay members before offsets. If offset is already there - > members are there too. > But I'd be happy if we could mitigate those 1000us too - with a hint about > last maixd state in a shared MX state, for example. Generally in such cases, condition variables would work. In the attached PoC, the reader side gets no penalty in the "likely" code path. The writer side always calls ConditionVariableBroadcast but the waiter list is empty in almost all cases. But I couldn't cause the situation where the sleep 1000u is reached. > Actually, if we read empty mxid array instead of something that is replayed > just yet - it's not a problem of inconsistency, because transaction in this > mxid could not commit before we started. ISTM. > So instead of fix, we, probably, can just add a comment. If this reasoning is > correct. The step 4 of the reader side reads the members of the target mxid. It is already written if the offset of the *next* mxid is filled-in. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index e2aa5c9ce4..9db8f6cddd 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -82,6 +82,7 @@ #include "lib/ilist.h" #include "miscadmin.h" #include "pg_trace.h" +#include "pgstat.h" #include "postmaster/autovacuum.h" #include "storage/lmgr.h" #include "storage/pmsignal.h" @@ -233,6 +234,7 @@ typedef struct MultiXactStateData /* support for members anti-wraparound measures */ MultiXactOffset offsetStopLimit; /* known if oldestOffsetKnown */ + ConditionVariable nextoff_cv; /* * Per-backend data starts here. We have two arrays stored in the area * immediately following the MultiXactStateData struct. Each is indexed by @@ -873,6 +875,14 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, /* Exchange our lock */ LWLockRelease(MultiXactOffsetControlLock); + /* + * Let everybody know the offset of this mxid is recorded now. The waiters + * are waiting for the offset of the mxid next of the target to know the + * number of members of the target mxid, so we don't need to wait for + * members of this mxid are recorded. + */ + ConditionVariableBroadcast(&MultiXactState->nextoff_cv); + LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE); prev_pageno = -1; @@ -1367,9 +1377,19 @@ retry: if (nextMXOffset == 0) { /* Corner case 2: next multixact is still being filled in */ + + /* + * The recorder of the next mxid is just before writing the offset. + * Wait for the offset to be written. + */ + ConditionVariablePrepareToSleep(&MultiXactState->nextoff_cv); + LWLockRelease(MultiXactOffsetControlLock); CHECK_FOR_INTERRUPTS(); - pg_usleep(1000L); + + ConditionVariableSleep(&MultiXactState->nextoff_cv, + WAIT_EVENT_WAIT_NEXT_MXMEMBERS); + ConditionVariableCancelSleep(); goto retry; } @@ -1847,6 +1867,7 @@ MultiXactShmemInit(void) /* Make sure we zero out the per-backend state */ MemSet(MultiXactState, 0, SHARED_MULTIXACT_STATE_SIZE); + ConditionVariableInit(&MultiXactState->nextoff_cv); } else Assert(found); diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 0ecd29a1d9..1ac6b37188 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3879,6 +3879,9 @@ pgstat_get_wait_ipc(WaitEventIPC w) case WAIT_EVENT_SYNC_REP: event_name = "SyncRep"; break; + case WAIT_EVENT_WAIT_NEXT_MXMEMBERS: + event_name = "Mact/WaitNextXactMembers"; + break; /* no default case, so that compiler will warn */ } diff --git a/src/include/pgstat.h b/src/include/pgstat.h index ae9a39573c..e79bba0bef 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -886,7 +886,8 @@ typedef enum WAIT_EVENT_REPLICATION_ORIGIN_DROP, WAIT_EVENT_REPLICATION_SLOT_DROP, WAIT_EVENT_SAFE_SNAPSHOT, - WAIT_EVENT_SYNC_REP + WAIT_EVENT_SYNC_REP, + WAIT_EVENT_WAIT_NEXT_MXMEMBERS } WaitEventIPC; /* --
Re: Add "-Wimplicit-fallthrough" to default flags
On Wed, May 13, 2020 at 10:02 PM Tom Lane wrote: > Andy Fan writes: > >> FWIW, I got a warning for jsonpath_gram.c. > > Ugh. Confirmed here on Fedora 30 (bison 3.0.5). > > > I just found this just serval minutes ago. Upgrading your bison to the > > latest version (3.6) is ok. I'd like we have a better way to share this > > knowledge through. I spend ~30 minutes to troubleshooting this issue. > > I fear that is going to mean that we revert this patch. > We are *NOT* moving the minimum bison requirement for this, > especially not to a bleeding-edge bison version. Yes, I didn't mean revert the patch, but I was thinking moving the minimum bison. But since down to the warning level 3 also resolved the issue, looks it is a better way to do it. (On the other hand, if you have an old bison, > you likely also have an old gcc that doesn't know this warning > switch, so maybe it'd be all right in practice?) > > I just use an old bision and a newer gcc:( and I used "echo "COPT=-Wall -Werror" > src/Makefile.custom" which is same as our cfbot system. Thank you all for so quick fix! Best Regards Andy Fan
Re: Add explanations which are influenced by track_io_timing
Thanks for reviewing! On Wed, May 13, 2020 at 11:27 PM Fujii Masao wrote: > What about the attached patch based on yours? It looks better. Regards, -- Atsushi Torikoshi
Re: Add A Glossary
On Thu, May 14, 2020 at 08:00:17PM -0400, Alvaro Herrera wrote: > + ACID > + > + > + Atomicity, > + consistency, > + isolation, and > + durability. > + A set of properties of database transactions intended to guarantee > validity > + in concurrent operation and even in event of errors, power failures, > etc. I would capitalize Consistency, Isolation, Durability, and say "These four properties" or "This set of four properties" (althought that makes this sounds more like a fun game of DBA jeopardy). > + Background writer (process) > > > - A process that continuously writes dirty pages from > + A process that continuously writes dirty I don't like "continuously" > + data pages from > > + > + Bloat > + > + > + Space in data pages which does not contain relevant data, > + such as unused (free) space or outdated row versions. "current row versions" instead of relevant ? > + > + Data page > + > + > + The basic structure used to store relation data. > + All pages are of the same size. > + Data pages are typically stored on disk, each in a specific file, > + and can be read to shared > buffers > + where they can be modified, becoming > + dirty. They get clean by being written down say "They become clean when written to disk" > + to disk. New pages, which initially exist in memory only, are also > + dirty until written. > + > + Fork > + > + > + Each of the separate segmented file sets that a relation stores its > + data in. There exist a main fork and two > secondary "in which a relation's data is stored" > + forks: the free space map > + visibility map. missing "and" ? > + > + Free space map (fork) > + > + > + A storage structure that keeps metadata about each data page in a > table's > + main storage space. s/in/of/ just say "main fork"? > The free space map entry for each space stores the for each page ? > + amount of free space that's available for future tuples, and is > structured > + so it is efficient to search for available space for a new tuple of a > given > + size. ..to be efficiently searched to find free space.. > The heap is realized within > - segment files. > + segmented files > + in the relation's main > fork. Hm, the files aren't segmented. Say "one or more file segments per relation" > + There also exist local objects that do not belong to schemas; some > examples are > + extensions, > + data type casts, and > + foreign data > wrappers. Don't extensions have schemas ? > + > + Transaction ID > + > + > + The numerical, unique, sequentially-assigned identifier that each > + transaction receives when it first causes a database modification. > + Frequently abbreviated xid. abbreviated *as* xid > + approximately four billion write transactions IDs can be generated; > + to permit the system to run for longer than that would allow, remove "would allow" > > The process of removing outdated linkend="glossary-tuple">tuple > versions from tables, and other closely related actually tables or materialized views.. > + > + Visibility map (fork) > + > + > + A storage structure that keeps metadata about each data page > + in a table's main storage space. The visibility map entry for s/in/of/ main fork? -- Justin
Re: Transactions involving multiple postgres foreign servers, take 2
On Fri, 15 May 2020 at 03:08, Muhammad Usama wrote: > > > Hi Sawada, > > I have just done some review and testing of the patches and have > a couple of comments. Thank you for reviewing! > > 1- IMHO the PREPARE TRANSACTION should always use 2PC even > when the transaction has operated on a single foreign server regardless > of foreign_twophase_commit setting, and throw an error otherwise when > 2PC is not available on any of the data-modified servers. > > For example, consider the case > > BEGIN; > INSERT INTO ft_2pc_1 VALUES(1); > PREPARE TRANSACTION 'global_x1'; > > Here since we are preparing the local transaction so we should also prepare > the transaction on the foreign server even if the transaction has modified > only > one foreign table. > > What do you think? Good catch and I agree with you. The transaction should fail if it opened a transaction on a 2pc-no-support server regardless of foreign_twophase_commit. And I think we should prepare a transaction on a foreign server even if it didn't modify any data on that. > > Also without this change, the above test case produces an assertion failure > with your patches. > > 2- when deciding if the two-phase commit is required or not in > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use > 2PC when we have at least one server capable of doing that. > > i.e > > For FOREIGN_TWOPHASE_COMMIT_PREFER case in > checkForeignTwophaseCommitRequired() function I think > the condition should be > > need_twophase_commit = (nserverstwophase >= 1); > instead of > need_twophase_commit = (nserverstwophase >= 2); > Hmm I might be missing your point but it seems to me that you want to use two-phase commit even in the case where a transaction modified data on only one server. Can't we commit distributed transaction atomically even using one-phase commit in that case? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_regress cleans up tablespace twice.
On Mon, May 11, 2020 at 05:13:54PM +0900, Kyotaro Horiguchi wrote: > Tablespace directory cleanup is not done for all testing > targets. Actually it is not done for the tools under bin/ except > pg_upgrade. Let's first take one problem at a time, as I can see that your patch 0002 is modifying a portion of what you added in 0001, and so let's try to remove this WIN32 stuff from pg_regress.c. +sub CleanupTablespaceDirectory +{ + my $tablespace = 'testtablespace'; + + rmtree($tablespace) if (-e $tablespace); + mkdir($tablespace); +} This check should use "-d" and not "-e" as it would be true for a file as well. Also, in pg_regress.c, we remove the existing tablespace test directory in --outputdir, which is "." by default but it can be a custom one. Shouldn't you do the same logic in this new routine? So we should have an optional argument for the output directory that defaults to `pwd` if not defined, no? This means passing down the argument only for upgradecheck() in vcregress.pl. sub isolationcheck { chdir "../isolation"; + CleanupTablespaceDirectory(); copy("../../../$Config/isolationtester/isolationtester.exe", "../../../$Config/pg_isolation_regress"); my @args = ( [...] print "\n"; print "Checking $module\n"; + CleanupTablespaceDirectory(); my @args = ( "$topdir/$Config/pg_regress/pg_regress", "--bindir=${topdir}/${Config}/psql", I would put that just before the system() calls for consistency with the rest. -- Michael signature.asc Description: PGP signature
Re: pg_regress cleans up tablespace twice.
On Fri, May 15, 2020 at 11:58:55AM +0900, Michael Paquier wrote: > Let's first take one problem at a time, as I can see that your patch > 0002 is modifying a portion of what you added in 0001, and so let's > try to remove this WIN32 stuff from pg_regress.c. (Please note that this is not v13 material.) -- Michael signature.asc Description: PGP signature
Fix a typo in slot.c
Hi, I've attached the patch for $subject. The old comment seems to be borrowed from WalSndShmemInit(). Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fix_typo.patch Description: Binary data
Re: Parallel copy
On Fri, May 15, 2020 at 1:51 AM Robert Haas wrote: > > On Thu, May 14, 2020 at 2:18 AM Amit Kapila wrote: > > To support that, we need to consider a few things. > > a. Currently, we increment the command counter each time we take a key > > share lock on a tuple during trigger execution. I am really not sure > > if this is required during Copy command execution or we can just > > increment it once for the copy. If we need to increment the command > > counter just once for copy command then for the parallel copy we can > > ensure that we do it just once at the end of the parallel copy but if > > not then we might need some special handling. > > My sense is that it would be a lot more sensible to do it at the > *beginning* of the parallel operation. Once we do it once, we > shouldn't ever do it again; that's how it works now. Deferring it > until later seems much more likely to break things. > AFAIU, we always increment the command counter after executing the command. Why do we want to do it differently here? > > b. Another point is that after inserting rows we record CTIDs of the > > tuples in the event queue and then once all tuples are processed we > > call FK trigger for each CTID. Now, with parallelism, the FK checks > > will be processed once the worker processed one chunk. I don't see > > any problem with it but still, this will be a bit different from what > > we do in serial case. Do you see any problem with this? > > I think there could be some problems here. For instance, suppose that > there are two entries for different workers for the same CTID. > First, let me clarify the CTID I have used in my email are for the table in which insertion is happening which means FK table. So, in such a case, we can't have the same CTIDs queued for different workers. Basically, we use CTID to fetch the row from FK table later and form a query to lock (in KEY SHARE mode) the corresponding tuple in PK table. Now, it is possible that two different workers try to lock the same row of PK table. I am not clear what problem group locking can have in this case because these are non-conflicting locks. Can you please elaborate a bit more? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Transactions involving multiple postgres foreign servers, take 2
On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > On Fri, 15 May 2020 at 03:08, Muhammad Usama wrote: > > > > > > Hi Sawada, > > > > I have just done some review and testing of the patches and have > > a couple of comments. > > Thank you for reviewing! > > > > > 1- IMHO the PREPARE TRANSACTION should always use 2PC even > > when the transaction has operated on a single foreign server regardless > > of foreign_twophase_commit setting, and throw an error otherwise when > > 2PC is not available on any of the data-modified servers. > > > > For example, consider the case > > > > BEGIN; > > INSERT INTO ft_2pc_1 VALUES(1); > > PREPARE TRANSACTION 'global_x1'; > > > > Here since we are preparing the local transaction so we should also > prepare > > the transaction on the foreign server even if the transaction has > modified only > > one foreign table. > > > > What do you think? > > Good catch and I agree with you. The transaction should fail if it > opened a transaction on a 2pc-no-support server regardless of > foreign_twophase_commit. And I think we should prepare a transaction > on a foreign server even if it didn't modify any data on that. > > > > > Also without this change, the above test case produces an assertion > failure > > with your patches. > > > > 2- when deciding if the two-phase commit is required or not in > > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use > > 2PC when we have at least one server capable of doing that. > > > > i.e > > > > For FOREIGN_TWOPHASE_COMMIT_PREFER case in > > checkForeignTwophaseCommitRequired() function I think > > the condition should be > > > > need_twophase_commit = (nserverstwophase >= 1); > > instead of > > need_twophase_commit = (nserverstwophase >= 2); > > > > Hmm I might be missing your point but it seems to me that you want to > use two-phase commit even in the case where a transaction modified > data on only one server. Can't we commit distributed transaction > atomically even using one-phase commit in that case? > > I think you are confusing between nserverstwophase and nserverswritten. need_twophase_commit = (nserverstwophase >= 1) would mean use two-phase commit if at least one server exists in the list that is capable of doing 2PC For the case when the transaction modified data on only one server we already exits the function indicating no two-phase required if (nserverswritten <= 1) return false; > Regards, > > -- > Masahiko Sawadahttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > Regards, ... Muhammad Usama Highgo Software (Canada/China/Pakistan) URL : http://www.highgo.ca ADDR: 10318 WHALLEY BLVD, Surrey, BC
Re: Fix a typo in slot.c
On Fri, May 15, 2020 at 9:16 AM Masahiko Sawada wrote: > > Hi, > > I've attached the patch for $subject. The old comment seems to be > borrowed from WalSndShmemInit(). > /* - * Allocate and initialize walsender-related shared memory. + * Allocate and initialize replication slots' shared memory. */ How about changing it to "Allocate and initialize shared memory for replication slots"? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Fix a typo in slot.c
On Fri, 15 May 2020 at 13:26, Amit Kapila wrote: > > On Fri, May 15, 2020 at 9:16 AM Masahiko Sawada > wrote: > > > > Hi, > > > > I've attached the patch for $subject. The old comment seems to be > > borrowed from WalSndShmemInit(). > > > > /* > - * Allocate and initialize walsender-related shared memory. > + * Allocate and initialize replication slots' shared memory. > */ > > How about changing it to "Allocate and initialize shared memory for > replication slots"? > Agreed. Attached the updated patch. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fix_typo_v2.patch Description: Binary data
Re: COPY, lock release and MVCC
On Thu, May 14, 2020 at 7:10 PM Laurenz Albe wrote: > > On Thu, 2020-05-14 at 15:11 +0530, Amit Kapila wrote: > > LGTM. I have slightly modified the commit message, see if that looks > > fine to you. > > Fine with me, thanks. > > > This breaks the cases where a REPEATABLE READ transaction could see an > > empty table if it repeats a COPY statement and somebody truncated the > > table in the meantime. > > I would use "case" rather than "cases" here. > Okay, changed, and pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Transactions involving multiple postgres foreign servers, take 2
On Fri, 15 May 2020 at 13:26, Muhammad Usama wrote: > > > > On Fri, May 15, 2020 at 7:20 AM Masahiko Sawada > wrote: >> >> On Fri, 15 May 2020 at 03:08, Muhammad Usama wrote: >> > >> > >> > Hi Sawada, >> > >> > I have just done some review and testing of the patches and have >> > a couple of comments. >> >> Thank you for reviewing! >> >> > >> > 1- IMHO the PREPARE TRANSACTION should always use 2PC even >> > when the transaction has operated on a single foreign server regardless >> > of foreign_twophase_commit setting, and throw an error otherwise when >> > 2PC is not available on any of the data-modified servers. >> > >> > For example, consider the case >> > >> > BEGIN; >> > INSERT INTO ft_2pc_1 VALUES(1); >> > PREPARE TRANSACTION 'global_x1'; >> > >> > Here since we are preparing the local transaction so we should also prepare >> > the transaction on the foreign server even if the transaction has modified >> > only >> > one foreign table. >> > >> > What do you think? >> >> Good catch and I agree with you. The transaction should fail if it >> opened a transaction on a 2pc-no-support server regardless of >> foreign_twophase_commit. And I think we should prepare a transaction >> on a foreign server even if it didn't modify any data on that. >> >> > >> > Also without this change, the above test case produces an assertion failure >> > with your patches. >> > >> > 2- when deciding if the two-phase commit is required or not in >> > FOREIGN_TWOPHASE_COMMIT_PREFER mode we should use >> > 2PC when we have at least one server capable of doing that. >> > >> > i.e >> > >> > For FOREIGN_TWOPHASE_COMMIT_PREFER case in >> > checkForeignTwophaseCommitRequired() function I think >> > the condition should be >> > >> > need_twophase_commit = (nserverstwophase >= 1); >> > instead of >> > need_twophase_commit = (nserverstwophase >= 2); >> > >> >> Hmm I might be missing your point but it seems to me that you want to >> use two-phase commit even in the case where a transaction modified >> data on only one server. Can't we commit distributed transaction >> atomically even using one-phase commit in that case? >> > > I think you are confusing between nserverstwophase and nserverswritten. > > need_twophase_commit = (nserverstwophase >= 1) would mean > use two-phase commit if at least one server exists in the list that is > capable of doing 2PC > > For the case when the transaction modified data on only one server we > already exits the function indicating no two-phase required > > if (nserverswritten <= 1) > return false; > Thank you for your explanation. If the transaction modified two servers that don't' support 2pc and one server that supports 2pc I think we don't want to use 2pc even in 'prefer' case. Because even if we use 2pc in that case, it's still possible to have the atomic commit problem. For example, if we failed to commit a transaction after committing other transactions on the server that doesn't support 2pc we cannot rollback the already-committed transaction. On the other hand, in 'prefer' case, if the transaction also modified the local data, we need to use 2pc even if it modified data on only one foreign server that supports 2pc. But the current code doesn't work fine in that case for now. Probably we also need the following change: @@ -540,7 +540,10 @@ checkForeignTwophaseCommitRequired(void) /* Did we modify the local non-temporary data? */ if ((MyXactFlags & XACT_FLAGS_WROTENONTEMPREL) != 0) + { nserverswritten++; + nserverstwophase++; + } Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: COPY, lock release and MVCC
On Fri, 2020-05-15 at 10:11 +0530, Amit Kapila wrote: > Okay, changed, and pushed. Thank you! Yours, Laurenz Albe
Re: Problem with logical replication
On Tue, May 12, 2020 at 09:45:45PM -0300, Euler Taveira wrote: >> In any case, it seems to me that the comment of >> build_replindex_scan_key needs to be updated. >> >> * This is not generic routine, it expects the idxrel to be replication >> * identity of a rel and meet all limitations associated with that. > > It is implicit that a primary key can be a replica identity so I think this > comment is fine. Agreed. I don't think either that we need to update this comment. I was playing with this patch and what you have here looks fine by me. Two nits: the extra parenthesis in the assert are not necessary, and the indentation had some diffs. Tom has just reindented the whole tree, so let's keep things clean. -- Michael diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c index 1418746eb8..8f474faed0 100644 --- a/src/backend/executor/execReplication.c +++ b/src/backend/executor/execReplication.c @@ -57,7 +57,8 @@ build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel, int2vector *indkey = &idxrel->rd_index->indkey; bool hasnulls = false; - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel)); + Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) || + RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel)); indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple, Anum_pg_index_indclass, &isnull); diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl index 6da7f71ca3..8c3dee8ffc 100644 --- a/src/test/subscription/t/001_rep_changes.pl +++ b/src/test/subscription/t/001_rep_changes.pl @@ -34,6 +34,8 @@ $node_publisher->safe_psql('postgres', $node_publisher->safe_psql('postgres', "CREATE TABLE tab_include (a int, b text, CONSTRAINT covering PRIMARY KEY(a) INCLUDE(b))" ); +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab_full_pk (a int primary key, b text)"); # Let this table with REPLICA IDENTITY NOTHING, allowing only INSERT changes. $node_publisher->safe_psql('postgres', "CREATE TABLE tab_nothing (a int)"); $node_publisher->safe_psql('postgres', @@ -46,6 +48,10 @@ $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full (a int)"); $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_full2 (x text)"); $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_rep (a int primary key)"); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab_full_pk (a int primary key, b text)"); +$node_subscriber->safe_psql('postgres', + "ALTER TABLE tab_full_pk REPLICA IDENTITY FULL"); $node_subscriber->safe_psql('postgres', "CREATE TABLE tab_nothing (a int)"); # different column count and order than on publisher @@ -64,7 +70,7 @@ $node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub"); $node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub_ins_only WITH (publish = insert)"); $node_publisher->safe_psql('postgres', - "ALTER PUBLICATION tap_pub ADD TABLE tab_rep, tab_full, tab_full2, tab_mixed, tab_include, tab_nothing" + "ALTER PUBLICATION tap_pub ADD TABLE tab_rep, tab_full, tab_full2, tab_mixed, tab_include, tab_nothing, tab_full_pk" ); $node_publisher->safe_psql('postgres', "ALTER PUBLICATION tap_pub_ins_only ADD TABLE tab_ins"); @@ -102,6 +108,9 @@ $node_publisher->safe_psql('postgres', "UPDATE tab_rep SET a = -a"); $node_publisher->safe_psql('postgres', "INSERT INTO tab_mixed VALUES (2, 'bar', 2.2)"); +$node_publisher->safe_psql('postgres', + "INSERT INTO tab_full_pk VALUES (1, 'foo')"); + $node_publisher->safe_psql('postgres', "INSERT INTO tab_nothing VALUES (generate_series(1,20))"); @@ -159,6 +168,8 @@ $node_publisher->safe_psql('postgres', "UPDATE tab_full2 SET x = 'bb' WHERE x = 'b'"); $node_publisher->safe_psql('postgres', "UPDATE tab_mixed SET b = 'baz' WHERE a = 1"); +$node_publisher->safe_psql('postgres', + "UPDATE tab_full_pk SET b = 'bar' WHERE a = 1"); $node_publisher->wait_for_catchup('tap_sub'); signature.asc Description: PGP signature
pg_bsd_indent and -Wimplicit-fallthrough
Hi, I have just noticed that pg_bsd_indent complains since -Wimplicit-fallthrough=3 has been added to the default set of switches if available. Something like the attached is fine to take care of those warnings, but what's our current patching policy for this tool? Thanks, -- Michael diff --git a/indent.c b/indent.c index 9faf57a..5da3401 100644 --- a/indent.c +++ b/indent.c @@ -917,6 +917,7 @@ check_type: case structure: if (ps.p_l_follow > 0) goto copy_id; + /* FALLTHROUGH */ case decl: /* we have a declaration type (int, etc.) */ parse(decl); /* let parser worry about indentation */ if (ps.last_token == rparen && ps.tos <= 1) { diff --git a/parse.c b/parse.c index a51eb8b..bf6b169 100644 --- a/parse.c +++ b/parse.c @@ -100,6 +100,7 @@ parse(int tk) /* tk: the code for the construct scanned */ */ ps.i_l_follow = ps.il[ps.tos--]; /* the rest is the same as for dolit and forstmt */ + /* FALLTHROUGH */ case dolit: /* 'do' */ case forstmt: /* for (...) */ ps.p_stack[++ps.tos] = tk; signature.asc Description: PGP signature
Re: documenting the backup manifest file format
On 2020/04/15 5:33, Andrew Dunstan wrote: On 4/14/20 4:09 PM, Alvaro Herrera wrote: On 2020-Apr-14, Andrew Dunstan wrote: OK, but I think if we're putting a timestamp string in ISO-8601 format in the manifest it should be in UTC / Zulu time, precisely to avoid these issues. If that's too much trouble then yes an epoch time will probably do. The timestamp is always specified and always UTC (except the code calls it GMT). + /* +* Convert last modification time to a string and append it to the +* manifest. Since it's not clear what time zone to use and since time +* zone definitions can change, possibly causing confusion, use GMT +* always. +*/ + appendStringInfoString(&buf, "\"Last-Modified\": \""); + enlargeStringInfo(&buf, 128); + buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%d %H:%M:%S %Z", + pg_gmtime(&mtime)); + appendStringInfoString(&buf, "\""); I was merely saying that it's trivial to make this iso-8601 compliant as buf.len += pg_strftime(&buf.data[buf.len], 128, "%Y-%m-%dT%H:%M:%SZ", ie. omit the "GMT" string and replace it with a literal Z, and remove the space and replace it with a T. I have one question related to this; Why don't we use log_timezone, like backup_label? log_timezone is used for "START TIME" field in backup_label. Sorry if this was already discussed. /* Use the log timezone here, not the session timezone */ stamp_time = (pg_time_t) time(NULL); pg_strftime(strfbuf, sizeof(strfbuf), "%Y-%m-%d %H:%M:%S %Z", pg_localtime(&stamp_time, log_timezone)); OTOH, *if* we want to use the same timezone for backup-related files because backup can be used in different environements and timezone setting may be different there or for other reasons, backup_label also should use GMT or something for the sake of consistency? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pg_bsd_indent and -Wimplicit-fallthrough
On Fri, May 15, 2020 at 8:03 AM Michael Paquier wrote: > > Hi, > > I have just noticed that pg_bsd_indent complains since > -Wimplicit-fallthrough=3 has been added to the default set of switches > if available. Oh Indeed. > Something like the attached is fine to take care of those warnings, > but what's our current patching policy for this tool? The patch looks good to me. It looks like we already have custom patches, so +1 to applying it.
Re: PG 13 release notes, first draft
On 2020/05/05 12:16, Bruce Momjian wrote: I have committed the first draft of the PG 13 release notes. You can see them here: https://momjian.us/pgsql_docs/release-13.html It still needs markup, word wrap, and indenting. The community doc build should happen in a few hours. Many thanks for working on this! When I did "make html", I got the following message. Link element has no content and no Endterm. Nothing to show in the link to sepgsql "Allow to control access to the" in release-13.sgml seems to have caused this. Also I found it's converted into "Allow ??? to control access to the", i.e., ??? was used. - Allow to control access to the + Allow sepgsql to control access to the Shouldn't we change that as the above? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION