Re: vacuumdb and new VACUUM options
On Wed, May 08, 2019 at 09:26:35AM +0900, Masahiko Sawada wrote: > I think it's a good idea to add new options of these parameters for > vacuumdb. While making INDEX_CLEANUP option patch I also attached the > patch for INDEX_CLEANUP parameter before[1], although it adds > --disable-index-cleanup option instead. I have added an open item for that. I think that we should added these options. -- Michael signature.asc Description: PGP signature
Re: copy-past-o comment in lock.h
On Tue, May 07, 2019 at 04:12:31PM +0800, John Naylor wrote: > That's probably better. Would you like to send an updated patch? Perhaps you have a better idea? -- Michael signature.asc Description: PGP signature
Inconsistency between table am callback and table function names
The general theme for table function names seem to be "table_". For example table_scan_getnextslot() and its corresponding callback scan_getnextslot(). Most of the table functions and callbacks follow mentioned convention except following ones table_beginscan table_endscan table_rescan table_fetch_row_version table_get_latest_tid table_insert table_insert_speculative table_complete_speculative table_delete table_update table_lock_tuple the corresponding callback names for them are scan_begin scan_end scan_rescan tuple_fetch_row_version tuple_get_latest_tid tuple_insert tuple_insert_speculative tuple_delete tuple_update tuple_lock It confuses while browsing through the code and hence I would like to propose we make them consistent. Either fix the callback names or table functions but all should follow the same convention, makes it easy to browse around and refer to as well. Personally, I would say fix the table function names as callback names seem fine. So, for example, make it table_scan_begin(). Also, some of these table function names read little odd table_relation_set_new_filenode table_relation_nontransactional_truncate table_relation_copy_data table_relation_copy_for_cluster table_relation_vacuum table_relation_estimate_size Can we drop relation word from callback names and as a result from these function names as well? Just have callback names as set_new_filenode, copy_data, estimate_size? Also, a question about comments. Currently, redundant comments are written above callback functions as also above table functions. They differ sometimes a little bit on descriptions but majority content still being the same. Should we have only one place finalized to have the comments to keep them in sync and also know which one to refer to? Plus, file name amapi.h now seems very broad, if possible should be renamed to indexamapi.h or indexam.h to follow tableam.h. No idea what's our policy around header file renames.
Re: Inconsistent error message wording for REINDEX CONCURRENTLY
On Tue, May 07, 2019 at 05:19:38PM -0400, Tom Lane wrote: > That's nothing but a hack, and the reason it's necessary is that > index_create will throw error if IsCatalogRelation is true, which > it will be for information_schema TOAST tables --- but not for their > parent tables that are being examined here. Oh. Good catch. That's indeed crazy now that I look closer at that. > After looking around, it seems to me that the correct definition for > IsCatalogRelation is just "is the OID less than FirstBootstrapObjectId?". > Currently we could actually restrict it to "less than > FirstGenbkiObjectId", because all the catalogs, indexes, and TOAST tables > have hand-assigned OIDs --- but perhaps someday we'll let genbki.pl > assign some of those OIDs, so I prefer the weaker constraint. In any > case, this gives us a correct separation between objects that are > traceable to the bootstrap data and those that are created by plain SQL > later in initdb. > > With this, the Form_pg_class argument to IsCatalogClass becomes > vestigial. I'm tempted to get rid of that function altogether in > favor of direct calls to IsCatalogRelationOid, but haven't done so > in the attached. I think that removing entirely IsCatalogClass() is just better as if any extension uses this routine, then it could potentially simplify its code because needing Form_pg_class means usually opening a Relation, and this can be removed. With IsCatalogClass() removed, the only dependency with Form_pg_class comes from IsToastClass() which is not used at all except in IsSystemClass(). Wouldn't it be better to remove entirely IsToastClass() and switch IsSystemClass() to use a namespace OID instead of Form_pg_class? With your patch, ReindexRelationConcurrently() does not complain for REINDEX TABLE CONCURRENTLY for a catalog table and would trigger the error from index_create(), which is at the origin of this thread. The check with IsSharedRelation() for REINDEX INDEX CONCURRENTLY is useless and the error message generated for IsCatalogRelationOid() still needs to be improved. Would you prefer to include those changes in your patch? Or should I work on top of what you are proposing (your patch does not include negative tests for toast index and tables on catalogs either). -- Michael signature.asc Description: PGP signature
Re: copy-past-o comment in lock.h
On Wed, May 8, 2019 at 3:10 PM Michael Paquier wrote: > > On Tue, May 07, 2019 at 04:12:31PM +0800, John Naylor wrote: > > That's probably better. > > Would you like to send an updated patch? Perhaps you have a better > idea? > -- > Michael In the attached, I've used your language, and also moved the comments closer to the code they are describing. That seems more logical and future proof. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services locktag-comment-v2.patch Description: Binary data
Re: copy-past-o comment in lock.h
On Wed, May 08, 2019 at 03:59:36PM +0800, John Naylor wrote: > In the attached, I've used your language, and also moved the comments > closer to the code they are describing. That seems more logical and > future proof. Good idea to move the comments so what you proposes looks fine to me. Are there any objections? -- Michael signature.asc Description: PGP signature
Re: Regression test PANICs with master-standby setup on same machine
At Tue, 7 May 2019 10:55:06 +0900, Michael Paquier wrote in <20190507015506.gc1...@paquier.xyz> > On Tue, May 07, 2019 at 10:16:54AM +0900, Kyotaro HORIGUCHI wrote: > > The fake symlinks need correction after the data directory and > > tablespsce directory are moved. Maybe needs to call > > CorrectSymlink() or something at startup... Or relative > > tablespaces should be rejected on Windows? > > It took enough sweat and tears to have an implementation with junction > points done correctly on Windows and we know that it works, so I am Indeed. It is very ill documented and complex. > not sure that we need an actual wrapper for readlink() and such for > the backend code to replace junction points. The issue with Windows > is that perl's symlink() is not directly available on Windows. Ugg. If we want to run tablespace-related tests involving replication on Windows, we need to make the tests using absolute tablespace paths. Period...? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: vacuumdb and new VACUUM options
On Wed, May 8, 2019 at 9:06 AM Michael Paquier wrote: > > On Wed, May 08, 2019 at 09:26:35AM +0900, Masahiko Sawada wrote: > > I think it's a good idea to add new options of these parameters for > > vacuumdb. While making INDEX_CLEANUP option patch I also attached the > > patch for INDEX_CLEANUP parameter before[1], although it adds > > --disable-index-cleanup option instead. > > I have added an open item for that. I think that we should added > these options. +1, and thanks for adding the open item!
Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
On Mon, May 6, 2019 at 3:30 PM Thomas Munro wrote: > I put the second sentence back and tweaked it thus: s/fails/might > fail/. Maybe I'm being too pedantic here, but binding to 127.0.0.2 > works on other OSes too, as long as you've configured an interface or > alias for it (and it's not terribly uncommon to do so). Here's a > version with a commit message. Please let me know if you want to > tweak the comments further. Pushed, with a further adjustment to the comment. -- Thomas Munro https://enterprisedb.com
Re: Statistical aggregate functions are not working with PARTIAL aggregation
On Wed, May 8, 2019 at 12:09 AM Kyotaro HORIGUCHI wrote: > By the way, as mentioned above, this issue exists since 11 but > harms at 12. Is this an open item, or older bug? Sounds more like an open item to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Inconsistent error message wording for REINDEX CONCURRENTLY
Michael Paquier writes: > On Tue, May 07, 2019 at 05:19:38PM -0400, Tom Lane wrote: >> With this, the Form_pg_class argument to IsCatalogClass becomes >> vestigial. I'm tempted to get rid of that function altogether in >> favor of direct calls to IsCatalogRelationOid, but haven't done so >> in the attached. > I think that removing entirely IsCatalogClass() is just better as if > any extension uses this routine, then it could potentially simplify > its code because needing Form_pg_class means usually opening a > Relation, and this can be removed. Yeah, it's clearly easier to use without the extra argument. > With IsCatalogClass() removed, the only dependency with Form_pg_class > comes from IsToastClass() which is not used at all except in > IsSystemClass(). Wouldn't it be better to remove entirely > IsToastClass() and switch IsSystemClass() to use a namespace OID > instead of Form_pg_class? Not sure. The way it's defined has the advantage of being more independent of exactly what the implementation of the "is a toast table" check is. Also, I looked around to see if any callers could really be simplified if they only had to pass the table OID, and didn't find much; almost all of them are looking at the pg_class tuple themselves, typically to check the relkind too. So we'd not make any net savings in syscache lookups by changing IsSystemClass's API. I'm kind of inclined to leave it alone. > With your patch, ReindexRelationConcurrently() does not complain for > REINDEX TABLE CONCURRENTLY for a catalog table and would trigger the > error from index_create(), which is at the origin of this thread. The > check with IsSharedRelation() for REINDEX INDEX CONCURRENTLY is > useless and the error message generated for IsCatalogRelationOid() > still needs to be improved. Would you prefer to include those changes > in your patch? Or should I work on top of what you are proposing > (your patch does not include negative tests for toast index and > tables on catalogs either). Yes, we still need to do your patch on top of this one (or really either order would do). I think keeping them separate is good. BTW, when I was looking at this I got dissatisfied about another aspect of the wording of the relevant error messages: a lot of them are like, for example errmsg("cannot reindex concurrently this type of relation"))); While that matches the command syntax we're using, it's just horrid English grammar. Better would be errmsg("cannot reindex this type of relation concurrently"))); Can we change that while we're at it? regards, tom lane
Re: Heap lock levels for REINDEX INDEX CONCURRENTLY not quite right?
On 2019-05-07 05:07, Michael Paquier wrote: > On Sat, May 04, 2019 at 09:59:20PM +0900, Michael Paquier wrote: >> The result should be no deadlocks happening in the two sessions >> running the reindex. I can see the deadlock easily with three psql >> sessions, running manually the queries. > > +* If the OID isn't valid, it means the index as concurrently dropped, > +* which is not a problem for us; just return normally. > Typo here s/as/is/. > > I have looked closer at the patch and the change proposed looks good > to me. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Inconsistent error message wording for REINDEX CONCURRENTLY
On Wed, May 08, 2019 at 08:31:54AM -0400, Tom Lane wrote: > Michael Paquier writes: >> With IsCatalogClass() removed, the only dependency with Form_pg_class >> comes from IsToastClass() which is not used at all except in >> IsSystemClass(). Wouldn't it be better to remove entirely >> IsToastClass() and switch IsSystemClass() to use a namespace OID >> instead of Form_pg_class? > > Not sure. The way it's defined has the advantage of being more > independent of exactly what the implementation of the "is a toast table" > check is. Also, I looked around to see if any callers could really be > simplified if they only had to pass the table OID, and didn't find much; > almost all of them are looking at the pg_class tuple themselves, typically > to check the relkind too. So we'd not make any net savings in syscache > lookups by changing IsSystemClass's API. I'm kind of inclined to leave > it alone. Hmm. Okay. It would have been nice to remove completely the dependency to Form_pg_class from this set of APIs, but I can see your point. > Yes, we still need to do your patch on top of this one (or really > either order would do). I think keeping them separate is good. Okay, glad to hear. That's what I wanted to do. > While that matches the command syntax we're using, it's just horrid > English grammar. Better would be > > errmsg("cannot reindex this type of relation > concurrently"))); > > Can we change that while we're at it? No problem to do that. I'll brush up all that once you commit the first piece you have come up with, and reuse the new API of catalog.c you are introducing based on the table OID. -- Michael signature.asc Description: PGP signature
Re: accounting for memory used for BufFile during hash joins
On Tue, May 07, 2019 at 05:43:56PM -0700, Melanie Plageman wrote: On Tue, May 7, 2019 at 6:59 AM Tomas Vondra wrote: On Tue, May 07, 2019 at 04:28:36PM +1200, Thomas Munro wrote: >On Tue, May 7, 2019 at 3:15 PM Tomas Vondra > wrote: >> On Tue, May 07, 2019 at 01:48:40PM +1200, Thomas Munro wrote: >> Switching to some other algorithm during execution moves the goal posts >> to the next galaxy, I'm afraid. > >The main problem I'm aware of with sort-merge join is: not all that is >hashable is sortable. So BNL is actually the only solution I'm aware >of for problem B that doesn't involve changing a fundamental thing >about PostgreSQL's data type requirements. > Sure, each of those algorithms has limitations. But I think that's mostly irrelevant to the main issue - switching between algorithms mid-execution. At that point some of the tuples might have been already sent sent to the other nodes, and I have no idea how to "resume" the tuple stream short of buffering everything locally until the join completes. And that would be rather terrible, I guess. What if you switched to NLJ on a batch-by-batch basis and did it before starting execution of the join but after building the inner side of the hash table. That way, no tuples will have been sent to other nodes yet. Interesting idea! I think you're right doing it on a per-batch basis would solve that problem. Essentially, if all (or >95%) of the tuples has the same hash value, we could switch to a special "degraded" mode doing something like a NL. At that point the hash table benefits are lost anyway, because all the tuples are in a single chain, so it's not going to be much slower. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Tue, May 7, 2019 at 2:10 AM Masahiko Sawada wrote: > > > That better not be true. If you have a design where reading the WAL > > > lets you get *any* encryption key, you have a bad design, I think. > > How does the startup process decrypt WAL during recovery without > getting any encryption key if we encrypt user data in WAL by multiple > encryption keys? The keys have to be supplied from someplace outside of the database system. I am imagining a command that gets run with the key ID as an argument and is expected to print the key out on standard output for the server to read. I am not an encryption expert, but it's hard for me to imagine this working any other way. I mean, if you store the keys that you need for decryption inside the database, isn't that the same as storing your house key in your house, or your car key in your car? If you store your car key in the car, then either the car is locked from the outside, and the key is useless to you, or the car is unlocked from the outside, and the key is just as available to a thief as it is to you. Either way, it provides no security. What you do is keep your car key in your pocket or purse; if you try to start the car, it "requests" the key from you as proof that you are entitled to start it. I think the database has to work similarly, except that rather than protecting the act of "starting" the database, each key is requested the first time it's needed, when it's discovered that we need to decrypt some data encrypted with that key. > > > Well, what threat are you trying to protect against? > > Data theft bypassing PostgreSQL's ACL, for example a malicious user > thefts storage devices and reads datbase files directly. > > I'm thinking that only users who have an access privilege of the > database object can get encryption key for the object. Therefore, when > a malicious user stole an encryption key by breaking the access > control system if we suppose data at rest encryption to serve as a yet > another access control layer we have to use the same encryption key > for WAL as that we used for database file. But I thought that we > should rather protect data from that situation by access control > system and managing encryption keys more robustly. I don't really follow that logic. If the encryption keys are managed robustly enough that they cannot be stolen, then we only need one. If there is still enough risk of key theft that we care to protect against it, we can't use a dedicated key for the WAL without increasing the risk. > > > > FWIW, binary log encryption of MySQL uses different encryption key > > > > from a key used for table[1]. The key is encrypted by the master key > > > > for binary log encryption and is stored in each file headers. > > > > > > So, if you steal the master key for binary log encryption, you can > > > decrypt everything, it sounds like. > > Yes, I think so. I am not keen to copy that design. It sounds like having multiple keys in this design adds a lot of complexity without adding much security. > > > Data other than table and index data seems like it is not very > > > security-sensitive. I'm not sure we need to encrypt it at all. If we > > > do, using one key seems fine. > > Agreed. But it seems not to satisfy some user who require to encrypt > everything, which we discussed before. Agreed. I'm thinking possibly we need two different facilities. Facility #1 could be whole-database encryption: everything is encrypted with one key on a block level. And facility #2 could be per-table encryption: blocks for specific tables (and the related TOAST tables, indexes, and relation forks) are encrypted with specific keys and, in addition, the WAL records for those tables (and the related TOAST tables, indexes, and relation forks) are encrypted with the same key, but on a per-WAL-record level; the original WAL record would get "wrapped" by a new WAL record that just says "I am an encrypted WAL record, key ID %d, encrypted contents: %s" and you have to get the key to decrypt the contents and decrypt the real WAL record inside of it. Then you process that interior record as normal. I guess if you had both things, you'd want tables for which facility #2 was enabled to bypass facility #1, so that no relation data blocks were doubly-encrypted, to avoid the overhead. But a WAL record would be doubly-encrypted when both facilities are in use: the record would get encrypted with the per-table key, and then the blocks it got stored into would be encrypted with the cluster-wide key. > I wanted to say that if we encrypt whole database cluster by single > encryption key we would need to rebuilt the database cluster when > re-encrypt data. But if we encrypt data in tablespaces by per > tablespace encryption keys we can re-encrypt data by moving > tablespaces, without rebuilt it. Interesting. I suppose that would also be true of per-table keys. CREATE TABLE newthunk ENCRYPT WITH 'hoge' AS SELECT * FROM thunk; or something o
any suggestions to detect memory corruption
I can get the following log randomly and I am not which commit caused it. I spend one day but failed at last. 2019-05-08 21:37:46.692 CST [60110] WARNING: problem in alloc set index info: req size > alloc size for chunk 0x2a33a78 in block 0x2a33a18 2019-05-08 21:37:46.692 CST [60110] WARNING: idx: 2 problem in alloc set index info: bad single-chunk 0x2a33a78 in block 0x2a33a18, chsize: 1408, chunkLimit: 1024, chunkHeaderSize: 24, block_used: 768 request size: 2481 2019-05-08 21:37:46.692 CST [60110] WARNING: problem in alloc set index info: found inconsistent memory block 0x2a33a18 it looks like the memory which is managed by "index info" memory context is written by some other wrong codes. I didn't change any AllocSetXXX related code and I think I just use it wrong in some way. Thanks
Re: [HACKERS] Detrimental performance impact of ringbuffers on performance
On Tue, May 7, 2019 at 4:16 PM Andres Freund wrote: > Just to attach some numbers for this. On my laptop, with a pretty fast > disk (as in ~550MB/s read + write, limited by SATA, not the disk), I get > these results. > > [ results showing ring buffers massively hurting performance ] Links to some previous discussions: http://postgr.es/m/8737e9bddb82501da1134f021bf49...@postgrespro.ru http://postgr.es/m/CAMkU=1yV=zq8shviv5nwajv5wowovzb7bx45rgdvtxs4p6w...@mail.gmail.com > We probably can't remove the ringbuffer concept from these places, but I > think we should allow users to disable them. Forcing bulk-loads, vacuum, > analytics queries to go to the OS/disk, just because of a heuristic that > can't be disabled, yielding massive slowdowns, really sucks. The discussions to which I linked above seem to suggest that one of the big issues is that the ring buffer must be large enough that WAL flush for a buffer can complete before we go all the way around the ring and get back to the same buffer. It doesn't seem unlikely that the size necessary for that to be true has changed over the years, or even that it's different on different hardware. When I did some benchmarking in this area many years ago, I found that there as you increase the ring buffer size, performance improves for a while and then more or less levels off at a certain point. And at that point performance is not much worse than it would be with no ring buffer, but you maintain some protection against cache-trashing. Your scenario assumes that the system has no concurrent activity which will suffer as a result of blowing out the cache, but in general that's probably not true. It seems to me that it might be time to bite the bullet and add GUCs for the ring buffer sizes. Then, we could make the default sizes big enough that on normal-ish hardware the performance penalty is not too severe (like, it's measured as a percentage rather than a multiple), and we could make a 0 value disable the ring buffer altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: make \d pg_toast.foo show its indices
On Tue, May 7, 2019 at 6:03 PM Stephen Frost wrote: > Alright, maybe I'm not the best representation of our user base, but I > sure type 'select oid,* from pg_class where relname = ...' with some > regularity, mostly to get the oid to then go do something else. Having > the relfilenode would be nice too, now that I think about it, and > reltuples. There's ways to get *nearly* everything that's in pg_class > and friends out of various \d incantations, but not quite everything, > which seems unfortunate. > > In any case, I can understand an argument that the code it requires is > too much to maintain for a relatively minor feature (though it hardly > seems like it would be...) or that it would be confusing or unhelpful to > users (aka "noise") much of the time, so I'll leave it to others to > comment on if they think any of these ideas be a useful addition or not. > > I just don't think we should be voting down a feature because it'd take > a bit of extra effort to make our regression tests work with it, which > is all I was intending to get at here. I think it's unjustifiable to show this in \d output. But maybe in \d+ output it could be justified, or perhaps in the \d++ which I seem to recall Alvaro proposing someplace recently. I think if we're going to show it, it should be on its own line, with a clear label, not just in the table header as you proposed. Otherwise, people won't know what it is. I suppose the work we'd need to make it work with the regression tests is no worse than the hide_tableam crock which Andres recently added. That is certainly a crock, but I can testify that it's a very useful crock for zheap development. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: make \d pg_toast.foo show its indices
Robert Haas writes: > I think it's unjustifiable to show this in \d output. But maybe in > \d+ output it could be justified, or perhaps in the \d++ which I seem > to recall Alvaro proposing someplace recently. Yeah, if we're going to do that (show a table's toast table) I would want to bury it in \d++ or some other not-currently-used notation. regards, tom lane
Re: any suggestions to detect memory corruption
Alex writes: > I can get the following log randomly and I am not which commit caused it. > 2019-05-08 21:37:46.692 CST [60110] WARNING: problem in alloc set index > info: req size > alloc size for chunk 0x2a33a78 in block 0x2a33a18 I've had success in finding memory stomp causes fairly quickly by setting a hardware watchpoint in gdb on the affected location. Then you just let it run to see when the value changes, and check whether that's a "legit" or "not legit" modification point. The hard part of that, of course, is to know in advance where the affected location is. You may be able to make things sufficiently repeatable by doing the problem query in a fresh session each time. regards, tom lane
Re: Identity columns should own only one sequence
On Tue, 2019-05-07 at 13:06 +0900, Michael Paquier wrote: > On Fri, May 03, 2019 at 08:14:35AM +0200, Laurenz Albe wrote: > > On Thu, 2019-05-02 at 22:43 +0200, Peter Eisentraut wrote: > >> I think the proper way to address this would be to create some kind of > >> dependency between the sequence and the default. > > > > That is certainly true. But that's hard to retrofit into existing > > databases, > > so it would probably be a modification that is not backpatchable. > > And this is basically already the dependency which exists between the > sequence and the relation created with the serial column. So what's > the advantage of adding more dependencies if we already have what we > need? I still think that we should be more careful to drop the > dependency between the sequence and the relation's column if dropping > the default using it. If a DDL defines first a sequence, and then a > default expression using nextval() on a column, then no serial-related I believe we should have both: - Identity columns should only use sequences with an INTERNAL dependency, as in Peter's patch. - When a column default is dropped, remove all dependencies between the column and sequences. In the spirit of moving this along, I have attached a patch which is Peter's patch from above with a regression test. Yours, Laurenz Albe From 4280a5251320d2051ada7aa1888ba20a610efa94 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Wed, 8 May 2019 16:42:10 +0200 Subject: [PATCH] XXX Draft with regression tests XXX --- src/backend/catalog/pg_depend.c| 26 +++--- src/backend/commands/tablecmds.c | 2 +- src/backend/parser/parse_utilcmd.c | 12 +--- src/backend/rewrite/rewriteHandler.c | 2 +- src/include/catalog/dependency.h | 2 +- src/test/regress/expected/identity.out | 5 + src/test/regress/sql/identity.sql | 7 +++ 7 files changed, 39 insertions(+), 17 deletions(-) diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c index f7caedcc02..63c94cfbe6 100644 --- a/src/backend/catalog/pg_depend.c +++ b/src/backend/catalog/pg_depend.c @@ -707,8 +707,8 @@ sequenceIsOwned(Oid seqId, char deptype, Oid *tableId, int32 *colId) * Collect a list of OIDs of all sequences owned by the specified relation, * and column if specified. */ -List * -getOwnedSequences(Oid relid, AttrNumber attnum) +static List * +getOwnedSequences_internal(Oid relid, AttrNumber attnum, char deptype) { List *result = NIL; Relation depRel; @@ -750,7 +750,8 @@ getOwnedSequences(Oid relid, AttrNumber attnum) (deprec->deptype == DEPENDENCY_AUTO || deprec->deptype == DEPENDENCY_INTERNAL) && get_rel_relkind(deprec->objid) == RELKIND_SEQUENCE) { - result = lappend_oid(result, deprec->objid); + if (!deptype || deprec->deptype == deptype) +result = lappend_oid(result, deprec->objid); } } @@ -761,18 +762,29 @@ getOwnedSequences(Oid relid, AttrNumber attnum) return result; } +List * +getOwnedSequences(Oid relid, AttrNumber attnum) +{ + return getOwnedSequences_internal(relid, attnum, 0); +} + /* - * Get owned sequence, error if not exactly one. + * Get owned identity sequence, error if not exactly one. */ Oid -getOwnedSequence(Oid relid, AttrNumber attnum) +getIdentitySequence(Oid relid, AttrNumber attnum, bool missing_ok) { - List *seqlist = getOwnedSequences(relid, attnum); + List *seqlist = getOwnedSequences_internal(relid, attnum, DEPENDENCY_INTERNAL); if (list_length(seqlist) > 1) elog(ERROR, "more than one owned sequence found"); else if (list_length(seqlist) < 1) - elog(ERROR, "no owned sequence found"); + { + if (missing_ok) + return InvalidOid; + else + elog(ERROR, "no owned sequence found"); + } return linitial_oid(seqlist); } diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8e4743d110..8240fec578 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -6607,7 +6607,7 @@ ATExecDropIdentity(Relation rel, const char *colName, bool missing_ok, LOCKMODE table_close(attrelation, RowExclusiveLock); /* drop the internal sequence */ - seqid = getOwnedSequence(RelationGetRelid(rel), attnum); + seqid = getIdentitySequence(RelationGetRelid(rel), attnum, false); deleteDependencyRecordsForClass(RelationRelationId, seqid, RelationRelationId, DEPENDENCY_INTERNAL); CommandCounterIncrement(); diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index 4564c0ae81..a3c5b005d5 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -1083,7 +1083,7 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla * find sequence owned by old column; extract sequence parameters; * build new create sequence command */ - seq_relid = getOwnedSequence(RelationGetRelid(relation), attribute->attnum); + seq_relid = getIdentit
Re: accounting for memory used for BufFile during hash joins
On Tue, May 07, 2019 at 05:30:27PM -0700, Melanie Plageman wrote: On Mon, May 6, 2019 at 8:15 PM Tomas Vondra wrote: Nope, that's not how it works. It's the array of batches that gets sliced, not the batches themselves. It does slightly increase the amount of data we need to shuffle between the temp files, because we can't write the data directly to batches in "future" slices. But that amplification is capped to ~2.2x (compared to the ~1.4x in master) - I've shared some measurements in [1]. [1] https://www.postgresql.org/message-id/20190428141901.5dsbge2ka3rxmpk6%40development Cool, I misunderstood. I looked at the code again today, and, at the email thread where you measured "amplification". Oh! I hope you're not too disgusted by the code in that PoC patch ;-) In terms of how many times you write each tuple, is it accurate to say that a tuple can now be spilled three times (in the worst case) whereas, before, it could be spilled only twice? 1 - when building the inner side hashtable, tuple is spilled to a "slice" file 2 - (assuming the number of batches was increased) during execution, when a tuple belonging to a later slice's spill file is found, it is re-spilled to that slice's spill file 3 - during execution, when reading from its slice file, it is re-spilled (again) to its batch's spill file Yes, that's mostly accurate understanding. Essentially this might add one extra step of "reshuffling" from the per-slice to per-batch files. Is it correct that the max number of BufFile structs you will have is equal to the number of slices + number of batches in a slice because that is the max number of open BufFiles you would have at a time? Yes. With the caveat that we need twice that number of BufFile structs, because we need them on both sides of the join. By the way, applying v4 patch on master, in an assert build, I am tripping some asserts -- starting with Assert(!file->readOnly); in BufFileWrite Whps :-/ One thing I was a little confused by was the nbatch_inmemory member of the hashtable. The comment in ExecChooseHashTableSize says that it is determining the number of batches we can fit in memory. I thought that the problem was the amount of space taken up by the BufFile data structure itself--which is related to the number of open BufFiles you need at a time. This comment in ExecChooseHashTableSize makes it sound like you are talking about fitting more than one batch of tuples into memory at a time. I was under the impression that you could only fit one batch of tuples in memory at a time. I suppose you mean this chunk: /* * See how many batches we can fit into memory (driven mostly by size * of BufFile, with PGAlignedBlock being the largest part of that). * We need one BufFile for inner and outer side, so we count it twice * for each batch, and we stop once we exceed (work_mem/2). */ while ((nbatch_inmemory * 2) * sizeof(PGAlignedBlock) * 2 <= (work_mem * 1024L / 2)) nbatch_inmemory *= 2; Yeah, that comment is a bit confusing. What the code actually does is computing the largest "slice" of batches for which we can keep the BufFile structs in memory, without exceeding work_mem/2. Maybe the nbatch_inmemory should be renamed to nbatch_slice, not sure. So, I was stepping through the code with work_mem set to the lower bound, and in ExecHashIncreaseNumBatches, I got confused. hashtable->nbatch_inmemory was 2 for me, thus, nbatch_tmp was 2 so, I didn't meet this condition if (nbatch_tmp > hashtable->nbatch_inmemory) since I just set nbatch_tmp using hashtable->nbatch_inmemory So, I didn't increase the number of slices, which is what I was expecting. What happens when hashtable->nbatch_inmemory is equal to nbatch_tmp? Ah, good catch. The condition you're refering to if (nbatch_tmp > hashtable->nbatch_inmemory) should actually be if (nbatch > hashtable->nbatch_inmemory) because the point is to initialize BufFile structs for the overflow files, and we need to do that once we cross nbatch_inmemory. And it turns out this actually causes the assert failures in regression tests, you reported earlier. It failed to initialize the overflow files in some cases, so the readOnly flag seemed to be set. Attached is an updated patch, fixing this. I tried to clarify some of the comments too, and I fixed another bug I found while running the regression tests. It's still very much a crappy PoC code, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index a6c6de78f1..105989baee 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2689,6 +2689,8 @@ show_hash_info(HashState *hashstate, ExplainState *es)
Re: [HACKERS] Detrimental performance impact of ringbuffers on performance
On Wed, May 08, 2019 at 10:08:03AM -0400, Robert Haas wrote: On Tue, May 7, 2019 at 4:16 PM Andres Freund wrote: Just to attach some numbers for this. On my laptop, with a pretty fast disk (as in ~550MB/s read + write, limited by SATA, not the disk), I get these results. [ results showing ring buffers massively hurting performance ] Links to some previous discussions: http://postgr.es/m/8737e9bddb82501da1134f021bf49...@postgrespro.ru http://postgr.es/m/CAMkU=1yV=zq8shviv5nwajv5wowovzb7bx45rgdvtxs4p6w...@mail.gmail.com We probably can't remove the ringbuffer concept from these places, but I think we should allow users to disable them. Forcing bulk-loads, vacuum, analytics queries to go to the OS/disk, just because of a heuristic that can't be disabled, yielding massive slowdowns, really sucks. The discussions to which I linked above seem to suggest that one of the big issues is that the ring buffer must be large enough that WAL flush for a buffer can complete before we go all the way around the ring and get back to the same buffer. It doesn't seem unlikely that the size necessary for that to be true has changed over the years, or even that it's different on different hardware. When I did some benchmarking in this area many years ago, I found that there as you increase the ring buffer size, performance improves for a while and then more or less levels off at a certain point. And at that point performance is not much worse than it would be with no ring buffer, but you maintain some protection against cache-trashing. Your scenario assumes that the system has no concurrent activity which will suffer as a result of blowing out the cache, but in general that's probably not true. It seems to me that it might be time to bite the bullet and add GUCs for the ring buffer sizes. Then, we could make the default sizes big enough that on normal-ish hardware the performance penalty is not too severe (like, it's measured as a percentage rather than a multiple), and we could make a 0 value disable the ring buffer altogether. IMO adding such GUC would be useful for testing, which is something we should probably do anyway, and then based on the results we could either keep the GUC, modify the default somehow, or do nothing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Statistical aggregate functions are not working with PARTIAL aggregation
On Wed, May 08, 2019 at 01:09:23PM +0900, Kyotaro HORIGUCHI wrote: At Wed, 08 May 2019 13:06:36 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190508.130636.184826233.horiguchi.kyot...@lab.ntt.co.jp> At Tue, 07 May 2019 20:47:28 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190507.204728.233299873.horiguchi.kyot...@lab.ntt.co.jp> > Hello. > > At Tue, 7 May 2019 14:39:55 +0530, Rajkumar Raghuwanshi wrote in > > Hi, > > As this issue is reproducible without partition-wise aggregate also, > > changing email subject from "Statistical aggregate functions are not > > working with partitionwise aggregate " to "Statistical aggregate functions > > are not working with PARTIAL aggregation". > > > > original reported test case and discussion can be found at below link. > > https://www.postgresql.org/message-id/flat/CAKcux6%3DuZEyWyLw0N7HtR9OBc-sWEFeByEZC7t-KDf15FKxVew%40mail.gmail.com > > The immediate reason for the behavior seems that > EEOP_AGG_STRICT_INPUT_CHECK_ARGS considers non existent second > argument as null, which is out of arguments list in > trans_fcinfo->args[]. > > The invalid deserialfn_oid case in ExecBuildAggTrans, it > initializes args[1] using the second argument of the functoin > (int8pl() in the case) so the correct numTransInputs here is 1, > not 2. > > I don't understand this fully but at least the attached patch > makes the test case work correctly and this seems to be the only > case of this issue. This behavior is introduced by 69c3936a14 (in v11). At that time FunctionCallInfoData is pallioc0'ed and has fixed length members arg[6] and argnull[7]. So nulls[1] is always false even if nargs = 1 so the issue had not been revealed. After introducing a9c35cf85c (in v12) the same check is done on FunctionCallInfoData that has NullableDatum args[] of required number of elements. In that case args[1] is out of palloc'ed memory so this issue has been revealed. In a second look, I seems to me that the right thing to do here is setting numInputs instaed of numArguments to numTransInputs in combining step. By the way, as mentioned above, this issue exists since 11 but harms at 12. Is this an open item, or older bug? It is an open item - there's a section for older bugs, but considering it's harmless in 11 (at least that's my understanding from the above discussion) I've added it as a regular open item. I've linked it to a9c35cf85c, which seems to be the culprit commit. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Detrimental performance impact of ringbuffers on performance
> 8 мая 2019 г., в 1:16, Andres Freund написал(а): > > We probably can't remove the ringbuffer concept from these places, but I > think we should allow users to disable them. Forcing bulk-loads, vacuum, > analytics queries to go to the OS/disk, just because of a heuristic that > can't be disabled, yielding massive slowdowns, really sucks. If we will have scan-resistant shared buffers eviction strategy [0] - we will not need ring buffers unconditionally. Are there any other reasons to have these rings? Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/89A121E3-B593-4D65-98D9-BBC210B87268%40yandex-team.ru
Re: Statistical aggregate functions are not working with PARTIAL aggregation
Don't we have a build farm animal that runs under valgrind that would have caught this?
Re: Patch: doc for pg_logical_emit_message()
On Fri, Apr 26, 2019 at 10:52 AM Matsumura, Ryo wrote: > > On Wed. Apr. 24, 2019 at 11:40 PM Masao, Fujii > wrote: > > Thank you for the comment. > I understand about REPLICATION privilege and notice my unecessary words. > I update the patch. Thanks! Pushed. Regards, -- Fujii Masao
Re: vacuumdb and new VACUUM options
On Wed, May 8, 2019 at 9:32 AM Masahiko Sawada wrote: > > On Wed, May 8, 2019 at 2:41 AM Fujii Masao wrote: > > > > Hi, > > > > vacuumdb command supports the corresponding options to > > any VACUUM parameters except INDEX_CLEANUP and TRUNCATE > > that were added recently. Should vacuumdb also support those > > new parameters, i.e., add --index-cleanup and --truncate options > > to the command? > > I think it's a good idea to add new options of these parameters for > vacuumdb. While making INDEX_CLEANUP option patch I also attached the > patch for INDEX_CLEANUP parameter before[1], although it adds > --disable-index-cleanup option instead. Regarding INDEX_CLEANUP, now VACUUM has three modes; (1) VACUUM (INDEX_CLEANUP on) does index cleanup whatever vacuum_index_cleanup reloption is. (2) VACUUM (INDEX_CLEANUP off) does not do index cleanup whatever vacuum_index_cleanup reloption is. (3) plain VACUUM decides whether to do index cleanup according to vacuum_index_cleanup reloption. If no option for index cleanup is specified, vacuumdb command should work in the mode (3). IMO this is intuitive. The question is; we should support vacuumdb option for (1), i.e.,, something like --index-cleanup option is added? Or for (2), i.e., something like --disable-index-cleanup option is added as your patch does? Or for both? Regards, -- Fujii Masao
Re: any suggestions to detect memory corruption
On Wed, May 8, 2019 at 10:34 AM Tom Lane wrote: > Alex writes: > > I can get the following log randomly and I am not which commit caused it. > > > 2019-05-08 21:37:46.692 CST [60110] WARNING: problem in alloc set index > > info: req size > alloc size for chunk 0x2a33a78 in block 0x2a33a18 > > I've had success in finding memory stomp causes fairly quickly by setting > a hardware watchpoint in gdb on the affected location. Then you just let > it run to see when the value changes, and check whether that's a "legit" > or "not legit" modification point. > > The hard part of that, of course, is to know in advance where the affected > location is. You may be able to make things sufficiently repeatable by > doing the problem query in a fresh session each time. valgrind might also be a possibility, although that has a lot of overhead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Detrimental performance impact of ringbuffers on performance
Hi, On 2019-05-08 10:08:03 -0400, Robert Haas wrote: > On Tue, May 7, 2019 at 4:16 PM Andres Freund wrote: > > Just to attach some numbers for this. On my laptop, with a pretty fast > > disk (as in ~550MB/s read + write, limited by SATA, not the disk), I get > > these results. > > > > [ results showing ring buffers massively hurting performance ] > > Links to some previous discussions: > > http://postgr.es/m/8737e9bddb82501da1134f021bf49...@postgrespro.ru > http://postgr.es/m/CAMkU=1yV=zq8shviv5nwajv5wowovzb7bx45rgdvtxs4p6w...@mail.gmail.com > > > We probably can't remove the ringbuffer concept from these places, but I > > think we should allow users to disable them. Forcing bulk-loads, vacuum, > > analytics queries to go to the OS/disk, just because of a heuristic that > > can't be disabled, yielding massive slowdowns, really sucks. > > The discussions to which I linked above seem to suggest that one of > the big issues is that the ring buffer must be large enough that WAL > flush for a buffer can complete before we go all the way around the > ring and get back to the same buffer. That is some of the problem, true. But even on unlogged tables the ringbuffers cause quite the massive performance deterioration. Without the ringbuffers we write twice the size of the releation (once with zeroes for the file extension, once with with the actual data). With the ringbuffer we do so two or three additional times (hint bits + normal vacuum, then freezing). On a test-cluster that replaced the smgrextend() for heap with posix_fallocate() (to avoid the unnecesary write), I measured the performance of CTAS UNLOGGED SELECT * FROM pgbench_accounts_scale_1000 with and without ringbuffers: With ringbuffers: CREATE UNLOGGED TABLE AS: Time: 67808.643 ms (01:07.809) VACUUM: Time: 53020.848 ms (00:53.021) VACUUM FREEZE: Time: 55809.247 ms (00:55.809) Without ringbuffers: CREATE UNLOGGED TABLE AS: Time: 45981.237 ms (00:45.981) VACUUM: Time: 23386.818 ms (00:23.387) VACUUM FREEZE: Time: 5892.204 ms (00:05.892) > It doesn't seem unlikely that > the size necessary for that to be true has changed over the years, or > even that it's different on different hardware. When I did some > benchmarking in this area many years ago, I found that there as you > increase the ring buffer size, performance improves for a while and > then more or less levels off at a certain point. And at that point > performance is not much worse than it would be with no ring buffer, > but you maintain some protection against cache-trashing. Your > scenario assumes that the system has no concurrent activity which will > suffer as a result of blowing out the cache, but in general that's > probably not true. Well, I noted that I'm not proposing to actually just rip out the ringbuffers. But I also don't think it's just a question of concurrent activity. It's a question of having concurrent activity *and* workloads that are smaller than shared buffers. Given current memory sizes a *lot* of workloads fit entirely in shared buffers - but for vacuum, seqscans (including copy), it's basically impossible to ever take advantage of that memory, unless your workload otherwise forces it into s_b entirely (or you manually load the data into s_b). > It seems to me that it might be time to bite the bullet and add GUCs > for the ring buffer sizes. Then, we could make the default sizes big > enough that on normal-ish hardware the performance penalty is not too > severe (like, it's measured as a percentage rather than a multiple), > and we could make a 0 value disable the ring buffer altogether. Yea, it'd be considerably better than today. It'd importantly allow us to more easily benchmark a lot of this. Think it might make sense to have a VACUUM option for disabling the ringbuffer too, especially for cases where VACUUMING is urgent. I think what we ought to do to fix this issue in a bit more principled manner (afterwards) is: 1) For ringbuffer'ed scans, if there are unused buffers, use them, instead of recycling a buffer from the ring. If so, replace the previous member of the ring with the previously unused one. When doing so, just reduce the usagecount by one (unless already zero), so it readily can be replaced. I think we should do so even when the to-be-replaced ringbuffer entry is currently dirty. But even if we couldn't agree on that, it'd already be a significant improvement if we only did this for clean buffers. That'd fix a good chunk of the "my shared buffers is never actually used" type issues. I personally think it's indefensible that we don't do that today. 2) When a valid buffer in the ringbuffer is dirty when about to be replaced, instead of doing the FlushBuffer ourselves (and thus waiting for an XLogFlush in many cases), put into a separate ringbuffer/qeueue that's processed by bgwriter. And have that then invalidate the buffer and put it on the freelist (unless usagecount was bumped since,
Re: [HACKERS] Detrimental performance impact of ringbuffers on performance
Hi, On 2019-05-08 21:35:06 +0500, Andrey Borodin wrote: > > 8 мая 2019 г., в 1:16, Andres Freund написал(а): > > > > We probably can't remove the ringbuffer concept from these places, but I > > think we should allow users to disable them. Forcing bulk-loads, vacuum, > > analytics queries to go to the OS/disk, just because of a heuristic that > > can't be disabled, yielding massive slowdowns, really sucks. > > If we will have scan-resistant shared buffers eviction strategy [0] - > we will not need ring buffers unconditionally. For me that's a fairly big if, fwiw. But it'd be cool. > Are there any other reasons to have these rings? Currently they also limit the amount of dirty data added to the system. I don't think that's a generally good property (e.g. because it'll cause a lot of writes that'll again happen later), but e.g. for initial data loads with COPY FREEZE it's helpful. It slows down the backend(s) causing the work (i.e. doing COPY), rather than other backends (e.g. because they need to evict the buffers, therefore first needing to clean them). Greetings, Andres Freund
Unexpected "shared memory block is still in use"
Just now, while running a parallel check-world on HEAD according to the same script I've been using for quite some time, one of the TAP tests died during initdb: selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default timezone ... America/New_York creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT [18351] FATAL: pre-existing shared memory block (key 5440004, ID 1734475802) is still in use 2019-05-08 13:59:19.963 EDT [18351] HINT: Terminate any old server processes associated with data directory "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata". child process exited with exit code 1 initdb: removing data directory "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata" Bail out! system initdb failed I have never seen this happen before in the TAP tests. I think the odds are very high that this implies something wrong with commit c09850992. My immediate guess after eyeballing that patch quickly is that it was not a good idea to redefine the rules used by bootstrap/standalone backends. In particular, it seems somewhat plausible that the bootstrap process hadn't yet completely died when the standalone backend for the post-bootstrap phase came along and decided there was a conflict (which it never would have before). regards, tom lane
Re: Statistical aggregate functions are not working with PARTIAL aggregation
On 5/8/19 12:41 PM, Greg Stark wrote: > Don't we have a build farm animal that runs under valgrind that would > have caught this? > > There are two animals running under valgrind: lousyjack and skink. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Cleanup/remove/update references to OID column
I found what appears to be a dangling reference to old "hidden" OID behavior. Justin >From 1c6712c0ade949648dbc415dfd7ea80312360ef7 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 8 May 2019 13:57:12 -0500 Subject: [PATCH v1] Cleanup/remove/update references to OID column... ..in wake of 578b229718e8f. See also 93507e67c9ca54026019ebec3026de35d30370f9 1464755fc490a9911214817fe83077a3689250ab f6b39171f3d65155b9390c2c69bc5b3469f923a8 Author: Justin Pryzby --- doc/src/sgml/catalogs.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index d544e60..0f9c6f2 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -610,7 +610,7 @@ oid oid - Row identifier (hidden attribute; must be explicitly selected) + Row identifier -- 2.7.4
Re: New EXPLAIN option: ALL
On Tue, May 7, 2019 at 6:25 PM Tom Lane wrote: > Meh --- I don't especially care for non-orthogonal behaviors like that. > If you wanted JSON but *not* all of the additional info, how would you > specify that? (The implementation I had in mind would make VERBOSE OFF > more or less a no-op, so that wouldn't get you there.) +1. Assuming we know which information the user wants on the basis of their choice of output format seems like a bad idea. I mean, suppose we introduced a new option that gathered lots of additional detail but made the query run 3x slower. Would everyone want that enabled all the time any time they chose a non-text format? Probably not. If people want BUFFERS turned on essentially all the time, then let's just flip the default for that, so that EXPLAIN ANALYZE does the equivalent of what EXPLAIN (ANALYZE, BUFFERS) currently does, and make people say EXPLAIN (ANALYZE, BUFFERS OFF) if they don't want all that detail. I think that's more or less what Andres was suggesting upthread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: New EXPLAIN option: ALL
On Tue, May 7, 2019 at 12:31 PM David Fetter wrote: > If you're tuning a query interactively, it's a lot simpler to prepend, > for example, > > EXPLAIN (ALL, FORMAT JSON) > > to it than to prepend something along the lines of > > EXPLAIN(ANALYZE, VERBOSE, COSTS, BUFFERS, SETTINGS, TIMING, SUMMARY, > PARTRIDGE_IN_A_PEAR_TREE, FORMAT JSON) > > to it. This is something of an exaggeration of what could ever be necessary, because COSTS and TIMING default to TRUE and SUMMARY defaults to TRUE when ANALYZE is specified, and the PARTRIDGE_IN_A_PEAR_TREE option seems not to have made it into the tree this cycle. But you could need EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SETTINGS, FORMAT JSON), which is not quite so long, but admittedly still somewhat long. Flipping some of the defaults seems like it might be the way to go. I think turning SETTINGS and BUFFERS on by default would be pretty sensible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pg12 release notes
I noticed you added release notes at bdf595adbca195fa54a909c74a5233ebc30641a1, thanks for writing them. I reviewed notes; find proposed changes attached+included. I think these should also be mentioned? f7cb284 Add plan_cache_mode setting a6da004 Add index_get_partition convenience function 387a5cf Add pg_dump --on-conflict-do-nothing option. 17f206f Set pg_class.relhassubclass for partitioned indexes diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml index 88bdcbd..ab4d1b3 100644 --- a/doc/src/sgml/release-12.sgml +++ b/doc/src/sgml/release-12.sgml @@ -61,8 +61,8 @@ Remove the special behavior of OID columns (Andres Freund, John Naylor) Previously, a normally-invisible OID column could be specified during table creation using WITH OIDS; that ability has been removed. Columns can still be explicitly -specified as type OID. pg_dump and pg_upgrade operations on databases using WITH OIDS will need adjustment. Many system tables now have an 'oid' column that will be -expanded with SELECT * by default. +specified as type OID. pg_dump and pg_upgrade operations on databases using WITH OIDS will need adjustment. The 'oid' column of many system tables will be +shown by default with SELECT *. @@ -115,7 +115,7 @@ Do not allow multiple different recovery_target* specifications (Peter Eisentrau -Previously multiple different recovery_target* variables could be specified, and last one specified was honored. Now, only one can be specified, though the same one can +Previously multiple different recovery_target* variables could be specified, and the last one specified was honored. Now, only one can be specified, though the same one can be specified multiple times and the last specification is honored. @@ -405,7 +405,7 @@ Author: Robert Haas --> -Allow ATTACH PARTITION to be performed with reduced locking requirements (Robert Haas) +ATTACH PARTITION is performed with lower locking requirement (Robert Haas) @@ -617,7 +617,7 @@ Have new btree indexes sort duplicate index entries in heap-storage order (Peter -Btree indexes pg_upgraded from previous releases will not have this ordering. This does slightly reduce the maximum length of indexed values. +Btree indexes pg_upgraded from previous releases will not have this ordering. This slightly reduces the maximum permitted length of indexed values. @@ -676,7 +676,7 @@ Allow CREATE STATISTICS to create most-common-value statistics for multiple colu -This improve optimization for columns with non-uniform distributions that often appear in WHERE clauses. +This improves query plans for columns with non-uniform distributions that often appear in WHERE clauses. @@ -954,21 +954,6 @@ This dramatically speeds up processing of floating-point values. Users who wish - - -Avoid creation of the free space map files for small table (John Naylor, Amit Kapila) - - - -Such files are not useful. - - - - - -Allow more comparisons with information_schema text columns use indexes (Tom Lane) +Allow more comparisons with information_schema text columns to use indexes (Tom Lane) @@ -1310,7 +1295,7 @@ Author: Thomas Munro --> -Allow discovery of the LDAP server using DNS (Thomas Munro) +Allow discovery of the LDAP server using DNS SRV records (Thomas Munro) @@ -1446,7 +1431,7 @@ Add wal_recycle and wal_init_zero server variables to avoid WAL file recycling ( -This can be beneficial on copy-on-write file systems like ZFS. +This can be beneficial on copy-on-write filesystems like ZFS. @@ -1502,7 +1487,7 @@ Add server variable to control the type of shared memory to use (Andres Freund) -The variable is shared_memory_type. It purpose is to allow selection of System V shared memory, if desired. +The variable is shared_memory_type. Its purpose is to allow selection of System V shared memory, if desired. >From a2d7fccc8cd8035898a4459e4b7176ef1168f713 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 8 May 2019 15:22:36 -0500 Subject: [PATCH v1] review pg12 release notes --- doc/src/sgml/release-12.sgml | 37 +++-- 1 file changed, 11 insertions(+), 26 deletions(-) diff --git a/doc/src/sgml/release-12.sgml b/doc/src/sgml/release-12.sgml index 88bdcbd..ab4d1b3 100644 --- a/doc/src/sgml/release-12.sgml +++ b/doc/src/sgml/release-12.sgml @@ -61,8 +61,8 @@ Remove the special behavior of OID columns (Andres Freund, John Naylor) Previously, a normally-invisible OID column could be specified during table creation using WITH OIDS; that ability has been removed. Columns can still be explicitly -specified as type OID. pg_dump and pg_upgrade operations on databases using WITH OIDS will need adjustment. Many system tables now have an 'oid' column that will be -expanded with SELECT * by default. +specified as type OID. pg_dump and pg_upgrade operations on databases using WITH OIDS will need adjustment. The
Re: pg12 release notes
On 2019-May-08, Justin Pryzby wrote: > I noticed you added release notes at bdf595adbca195fa54a909c74a5233ebc30641a1, > thanks for writing them. > > I reviewed notes; find proposed changes attached+included. > > I think these should also be mentioned? > > a6da004 Add index_get_partition convenience function I don't disagree with the other three you suggest, but this one is a C function and I think it's below the level of usefulness that we publish in relnotes. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Patch to document base64 encoding
Er, ping. Nobody has reviewed the latest patchs. They still apply to master... I am re-attaching the patches. See descriptions below. On Mon, 11 Mar 2019 15:32:14 -0500 "Karl O. Pinc" wrote: > On Sun, 10 Mar 2019 08:15:35 +0100 (CET) > Fabien COELHO What's causing problems here is that the encode and decode > functions are listed in both the string functions section > and the binary functions section. A related but not-relevant > problem is that there are functions listed in the string > function section which take binary input. > > I asked about this on IRC and the brief reply was > unflattering to the existing documentation. > > So I'm going to fix this also. 3 patches attached: > > doc_base64_part1_v9.patch > > This moves functions taking bytea and other non-string > input into the binary string section, and vice versa. > Eliminates duplicate encode() and decode() documentation. > > Affects: convert(bytea, name, name) >convert_from(bytea, name) >encode(bytea, text) >length(bytea, name) >quote_nullable(anytype) >to_hex(int or bigint) >decode(text, text) > > Only moves, eliminates duplicates, and adjusts indentation. > > > doc_base64_part2_v9.patch > > Cleanup wording after moving functions between sections. > > > doc_base64_part3_v9.patch > > Documents base64, hex, and escape encode() and decode() > formats. > > > >> "The string and the binary encode and decode functions..." > > >> sentence looks strange to me, especially with the English > > >> article that I do not really master, so maybe it is ok. I'd have > > >> written something more straightforward, eg: "Functions encode > > >> and decode support the following encodings:", > > > > > > It is an atypical construction because I want to draw attention > > > that this is documentation not only for the encode() and decode() > > > in section 9.4. String Functions and Operators but also for the > > > encode() and decode in section 9.5. Binary String Functions and > > > Operators. Although I can't think of a better approach it makes me > > > uncomfortable that documentation written in one section applies > > > equally to functions in a different section. > > > > People coming from the binary doc would have no reason to look at > > the string paragraph anyway. > > > > > Do you think it would be useful to hyperlink the word "binary" > > > to section 9.5? > > > > Hmmm... I think that the link is needed in the other direction. > > I'm not sure what you mean here or if it's still relevant. > > > I'd suggest (1) to use a simpler and direct sentence in the string > > section, (2) to simplify/shorten the in cell description in the > > binary section, and (3) to add an hyperlink from the binary section > > which would point to the expanded explanation in the string section. > > > > > The idiomatic phrasing would be "Both the string and the binary > > > encode and decode functions..." but the word "both" adds > > > no information. Shorter is better. > > > > Possibly, although "Both" would insist on the fact that it applies > > to the two variants, which was your intention. > > I think this is no longer relevant. Although I'm not sure what > you mean by 3. The format names already hyperlink back to the > string docs. > > > >> and also I'd use a direct "Function > > >> <...>decode ..." rather than "The > > >> decode function ..." (twice). > > > > > > The straightforward English would be "Decode accepts...". The > > > problem is that this begins the sentence with the name of a > > > function. This does not work very well when the function name is > > > all lower case, and can have other problems where clarity is lost > > > depending on documentation output formatting. > > > > Yep. > > > > > I don't see a better approach. > > > > I suggested "Function <>decode ...", which is the kind of thing > > we do in academic writing to improve precision, because I thought it > > could be better:-) > > "Function <>decode ..." just does not work in English. > > > >> Maybe I'd use the exact same grammatical structure for all 3 > > >> cases, starting with "The <>whatever encoding converts bla > > >> bla bla" instead of varying the sentences. > > > > > > Agreed. Good idea. The first paragraph of each term has to > > > do with encoding and the second with decoding. > > > > > > > Uniformity in starting the second paragraphs helps make > > > this clear, even though the first paragraphs are not uniform. > > > With this I am not concerned that the first paragraphs > > > do not have a common phrasing that's very explicit about > > > being about encoding. > > > > > > Adjusted. > > > > Cannot see it fully in the v8 patch: > > > > - The base64 encoding is > > - hex represents > > - escape converts > > I did only the decode paras. I guess no reason not to make > the first paras uniform as well. Done
Re: vacuumdb and new VACUUM options
Em qua, 8 de mai de 2019 às 14:19, Fujii Masao escreveu: > > The question is; we should support vacuumdb option for (1), i.e.,, > something like --index-cleanup option is added? > Or for (2), i.e., something like --disable-index-cleanup option is added > as your patch does? Or for both? > --index-cleanup=BOOL -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: New EXPLAIN option: ALL
On 07/05/2019 09:30, David Fetter wrote: > Folks, > > It can get a little tedious turning on (or off) all the boolean > options to EXPLAIN, so please find attached a shortcut. I would rather have a set of gucs such as default_explain_buffers, default_explain_summary, and default_explain_format. Of course if you default BUFFERS to on(*) and don't do ANALYZE, that should not result in an error. (*) Defaulting BUFFERS to on is something I want regardless of anything else we do. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: Pluggable Storage - Andres's take
Hi, On 2019-04-29 16:17:41 -0700, Ashwin Agrawal wrote: > On Thu, Apr 25, 2019 at 3:43 PM Andres Freund wrote: > > Hm. I think some of those changes would be a bit bigger than I initially > > though. Attached is a more minimal fix that'd route > > RelationGetNumberOfBlocksForFork() through tableam if necessary. I > > think it's definitely the right answer for 1), probably the pragmatic > > answer to 2), but certainly not for 3). > > > > I've for now made the AM return the size in bytes, and then convert that > > into blocks in RelationGetNumberOfBlocksForFork(). Most postgres callers > > are going to continue to want it internally as pages (otherwise there's > > going to be way too much churn, without a benefit I can see). So I > > think that's OK. > > I will provide my inputs, Heikki please correct me or add your inputs. > > I am not sure how much gain this practically provides, if rest of the > system continues to use the value returned in-terms of blocks. I > understand things being block based (and not just really block based > but all the blocks of relation are storing data and full tuple) is > engraved in the system. So, breaking out of it is yes much larger > change and not just limited to table AM API. I don't think it's that ingrained in all that many parts of the system. Outside of the places I listed upthread, and the one index case that stashes extra info, which places are that "block based"? > I feel most of the issues discussed here should be faced by zheap as > well, as not all blocks/pages contain data like TPD pages should be > excluded from sampling and TID scans, etc... It's not a problem so far, and zheap works on tableam. You can just skip such blocks during sampling / analyze, and return nothing for tidscans. > > > 2) commands/analyze.c, computing pg_class.relpages > > > > > >This should imo be moved to the tableam callback. It's currently done > > >a bit weirdly imo, with fdws computing relpages the callback, but > > >then also returning the acquirefunc. Seems like it should entirely be > > >computed as part of calling acquirefunc. > > > > Here I'm not sure routing RelationGetNumberOfBlocksForFork() through > > tableam wouldn't be the right minimal approach too. It has the > > disadvantage of implying certain values for the > > RelationGetNumberOfBlocksForFork(MAIN) return value. The alternative > > would be to return the desire sampling range in > > table_beginscan_analyze() - but that'd require some duplication because > > currently that just uses the generic scan_begin() callback. > > Yes, just routing relation size via AM layer and using its returned > value in terms of blocks still and performing sampling based on blocks > based on it, doesn't feel resolves the issue. Maybe need to delegate > sampling completely to AM layer. Code duplication can be avoided by > similar AMs (heap and zheap) possible using some common utility > functions to achieve intended result. I don't know what this is actually proposing. > > I suspect - as previously mentioned- that we're going to have to extend > > statistics collection beyond the current approach at some point, but I > > don't think that's now. At least to me it's not clear how to best > > represent the stats, and how to best use them, if the underlying storage > > is fundamentally not block best. Nor how we'd avoid code duplication... > > Yes, will have to give more thoughts into this. > > > > > > 3) nodeTidscan, skipping over too large tids > > >I think this should just be moved into the AMs, there's no need to > > >have this in nodeTidscan.c > > > > I think here it's *not* actually correct at all to use the relation > > size. It's currently doing: > > > > /* > > * We silently discard any TIDs that are out of range at the time > > of scan > > * start. (Since we hold at least AccessShareLock on the table, it > > won't > > * be possible for someone to truncate away the blocks we intend to > > * visit.) > > */ > > nblocks = > > RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation); > > > > which is fine (except for a certain abstraction leakage) for an AM like > > heap or zheap, but I suspect strongly that that's not ok for Ashwin & > > Heikki's approach where tid isn't tied to physical representation. > > Agree, its not nice to have that optimization being performed based on > number of block in generic layer. I feel its not efficient either for > zheap too due to TPD pages as mentioned above, as number of blocks > returned will be higher compared to actually data blocks. I don't think there's a problem for zheap. The blocks are just interspersed. Having pondered this a lot more, I think this is the way to go for 12. Then we can improve this for v13, to be nice. > > The proper fix seems to be to introduce a new scan variant > > (e.g. table_beginscan_tid()), and then have table_fetch_row_version take > > a scan as a paramete
Re: Pluggable Storage - Andres's take
Hi, On 2019-05-07 23:18:39 -0700, Ashwin Agrawal wrote: > On Mon, May 6, 2019 at 1:39 PM Ashwin Agrawal wrote: > > Also wish to point out, while working on Zedstore, we realized that > > TupleDesc from Relation object can be trusted at AM layer for > > scan_begin() API. As for ALTER TABLE rewrite case (ATRewriteTables()), > > catalog is updated first and hence the relation object passed to AM > > layer reflects new TupleDesc. For heapam its fine as it doesn't use > > the TupleDesc today during scans in AM layer for scan_getnextslot(). > > Only TupleDesc which can trusted and matches the on-disk layout of the > > tuple for scans hence is from TupleTableSlot. Which is little > > unfortunate as TupleTableSlot is only available in scan_getnextslot(), > > and not in scan_begin(). Means if AM wishes to do some initialization > > based on TupleDesc for scans can't be done in scan_begin() and forced > > to delay till has access to TupleTableSlot. We should at least add > > comment for scan_begin() to strongly clarify not to trust Relation > > object TupleDesc. Or maybe other alternative would be have separate > > API for rewrite case. > > Just to correct my typo, I wish to say, TupleDesc from Relation object can't > be trusted at AM layer for scan_begin() API. > > Andres, any thoughts on above. I see you had proposed "change the > table_beginscan* API so it > provides a slot" in [1], but seems received no response/comments that time. > [1] > https://www.postgresql.org/message-id/20181211021340.mqaown4njtcgrjvr%40alap3.anarazel.de I don't think passing a slot at beginscan time is a good idea. There's several places that want to use different slots for the same scan, and we probably want to increase that over time (e.g. for batching), not decrease it. What kind of initialization do you want to do based on the tuple desc at beginscan time? Greetings, Andres Freund
Re: Inconsistency between table am callback and table function names
Hi, On 2019-05-08 00:32:22 -0700, Ashwin Agrawal wrote: > The general theme for table function names seem to be > "table_". For example table_scan_getnextslot() and its > corresponding callback scan_getnextslot(). Most of the table functions and > callbacks follow mentioned convention except following ones > > table_beginscan > table_endscan > table_rescan > table_fetch_row_version > table_get_latest_tid > table_insert > table_insert_speculative > table_complete_speculative > table_delete > table_update > table_lock_tuple > > the corresponding callback names for them are > > scan_begin > scan_end > scan_rescan The mismatch here is just due of backward compat with the existing function names. > tuple_fetch_row_version > tuple_get_latest_tid Hm, I'd not object to adding a tuple_ to the wrapper. > tuple_insert > tuple_insert_speculative > tuple_delete > tuple_update > tuple_lock That again is to keep the naming similar to the existing functions. > Also, some of these table function names read little odd > > table_relation_set_new_filenode > table_relation_nontransactional_truncate > table_relation_copy_data > table_relation_copy_for_cluster > table_relation_vacuum > table_relation_estimate_size > > Can we drop relation word from callback names and as a result from these > function names as well? Just have callback names as set_new_filenode, > copy_data, estimate_size? I'm strongly against that. These all work on a full relation size, rather than on individual tuples, and that seems worth pointing out. > Also, a question about comments. Currently, redundant comments are written > above callback functions as also above table functions. They differ > sometimes a little bit on descriptions but majority content still being the > same. Should we have only one place finalized to have the comments to keep > them in sync and also know which one to refer to? Note that the non-differing comments usually just refer to the other place. And there's legitimate differences in most of the ones that are both at the callback and the external functions - since the audience of both are difference, that IMO makes sense. > Plus, file name amapi.h now seems very broad, if possible should be renamed > to indexamapi.h or indexam.h to follow tableam.h. No idea what's our policy > around header file renames. We probably should rename it, but not in 12... Greetings, Andres Freund
Problems with pg_upgrade and extensions referencing catalog tables/views
pgTap has a view that references pg_proc; to support introspection of functions and aggregates. That view references proisagg in versions < 11, and prokind in 11+. pgtap's make process understands how to handle this; modifying the creation scripts as necessary. It actually has to do this for several functions as well. The problem is that pg_dump --binary-upgrade intentionally does not simply issue a `CREATE EXTENSION` command the way a normal dump does, so that it can control the OIDs that are assigned to objects[1]. That means that attempting to pg_upgrade a database with pgtap installed to version 11+ fails trying to create the view that references pg_proc.proisagg[2]. For pgtap, we should be able to work around this by removing the offending column from the view and embedding the knowledge in a function. This would be more difficult in other types of extensions though, especially any that are attempting to provide more user-friendly views of catalog tables. I don’t recall why pg_upgrade wants to control OIDs… don’t we re-create all catalog entries for user objects from scratch? 1: https://www.postgresql.org/message-id/AANLkTimm1c64=xkdpz5ji7q-rh69zd3cmewmrpkh0...@mail.gmail.com 2: https://github.com/theory/pgtap/issues/201
Re: New EXPLAIN option: ALL
> On May 8, 2019, at 4:22 PM, Vik Fearing wrote: > > On 07/05/2019 09:30, David Fetter wrote: >> Folks, >> >> It can get a little tedious turning on (or off) all the boolean >> options to EXPLAIN, so please find attached a shortcut. > > I would rather have a set of gucs such as default_explain_buffers, > default_explain_summary, and default_explain_format. > > Of course if you default BUFFERS to on(*) and don't do ANALYZE, that > should not result in an error. > > (*) Defaulting BUFFERS to on is something I want regardless of anything > else we do. I think this, plus Tom’s suggesting of changing what VERBOSE does, is the best way to handle this. Especially since VERBOSE is IMHO pretty useless... I’m +1 on trying to move away from ANALYZE as well, though I think it’s mostly orthogonal...
Re: _bt_split(), and the risk of OOM before its critical section
On Tue, May 7, 2019 at 6:15 PM Peter Geoghegan wrote: > I suppose I'm biased, but I prefer the new approach anyway. Adding the > left high key first, and then the right high key seems simpler and > more logical. It emphasizes the similarities and differences between > leftpage and rightpage. I came up with a better way of doing it in the attached revision. Now, _bt_split() calls _bt_findsplitloc() directly. This makes it possible to significantly simplify the signature of _bt_split(). It makes perfect sense for _bt_split() to call _bt_findsplitloc() directly, since _bt_findsplitloc() is already aware of almost every _bt_split() implementation detail, whereas those same details are not of interest anywhere else. _bt_findsplitloc() also knows all about suffix truncation. It's also nice that the actual _bt_truncate() call is closely tied to the _bt_findsplitloc() call. -- Peter Geoghegan v2-0001-Don-t-leave-behind-junk-nbtree-pages-during-split.patch Description: Binary data
Re: Problems with pg_upgrade and extensions referencing catalog tables/views
"Nasby, Jim" writes: > The problem is that pg_dump --binary-upgrade intentionally does not > simply issue a `CREATE EXTENSION` command the way a normal dump does, so > that it can control the OIDs that are assigned to objects[1]. That's not the only reason. The original concerns were about not breaking the extension, in case the destination server had a different version of the extension available. CREATE EXTENSION doesn't normally guarantee that you get an exactly compatible extension version, which is a good thing for regular pg_dump and restore but a bad thing for binary upgrade. I'm not really sure how to improve the situation you describe, but "issue CREATE EXTENSION and pray" doesn't sound like a solution. regards, tom lane
Re: Inconsistency between table am callback and table function names
On Wed, May 8, 2019 at 2:51 PM Andres Freund wrote: > > Hi, > > On 2019-05-08 00:32:22 -0700, Ashwin Agrawal wrote: > > The general theme for table function names seem to be > > "table_". For example table_scan_getnextslot() and its > > corresponding callback scan_getnextslot(). Most of the table functions and > > callbacks follow mentioned convention except following ones > > > > table_beginscan > > table_endscan > > table_rescan > > table_fetch_row_version > > table_get_latest_tid > > table_insert > > table_insert_speculative > > table_complete_speculative > > table_delete > > table_update > > table_lock_tuple > > > > the corresponding callback names for them are > > > > scan_begin > > scan_end > > scan_rescan > > The mismatch here is just due of backward compat with the existing > function names. I am missing something here, would like to know more. table_ seem all new fresh naming. Hence IMO having consistency with surrounding and related code carries more weight as I don't know backward compat serving what purpose. Heap function names can continue to call with same old names for backward compat if required. > > Also, a question about comments. Currently, redundant comments are written > > above callback functions as also above table functions. They differ > > sometimes a little bit on descriptions but majority content still being the > > same. Should we have only one place finalized to have the comments to keep > > them in sync and also know which one to refer to? > > Note that the non-differing comments usually just refer to the other > place. And there's legitimate differences in most of the ones that are > both at the callback and the external functions - since the audience of > both are difference, that IMO makes sense. > Not having consistency is the main aspect I wish to bring to attention. Like for some callback functions the comment is /* see table_insert() for reference about parameters */ void(*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid, int options, struct BulkInsertStateData *bistate); /* see table_insert_speculative() for reference about parameters */ void(*tuple_insert_speculative) (Relation rel, TupleTableSlot *slot, CommandId cid, int options, struct BulkInsertStateData *bistate, uint32 specToken); Whereas for some other callbacks the parameter explanation exist in both the places. Seems we should be consistent. I feel in long run becomes pain to keep them in sync as comments evolve. Like for example /* * Estimate the size of shared memory needed for a parallel scan of this * relation. The snapshot does not need to be accounted for. */ Size(*parallelscan_estimate) (Relation rel); parallescan_estimate is not having snapshot argument passed to it, but table_parallescan_estimate does. So, this way chances are high they going out of sync and missing to modify in both the places. Agree though on audience being different for both. So, seems going with the refer XXX for parameters seems fine approach for all the callbacks and then only specific things to flag out for the AM layer to be aware can live above the callback function. > > Plus, file name amapi.h now seems very broad, if possible should be renamed > > to indexamapi.h or indexam.h to follow tableam.h. No idea what's our policy > > around header file renames. > > We probably should rename it, but not in 12... Okay good to know.
RE: Patch: doc for pg_logical_emit_message()
On Thu. May. 9, 2019 at 01:48 AM Masao, Fujii wrote: > Thanks! Pushed. Thank you. Regards Ryo Matsumura
Re: vacuumdb and new VACUUM options
On Wed, May 08, 2019 at 06:21:09PM -0300, Euler Taveira wrote: > Em qua, 8 de mai de 2019 às 14:19, Fujii Masao > escreveu: >> The question is; we should support vacuumdb option for (1), i.e.,, >> something like --index-cleanup option is added? >> Or for (2), i.e., something like --disable-index-cleanup option is added >> as your patch does? Or for both? > > --index-cleanup=BOOL I agree with Euler's suggestion to have a 1-1 mapping between the option of vacuumdb and the VACUUM parameter, because that's more intuitive: - --index-cleanup=3Dfalse =3D> VACUUM (INDEX_CLEANUP=3Dfalse) - --index-cleanup=3Dtrue =3D> VACUUM (INDEX_CLEANUP=3Dtrue) - no --index-cleanup means to rely on the reloption. -- Michael signature.asc Description: PGP signature
Re: Wrong return code in vacuumdb when multiple jobs are used
On Sat, May 04, 2019 at 11:48:53AM -0400, Tom Lane wrote: > +1, waiting till after the minor releases are tagged seems wisest. > We can still push it before 12beta1, so it will get tested in the beta > period. The new minor releases have been tagged, so committed. -- Michael signature.asc Description: PGP signature
Re: Statistical aggregate functions are not working with PARTIAL aggregation
Hello. There is an unfortunate story on this issue. At Wed, 8 May 2019 14:56:25 -0400, Andrew Dunstan wrote in <7969b496-096a-bf9b-2a03-4706baa4c...@2ndquadrant.com> > > On 5/8/19 12:41 PM, Greg Stark wrote: > > Don't we have a build farm animal that runs under valgrind that would > > have caught this? > > > > > > There are two animals running under valgrind: lousyjack and skink. Valgrind doesn't detect the overruning read since the block doesn't has 'MEMNOACCESS' region, since the requested size is just 64 bytes. Thus the attached patch let valgrind detect the overrun. ==00:00:00:22.959 20254== VALGRINDERROR-BEGIN ==00:00:00:22.959 20254== Conditional jump or move depends on uninitialised value(s) ==00:00:00:22.959 20254==at 0x88A838: ExecInterpExpr (execExprInterp.c:1553) ==00:00:00:22.959 20254==by 0x88AFD5: ExecInterpExprStillValid (execExprInterp.c:1769) ==00:00:00:22.959 20254==by 0x8C3503: ExecEvalExprSwitchContext (executor.h:307) ==00:00:00:22.959 20254==by 0x8C4653: advance_aggregates (nodeAgg.c:679) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index d01fc4f52e..7c6eab6d94 100644 --- a/src/backend/executor/nodeAgg.c +++ b/src/backend/executor/nodeAgg.c @@ -2935,7 +2935,7 @@ build_pertrans_for_aggref(AggStatePerTrans pertrans, fmgr_info_set_expr((Node *) combinefnexpr, &pertrans->transfn); pertrans->transfn_fcinfo = - (FunctionCallInfo) palloc(SizeForFunctionCallInfo(2)); + (FunctionCallInfo) palloc(SizeForFunctionCallInfo(2) + 1); InitFunctionCallInfoData(*pertrans->transfn_fcinfo, &pertrans->transfn, 2,
Re: Fuzzy thinking in is_publishable_class
I wrote: > is_publishable_class has a test "relid >= FirstNormalObjectId", > which I think we should drop, for two reasons: > ... > So what is the motivation for this test? If there's an important > reason for it, we need to find a less fragile way to express it. I tried removing the FirstNormalObjectId check, and found that the reason for it seems to be "the subscription/t/004_sync.pl test falls over without it". That's because that test supposes that the *only* entry in pg_subscription_rel will be for the test table that it creates. Without the FirstNormalObjectId check, the information_schema relations also show up in pg_subscription_rel, confusing the script's simplistic status check. I'm of two minds what to do about that. One approach is to just define a "FOR ALL TABLES" publication as including the information_schema tables, in which case 004_sync.pl is wrong and we should fix it by adding a suitable WHERE restriction to its pg_subscription_rel check. However, possibly that would break some applications that are likewise assuming that no built-in tables appear in pg_subscription_rel. But, if what we want is the definition that "information_schema is excluded from publishable tables", I'm not satisfied with this implementation of that rule. Dropping/recreating information_schema would cause the behavior to change. We could, at the cost of an additional syscache lookup, check the name of the schema that a potentially publishable table belongs to and exclude information_schema by name. I don't have much idea about how performance-critical is_publishable_class is, so I don't know how acceptable that seems. regards, tom lane
Re: Inconsistent error message wording for REINDEX CONCURRENTLY
Michael Paquier writes: > No problem to do that. I'll brush up all that once you commit the > first piece you have come up with, and reuse the new API of catalog.c > you are introducing based on the table OID. Pushed my stuff, have at it. regards, tom lane
Re: Inconsistent error message wording for REINDEX CONCURRENTLY
On Wed, May 08, 2019 at 11:28:35PM -0400, Tom Lane wrote: > Michael Paquier writes: >> No problem to do that. I'll brush up all that once you commit the >> first piece you have come up with, and reuse the new API of catalog.c >> you are introducing based on the table OID. > > Pushed my stuff, have at it. Thanks. Attached is what I get to after scanning the error messages in indexcmds.c and index.c. Perhaps you have more comments about it? -- Michael diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index dab06e20a8..7e7c03ef12 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -2743,6 +2743,12 @@ ReindexRelationConcurrently(Oid relationOid, int options) MemoryContextSwitchTo(oldcontext); +/* A system catalog cannot be reindexed concurrently */ +if (IsCatalogRelationOid(relationOid)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot reindex a system catalog concurrently"))); + /* Open relation to get its indexes */ heapRelation = table_open(relationOid, ShareUpdateExclusiveLock); @@ -2756,13 +2762,13 @@ ReindexRelationConcurrently(Oid relationOid, int options) if (!indexRelation->rd_index->indisvalid) ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping", + errmsg("cannot reindex invalid index \"%s.%s\" concurrently, skipping", get_namespace_name(get_rel_namespace(cellOid)), get_rel_name(cellOid; else if (indexRelation->rd_index->indisexclusion) ereport(WARNING, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot reindex concurrently exclusion constraint index \"%s.%s\", skipping", + errmsg("cannot reindex exclusion constraint index \"%s.%s\" concurrently, skipping", get_namespace_name(get_rel_namespace(cellOid)), get_rel_name(cellOid; else @@ -2802,7 +2808,7 @@ ReindexRelationConcurrently(Oid relationOid, int options) if (!indexRelation->rd_index->indisvalid) ereport(WARNING, (errcode(ERRCODE_INDEX_CORRUPTED), - errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping", + errmsg("cannot reindex invalid index \"%s.%s\" concurrently, skipping", get_namespace_name(get_rel_namespace(cellOid)), get_rel_name(cellOid; else @@ -2831,17 +2837,11 @@ ReindexRelationConcurrently(Oid relationOid, int options) { Oid heapId = IndexGetRelation(relationOid, false); -/* A shared relation cannot be reindexed concurrently */ -if (IsSharedRelation(heapId)) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("concurrent reindex is not supported for shared relations"))); - /* A system catalog cannot be reindexed concurrently */ if (IsCatalogRelationOid(heapId)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("concurrent reindex is not supported for catalog relations"))); + errmsg("cannot reindex a system catalog concurrently"))); /* Save the list of relation OIDs in private context */ oldcontext = MemoryContextSwitchTo(private_context); @@ -2869,7 +2869,7 @@ ReindexRelationConcurrently(Oid relationOid, int options) /* Return error if type of relation is not supported */ ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot reindex concurrently this type of relation"))); + errmsg("cannot reindex this type of relation concurrently"))); break; } diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 326dc44177..c8bc6be061 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1967,7 +1967,7 @@ INSERT INTO concur_reindex_tab3 VALUES (3, '[1,2]'); REINDEX INDEX CONCURRENTLY concur_reindex_tab3_c2_excl; -- error ERROR: concurrent index creation for exclusion constraints is not supported REINDEX TABLE CONCURRENTLY concur_reindex_tab3; -- succeeds with warning -WARNING: cannot reindex concurrently exclusion constraint index "public.concur_reindex_tab3_c2_excl", skipping +WARNING: cannot reindex exclusion constraint index "public.concur_reindex_tab3_c2_excl" concurrently, skipping INSERT INTO concur_reindex_tab3 VALUES (4, '[2,4]'); ERROR: conflicting key value violates exclusion constraint "concur_reindex_tab3_c2_excl" DETAIL: Key (c2)=([2,5)) conflicts with existing key (c2)=([1,3)). @@ -2092,10 +2092,15 @@ BEGIN; REINDEX TABLE CONCURRENTLY concur_reindex_tab; ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block COMMIT; -REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation -ERROR: concurrent index creation on system catalog tables is not supported -REINDEX TABLE CONCURRENTLY pg
Re: Implicit timezone conversion replicating from timestamp to timestamptz?
On Friday, 25 January 2019 04:57:15 UTC+8, Jeremy Finzel wrote: > > We are working to migrate several large tables from the timestamp to the > timestamptz data type by using logical replication (so as to avoid long > downtime for type conversions). We are using pglogical but curious if what > I share below applies to native logical replication as well. > > Both source and destination dbs are at localtime, which is > 'America/Chicago' time zone. > > The source system has a timestamp stored "at time zone UTC", like this for > 6:00pm Chicago time: > 2019-01-24 20:00:00.00 > > I was *very surprised* to find that replicating above timestamp to > timestamptz actually does so correctly, showing this value in my psql > client on the subscriber: > 2019-01-24 14:00:00.00-06 > > How does it know/why does it assume it knows that the time zone of the > timestamp data type is UTC on the provider given that my clusters are at > America/Chicago? I would have actually expected an incorrect conversion of > the data unless I set the timezone to UTC on the way in on the subscriber > via a trigger. > > That is, I was expecting to see this: > 2019-01-24 20:00:00.00-06 > > Which is obviously wrong. So why does it do this and is there some > assumption being made somewhere in the code base that a timestamp is > actually saved "at time zone UTC"? > > pglogical is replicating the timestamp text, which is converted on both output and input.
Re: [HACKERS] proposal: schema variables
Hi rebased patch Regards Pavel schema-variables-20190509.patch.gz Description: application/gzip
Re: Patch to document base64 encoding
Er, ping. Nobody has reviewed the latest patchs. Next CF is in July, two months away. You might consider reviewing other people patches, that is expected to make the overall process work. There are several documentation or comment patches in the queue. -- Fabien.
integrate Postgres Users Authentication with our own LDAP Server
Hi all, We would need to integrate Postgres Users Authentication with our own LDAP Server. Basically as of now we are able to login to Postgress DB with a user/password credential. [cid:image001.png@01D50650.D807AE30] These user objects are the part of Postgres DB server. Now we want that these users should be authenticated by LDAP server. We would want the authentication to be done with LDAP, so basically the user credentials should be store in LDAP server Can you mention the prescribed steps in Postgres needed for this integration with LDAP Server? Regards Tarkeshwar
Re: range_agg
On Mon, May 6, 2019 at 4:21 PM Paul Jungwirth wrote: > I need to write some docs and do > some cleanup and I'll have a CF entry. Here is an initial patch. I'd love to have some feedback! :-) One challenge was handling polymorphism, since I want to have this: anyrange[] range_agg(anyrange, bool, bool) But there is no anyrange[]. I asked about this back when I wrote my extension too: https://www.postgresql.org/message-id/CA%2BrenyVOjb4xQZGjdCnA54-1nzVSY%2B47-h4nkM-EP5J%3D1z%3Db9w%40mail.gmail.com Like then, I handled it by overloading the function with separate signatures for each built-in range type: int4range[] range_agg(int4range, bool, bool); int8range[] range_agg(int8range, bool, bool); numrange[] range_agg(numrange, bool, bool); tsrange[] range_agg(tsrange, bool, bool); tstzrange[] range_agg(tstzrange, bool, bool); daterange[] range_agg(daterange, bool, bool); (And users can still define a range_agg for their own custom range types using similar instructions to those in my extension's README.) The problem was the opr_sanity regression test, which rejects functions that share the same C-function implementation (roughly). I took a few stabs at changing my code to pass that test, e.g. defining separate wrapper functions for everything like this: Datum int4range_agg_transfn(PG_FUNCTION_ARGS) { return range_agg_transfn(fcinfo); } But that felt like it was getting ridiculous (8 range types * transfn+finalfn * 1-bool and 2-bool variants), so in the end I just added my functions to the "permitted" output in opr_sanity. Let me know if I should avoid that though. Also I would still appreciate some bikeshedding over the functions' UI since I don't feel great about it. If the overall approach seems okay, I'm still open to adding David's suggestions for weighted_range_agg and covering_range_agg. Thanks! Paul range_agg_v1.patch Description: Binary data
Re: Statistical aggregate functions are not working with PARTIAL aggregation
At Thu, 09 May 2019 11:17:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190509.111746.217492977.horiguchi.kyot...@lab.ntt.co.jp> > Valgrind doesn't detect the overruning read since the block > doesn't has 'MEMNOACCESS' region, since the requested size is > just 64 bytes. > > Thus the attached patch let valgrind detect the overrun. So the attached patch makes palloc always attach the MEMNOACCESS region and sentinel byte. The issue under discussion is detected with this patch either. (But in return memory usage gets larger.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5c25b69822fdd86d05871f61cf30e47f514853ae Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 9 May 2019 13:18:33 +0900 Subject: [PATCH] Always put sentinel in allocated memory blocks. Currently allocated memory blocks doesn't have sentinel byte and valgrind NOACCESS region if requested size is equal to chunk size. The behavior largily diminishes the chance of overrun dection. This patch let allocated memory blocks are always armed with the stuff when MEMORY_CONTEXT_CHECKING is defined. --- src/backend/utils/mmgr/aset.c | 51 + src/backend/utils/mmgr/generation.c | 35 +++-- src/backend/utils/mmgr/slab.c | 23 ++--- 3 files changed, 64 insertions(+), 45 deletions(-) diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 08aff333a4..15ef5c5afa 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -297,6 +297,13 @@ static const MemoryContextMethods AllocSetMethods = { #endif }; +/* to keep room for sentinel in allocated chunks */ +#ifdef MEMORY_CONTEXT_CHECKING +#define REQUIRED_SIZE(x) ((x) + 1) +#else +#define REQUIRED_SIZE(x) (x) +#endif + /* * Table for AllocSetFreeIndex */ @@ -726,9 +733,9 @@ AllocSetAlloc(MemoryContext context, Size size) * If requested size exceeds maximum for chunks, allocate an entire block * for this request. */ - if (size > set->allocChunkLimit) + if (REQUIRED_SIZE(size) > set->allocChunkLimit) { - chunk_size = MAXALIGN(size); + chunk_size = MAXALIGN(REQUIRED_SIZE(size)); blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ; block = (AllocBlock) malloc(blksize); if (block == NULL) @@ -742,8 +749,8 @@ AllocSetAlloc(MemoryContext context, Size size) #ifdef MEMORY_CONTEXT_CHECKING chunk->requested_size = size; /* set mark to catch clobber of "unused" space */ - if (size < chunk_size) - set_sentinel(AllocChunkGetPointer(chunk), size); + Assert (size < chunk_size); + set_sentinel(AllocChunkGetPointer(chunk), size); #endif #ifdef RANDOMIZE_ALLOCATED_MEMORY /* fill the allocated space with junk */ @@ -787,7 +794,7 @@ AllocSetAlloc(MemoryContext context, Size size) * If one is found, remove it from the free list, make it again a member * of the alloc set and return its data address. */ - fidx = AllocSetFreeIndex(size); + fidx = AllocSetFreeIndex(REQUIRED_SIZE(size)); chunk = set->freelist[fidx]; if (chunk != NULL) { @@ -800,8 +807,8 @@ AllocSetAlloc(MemoryContext context, Size size) #ifdef MEMORY_CONTEXT_CHECKING chunk->requested_size = size; /* set mark to catch clobber of "unused" space */ - if (size < chunk->size) - set_sentinel(AllocChunkGetPointer(chunk), size); + Assert (size < chunk->size); + set_sentinel(AllocChunkGetPointer(chunk), size); #endif #ifdef RANDOMIZE_ALLOCATED_MEMORY /* fill the allocated space with junk */ @@ -824,7 +831,7 @@ AllocSetAlloc(MemoryContext context, Size size) * Choose the actual chunk size to allocate. */ chunk_size = (1 << ALLOC_MINBITS) << fidx; - Assert(chunk_size >= size); + Assert(chunk_size >= REQUIRED_SIZE(size)); /* * If there is enough room in the active allocation block, we will put the @@ -959,8 +966,8 @@ AllocSetAlloc(MemoryContext context, Size size) #ifdef MEMORY_CONTEXT_CHECKING chunk->requested_size = size; /* set mark to catch clobber of "unused" space */ - if (size < chunk->size) - set_sentinel(AllocChunkGetPointer(chunk), size); + Assert (size < chunk->size); + set_sentinel(AllocChunkGetPointer(chunk), size); #endif #ifdef RANDOMIZE_ALLOCATED_MEMORY /* fill the allocated space with junk */ @@ -996,10 +1003,10 @@ AllocSetFree(MemoryContext context, void *pointer) #ifdef MEMORY_CONTEXT_CHECKING /* Test for someone scribbling on unused space in chunk */ - if (chunk->requested_size < chunk->size) - if (!sentinel_ok(pointer, chunk->requested_size)) - elog(WARNING, "detected write past chunk end in %s %p", - set->header.name, chunk); + Assert (chunk->requested_size < chunk->size); + if (!sentinel_ok(pointer, chunk->requested_size)) + elog(WARNING, "detected write past chunk end in %s %p", + set->header.name, chunk); #endif if (chunk->size > set->allocChunkLimit) @@ -1078,10 +1085,10 @@ AllocSetRealloc(MemoryContext context, void *pointer, Size size) #ifdef
Re: New vacuum option to do only freezing
On Thu, May 2, 2019 at 1:39 AM Robert Haas wrote: > > On Wed, May 1, 2019 at 12:14 PM Andres Freund wrote: > > On 2019-04-06 16:13:53 +0900, Michael Paquier wrote: > > > On Sat, Apr 06, 2019 at 11:31:31AM +0900, Masahiko Sawada wrote: > > > > Yes, but Fujii-san pointed out that this option doesn't support toast > > > > tables and I think there is not specific reason why not supporting > > > > them. So it might be good to add toast.vacuum_index_cleanup. Attached > > > > patch. > > > > > > Being able to control that option at toast level sounds sensible. I > > > have added an open item about that. > > > > Robert, what is your stance on this open item? It's been an open item > > for about three weeks, without any progress. > > The actual bug in this patch needs to be fixed, but I see we have > another open item for that. This open item, as I understand it, is > all about whether we should add another reloption so that you can > control this behavior separately for TOAST tables. In my opinion, > that's not a critical change and the open item should be dropped, but > others might see it differently. I agree that this item is neither critical and bug. But this is an (my) oversight and is a small patch and I think there is no specific reason why we don't dare to include this in 12. So if this patch could get reviewed enough I think we can have it in 12. Since the previous patch conflicts with current HEAD I've attached the rebased version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center toast_vacuum_index_cleanup_v2.patch Description: Binary data
Re: Fwd: Add tablespace tap test to pg_rewind
On Fri, Mar 22, 2019 at 04:25:53PM +0800, Shaoqi Bai wrote: > Deleted the test for group permissions in updated patch. Well, there are a couple of things I am not really happy about in this patch: - There is not much point to have has_tablespace_mapping as it is not extensible. Instead I'd rather use a single "extra" parameter which can define a range of options. An example of that is within PostgresNode::init for "extra" and "auth_extra". - CREATE TABLESPACE is run once on the primary *before* promoting the standby, which causes the tablespace paths to map between both of them. This is not correct. Creating a tablespace on the primary before creating the standby, and use --tablespace-map would be the way to go... However per the next point... - standby_afterpromotion is created on the promoted standby, and then immediately dropped. pg_rewind is able to handle this case when working on different hosts. But with this test we finish by having the same problem as pg_basebackup: the source and the target server finish by eating each other. I think that this could actually be an interesting feature for pg_rewind. - A comment at the end refers to databases, and not tablespaces. You could work out the first problem with the backup by changing the backup()/init_from_backup() in RewindTest::create_standby by a pure call to pg_basebackup, but you still have the second problem, which we should still be able to test, and this requires more facility in pg_rewind so as it is basically possible to hijack create_target_symlink() to create a symlink to a different path than the initial one. > Checking the RecursiveCopy::copypath being called, only _backup_fs and > init_from_backup called it. > After runing cmd make -C src/bin check in updated patch, seeing no failure. Yes, I can see that. The issue is that even if we do a backup with --tablespace-mapping then we still need a tweak to allow the copy of symlinks. I am not sure that this is completely what we are looking for either, as it means that any test setting a primary with a tablespace and two standbys initialized from the same base backup would fail. That's not really portable. -- Michael signature.asc Description: PGP signature
Re: any suggestions to detect memory corruption
Thanks you Tom and Robert! I tried valgrind, and looks it help me fix the issue. Someone add some code during backend init which used palloc. but at that time, the CurrentMemoryContext is PostmasterContext. at the end of backend initialization, the PostmasterContext is deleted, then the error happens. the reason why it happens randomly is before the palloc, there are some other if clause which may skip the palloc. I still can't explain why PostmasterContext may have impact "index info" MemoryContext sometime, but now I just can't reproduce it (before the fix, it may happen in 30% cases). On Thu, May 9, 2019 at 1:21 AM Robert Haas wrote: > On Wed, May 8, 2019 at 10:34 AM Tom Lane wrote: > > Alex writes: > > > I can get the following log randomly and I am not which commit caused > it. > > > > > 2019-05-08 21:37:46.692 CST [60110] WARNING: problem in alloc set > index > > > info: req size > alloc size for chunk 0x2a33a78 in block 0x2a33a18 > > > > I've had success in finding memory stomp causes fairly quickly by setting > > a hardware watchpoint in gdb on the affected location. Then you just let > > it run to see when the value changes, and check whether that's a "legit" > > or "not legit" modification point. > > > > The hard part of that, of course, is to know in advance where the > affected > > location is. You may be able to make things sufficiently repeatable by > > doing the problem query in a fresh session each time. > > valgrind might also be a possibility, although that has a lot of overhead. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: Add tablespace tap test to pg_rewind
On Tue, May 07, 2019 at 10:31:59AM +0900, Kyotaro HORIGUCHI wrote: > The patch seems to be using the tablespace directory in backups > directly from standbys. In other words, multiple standbys created > from A backup shares the tablespace directory in the backup. Yes, I noticed that, and I am not happy about that either. I'd like to think that what we are looking for is an equivalent of the tablespace mapping of pg_basebackup, but for init_from_backup(). At least that sounds like a plausible solution. -- Michael signature.asc Description: PGP signature
Re: Unexpected "shared memory block is still in use"
On Wed, May 08, 2019 at 02:32:46PM -0400, Tom Lane wrote: > Just now, while running a parallel check-world on HEAD according to the > same script I've been using for quite some time, one of the TAP tests > died during initdb: > > selecting dynamic shared memory implementation ... posix > selecting default max_connections ... 100 > selecting default shared_buffers ... 128MB > selecting default timezone ... America/New_York > creating configuration files ... ok > running bootstrap script ... ok > performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT > [18351] FATAL: pre-existing shared memory block (key 5440004, ID 1734475802) > is still in use > 2019-05-08 13:59:19.963 EDT [18351] HINT: Terminate any old server processes > associated with data directory > "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata". > child process exited with exit code 1 > initdb: removing data directory > "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata" > Bail out! system initdb failed > > I have never seen this happen before in the TAP tests. > > I think the odds are very high that this implies something wrong with > commit c09850992. The odds are very high that you would not have gotten that error before that commit. But if the cause matches your guess, it's not something wrong with the commit ... > My immediate guess after eyeballing that patch quickly is that it was > not a good idea to redefine the rules used by bootstrap/standalone > backends. In particular, it seems somewhat plausible that the bootstrap > process hadn't yet completely died when the standalone backend for the > post-bootstrap phase came along and decided there was a conflict (which > it never would have before). If so, I would sure try to fix the initdb sequence to not let that happen. I would not trust such a conflict to be harmless. What OS, OS version, and filesystem?
Re: relcache reference leak with pglogical replication to insert-only partitioned table?
On Sunday, 27 January 2019 11:20:03 UTC+8, Jeremy Finzel wrote: > > I understand it's not fully supported to replicate to a differently > partitioned setup on a subscriber with either pglogical or the native > logical replication, however I also know that INSERT triggers can be fired > in replication mode. I have an insert-only OLTP table that I want > partitioned only on the subscriber system. I have this setup using the > "old style" partitioning as it is a 9.6 system. > > Provider is 9.6.6 pglogical 2.1.1 > Subscriber is 9.6.10 pglogical 2.1.1 > > Everything appears good as far as the data. It is partitioning > correctly. Queries on the data are planning correctly. However, I am now > getting these WARNING messages constantly. How concerned should I be? Is > there a fix for this? Any insight is much appreciated! > > 2019-01-27 03:12:34.150 GMT,,,135600,,5c4d1f44.211b0,6794,,2019-01-27 > 03:02:28 GMT,54/0,1057372660,WARNING,01000,"relcache reference leak: > relation ""foo_pkey"" not closed","apply COMMIT in commit before > 14DB/34DB1B78, xid 1476598649 commited at 2019-01-26 21:12:34.071673-06 > (action #10) from node replorigin 22""pglogical apply 16420:2094659706" > It won't corrupt anything but you can expect resource leaks. I think there are changes to support this in pglogical3. I don't have any new information on when they might become public, but I do know work has been done to add a plugin mechanism etc as part of work toward opening pglogical3.
Re: Wrong return code in vacuumdb when multiple jobs are used
On Thu, May 9, 2019 at 3:32 AM Michael Paquier wrote: > > On Sat, May 04, 2019 at 11:48:53AM -0400, Tom Lane wrote: > > +1, waiting till after the minor releases are tagged seems wisest. > > We can still push it before 12beta1, so it will get tested in the beta > > period. > > The new minor releases have been tagged, so committed. Thanks a lot!
Re: Unexpected "shared memory block is still in use"
At Wed, 8 May 2019 22:54:14 -0700, Noah Misch wrote in <20190509055414.gb1066...@rfd.leadboat.com> > On Wed, May 08, 2019 at 02:32:46PM -0400, Tom Lane wrote: > > Just now, while running a parallel check-world on HEAD according to the > > same script I've been using for quite some time, one of the TAP tests > > died during initdb: > > > > selecting dynamic shared memory implementation ... posix > > selecting default max_connections ... 100 > > selecting default shared_buffers ... 128MB > > selecting default timezone ... America/New_York > > creating configuration files ... ok > > running bootstrap script ... ok > > performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT > > [18351] FATAL: pre-existing shared memory block (key 5440004, ID > > 1734475802) is still in use > > 2019-05-08 13:59:19.963 EDT [18351] HINT: Terminate any old server > > processes associated with data directory > > "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata". > > child process exited with exit code 1 > > initdb: removing data directory > > "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata" > > Bail out! system initdb failed > > > > I have never seen this happen before in the TAP tests. > > > > I think the odds are very high that this implies something wrong with > > commit c09850992. > > The odds are very high that you would not have gotten that error before that > commit. But if the cause matches your guess, it's not something wrong with > the commit ... > > > My immediate guess after eyeballing that patch quickly is that it was > > not a good idea to redefine the rules used by bootstrap/standalone > > backends. In particular, it seems somewhat plausible that the bootstrap > > process hadn't yet completely died when the standalone backend for the > > post-bootstrap phase came along and decided there was a conflict (which > > it never would have before). > > If so, I would sure try to fix the initdb sequence to not let that happen. I > would not trust such a conflict to be harmless. > > What OS, OS version, and filesystem? PGSharedMemoryCreate shows the error in SHMSTATE_ANALYSYS_FAILURE case. PGSharedMemoryAttach returns the code when, for example, shmat failed with ENOMEM. I'm afraid that the message is not shown from SHMSTATE_ATTACHED.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: POC: Cleaning up orphaned files using undo logs
On Mon, May 6, 2019 at 5:43 PM Dilip Kumar wrote: > > Just for tracking, open comments which still needs to be worked on. > > 1. Avoid special case in UndoRecordIsValid. > > Can we instead eliminate the special case? It seems like the if > > (log->oldest_data == InvalidUndoRecPtr) case will be taken very > > rarely, so if it's buggy, we might not notice. I have worked on this comments and added changes in the latest patch. > > 2. While updating the previous transaction header instead of unpacking > complete header and writing it back, we can just unpack main header > and calculate the offset of uur_next and then update it directly. For this as you suggested I am not changing, updated the comments. > > 3. unifying uur_xid and uur_xidepoch into uur_fxid. Still open. I have also added the README. Patches can be applied on top of undo branch [1] commit: (cb777466d008e656f03771cf16ec7ef9d6f2778b) [1] https://github.com/EnterpriseDB/zheap/tree/undo -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 0001-Update-oldest_data-on-startup_v5.patch Description: Binary data 0003-Test-module-for-undo-api_v5.patch Description: Binary data 0004-undo-page-consistency-checker_v5.patch Description: Binary data 0002-Provide-interfaces-to-store-and-fetch-undo-records_v5.patch Description: Binary data
Re: integrate Postgres Users Authentication with our own LDAP Server
On Thu, 2019-05-09 at 04:51 +, M Tarkeshwar Rao wrote: > We would need to integrate Postgres Users Authentication with our own LDAP > Server. > > Basically as of now we are able to login to Postgress DB with a user/password > credential. > > [roles "pg_signal_backend" and "postgres"] > > These user objects are the part of Postgres DB server. Now we want that these > users should be authenticated by LDAP server. > We would want the authentication to be done with LDAP, so basically the user > credentials should be store in LDAP server > > Can you mention the prescribed steps in Postgres needed for this integration > with LDAP Server? LDAP authentication is well documented: https://www.postgresql.org/docs/current/auth-ldap.html But I don't think you are on the right track. "pg_signal_backend" cannot login, it is a role to which you add a login user to give it certain privileges. So you don't need to authenticate the role. "postgres" is the installation superuser. If security is important for you, you won't set a password for that user and you won't allow remote logins with that user. But for your application users LDAP authentication is a fine thing, and not hard to set up if you know a little bit about LDAP. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: pgsql: Add strict_multi_assignment and too_many_rows plpgsql checks
On 2018-07-25 01:47, Tomas Vondra wrote: > Add strict_multi_assignment and too_many_rows plpgsql checks Could you clarify this error message: /* translator: %s represents a name of an extra check */ errdetail("%s check of %s is active.", "strict_multi_assignment", strict_multiassignment_level == ERROR ? "extra_errors" : "extra_warnings"), What does, for example, strict_multi_assignment check of extra_errors is active. mean? Also note that the translator comment is only mentioning one of the %s. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services