Re: remove some ancient port hacks
On 2020-08-12 10:18, Marco Atzeri wrote: On 12.08.2020 09:12, Peter Eisentraut wrote: There are two ancient hacks in the cygwin and solaris ports that appear to have been solved more than 10 years ago, so I think we can remove them. See attached patches. Hi Peter, This is really archeology Check for b20.1 as it was released in 1998. No problem at all to remove it Committed. Thanks for the feedback. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: remove some ancient port hacks
On 2020-08-13 05:22, Noah Misch wrote: On Wed, Aug 12, 2020 at 09:12:07AM +0200, Peter Eisentraut wrote: There are two ancient hacks in the cygwin and solaris ports that appear to have been solved more than 10 years ago, so I think we can remove them. See attached patches. +1 for removing these. >10y age is not sufficient justification by itself; if systems that shipped with the defect were not yet EOL, that would tend to justify waiting longer. For these particular hacks, though, affected systems are both old and EOL. done In this case, the bug was fixed in the stable release track of this OS, so the only way to still be affected would be if you had never installed any OS patches in 10 years, which is clearly unreasonable. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: use pg_get_functiondef() in pg_dump
On 2020-08-12 21:54, Robert Haas wrote: One problem with this, which I think Tom pointed out before, is that it might make it to handle some forward-compatibility problems. In other words, if something that the server is generating needs to be modified for compatibility with a future release, it's not easy to do that. Like if we needed to quote something we weren't previously quoting, for example. We already use a lot of other pg_get_*def functions in pg_dump. Does this one introduce any fundamentally new problems? A hypothetical change where syntax that we accept now would no longer be accepted in a (near-)future version would create a lot of upsetness. I don't think we'd do it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, Aug 13, 2020 at 6:47 PM Amit Kapila wrote: > > On Thu, Aug 13, 2020 at 12:08 PM Amit Kapila wrote: > > > > On Fri, Aug 7, 2020 at 2:04 PM Dilip Kumar wrote: > > > > > > On Thu, Aug 6, 2020 at 2:43 PM Amit Kapila > > > wrote: > > > > > > .. > > > > This patch's functionality can be independently verified by SQL APIs > > > > > > Your changes look fine to me. > > > > > > > I have pushed that patch last week and attached are the remaining > > patches. I have made a few changes in the next patch > > 0001-Extend-the-BufFile-interface.patch and have some comments on it > > which are as below: > > > > Few more comments on the latest patches: > v48-0002-Add-support-for-streaming-to-built-in-replicatio > 1. It appears to me that we don't remove the temporary folders created > by the apply worker. So, we have folders like > pgsql_tmp15324.0.sharedfileset in base/pgsql_tmp directory even when > the apply worker exits. I think we can remove these by calling > PathNameDeleteTemporaryDir in SharedFileSetUnregister while removing > the fileset from registered filesetlist. I think we need to call SharedFileSetDeleteAll(input_fileset), from SharedFileSetUnregister, so that all the directories created for this fileset are removed > 2. > +typedef struct SubXactInfo > +{ > + TransactionId xid; /* XID of the subxact */ > + int fileno; /* file number in the buffile */ > + off_t offset; /* offset in the file */ > +} SubXactInfo; > + > +static uint32 nsubxacts = 0; > +static uint32 nsubxacts_max = 0; > +static SubXactInfo *subxacts = NULL; > +static TransactionId subxact_last = InvalidTransactionId; > > Will it be better if we move all the subxact related variables (like > nsubxacts, nsubxacts_max and subxact_last) inside SubXactInfo struct > as all the information anyway is related to sub-transactions? I have moved them all to a structure. > 3. > + /* > + * If there is no subtransaction then nothing to do, but if already have > + * subxact file then delete that. > + */ > > extra space before 'but' in the above sentence is not required. Fixed > v48-0001-Extend-the-BufFile-interface > 4. > - * SharedFileSets can also be used by backends when the temporary files need > - * to be opened/closed multiple times and the underlying files need to > survive > + * SharedFileSets can be used by backends when the temporary files need to be > + * opened/closed multiple times and the underlying files need to survive > * across transactions. > * > > No need of 'also' in the above sentence. Fixed -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: run pgindent on a regular basis / scripted manner
On 2020-08-13 00:34, Andres Freund wrote: I e.g. just re-indented patch 0001 of my GetSnapshotData() series and most of the hunks were entirely unrelated. Despite the development window for 14 having only relatively recently opened. Based on my experience it tends to get worse over time. Do we have a sense of why poorly-indented code gets committed? I think some of the indentation rules are hard to follow manually. (pgperltidy is worse.) Also, since pgindent gets run eventually anyway, it's not really that important to get the indentation right the first time. I suspect the goal of most authors and committers is to write readable code rather than to divine the exact pgindent output. I think as a start, we could just issue a guidelines that all committed code should follow pgindent. That has never really been a guideline, so it's not surprising that it's not followed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: use pg_get_functiondef() in pg_dump
Peter Eisentraut writes: > On 2020-08-12 21:54, Robert Haas wrote: >> One problem with this, which I think Tom pointed out before, is that >> it might make it to handle some forward-compatibility problems. In >> other words, if something that the server is generating needs to be >> modified for compatibility with a future release, it's not easy to do >> that. Like if we needed to quote something we weren't previously >> quoting, for example. > We already use a lot of other pg_get_*def functions in pg_dump. Does > this one introduce any fundamentally new problems? I wouldn't say that it's *fundamentally* new, but nonethless it disturbs me that this proposal has pg_dump assembling CREATE FUNCTION commands in very different ways depending on the server version. I'd rather see us continuing to build the bulk of the command the same as before, and introduce new behavior only for deparsing the function body. We've talked before about what a mess it is that some aspects of pg_dump's output are built on the basis of what pg_dump sees in its stable snapshot but others are built by ruleutils.c on the basis of up-to-the-minute catalog contents. While I don't insist that this patch fix that, I'm worried that it may be making things worse, or at least getting in the way of ever fixing that. Perhaps these concerns are unfounded, but I'd like to see some arguments why before we go down this path. regards, tom lane
Re: use pg_get_functiondef() in pg_dump
I wrote: > I wouldn't say that it's *fundamentally* new, but nonethless it disturbs > me that this proposal has pg_dump assembling CREATE FUNCTION commands in > very different ways depending on the server version. I'd rather see us > continuing to build the bulk of the command the same as before, and > introduce new behavior only for deparsing the function body. BTW, a concrete argument for doing it that way is that if you make a backend function that does the whole CREATE-FUNCTION-building job in exactly the way pg_dump wants it, that function is nigh useless for any other client with slightly different requirements. A trivial example here is that I don't think we want to become locked into the proposition that psql's \ef and \sf must print functions exactly the same way that pg_dump would. regards, tom lane
Re: [BUG] Error in BRIN summarization
hyrax's latest report suggests that this patch has issues under CLOBBER_CACHE_ALWAYS: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-08-13%2005%3A09%3A58 Hard to tell whether there's an actual bug there or just test instability, but either way it needs to be resolved. regards, tom lane
Re: Improving connection scalability: GetSnapshotData()
We have two essentially identical buildfarm failures since these patches went in: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2020-08-15%2011%3A27%3A32 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2020-08-15%2003%3A09%3A14 They're both in the same place in the freeze-the-dead isolation test: TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", File: "heapam.c", Line: 6051) 0x9613eb at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres 0x52d586 at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres 0x53bc7e at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres 0x6949bb at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres 0x694532 at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres 0x693d1c at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres 0x8324b3 ... 2020-08-14 22:16:41.783 CDT [78410:4] LOG: server process (PID 80395) was terminated by signal 6: Abort trap 2020-08-14 22:16:41.783 CDT [78410:5] DETAIL: Failed process was running: VACUUM FREEZE tab_freeze; peripatus has successes since this failure, so it's not fully reproducible on that machine. I'm suspicious of a timing problem in computing vacuum's cutoff_xid. (I'm also wondering why the failing check is an Assert rather than a real test-and-elog. Assert doesn't seem like an appropriate way to check for plausible data corruption cases.) regards, tom lane
Re: LSM tree for Postgres
Dmitry Dolgov wrote >> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote: >> >> Then I think about implementing ideas of LSM using standard Postgres >> nbtree. >> >> We need two indexes: one small for fast inserts and another - big >> (main) index. This top index is small enough to fit in memory so >> inserts in this index are very fast. Periodically we will merge data >> from top index to base index and truncate the top index. To prevent >> blocking of inserts in the table while we are merging indexes we can >> add ... on more index, which will be used during merge. >> >> So final architecture of Lsm3 is the following: two top indexes used >> in cyclic way and one main index. When top index reaches some >> threshold value we initiate merge with main index, done by bgworker >> and switch to another top index. As far as merging indexes is done in >> background, it doesn't affect insert speed. Unfortunately Postgres >> Index AM has not bulk insert operation, so we have to perform normal >> inserts. But inserted data is already sorted by key which should >> improve access locality and partly solve random reads problem for base >> index. >> >> Certainly to perform search in Lsm3 we have to make lookups in all >> three indexes and merge search results. > > Thanks for sharing this! In fact this reminds me more of partitioned > b-trees [1] (and more older [2]) rather than LSM as it is (although > could be that the former was influenced by the latter). What could be > interesting is that quite often in these and many other whitepapers > (e.g. [3]) to address the lookup overhead the design includes bloom > filters in one or another way to avoid searching not relevant part of an > index. Tomas mentioned them in this thread as well (in the different > context), probably the design suggested here could also benefit from it? > > [1]: Riegger Christian, Vincon Tobias, Petrov Ilia. Write-optimized > indexing with partitioned b-trees. (2017). 296-300. > 10.1145/3151759.3151814. > [2]: Graefe Goetz. Write-Optimized B-Trees. (2004). 672-683. > 10.1016/B978-012088469-8/50060-7. > [3]: Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, > Lin Ma, and Rui Shen. Reducing the Storage Overhead of Main-Memory OLTP > Databases with Hybrid Indexes. (2016). 1567–1581. 10.1145/2882903.2915222. I found this 2019 paper recently, might be worth a skim read for some different ideas. too technical for me :) "Jungle: Towards Dynamically Adjustable Key-Value Storeby Combining LSM-Tree and Copy-On-Write B+-Tree" https://www.usenix.org/system/files/hotstorage19-paper-ahn.pdf -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Improving connection scalability: GetSnapshotData()
Hi, On 2020-08-15 11:10:51 -0400, Tom Lane wrote: > We have two essentially identical buildfarm failures since these patches > went in: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2020-08-15%2011%3A27%3A32 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2020-08-15%2003%3A09%3A14 > > They're both in the same place in the freeze-the-dead isolation test: > TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", > File: "heapam.c", Line: 6051) > 0x9613eb at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x52d586 at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x53bc7e at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x6949bb at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x694532 at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x693d1c at > /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres > 0x8324b3 > ... > 2020-08-14 22:16:41.783 CDT [78410:4] LOG: server process (PID 80395) was > terminated by signal 6: Abort trap > 2020-08-14 22:16:41.783 CDT [78410:5] DETAIL: Failed process was running: > VACUUM FREEZE tab_freeze; > > peripatus has successes since this failure, so it's not fully reproducible > on that machine. I'm suspicious of a timing problem in computing vacuum's > cutoff_xid. Hm, maybe it's something around what I observed in https://www.postgresql.org/message-id/20200723181018.neey2jd3u7rfrfrn%40alap3.anarazel.de I.e. that somehow we end up with hot pruning and freezing coming to a different determination, and trying to freeze a hot tuple. I'll try to add a few additional asserts here, and burn some cpu tests trying to trigger the issue. I gotta escape the heat in the house for a few hours though (no AC here), so I'll not look at the results till later this afternoon, unless it triggers soon. > (I'm also wondering why the failing check is an Assert rather than a real > test-and-elog. Assert doesn't seem like an appropriate way to check for > plausible data corruption cases.) Robert, and to a lesser degree you, gave me quite a bit of grief over converting nearby asserts to elogs. I agree it'd be better if it were an assert, but ... Greetings, Andres Freund
Re: run pgindent on a regular basis / scripted manner
Hi, On 2020-08-15 13:47:41 +0200, Peter Eisentraut wrote: > On 2020-08-13 00:34, Andres Freund wrote: > > I e.g. just re-indented patch 0001 of my GetSnapshotData() series and > > most of the hunks were entirely unrelated. Despite the development > > window for 14 having only relatively recently opened. Based on my > > experience it tends to get worse over time. > > Do we have a sense of why poorly-indented code gets committed? I think some > of the indentation rules are hard to follow manually. (pgperltidy is > worse.) > > Also, since pgindent gets run eventually anyway, it's not really that > important to get the indentation right the first time. I suspect the goal > of most authors and committers is to write readable code rather than to > divine the exact pgindent output. One thing is that some here are actively against manually adding entries to typedefs.list. Which then means that code gets oddly indented if you use pgindent. I personally try to make the predictable updates to typedefs.list, which then at least allows halfway sensibly indenting my own changes. > I think as a start, we could just issue a guidelines that all committed code > should follow pgindent. That has never really been a guideline, so it's not > surprising that it's not followed. Without a properly indented baseline that's hard to do, because it'll cause damage all over. So I don't think we easily can start just there - we'd first need to re-indent everything. Greetings, Andres Freund
Re: Is it worth accepting multiple CRLs?
Greetings, * Kyotaro Horiguchi (horikyota@gmail.com) wrote: > At Mon, 03 Aug 2020 16:20:40 +0900 (JST), Kyotaro Horiguchi > wrote in > > Thanks for the opinion. I'll continue working on this. > > This is it, but.. Thanks! > Looking closer I realized that certificates are verified in each > backend so CRL cache doesn't work at all for the hashed directory > method. Therefore, all CRL files relevant to a certificate to be > verfied are loaded every time a backend starts. > > The only advantage of this is avoiding irrelevant CRLs from being > loaded in exchange of loading relevant CRLs at every session > start. Session startup gets slower by many delta CRLs from the same > CA. > > Seems far from promising. I agree that it's not ideal, but I don't know that this is a reason to not move forward with this feature..? We could certainly have a later patch which improves this in some way (though exactly how isn't clear... if we move the CRL loading into postmaster then we'd have to load *all* of them, and then we'd still need to check if they've changed since we loaded them, and presumably have some way to signal the postmaster to update its set from time to time..), but that can be a future effort. I took a quick look through the patch and it seemed pretty straight forward to me and a good improvement. Would love to hear other thoughts. I hope you'll submit this for the September CF and ping me when you do and I'll see if I can get it committed. Thanks! Stephen signature.asc Description: PGP signature
Re: use pg_get_functiondef() in pg_dump
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > I wrote: > > I wouldn't say that it's *fundamentally* new, but nonethless it disturbs > > me that this proposal has pg_dump assembling CREATE FUNCTION commands in > > very different ways depending on the server version. I'd rather see us > > continuing to build the bulk of the command the same as before, and > > introduce new behavior only for deparsing the function body. > > BTW, a concrete argument for doing it that way is that if you make a > backend function that does the whole CREATE-FUNCTION-building job in > exactly the way pg_dump wants it, that function is nigh useless for > any other client with slightly different requirements. A trivial > example here is that I don't think we want to become locked into > the proposition that psql's \ef and \sf must print functions exactly > the same way that pg_dump would. The fact that the need that psql has and that which pg_dump has are at least somewhat similar really argues that we should have put this code into libpgcommon in the first place and not in the backend, and then had both psql and pg_dump use that. I'm sure there's a lot of folks who'd like to see more of the logic we have in pg_dump for building objects from the catalog available to more tools through libpgcommon- psql being one of the absolute first use-cases for exactly that (there's certainly no shortage of people who've asked how they can get a CREATE TABLE statement for a table by using psql...). Thanks, Stephen signature.asc Description: PGP signature
Re: run pgindent on a regular basis / scripted manner
Andres Freund writes: > One thing is that some here are actively against manually adding entries > to typedefs.list. I've been of the opinion that it's pointless to do so under the current regime where (a) only a few people do that and (b) we only officially re-indent once a year anyway. When I want to manually run pgindent, I always pull down a fresh typedefs.list from the buildfarm, which is reasonably up-to-date regardless of what people added or didn't add, and then add any new typedefs from my current patch to that out-of-tree copy. Now, if we switch to a regime where we're trying to keep the tree in more nearly correctly-indented shape all the time, it would make sense to revisit that. I'm not saying that it's unreasonable to want to have the in-tree typedefs.list track reality more closely --- only that doing so in a half-baked way won't be very helpful. >> I think as a start, we could just issue a guidelines that all committed code >> should follow pgindent. That has never really been a guideline, so it's not >> surprising that it's not followed. > Without a properly indented baseline that's hard to do, because it'll > cause damage all over. So I don't think we easily can start just there - > we'd first need to re-indent everything. Well, we can certainly do a tree-wide re-indent anytime we're ready. I doubt it would be very painful right now, with so little new work since the last run. regards, tom lane