Re: [Patch] Make pg_checksums skip foreign tablespace directories
At Fri, 31 Jan 2020 13:53:52 +0900, Michael Paquier wrote in > On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote: > > The other question is whether it is possible to end up with a > > pg_internal.init.$PID file in a running cluster. E.g. if an instance > > crashes and gets started up again - are those cleaned up during crash > > recovery, or should pg_checksums ignore them? Right now pg_checksums > > only checks against a list of filenames and only skips on exact matches > > not prefixes so that might take a bit of work. > > Indeed, with a bad timing and a crash in the middle of > write_relcache_init_file(), it could be possible to have such a file > left around in the data folder. Having a past tablespace version left > around after an upgrade is a pilot error in my opinion because > pg_upgrade generates a script to cleanup past tablespaces, no? So > your patch does not look like a good idea to me. And now that I look > at it, if we happen to leave behind a temporary file for > pg_internal.init, backups fail with the following error: > 2020-01-31 13:26:18.345 JST [102076] 010_pg_basebackup.pl ERROR: > invalid segment number 0 in file "pg_internal.init.123" Agreed. > So, I think that it would be better to change basebackup.c, > pg_checksums and pg_rewind so as files are excluded if there is a > prefix match with the exclude lists. Please see the attached. Agreed that the tools should ignore such files. Looking excludeFile, it seems to me that basebackup makes sure to exclude only files that should harm. I'm not sure whether it's explicitly, but tablespace_map.old and backup_label.old are not excluded. The patch excludes harmless files such like "backup_label.20200131" and the two files above. I don't think that is a problem right away, of course. It looks good to me except for the possible excessive exclusion. So, I don't object it if we don't mind that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [Patch] Make pg_checksums skip foreign tablespace directories
At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi wrote in > I don't think that is a problem right away, of course. It looks good > to me except for the possible excessive exclusion. So, I don't object > it if we don't mind that. That's a bit wrong. All the discussion is only on excludeFiles. I think we should refrain from letting more files match to nohecksumFiles. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: SimpleLruTruncate() mutual exclusion
On Thu, Jan 30, 2020 at 04:34:33PM +0100, Dmitry Dolgov wrote: > > On Sun, Nov 17, 2019 at 10:14:26PM -0800, Noah Misch wrote: > > > I'm probably missing something, so just wanted to clarify. Do I > > > understand correctly, that thread [1] and this one are independent, and > > > it is assumed in the scenario above that we're at the end of XID space, > > > but not necessarily having rounding issues? I'm a bit confused, since > > > the reproduce script does something about cutoffPage, and I'm not sure > > > if it's important or not. > > > > I think the repro recipe contained an early fix for the other thread's bug. > > While they're independent in principle, I likely never reproduced this bug > > without having a fix in place for the other thread's bug. The bug in the > > other thread was easier to hit. > > Just to clarify, since the CF item for this patch was withdrawn > recently. Does it mean that eventually the thread [1] covers this one > too? > > [1]: > https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com I withdrew $SUBJECT because, if someone reviews one of my patches, I want it to be the one you cite in [1]. I plan not to commit [1] without a Ready for Committer, and I plan not to commit $SUBJECT before committing [1]. I would be willing to commit $SUBJECT without getting a review, however.
Re: Hash join not finding which collation to use for string hashing
On Fri, Jan 31, 2020 at 6:15 AM Mark Dilger wrote: > > On Jan 30, 2020, at 12:29 PM, Tom Lane wrote: > > Mark Dilger writes: > >> Would it make sense to catch a qual with unassigned collation > >> somewhere in the planner, where the qual's operator family is > >> estatblished, by checking if the operator family behavior is sensitive > >> to collations? > > > >> I’m not sure how to do that. pg_opfamily doesn’t seem to have a field for > >> that. Can you recommend how I would proceed there? > > > > There's no such information attached to opfamilies, which is more or less > > forced by the fact that individual operators don't expose it either. > > There's not much hope of making that better without incompatible changes > > in the requirements for extensions to define operators and/or operator > > families. > > Thanks, Tom, for confirming this. Just for the record, I will explain why I brought up doing anything with operator families at all. What I was really imagining is putting a hard-coded check somewhere in the middle of equivalence processing to see if a given qual's operator would be sensitive to collation based *only* on whether it belongs to a text operator family, such as TEXT_BTREE_FAM_OID, whereas the qual expression's inputcollid is 0 (parser failed to resolve collation conflict among its arguments) and erroring out if so. If we do that, maybe we won't need check_collation_set() that's used in various text operators. Also, erroring out sooner might allow us to produce more specific error message, which as I understand it, would help with one of the Mark's complaints that the error message is too ambiguous due to emitted as such a low layer. I thought of the idea after running into a recent commit relevant to non-deterministic collations: commit 2810396312664bdb941e549df7dfa75218d73a1c Author: Tom Lane Date: Sat Sep 21 16:29:17 2019 -0400 Fix up handling of nondeterministic collations with pattern_ops opclasses. text_pattern_ops and its siblings can't be used with nondeterministic collations, because they use the text_eq operator which will not behave as bitwise equality if applied with a nondeterministic collation. The initial implementation of that restriction was to insert a run-time test in the related comparison functions, but that is inefficient, may throw misleading errors, and will throw errors in some cases that would work. It seems sufficient to just prevent the combination during CREATE INDEX, so do that instead. Lacking any better way to identify the opclasses involved, we need to hard-wire tests for them, which requires hand-assigned values for their OIDs, which forces a catversion bump because they previously had OIDs that would be assigned automatically. That's slightly annoying in the v12 branch, but fortunately we're not at rc1 yet, so just do it. Discussion: https://postgr.es/m/22566.1568675...@sss.pgh.pa.us IIUC, the above commit removes a check_collation_set() call from a operator class comparison function in favor of ensuring that an index using that operator class can only be defined with a deterministic collation in the first place. But as the above commit is purportedly only a stop-gap solution due to lacking operator class infrastructure to consider collation in operator semantics, maybe we shouldn't spread such a hack in other places. Thanks, Amit
Re: Marking some contrib modules as trusted extensions
On Wed, 29 Jan 2020 at 21:39, Tom Lane wrote: > > >>> pg_stat_statements > > Mmm, I'm not convinced --- the ability to see what statements are being > executed in other sessions (even other databases) is something that > paranoid installations might not be so happy about. Our previous > discussions about what privilege level is needed to look at > pg_stat_statements info were all made against a background assumption > that you needed some extra privilege to set up the view in the first > place. I think that would need another look or two before being > comfortable that we're not shifting the goal posts too far. > > The bigger picture here is that I don't want to get push-back that > we've broken somebody's security posture by marking too many extensions > trusted. So for anything where there's any question about security > implications, we should err in the conservative direction of leaving > it untrusted. > +1 I wonder if the same could be said about pgrowlocks. Regards, Dean
Re: [Patch] Make pg_checksums skip foreign tablespace directories
Hi, Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier: > On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote: > Having a past tablespace version left > around after an upgrade is a pilot error in my opinion because > pg_upgrade generates a script to cleanup past tablespaces, no? So > your patch does not look like a good idea to me. Not sure I agree with it, but if that (i.e. after pg_upgrade in copy mode, you have no business to use the old cluster as well as the new one) is project policy, fair enough. However, Postgres does not disallow to just create tablespaces in the same location from two different versions, so you don't need the pg_upgade scenario to get into this (pg_checksums checking the wrong cluster's data) problem: postgres@kohn:~$ psql -p 5437 -c "CREATE TABLESPACE bar LOCATION '/var/lib/postgresql/bar'" CREATE TABLESPACE postgres@kohn:~$ psql -p 5444 -c "CREATE TABLESPACE bar LOCATION '/var/lib/postgresql/bar'" CREATE TABLESPACE postgres@kohn:~$ ls bar PG_11_201809051 PG_12_201909212 postgres@kohn:~$ touch bar/PG_11_201809051/pg_internal.init.123 postgres@kohn:~$ pg_ctlcluster 12 main stop sudo systemctl stop postgresql@12-main postgres@kohn:~$ LANG=C /usr/lib/postgresql/12/bin/pg_checksums -D /var/lib/postgresql/12/main pg_checksums: error: invalid segment number 0 in file name "/var/lib/postgresql/12/main/pg_tblspc/16396/PG_11_201809051/pg_internal .init.123" I believe this is in order to allow pg_upgrade to run in the first place. But is this pilot error as well? In any case, it is a situation we allow to happen so IMO we should fix pg_checksums to skip the foreign tablespaces. As an aside, I would advocate to just skip files which fail the segment number determination step (and maybe log a warning), not bail out. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: pg_restore crash when there is a failure before all child process is created
I have applied tested both patches separately and ran regression with both. No new test cases are failing with both patches. The issues is fixed by both patches however the fix from Tom looks more elegant. I haven't done a detailed code review. On Fri, Jan 31, 2020 at 12:39 AM Tom Lane wrote: > vignesh C writes: > > On Wed, Jan 29, 2020 at 6:54 PM Ahsan Hadi wrote: > >> Can you share a test case or steps that you are using to reproduce this > issue? Are you reproducing this using a debugger? > > > I could reproduce with the following steps: > > Make cluster setup. > > Create few tables. > > Take a dump in directory format using pg_dump. > > Restore the dump generated above using pg_restore with very high > > number for --jobs options around 600. > > I agree this is quite broken. Another way to observe the crash is > to make the fork() call randomly fail, as per booby-trap-fork.patch > below (not intended for commit, obviously). > > I don't especially like the proposed patch, though, as it introduces > a great deal of confusion into what ParallelState.numWorkers means. > I think it's better to leave that as being the allocated array size, > and instead clean up all the fuzzy thinking about whether workers > are actually running or not. Like 0001-fix-worker-status.patch below. > > regards, tom lane > > -- Highgo Software (Canada/China/Pakistan) URL : http://www.highgo.ca ADDR: 10318 WHALLEY BLVD, Surrey, BC EMAIL: mailto: ahsan.h...@highgo.ca
Re: unsupportable composite type partition keys
On Sun, Dec 22, 2019 at 10:51 PM Tom Lane wrote: > > I wrote: > > Now as far as point 1 goes, I think it's not really that awful to use > > CheckAttributeType() with a dummy attribute name. The attached > > incomplete patch uses "partition key" which causes it to emit errors > > like > > regression=# create table fool (a int, b int) partition by list ((row(a, > > b))); > > ERROR: column "partition key" has pseudo-type record > > I don't think that that's unacceptable. But if we wanted to improve it, > > we could imagine adding another flag, say CHKATYPE_IS_PARTITION_KEY, > > that doesn't affect CheckAttributeType's semantics, just the wording of > > the error messages it throws. > > Here's a fleshed-out patch that does it like that. > > While poking at this, I also started to wonder why CheckAttributeType > wasn't recursing into ranges, since those are our other kind of > container type. And the answer is that it must, because we allow > creation of ranges over composite types: > > regression=# create table foo (f1 int, f2 int); > CREATE TABLE > regression=# create type foorange as range (subtype = foo); > CREATE TYPE > regression=# alter table foo add column r foorange; > ALTER TABLE > > Simple things still work on table foo, but surely this is exactly > what CheckAttributeType is supposed to be preventing. With the > second attached patch you get > > regression=# alter table foo add column r foorange; > ERROR: composite type foo cannot be made a member of itself > > The second patch needs to go back all the way, the first one > only as far as we have partitions. While working on regression tests for index collation versioning [1], I noticed that the 2nd patch apparently broke the ability to create a table using a range over collatable datatype attribute, which we apparently don't test anywhere. Simple example to reproduce: CREATE TYPE myrange_text AS range (subtype = text); CREATE TABLE test_text( meh myrange_text ); ERROR: 42P16: no collation was derived for column "meh" with collatable type text HINT: Use the COLLATE clause to set the collation explicitly. AFAICT, this is only a thinko in CheckAttributeType(), where the range collation should be provided rather than the original tuple desc one, as per attached. I also added a create/drop table in an existing regression test that was already creating range over collatable type. [1] https://www.postgresql.org/message-id/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com fix_collatable_range-v1.diff Description: Binary data
Re: [Patch] Make pg_checksums skip foreign tablespace directories
Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier: > Indeed, with a bad timing and a crash in the middle of > write_relcache_init_file(), it could be possible to have such a file > left around in the data folder. Having a past tablespace version > left > around after an upgrade is a pilot error in my opinion because > pg_upgrade generates a script to cleanup past tablespaces, no? I'm suprised, why should that be a problem in copy mode? For me this is a fair use case to test upgrades, e.g. for development purposes, if someone want's to still have application tests against the current old version, for fallback and whatever. And people might not want such upgrades as a "fire-and-forget" task. We even have the --clone feature now, making this even faster. If our project policy is to never ever touch an pg_upgrade'd PostgreSQL instance again in copy mode, i wasn't aware of it. And to be honest, even PostgreSQL itself allows you to reuse tablespace locations for multiple instances as well, so the described problem should exist not in upgraded clusters only. Bernd
Re: pg_restore crash when there is a failure before all child process is created
On Fri, Jan 31, 2020 at 1:09 AM Tom Lane wrote: > > vignesh C writes: > > On Wed, Jan 29, 2020 at 6:54 PM Ahsan Hadi wrote: > >> Can you share a test case or steps that you are using to reproduce this > >> issue? Are you reproducing this using a debugger? > > > I could reproduce with the following steps: > > Make cluster setup. > > Create few tables. > > Take a dump in directory format using pg_dump. > > Restore the dump generated above using pg_restore with very high > > number for --jobs options around 600. > > I agree this is quite broken. Another way to observe the crash is > to make the fork() call randomly fail, as per booby-trap-fork.patch > below (not intended for commit, obviously). > > I don't especially like the proposed patch, though, as it introduces > a great deal of confusion into what ParallelState.numWorkers means. > I think it's better to leave that as being the allocated array size, > and instead clean up all the fuzzy thinking about whether workers > are actually running or not. Like 0001-fix-worker-status.patch below. > The patch looks fine to me. The test is also getting fixed by the patch. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Physical replication slot advance is not persistent
On 30.01.2020 05:19, Michael Paquier wrote: On Wed, Jan 29, 2020 at 05:10:20PM +0900, Kyotaro Horiguchi wrote: Looks perfect. Thanks Horiguchi-san and others. Applied and back-patched down to 11. Great! Thanks for getting this done. -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Re: Duplicated LSN in ReorderBuffer
On Fri, Jan 31, 2020 at 12:35 AM Alvaro Herrera wrote: > > On 2019-Jul-26, Andres Freund wrote: > > > Petr, Simon, see the potential issue related to fast forward at the > > bottom. > > I think we neglected this bit. I looked at the patch Simon submitted > downthread, and while I vaguely understand that we need to process > NEW_CID records during fast-forwarding, > Right, IIUC, that is mainly to mark txn has catalog changes (ReorderBufferXidSetCatalogChanges) so that later such a transaction can be used to build historic snapshots (during SnapBuildCommitTxn). Now, if that is true, then won't we need a similar change in DecodeHeapOp for XLOG_HEAP_INPLACE case as well? Also, I am not sure if SnapBuildProcessNewCid needs to create Cid map in such a case. > I don't quite understand why we > still can skip XLOG_INVALIDATION messages. I *think* we should process > those too. > I also think so. If you are allowing to execute invalidation irrespective of fast_forward in DecodeStandbyOp, then why not do the same in DecodeCommit where we add ReorderBufferAddInvalidations? > Here's a patch that also contains that change; I also > reworded Simon's proposed comment. I appreciate reviews. > Even though, in theory, the changes look to be in the right direction, but it is better if we can create a test case to reproduce the problem. I am not sure, but I think we need to generate a few DDLs for the transaction for which we want to fast forward and then after moving forward the DMLs dependent on those WAL should create some problem as we have skipped executing invalidations and addition of such a transaction in the historic snapshot. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: allow online change primary_conninfo
Hello Small rebase due to merge conflict of the tests. No functional changes since v7. PS: also it is end of current CF, I will mark patch entry as moved to the next CF.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 2e2af9e96e..7887391bbb 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4005,9 +4005,15 @@ ANY num_sync ( @@ -4022,9 +4028,13 @@ ANY num_sync ( ). - This parameter can only be set at server start. + This parameter can only be set in the postgresql.conf + file or on the server command line. This setting has no effect if primary_conninfo is not - set. + set or the server is not in standby mode. + + + The WAL receiver is restarted after an update of primary_slot_name. @@ -4142,7 +4152,11 @@ ANY num_sync ( ). The default is on. The only reason to turn this off would be if the remote instance is currently out of available replication slots. This -parameter can only be set at server start. +parameter can only be set in the postgresql.conf +file or on the server command line. + + +The WAL receiver is restarted after an update of wal_receiver_create_temp_slot. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 408b9b489a..c24b93332e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -803,6 +803,12 @@ static XLogSource readSource = 0; /* XLOG_FROM_* code */ static XLogSource currentSource = 0; /* XLOG_FROM_* code */ static bool lastSourceFailed = false; +/* + * Need for restart running WalReceiver due the configuration change. + * Suitable only for XLOG_FROM_STREAM source + */ +static bool pendingWalRcvRestart = false; + typedef struct XLogPageReadPrivate { int emode; @@ -11831,6 +11837,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, for (;;) { int oldSource = currentSource; + bool startWalReceiver = false; /* * First check if we failed to read from the current source, and @@ -11864,54 +11871,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, if (!StandbyMode) return false; - /* - * If primary_conninfo is set, launch walreceiver to try - * to stream the missing WAL. - * - * If fetching_ckpt is true, RecPtr points to the initial - * checkpoint location. In that case, we use RedoStartLSN - * as the streaming start position instead of RecPtr, so - * that when we later jump backwards to start redo at - * RedoStartLSN, we will have the logs streamed already. - */ - if (PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0) - { - XLogRecPtr ptr; - TimeLineID tli; - - if (fetching_ckpt) - { - ptr = RedoStartLSN; - tli = ControlFile->checkPointCopy.ThisTimeLineID; - } - else - { - ptr = RecPtr; - - /* - * Use the record begin position to determine the - * TLI, rather than the position we're reading. - */ - tli = tliOfPointInHistory(tliRecPtr, expectedTLEs); - - if (curFileTLI > 0 && tli < curFileTLI) -elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u", - (uint32) (tliRecPtr >> 32), - (uint32) tliRecPtr, - tli, curFileTLI); - } - curFileTLI = tli; - RequestXLogStreaming(tli, ptr, PrimaryConnInfo, - PrimarySlotName); - receivedUpto = 0; - } - /* * Move to XLOG_FROM_STREAM state in either case. We'll * get immediate failure if we didn't launch walreceiver, * and move on to the next state. */ currentSource = XLOG_FROM_STREAM; + startWalReceiver = true; break; case XLOG_FROM_STREAM: @@ -12057,7 +12023,69 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, Assert(StandbyMode); /* - * Check if WAL receiver is still active. + * shutdown WAL receiver if restart is requested. + */ + if (!startWalReceiver && pendingWalRcvRestart) + { + if (WalRcvRunning()) + ShutdownWalRcv(); + + /* + * Re-scan for possible new timelines if we were + * requested to recover to the latest timeline. + */ + if (recoveryTargetTimeLineGoal == + RECOVERY_TARGET_TIMELINE_LATEST) + rescanLatestTimeLine(); + + startWalReceiver = true; + } + pendingWalRcvRestart = false; + + /* + * Launch walreceiver if needed. + * + * If fetching_ckpt is true, RecPtr points to the initial + * checkpoint location. In that case, we use RedoStartLSN + * as the streaming start position instead of RecPtr, so + * that when we later jump backwards to start redo a
Re: Data race in interfaces/libpq/fe-exec.c
According to folks significantly cleverer than me, this can be a problem: See section 2.4 in https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf tl;dr in a self-fulfilling prophecy kind of way, there are no benign data-races. So the compiler can assume no-one would write a data race. Therefore it can make aggressive optimisations that render what would otherwise have been a benign race actively dangerous. Granted the danger here is mainly theoretical, and the main problem for me is that turning off ThreadSanitizer because of this issue means that other more dangerous issues in my code (rather than the postgres client code) won't be found. But the above is the reason why ThreadSanitizer folks don't want to put in any "you can ignore this race, it's benign" functionality, and told me that the right thing to do was to contact you folks and get a fix in upstream... Mark On Thu, Jan 30, 2020 at 4:46 PM Tom Lane wrote: > Mark Charsley writes: > > This line > > > https://github.com/postgres/postgres/blob/30012a04a6c8127397a8ab71e160d9c7e7fbe874/src/interfaces/libpq/fe-exec.c#L1073 > > triggers data race errors when run under ThreadSanitizer (*) > > > As far as I can tell, the static variable in question is a hack to allow > a > > couple of deprecated functions that are already unsafe to use > > (PQescapeString and PQescapeBytea) to be fractionally less unsafe to use. > > Yup. > > > Would there be any interest in a patch changing the type of > > static_client_coding > > and static_std_strings > > < > https://github.com/postgres/postgres/blob/30012a04a6c8127397a8ab71e160d9c7e7fbe874/src/interfaces/libpq/fe-exec.c#L49 > > > > to > > some atomic equivalent, so the data race goes away? > > I don't see that making those be some other datatype would improve anything > usefully. (1) On just about every platform known to man, int and bool are > going to be atomic anyway. (2) The *actual* hazards here, as opposed to > theoretical ones, are that you're using more than one connection with > different settings for these values, whereupon it's not clear whether > those deprecated functions will see the appropriate settings when they're > used. A different data type won't help that. > > In short: this warning you're getting from ThreadSanitizer is entirely > off-point, so contorting the code to suppress it seems useless. > > regards, tom lane >
PATCH: Fix wrong size argument to pg_strncasecmp
Hello, Please find a one-liner patch in the attachment. This patch fixes a size parameter of `pg_strncasecmp` which compared a "string" literal with a variable by passing a size of 5 while the "string" literal has 6 bytes. This issue can be observed with the following query (where 'X' is any character other than 'g' and null byte): select json_to_tsvector('"abc"'::json, '"strinX"') Before this patch this query returns the `'abc':1` result instead of failing with the following error: wrong flag in flag array: "strinX" By the way, the `strncasecmp` usages around the fixed line could use `strcasecmp` which doesn't accept the `size_t n` argument. --- Regards, Disconnect3d 0001-Fix-wrong-size-argument-to-pg_strncasecmp.patch Description: Binary data
Re: standby apply lag on inactive servers
On 2020/01/31 5:45, Alvaro Herrera wrote: On 2020-Jan-30, Kyotaro Horiguchi wrote: Agreed about backbranches. I'd like to preserve the word "transaction" as it is more familiar to users. How about something like the follows? "transactions are completed up to log time %s" That's a good point. I used the phrase "transaction activity", which seems sufficiently explicit to me. So, the attached is the one for master; in back branches I would use the same (plus minor conflict fixes), except that I would drop the message wording changes. You're thinking to apply this change to the back branches? Sorry if my understanding is not right. But I don't think that back-patch is ok because it changes the documented existing behavior of pg_last_xact_replay_timestamp(). So it looks like the behavior change not a bug fix. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: table partitioning and access privileges
On 2020/01/31 13:38, Amit Langote wrote: On Fri, Jan 31, 2020 at 1:28 AM Fujii Masao wrote: On 2020/01/31 1:02, Tom Lane wrote: Fujii Masao writes: Thanks for updating the patch! Barring any objection, I will commit this fix and backport it to all supported versions. Sorry for not having paid closer attention to this thread, but ... is back-patching this behavioral change really a good idea? It's not that hard to imagine that somebody is expecting the old behavior and will complain that we broke their application's security. So I'd have thought it better to fix only in HEAD, with a compatibility warning in the v13 release notes. I'm afraid it's much more likely that people will complain about making such a change in a minor release than that they will be happy about it. It's particularly risky to be making it in what will be the last 9.4.x release, because we will not have any opportunity to undo it in that branch if there is pushback. Fair enough. I finally did back-patch because the behavior is clearly documented and I failed to hear the opinions to object the back-patch. But I should have heard and discussed such risks more. I'm OK to revert all those back-patch. Instead, probably the document should be updated in old branches. I could find only this paragraph in the section on inheritance that talks about how access permissions work: 9.4: "Note how table access permissions are handled. Querying a parent table can automatically access data in child tables without further access privilege checking. This preserves the appearance that the data is (also) in the parent table. Accessing the child tables directly is, however, not automatically allowed and would require further privileges to be granted." 9.5-12: "Inherited queries perform access permission checks on the parent table only. Thus, for example, granting UPDATE permission on the cities table implies permission to update rows in the capitals table as well, when they are accessed through cities. This preserves the appearance that the data is (also) in the parent table. But the capitals table could not be updated directly without an additional grant. In a similar way, the parent table's row security policies (see Section 5.7) are applied to rows coming from child tables during an inherited query. A child table's policies, if any, are applied only when it is the table explicitly named in the query; and in that case, any policies attached to its parent(s) are ignored." Do you mean that the TRUNCATE exception should be noted here? Yes, that's what I was thinking. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
[PATCH] Erase the distinctClause if the result is unique by definition
Hi: I wrote a patch to erase the distinctClause if the result is unique by definition, I find this because a user switch this code from oracle to PG and find the performance is bad due to this, so I adapt pg for this as well. This patch doesn't work for a well-written SQL, but some drawback of a SQL may be not very obvious, since the cost of checking is pretty low as well, so I think it would be ok to add.. Please see the patch for details. Thank you. 0001-Erase-the-distinctClause-if-the-result-is-unique-by-.patch Description: Binary data
Re: pg_restore crash when there is a failure before all child process is created
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: tested, passed Documentation:not tested I have applied tested both patches separately and ran regression with both. No new test cases are failing with both patches. The issues is fixed by both patches however the fix from Tom (0001-fix-worker-status.patch) looks more elegant. I haven't done a detailed code review.
Re: Allow to_date() and to_timestamp() to accept localized names
On Tue, Jan 28, 2020 at 9:35 PM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-01-28 16:47, Juan José Santamaría Flecha wrote: > > This patch targets to do something symmetrical to to_char(), which will > > just return a single value. > > I didn't fully realize while reading this thread that to_char() already > supports localized output and this patch indeed just wants to do the > opposite. > > So I'm withdrawing my concerns with respect to this patch. As long as > it can do a roundtrip conversion with to_char(), it's fine. > > We can avoid issues with non injective case conversion languages with a double conversion, so both strings in the comparison end up in the same state. I propose an upper/lower conversion as in the attached patch. Regards, Juan José Santamaría Flecha diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ceda48e..b1951e5 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5968,7 +5968,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); TM prefix -translation mode (print localized day and month names based on +translation mode (use localized day and month names based on ) TMMonth @@ -6000,8 +6000,13 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); TM does not include trailing blanks. + + + + + to_timestamp and to_date ignore - the TM modifier. + the case when receiving names as an input. diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index f58331d..e5b4eb5 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -1059,9 +1059,11 @@ static int from_char_parse_int_len(int *dest, const char **src, const int len, FormatNode *node, bool *have_error); static int from_char_parse_int(int *dest, const char **src, FormatNode *node, bool *have_error); -static int seq_search(const char *name, const char *const *array, int *len); +static int seq_search_ascii(const char *name, const char *const *array, int *len); +static int seq_search_localized(const char *name, char **array, int *len); static int from_char_seq_search(int *dest, const char **src, const char *const *array, + char **localized_array, FormatNode *node, bool *have_error); static void do_to_timestamp(text *date_txt, text *fmt, bool std, struct pg_tm *tm, fsec_t *fsec, int *fprec, @@ -2459,7 +2461,7 @@ from_char_parse_int(int *dest, const char **src, FormatNode *node, bool *have_er * suitable for comparisons to ASCII strings. */ static int -seq_search(const char *name, const char *const *array, int *len) +seq_search_ascii(const char *name, const char *const *array, int *len) { unsigned char firstc; const char *const *a; @@ -2505,8 +2507,74 @@ seq_search(const char *name, const char *const *array, int *len) } /* - * Perform a sequential search in 'array' for an entry matching the first - * character(s) of the 'src' string case-insensitively. + * Sequentially search an array of possibly non-English words for + * a case-insensitive match to the initial character(s) of "name". + * + * This has the same API as seq_search_ascii(), but we use a more general + * downcasing transformation to achieve case-insensitivity. + * + * The array is treated as const, but we don't declare it that way because + * the arrays exported by pg_locale.c aren't const. + */ +static int +seq_search_localized(const char *name, char **array, int *len) +{ + char **a; + char *lower_name; + char *upper_name; + + *len = 0; + + /* empty string can't match anything */ + if (!*name) + return -1; + + /* + * We do an upper/lower conversion to avoid problems with languages + * in which case conversions are not injective. + */ + upper_name = str_toupper(unconstify(char *, name), strlen(name), + DEFAULT_COLLATION_OID); + lower_name = str_tolower(upper_name, strlen(upper_name), + DEFAULT_COLLATION_OID); + pfree(upper_name); + + for (a = array; *a != NULL; a++) + { + char *lower_element; + char *upper_element; + int element_len; + + /* Upper/lower-case array element, assuming it is normalized */ + upper_element = str_toupper(*a, strlen(*a), DEFAULT_COLLATION_OID); + lower_element = str_tolower(upper_element, strlen(upper_element), + DEFAULT_COLLATION_OID); + pfree(upper_element); + element_len = strlen(lower_element); + + /* Match? */ + if (strncmp(lower_name, lower_element, element_len) == 0) + { + *len = element_len; + pfree(lower_element); + pfree(lower_name); + return a - array; + } + pfree(lower_element); + } + + pfree(lower_name); + return -1; +} + +/* + * Perform a sequential search in 'array' (or 'localized_array', if that's + * not NULL) for an entry matching the first character(s) of the 'src' + * string case-insensitively. + * +
Re: standby apply lag on inactive servers
On 2020-Jan-31, Fujii Masao wrote: > You're thinking to apply this change to the back branches? Sorry > if my understanding is not right. But I don't think that back-patch > is ok because it changes the documented existing behavior > of pg_last_xact_replay_timestamp(). So it looks like the behavior > change not a bug fix. Yeah, I am thinking in backpatching it. The documented behavior is already not what the code does. Do you have a situation where this change would break something? If so, can you please explain what it is? I think (and I said it upthread) a 100% complete fix involves tracking two timestamps rather than one. I was thinking that that would be too invasive because it changes XLogCtlData shmem struct ... but that struct is private to xlog.c, so I think it's fine to change the struct. The problem though is that the user-visible change that I want to achieve is pg_last_xact_replay_timestamp(), and it would be obviously wrong to use the new XLogCtlData field rather than the existing one, as that would be a behavior change in the same sense that you're now complaining about. So I would achieve nothing. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Missing break in RelationFindReplTupleSeq
Eventually we find out that logical replication in the current version of Postgres works significantly slower on table with replica identity full than old pglogical implementation. The comment to RelationFindReplTupleSeq says: Note that this stops on the first matching tuple. But actually this function continue traversal until end of the table even if tuple was found. I wonder if break; should be added to the end of for loop. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Missing break in RelationFindReplTupleSeq
On 2020-Jan-31, Konstantin Knizhnik wrote: > Eventually we find out that logical replication in the current version of > Postgres works significantly slower on table with replica identity full than > old pglogical implementation. > > The comment to RelationFindReplTupleSeq says: > > Note that this stops on the first matching tuple. > > But actually this function continue traversal until end of the table even if > tuple was found. > I wonder if break; should be added to the end of for loop. Wow, you're right, and the "break" is missing there. I propose it like this. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c index 582b0cb017..30cba89da7 100644 --- a/src/backend/executor/execReplication.c +++ b/src/backend/executor/execReplication.c @@ -327,6 +327,9 @@ retry: XactLockTableWait(xwait, NULL, NULL, XLTW_None); goto retry; } + + /* Found our tuple and it's not locked */ + break; } /* Found tuple, try to lock it in the lockmode. */
Re: Do we need to handle orphaned prepared transactions in the server?
On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer wrote: > On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar wrote: > > > > So having seen the feedback on this thread, and I tend to agree with > most of what has been said here, I also agree that the server core isn't > really the ideal place to handle the orphan prepared transactions. > > > > Ideally, these must be handled by a transaction manager, however, I do > believe that we cannot let database suffer for failing of an external > software, and we did a similar change through introduction of idle in > transaction timeout behavior. > > The difference, IMO, is that idle-in-transaction aborts don't affect > anything we've promised to be durable. > > Once you PREPARE TRANSACTION the DB has made a promise that that txn > is durable. We don't have any consistent feedback channel to back to > applications and say "Hey, if you're not going to finish this up we > need to get rid of it soon, ok?". If a distributed transaction manager > gets consensus for commit and goes to COMMIT PREPARED a previously > prepared txn only to find that it has vanished, that's a major > problem, and one that may bring the entire DTM to a halt until the > admin can intervene. > > This isn't like idle-in-transaction aborts. It's closer to something > like uncommitting a previously committed transaction. > > I do think it'd make sense to ensure that the documentation clearly > highlights the impact of abandoned prepared xacts on server resource > retention and performance, preferably with pointers to appropriate > views. I haven't reviewed the docs to see how clear that is already. > Having seen the documentation, IMHO the document does contain enough information for users to understand what issues can be caused by these orphaned prepared transactions. > > I can also see an argument for a periodic log message (maybe from > vacuum?) warning when old prepared xacts hold xmin down. Including one > sent to the client application when an explicit VACUUM is executed. > (In fact, it'd make sense to generalise that for all xmin-retention). > I think that opens up the debate on what we really mean by "old" and whether that requires a syntax change when creating a prepared transactions as Thomas Kellerer suggested earlier? I agree that vacuum should periodically throw warnings for any prepared xacts that are holding xmin down. Generalising it for all xmin-retention is a fair idea IMHO, though that does increase the overall scope here. A vacuum process should (ideally) periodically throw out warnings for anything that is preventing it (including orphaned prepared transactions) from doing its routine work so that somebody can take necessary actions. > But I'm really not a fan of aborting such txns. If you operate with > some kind of broken global transaction manager that can forget or > abandon prepared xacts, then fix it, or adopt site-local periodic > cleanup tasks that understand your site's needs. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > 2ndQuadrant - PostgreSQL Solutions for the Enterprise > -- Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca ADDR: 10318 WHALLEY BLVD, Surrey, BC CELL:+923335449950 EMAIL: mailto:hamid.akh...@highgo.ca SKYPE: engineeredvirus
Re: standby apply lag on inactive servers
On 2020/01/31 22:40, Alvaro Herrera wrote: On 2020-Jan-31, Fujii Masao wrote: You're thinking to apply this change to the back branches? Sorry if my understanding is not right. But I don't think that back-patch is ok because it changes the documented existing behavior of pg_last_xact_replay_timestamp(). So it looks like the behavior change not a bug fix. Yeah, I am thinking in backpatching it. The documented behavior is already not what the code does. Maybe you thought this because getRecordTimestamp() extracts the timestamp from even WAL record of a restore point? That is, you're concerned about that pg_last_xact_replay_timestamp() returns the timestamp of not only commit/abort record but also restore point one. Right? As far as I read the code, this problem doesn't occur because SetLatestXTime() is called only for commit/abort records, in recoveryStopsAfter(). No? Do you have a situation where this change would break something? If so, can you please explain what it is? For example, use the return value of pg_last_xact_replay_timestamp() (and also the timestamp in the log message output at the end of recovery) as a HINT when setting recovery_target_time later. Use it to compare with the timestamp retrieved from the master server, in order to monitor the replication delay. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: standby apply lag on inactive servers
On 2020-Jan-31, Fujii Masao wrote: > On 2020/01/31 22:40, Alvaro Herrera wrote: > > On 2020-Jan-31, Fujii Masao wrote: > > > > > You're thinking to apply this change to the back branches? Sorry > > > if my understanding is not right. But I don't think that back-patch > > > is ok because it changes the documented existing behavior > > > of pg_last_xact_replay_timestamp(). So it looks like the behavior > > > change not a bug fix. > > > > Yeah, I am thinking in backpatching it. The documented behavior is > > already not what the code does. > > Maybe you thought this because getRecordTimestamp() extracts the > timestamp from even WAL record of a restore point? That is, you're > concerned about that pg_last_xact_replay_timestamp() returns the > timestamp of not only commit/abort record but also restore point one. > Right? right. > As far as I read the code, this problem doesn't occur because > SetLatestXTime() is called only for commit/abort records, in > recoveryStopsAfter(). No? ... uh, wow, you're right about that too. IMO this is extremely fragile, easy to break, and under-documented. But you're right, there's no bug there at present. > > Do you have a situation where this > > change would break something? If so, can you please explain what it is? > > For example, use the return value of pg_last_xact_replay_timestamp() > (and also the timestamp in the log message output at the end of > recovery) as a HINT when setting recovery_target_time later. Hmm. I'm not sure how you would use it in that way. I mean, I understand how it *can* be used that way, but it seems too fragile to be done in practice, in a scenario that's not just laboratory games. > Use it to compare with the timestamp retrieved from the master server, > in order to monitor the replication delay. That's precisely the use case that I'm aiming at. The timestamp currently is not useful because this usage breaks when the primary is inactive (no COMMIT records occur). During such periods of inactivity, CHECKPOINT records would keep the "last xtime" current. This has actually happened in a production setting, it's not a thought experiment. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Marking some contrib modules as trusted extensions
Dean Rasheed writes: > On Wed, 29 Jan 2020 at 21:39, Tom Lane wrote: >> The bigger picture here is that I don't want to get push-back that >> we've broken somebody's security posture by marking too many extensions >> trusted. So for anything where there's any question about security >> implications, we should err in the conservative direction of leaving >> it untrusted. > I wonder if the same could be said about pgrowlocks. Good point. I had figured it was probably OK given that it's analogous to the pg_locks view (which is unrestricted AFAIR), and that it already has some restrictions on what you can see. I'd have no hesitation about dropping it off this list though, since it's probably not used that much and it could also be seen as exposing internals. regards, tom lane
Re: Use compiler intrinsics for bit ops in hash
On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > 1. Is ceil_log2_64 dead code? > > > > Let's call it nascent code. I suspect there are places it could go, if > > I look for them. Also, it seemed silly to have one without the other. > > > > While not absolutely required, I'd like us to find at least one > place and start using it. (Clang also nags at me when we have > unused functions). Done in the expanded patches attached. > > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote: > > > 4. It seems like you *really* would like an operation like LZCNT in x86 > > > (first appearing in Haswell) that is well defined on zero input. ISTM > > > the alternatives are: > > > > > >a) Special case 1. That seems straightforward, but the branching cost > > >on a seemingly unlikely condition seems to be a lot of performance > > >loss > > > > > >b) Use architecture specific intrinsic (and possibly with CPUID > > >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ > > >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to > > >instructions that are well defined on zero in most ISA's other than > > >x86, so maybe we can get away with special-casing x86? > > i. We can detect LZCNT instruction by checking one of the > "extended feature" (EAX=8001) bits using CPUID. Unlike the > "basic features" (EAX=1), extended feature flags have been more > vendor-specific, but fortunately it seems that the feature bit > for LZCNT is the same [1][2]. > > ii. We'll most likely still need to provide a fallback > implementation for processors that don't have LZCNT (either > because they are from a different vendor, or an older Intel/AMD > processor). I wonder if simply checking for 1 is "good enough". > Maybe a micro benchmark is in order? I'm not sure how I'd run one on the architectures we support. What I've done here is generalize our implementation to be basically like LZCNT and TZCNT at the cost of a brief branch that might go away at runtime. > 2. (My least favorite) use inline asm (a la our popcount > implementation). Yeah, I'd like to fix that, but I kept the scope of this one relatively narrow. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 5fcaa74146206e4de05ca8cbd863aca20bba94bf Mon Sep 17 00:00:00 2001 From: David Fetter Date: Wed, 29 Jan 2020 02:09:59 -0800 Subject: [PATCH v4 1/2] de-long-ify To: hackers MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="2.24.1" This is a multi-part message in MIME format. --2.24.1 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c index 4b562d8d3f..482a569814 100644 --- a/src/backend/access/gist/gistbuildbuffers.c +++ b/src/backend/access/gist/gistbuildbuffers.c @@ -34,11 +34,11 @@ static void gistPlaceItupToPage(GISTNodeBufferPage *pageBuffer, IndexTuple item); static void gistGetItupFromPage(GISTNodeBufferPage *pageBuffer, IndexTuple *item); -static long gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); -static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum); +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb); +static void gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum); -static void ReadTempFileBlock(BufFile *file, long blknum, void *ptr); -static void WriteTempFileBlock(BufFile *file, long blknum, void *ptr); +static void ReadTempFileBlock(BufFile *file, uint64 blknum, void *ptr); +static void WriteTempFileBlock(BufFile *file, uint64 blknum, void *ptr); /* @@ -64,7 +64,7 @@ gistInitBuildBuffers(int pagesPerBuffer, int levelStep, int maxLevel) /* Initialize free page management. */ gfbb->nFreeBlocks = 0; gfbb->freeBlocksLen = 32; - gfbb->freeBlocks = (long *) palloc(gfbb->freeBlocksLen * sizeof(long)); + gfbb->freeBlocks = (int64 *) palloc(gfbb->freeBlocksLen * sizeof(int64)); /* * Current memory context will be used for all in-memory data structures @@ -469,7 +469,7 @@ gistPopItupFromNodeBuffer(GISTBuildBuffers *gfbb, GISTNodeBuffer *nodeBuffer, /* * Select a currently unused block for writing to. */ -static long +static uint64 gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) { /* @@ -487,7 +487,7 @@ gistBuffersGetFreeBlock(GISTBuildBuffers *gfbb) * Return a block# to the freelist. */ static void -gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum) +gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, uint64 blocknum) { int ndx; @@ -495,9 +495,9 @@ gistBuffersReleaseBlock(GISTBuildBuffers *gfbb, long blocknum)
Re: standby apply lag on inactive servers
On 2020/01/31 23:47, Alvaro Herrera wrote: On 2020-Jan-31, Fujii Masao wrote: On 2020/01/31 22:40, Alvaro Herrera wrote: On 2020-Jan-31, Fujii Masao wrote: You're thinking to apply this change to the back branches? Sorry if my understanding is not right. But I don't think that back-patch is ok because it changes the documented existing behavior of pg_last_xact_replay_timestamp(). So it looks like the behavior change not a bug fix. Yeah, I am thinking in backpatching it. The documented behavior is already not what the code does. Maybe you thought this because getRecordTimestamp() extracts the timestamp from even WAL record of a restore point? That is, you're concerned about that pg_last_xact_replay_timestamp() returns the timestamp of not only commit/abort record but also restore point one. Right? right. As far as I read the code, this problem doesn't occur because SetLatestXTime() is called only for commit/abort records, in recoveryStopsAfter(). No? ... uh, wow, you're right about that too. IMO this is extremely fragile, easy to break, and under-documented. Yeah, it's worth improving the code. But you're right, there's no bug there at present. Do you have a situation where this change would break something? If so, can you please explain what it is? For example, use the return value of pg_last_xact_replay_timestamp() (and also the timestamp in the log message output at the end of recovery) as a HINT when setting recovery_target_time later. Hmm. I'm not sure how you would use it in that way. I mean, I understand how it *can* be used that way, but it seems too fragile to be done in practice, in a scenario that's not just laboratory games. Use it to compare with the timestamp retrieved from the master server, in order to monitor the replication delay. That's precisely the use case that I'm aiming at. The timestamp currently is not useful because this usage breaks when the primary is inactive (no COMMIT records occur). During such periods of inactivity, CHECKPOINT records would keep the "last xtime" current. This has actually happened in a production setting, it's not a thought experiment. I've heard that someone periodically generates dummy tiny transactions (say, every minute), as a band-aid solution, to avoid inactive primary. Of course, this is not a perfect solution. The idea that I proposed previously was to introduce pg_last_xact_insert_timestamp() [1] into core. This function returns the timestamp of commit / abort records in *primary* side. So we can retrieve that timestamp from the primary (e.g., by using dblink) and compare its result with pg_last_xact_replay_timestamp() to calculate the delay in the standby. Another idea is to include the commit / abort timestamp in primary-keepalive-message that periodially sent from the primary to the standby. Then if we introduce the function returning that timestamp, in the standby side, we can easily compare the commit / abort timestamps taken from both primary and standby, in the standby. [1] https://www.postgresql.org/message-id/cahgqgwf3zjfunej5ka683ku5rqubtswtqfq7g1x0g34o+jx...@mail.gmail.com Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Unix-domain socket support on Windows
On 2020-01-30 19:41, Tom Lane wrote: The code looks fine (and a big +1 for not having knowledge of DEFAULT_PGSOCKET_DIR wired into UNIXSOCK_PATH). I wonder though whether any user-facing documentation needs to be adjusted. There are no user-facing changes in this patch yet. That will come with subsequent patches. This patch has now been committed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Extracting only the columns needed for a query
I'm bumping this to the next commitfest because I haven't had a chance to address the questions posed by Dmitry. I'm sure I'll get to it in the next few weeks. > I believe it would be beneficial to add this potential API extension patch into > the thread (as an example of an interface defining how scanCols could be used) > and review them together. As for including some code that uses the scanCols, after discussing off-list with a few folks, there are three options I would like to pursue for doing this. One option I will pursue is using the scanCols to inform the columns needed to be spilled for memory-bounded hashagg (mentioned by Jeff here [1]). The second is potentially using the scanCols in the context of FDW. Corey, would you happen to have any specific examples handy of when this might be useful? The third is exercising it with a test only but providing an example of how a table AM API user like Zedstore uses the columns during planning. [1] https://www.postgresql.org/message-id/e5566f7def33a9e9fdff337cca32d07155d7b635.camel%40j-davis.com -- Melanie Plageman
Re: Extracting only the columns needed for a query
On Wed, Jan 15, 2020 at 7:54 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > For UPDATE, we need all of the columns in the table because of the > > table_lock() API's current expectation that the slot has all of the > > columns populated. If we want UPDATE to only need to insert the column > > values which have changed, table_tuple_lock() will have to change. > > Can you please elaborate on this part? I'm probably missing something, > since I don't see immediately where are those expectations expressed. > The table_tuple_lock() has TupleTableSlot as output argument. Comment for the function states "*slot: contains the target tuple". Usage of table_tuple_lock() in places like ExecUpdate() and GetTupleForTrigger() use the returned tuple for EvalPlanQual. Also, it's unknown within table_tuple_lock() what is expectation and what would be performed on the returned slot/tuple. Hence, it becomes tricky for any AM currently to guess and hence need to return full tuple, as API doesn't state only which columns it would be interested in or just wishes to take the lock and needs nothing back in slot.
Re: Shared memory leak on DSM slot exhaustion
On Thu, Jan 30, 2020 at 4:54 AM Thomas Munro wrote: > As reported over on pgsql-general[1], we leak shared memory when we > run out of DSM slots. To see this, add the random-run-out-of-slots > hack I showed in that thread, create and analyze a table t(i) with a > million integers, run with dynamic_shared_memory_type=mmap, and try > SELECT COUNT(*) FROM t t1 JOIN t t2 USING (i) a few times and you'll > see that pgbase/pg_dynshmem fills up with leaked memory segments each > time an out-of-slots errors is raised. (It happens with all DSM > types, but then the way to list the segments varies or there isn't > one, depending on type and OS.) Here's a draft patch to fix that. Whoops. The patch looks OK to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
get a relations DDL server-side
I would like to introduce the ability to get object DDL (server-side) by introducing a new function with roughly the following prototype: get_ddl(regclass) RETURNS text LANGUAGE C STRICT PARALLEL SAFE; A previous conversation seemed to encourage the development of this feature https://www.postgresql.org/message-id/CADkLM=fxfsrhaskk_by_a4uomj1te5mfggd_rwwqfv8wp68...@mail.gmail.com I would like to start work on this patch but wanted acceptance on the function signature. Thank you!
Re: [Proposal] Global temporary tables
On Thu, Jan 30, 2020 at 4:33 AM Konstantin Knizhnik wrote: > On 29.01.2020 21:16, Robert Haas wrote: > > On Wed, Jan 29, 2020 at 10:30 AM Konstantin Knizhnik > > wrote: > > > > I think that the idea of calling ambuild() on the fly is not going to > > work, because, again, I don't think that calling that from random > > places in the code is safe. > > It is not a random place in the code. > Actually it is just one place - _bt_getbuf > Why it can be unsafe if it affects only private backends data? Because, as I already said, not every operation is safe at every point in the code. This is true even when there's no concurrency involved. For example, executing user-defined code is not safe while holding a buffer lock, because the user-defined code might try to do something that locks the same buffer, which would cause an undetected, uninterruptible deadlock. > But GTT case is different. Heap and indexes can be easily initialized by > backend using existed functions. That would be nice if we could make it work. Someone would need to show, however, that it's safe. > You say that it not safe. But you have not explained why it is unsafe. > Yes, I agree that it is my responsibility to prove that it is safe. > And as I already wrote, I can not provide such proof now. I will be > pleased if you or anybody else can help to convince that this approach > is safe or demonstrate problems with this approach. That's fair, but nobody's obliged to spend time on that. > But I really like to receive more constructive critics rather than "this > approach is wrong because it is unsafe". I'm sure, and that's probably valid. Equally, however, I'd like to receive more analysis of why it is safe than "I don't see anything wrong with it so it's probably fine." And I think that's pretty valid, too. > Actually index is not created on the fly. > Index is created is usual way, by executing "create index" command. > So all components of the Postgres (planner, executor,...) treat GTT > indexes in the same way as regular indexes. > Locking and invalidations policies are exactly the same for them. > The only difference is that content of GTT index is constructed on > demand using private backend data. > Is it safe or not? We are just reading data from local buffers/files and > writing them here. > May be I missed something but I do not see any unsafety here. > There are issues with updating statistic but them can be solved. But that's not all you are doing. To build the index, you'll have to sort the data. To sort the data, you'll have to call btree support functions. Those functions can be user-defined, and can do complex operations like catalog access that depend on a good transaction state, no buffer locks being held, and nothing already in progress within this backend that can get confused as a result of this operation. Just as a quick test, I tried doing this in _bt_getbuf: +if (InterruptHoldoffCount != 0) +elog(WARNING, "INTERRUPTS ARE HELD OFF"); That causes 103 out of 196 regression tests to fail, which means that it's pretty common to arrive in _bt_getbuf() with interrupts held off. At the very least, that means that the index build would be uninterruptible, which already seems unacceptable. Probably, it means that the calling code is already holding an LWLock, because that's normally what causes HOLD_INTERRUPTS() to happen. And as I have already explained, that is super-problematic, because of deadlock risks, and because it risks putting other backends into non-interruptible waits if they should happen to need the LWLock we're holding here. I really don't understand why the basic point here remains obscure. In general, it tends to be unsafe to call high-level code from low-level code, not just in PostgreSQL but in pretty much any program. Do you think that we can safely add a GUC that executes a user-defined SQL query every time an LWLock is acquired? If you do, why don't you try adding code to do that to LWLockAcquire and testing it out a little bit? Try making the SQL query do something like query pg_class, find a table name that's not in use, and create a table by that name. Then run the regression tests with the GUC set to run that query and see how it goes. I always hate to say that things are "obvious," because what's obvious to me may not be obvious to somebody else, but it is clear to me, at least, that this has no chance of working. Even though I can't say exactly what will break, or what will break first, I'm very sure that a lot of things will break and that most of them are unfixable. Now, your idea is not quite as crazy as that, but it has the same basic problem: you can't insert code into a low-level facility that uses a high level facility which may in turn use and depend on that very same low-level facility to not be in the middle of an operation. If you do, it's going to break somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_restore crash when there is a failure before all child process is created
ahsan hadi writes: > I have applied tested both patches separately and ran regression with both. > No new test cases are failing with both patches. > The issues is fixed by both patches however the fix from Tom > (0001-fix-worker-status.patch) looks more elegant. I haven't done a detailed > code review. Pushed, thanks for looking! regards, tom lane
Re: get a relations DDL server-side
"Jordan Deitch" writes: > I would like to introduce the ability to get object DDL (server-side) by > introducing a new function with roughly the following prototype: > get_ddl(regclass) > RETURNS text > LANGUAGE C STRICT PARALLEL SAFE; Umm ... "regclass" would only be appropriate for relations. If you actually want to support more than one type of object with a single function, you'll need two OIDs. Catalog's OID and object's OID are the usual choices, per pg_describe_object() and similar functions. I don't think "get_ddl" is a particularly apt function name, either. It ignores the precedent of existing functions with essentially this same functionality, such as pg_get_triggerdef(), pg_get_constraintdef(), etc. One wonders why duplicate that existing functionality, so maybe you should think about adding per-object-type functions instead of trying to make one function to rule them all. The larger reason why this doesn't exist already, BTW, is that we've tended to find that it's not all that useful to get back a monolithic chunk of DDL text for complicated objects such as tables. You should provide a little more clarity as to what use-case you foresee, because otherwise there are just a *whole* lot of things that aren't clear. Some examples: * Should the output include a CREATE COMMENT if the object has a comment? * What about ownership and ACL (grants)? * On tables, are foreign keys part of the table, or are they distinct objects? (Hint: both answers can be correct depending on use-case) * How about indexes, and do you want to treat constraint indexes differently from other ones? (Constraint indexes *could* be made part of the table's DDL, but other indexes need a separate CREATE) * Do you need options, such as whether to pretty-print expressions? You might also find it instructive to dig through the archives for past discussions about moving more of pg_dump's logic into the server; that's the area where this has come up over and over. regards, tom lane
widen vacuum buffer counters
We recently noticed that vacuum buffer counters wraparound in extreme cases, with ridiculous results. Example: 2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17 pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec That's to be expected, as tables exist that are large enough for 4 billion buffer accesses to be a possibility. Let's widen the counters, as in the attached patch. I propose to backpatch this. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8ce501151e..049ec2703b 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -614,7 +614,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, vacrelstats->new_dead_tuples, OldestXmin); appendStringInfo(&buf, - _("buffer usage: %d hits, %d misses, %d dirtied\n"), + _("buffer usage: %zd hits, %zd misses, %zd dirtied\n"), VacuumPageHit, VacuumPageMiss, VacuumPageDirty); diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c index b1f6291b99..eb19644419 100644 --- a/src/backend/utils/init/globals.c +++ b/src/backend/utils/init/globals.c @@ -140,9 +140,9 @@ int VacuumCostPageDirty = 20; int VacuumCostLimit = 200; double VacuumCostDelay = 0; -int VacuumPageHit = 0; -int VacuumPageMiss = 0; -int VacuumPageDirty = 0; +int64 VacuumPageHit = 0; +int64 VacuumPageMiss = 0; +int64 VacuumPageDirty = 0; int VacuumCostBalance = 0; /* working state for vacuum */ bool VacuumCostActive = false; diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 62d64aa0a1..f985453ec3 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -252,9 +252,9 @@ extern int VacuumCostPageDirty; extern int VacuumCostLimit; extern double VacuumCostDelay; -extern int VacuumPageHit; -extern int VacuumPageMiss; -extern int VacuumPageDirty; +extern int64 VacuumPageHit; +extern int64 VacuumPageMiss; +extern int64 VacuumPageDirty; extern int VacuumCostBalance; extern bool VacuumCostActive;
Re: widen vacuum buffer counters
On Fri, Jan 31, 2020 at 9:59 PM Alvaro Herrera wrote: > > We recently noticed that vacuum buffer counters wraparound in extreme > cases, with ridiculous results. Example: > > 2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table > "somtab.sf.foobar": index scans: 17 > pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 > skipped frozen > tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but > not yet removable > buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied > avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s > system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec > > That's to be expected, as tables exist that are large enough for 4 billion > buffer accesses to be a possibility. Let's widen the counters, as in the > attached patch. > > I propose to backpatch this. +1, and patch LGTM.
Re: unsupportable composite type partition keys
Julien Rouhaud writes: > On Sun, Dec 22, 2019 at 10:51 PM Tom Lane wrote: >> While poking at this, I also started to wonder why CheckAttributeType >> wasn't recursing into ranges, since those are our other kind of >> container type. And the answer is that it must, because we allow >> creation of ranges over composite types: > While working on regression tests for index collation versioning [1], > I noticed that the 2nd patch apparently broke the ability to create a > table using a range over collatable datatype attribute, which we > apparently don't test anywhere. Ugh. > AFAICT, this is only a thinko in CheckAttributeType(), where the range > collation should be provided rather than the original tuple desc one, > as per attached. I also added a create/drop table in an existing > regression test that was already creating range over collatable type. Looks good, although I think maybe we'd better test the case a little harder than this. Will tweak that and push -- thanks! regards, tom lane
Re: WIP: Aggregation push-down
Hello, Thank you for this great feature ! I hope this will be reviewed/validated soon ;o) Just a comment: set enable_agg_pushdown to true; isn't displayed in EXPLAIN (SETTINGS) syntax. The following modification seems to fix that: src/backend/utils/misc/guc.c {"enable_agg_pushdown", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables aggregation push-down."), NULL, GUC_EXPLAIN --- line Added --- }, &enable_agg_pushdown, false, NULL, NULL, NULL }, then postgres=# set enable_agg_pushdown = true; SET postgres=# explain (settings) select 1; QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=4) Settings: enable_agg_pushdown = 'on' (2 rows) Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: widen vacuum buffer counters
Alvaro Herrera writes: > We recently noticed that vacuum buffer counters wraparound in extreme > cases, with ridiculous results. Ugh. > I propose to backpatch this. +1 for widening these counters, but since they're global variables, -0.2 or so for back-patching. I don't know of any reason that an extension would be touching these, but I feel like the problem isn't severe enough to justify taking an ABI-break risk. Also, %zd is the wrong format code for int64. Recommended practice these days is to use "%lld" with an explicit cast of the printf argument to long long (just to be sure). That doesn't work safely before v12, and if you did insist on back-patching further, you'd need to jump through hoops to avoid having platform-specific format codes in a translatable string. (The side-effects for translation seem like an independent argument against back-patching.) regards, tom lane
Re: PATCH: Fix wrong size argument to pg_strncasecmp
Dominik Czarnota writes: > This patch fixes a size parameter of `pg_strncasecmp` which compared a > "string" literal with a variable by passing a size of 5 while the "string" > literal has 6 bytes. Pushed, thanks for the report! > By the way, the `strncasecmp` usages around the fixed line could use > `strcasecmp` which doesn't accept the `size_t n` argument. Maybe. It's not clear to me that it's be okay to assume that the variable input string is null-terminated. regards, tom lane
Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Em sex., 10 de jan. de 2020 às 09:22, Mark Lorenz escreveu: > Updated the chg_to_date_wwd.patch with additional tests (because it > works not only for 'D' pattern but also for all day patterns like 'Day' > or 'DY'). Added the necessary documentation change. > > (The fix_to_char_wwd.patch from > f4e740a8de3ad1e762a28f6ff253ea4f%40four-two.de is still up-to-date) The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Hi Mark, this is a review of the patch: chg_to_date_wwd.patch There hasn't been any problem, at least that I've been able to find. This one applies cleanly. The entire compilation went without error as well. # Without patch # postgres=# SELECT to_date('2019-1-1', '-WW-D'); to_date 2019-01-01 (1 row) postgres=# SELECT to_date('2019-1-2', '-WW-D'); to_date 2019-01-01 (1 row) postgres=# SELECT to_date('2019-1-9', '-WW-D'); to_date 2019-01-01 (1 row) # With patch # postgres=# SELECT to_date('2019-1-1', '-WW-D'); to_date 2018-12-30 (1 row) postgres=# SELECT to_date('2019-1-2', '-WW-D'); to_date 2018-12-31 (1 row) postgres=# SELECT to_date('2019-1-9', '-WW-D'); to_date 2019-01-07 (1 row) +1 for committer review -- Cleysson Lima
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On 2020-01-24 08:50, Michael Paquier wrote: On Wed, Jan 22, 2020 at 12:55:30AM +0300, Alexander Korotkov wrote: On Sun, Jan 19, 2020 at 1:24 PM Michael Paquier wrote: +static int +RestoreArchivedWALFile(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand) Not a fan of putting that to pg_rewind/parsexlog.c. It has nothing to do with WAL parsing, and it seems to me that we could have an argument for making this available as a frontend-only API in src/common/. I've put it into src/common/fe_archive.c. This split makes sense. You have forgotten to update @pgcommonfrontendfiles in Mkvcbuild.pm for the MSVC build. Still, I can see that we have a lot of duplication between the code of the frontend and the backend, which is annoying.. The execution part is tricky to split because the backend has pre- and post- callbacks, the interruption handling is not the same and there is the problem of elog() calls, with elevel that can be changed depending on the backend configuration. However, I can see one portion of the code which is fully duplicated and could be refactored out: the construction of the command to execute, by having in input the restore_command string and the arguments that we expect to replace in the '%' markers which are part of the command. OK, I agree that duplication of this part directly is annoying. I did a refactoring and moved this restore_command building code into common/archive. Separate file was needed, since fe_archive is frontend-only, so I cannot put universal code there. I do not know is there any better place for it. If NULL is specified for one of those values, then the construction routine returns NULL, synonym of failure. And the result of the routine is the built command. I think that you would need an extra argument to give space for the error message generated in the event of an error when building the command. I did it in a bit different way, but the overall logic is exactly as you proposed, I hope. Also I have checked win/linux build in the similar to cfbot CI setup and now everything runs well. +++ b/src/include/common/fe_archive.h +#ifndef ARCHIVE_H +#define ARCHIVE_H This should be FE_ARCHIVE_H. Changed. - while ((c = getopt_long(argc, argv, "D:nNPR", long_options, &option_index)) != -1) + while ((c = getopt_long(argc, argv, "D:nNPRC:c", long_options, &option_index)) != -1) This still includes 'C', but that should not be the case. + pg_rewind with the -c or + -C option to automatically retrieve them from the WAL [...] + -C restore_command + --target-restore-command=restore_command [...] + available, try re-running pg_rewind with + the -c or -C option to search + for the missing files in the WAL archive. Three places in the docs need to be cleaned up. I have fixed all the above, thanks. Do we really need to test the new "archive" mode as well for 003_extrafiles and 002_databases? I would imagine that only 001_basic is enough. We do not check any unique code paths with it, so I do it only for 001_basic now. I have left it as a test mode, since it will be easy to turn this on for any other test in the future if needed and it fits well RewindTest.pm module. +# Moves WAL files to the temporary location and returns restore_command +# to get them back. +sub move_wal +{ + my ($tmp_dir, $master_pgdata) = @_; + my $wal_archive_path = "$tmp_dir/master_wal_archive"; + my $wal_path = "$master_pgdata/pg_wal"; + my $wal_dir; + my $restore_command; I think that this routine is wrongly designed. First, in order to copy the contents from one path to another, we have RecursiveCopy::copy_path, and there is a long history about making it safe for the TAP tests. Second, there is in PostgresNode::enable_restoring already a code path doing the decision-making on how building restore_command. We should keep this code path unique. Yeah, thank you for pointing me to the RecursiveCopy::copypath and especially PostgresNode::enable_restoring, I have modified test to use them instead. The copypath does not work with existing destination directories and does not preserve initial permissions, so I had to do some dirty work to achieve what we need in the test and keep it simple in the same time. However, I think that using these unified routines is much better for a future support. New version is attached. Do you have any other comments or objections? Regards -- Alexey From 7af0b3642f6218c764eee361e013f24bfb43ffbe Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Fri, 31 Jan 2020 20:08:13 +0300 Subject: [PATCH v14] pg_rewind: Add options to restore WAL files from archive Currently, pg_rewind fails when it could not find required WAL files in the target data directory. One have to manually figure out which WAL files are required and copy them back from archive. This commit implements two new pg_rew
Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Cleysson Lima writes: > this is a review of the patch: chg_to_date_wwd.patch > There hasn't been any problem, at least that I've been able to find. AFAICS, the point of this patch is to make to_date symmetrical with the definition of WW that the other patch wants for to_char. But the other patch is wrong, for the reasons I explained upthread, so I doubt that we want this one either. I still think that it'd be necessary to invent at least one new format field code in order to get to a sane version of this feature. As they stand, 'WW' and 'D' do not agree on what a week is, and changing the behavior of either one in order to make them agree is just not going to happen. BTW, I went to check on what Oracle thinks about this, since these functions are allegedly Oracle-compatible. On PG, I get this for the WW and D values for the next few days: select to_char(current_date+n, '-MM-DD -WW-D Day') from generate_series(0,10) n; to_char 2020-01-31 2020-05-6 Friday 2020-02-01 2020-05-7 Saturday 2020-02-02 2020-05-1 Sunday 2020-02-03 2020-05-2 Monday 2020-02-04 2020-05-3 Tuesday 2020-02-05 2020-06-4 Wednesday 2020-02-06 2020-06-5 Thursday 2020-02-07 2020-06-6 Friday 2020-02-08 2020-06-7 Saturday 2020-02-09 2020-06-1 Sunday 2020-02-10 2020-06-2 Monday (11 rows) I did the same calculations using Oracle 11g R2 on sqlfiddle.com and got the same results. Interestingly, though, I also tried it on https://rextester.com/l/oracle_online_compiler and here's what I get there: 2020-01-31 2020-05-5 Freitag 2020-02-01 2020-05-6 Samstag 2020-02-02 2020-05-7 Sonntag 2020-02-03 2020-05-1 Montag 2020-02-04 2020-05-2 Dienstag 2020-02-05 2020-06-3 Mittwoch 2020-02-06 2020-06-4 Donnerstag 2020-02-07 2020-06-5 Freitag 2020-02-08 2020-06-6 Samstag 2020-02-09 2020-06-7 Sonntag 2020-02-10 2020-06-1 Montag (I don't know how to switch locales on these sites, so I don't have any way to check what happens in other locales.) So we agree with Oracle on what WW means, but they count D as 1-7 starting on either Sunday or Monday according to locale. I wonder whether we should change to match that? Maybe "TMD" should act that way? It's already the case that their "Day" acts like our "TMDay", evidently. Either way, though, the WW weeks don't line up with the D weeks, and we're not likely to make them do so. So I think an acceptable version of this feature has to involve defining at least one new format code and maybe as many as three, to produce year, week and day values that agree on whichever definition of "a week" you want to use, and then to_date has to enforce that input uses matching year/week/day field types, very much like it already does for ISO versus Gregorian dates. I also notice that neither patch touches the documentation. A minimum requirement here is defining what you think the underlying "week" is, if it's neither ISO nor the existing WW definition. As I said before, it'd also be a good idea to provide some evidence that there are other people using that same week definition. regards, tom lane
Re: Shared memory leak on DSM slot exhaustion
On Sat, Feb 1, 2020 at 7:37 AM Robert Haas wrote: > Whoops. The patch looks OK to me. Pushed.
What's difference among insert/write/flush lsn?
Hi, pg_current_wal_flush_lsn()pg_lsnGet current write-ahead log flush location pg_current_wal_insert_lsn()pg_lsnGet current write-ahead log insert location pg_current_wal_lsn()pg_lsnGet current write-ahead log write location I guess write is about how many bytes written in shared cache, and flush is flush to file, which makes it persistent. Anybody gives some official explanation? Thanks. Regards, Jinhua Luo
Re: dropdb --force
On Sat, Nov 30, 2019 at 7:46 AM Michael Paquier wrote: > > On Fri, Nov 29, 2019 at 03:31:21PM +0530, Amit Kapila wrote: > > It might be better to move it to next CF as the discussion is still active. > > OK, just did that. > I have marked this as committed in CF. This was committed some time back[1][2]. I was just waiting for the conclusion on what we want to do about Windows behavior related to socket closure which we discovered while testing this feature. It is not very clear whether we want to do anything about it, see discussion on thread [3], so I closed this. [1] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1379fd537f9fc7941c8acff8c879ce3636dbdb77 [2] - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=80e05a088e4edd421c9c0374d54d787c8a4c0d86 [3] - https://www.postgresql.org/message-id/CALDaNm2tEvr_Kum7SyvFn0%3D6H3P0P-Zkhnd%3DdkkX%2BQ%3DwKutZ%3DA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Don't try fetching future segment of a TLI.
On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote: > At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky >> Regading influence: issue is not about the large amount of WALs to apply >> but in searching for the non-existing WALs on the remote storage, each such >> search can take 5-10 seconds while obtaining existing WAL takes >> milliseconds. > > Wow. I didn't know of a file system that takes that much seconds to > trying non-existent files. Although I still think this is not a bug, > but avoiding that actually leads to a big win on such systems. I have not tested this case but I can imagine it would be slow in practice. It's axiomatic that is hard to prove a negative. With multi-region replication it might well take some time to be sure that the file *really* doesn't exist and hasn't just been lost in a single region. > After a thought, I think it's safe and effectively doable to let > XLogFileReadAnyTLI() refrain from trying WAL segments of too-high > TLIs. Some garbage archive files out of the range of a timeline might > be seen, for example, after reusing archive directory without clearing > files. However, fetching such garbages just to fail doesn't > contribute durability or reliablity at all, I think. The patch seems sane, the trick will be testing it. Pavel, do you have an environment where you can ensure this is a performance benefit? Regards, -- -David da...@pgmasters.net
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, Jan 30, 2020 at 6:10 PM Dilip Kumar wrote: > > Other comments look fine to me so I will reply to them along with the > next version of the patch. > This still needs more work, so I have moved this to the next CF. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs
On Mon, Nov 25, 2019 at 1:17 PM Michael Paquier wrote: > > On Mon, Sep 09, 2019 at 05:34:43PM +0530, Amit Kapila wrote: > > The only difference is in the last line where for me it gives > > assertion failure when trying to do ReleaseAuxProcessResources. Below > > is the callstack: > > No need for Windows on this one and I have reproduced easily the same > trace as Amit. The patch has been moved to next CF. Chengchao, could > you provide an update please? > I have marked this patch as "Returned with feedback" as it's been long since the author has responded. Feel free to provide a new patch for the next CF. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [Proposal] Add accumulated statistics for wait event
Hi st 15. 1. 2020 v 14:15 odesílatel Imai Yoshikazu napsal: > On 2020/01/13 4:11, Pavel Stehule wrote: > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: not tested > > Documentation:tested, passed > > > > I like this patch, because I used similar functionality some years ago > very successfully. The implementation is almost simple, and the result > should be valid by used method. > > Thanks for your review! > > > The potential problem is performance impact. Very early test show impact > cca 3% worst case, but I'll try to repeat these tests. > > Yes, performance impact is the main concern. I want to know how it > affects performance in various test cases or on various environments. > > > There are some ending whitespaces and useless tabs. > > > > The new status of this patch is: Waiting on Author > I attach v4 patches removing those extra whitespaces of the end of lines > and useless tabs. > today I run 120 5minutes pgbench tests to measure impact of this patch. Result is attached. test PSQL="/usr/local/pgsql/bin/psql" PGBENCH="/usr/local/pgsql/bin/pgbench" export PGDATABASE=postgres echo "*** START ***" > ~/result.txt for i in 1 5 10 50 100 do echo "scale factor $i" >> ~/result.txt $PSQL -c "create database bench$i" $PGBENCH -i -s $i "bench$i" for c in 1 5 10 50 do $PGBENCH -c $c -T 300 "bench$i" >> ~/result.txt done $PSQL -c "vacuum full" "bench$i" $PSQL -c "vacuum analyze" "bench$i" for c in 1 5 10 50 do $PGBENCH -S -c $c -T 300 "bench$i" >> ~/result.txt done $PSQL -c "drop database bench$i" done Tested on computer with 4CPU, 8GB RAM - configuration: shared buffers 2GB, work mem 20MB The result is interesting - when I run pgbench in R/W mode, then I got +/- 1% changes in performance. Isn't important if tracking time is active or not (tested on Linux). In this mode the new code is not on critical path. More interesting results are from read only tests (there are visible some higher differences) for scale 5/ and 50 users - the tracking time increase performance about 12% (same result I got for scale/users 10/50), in other direction patched but without tracking time decreases performance about 10% for for 50/50 (with without tracking time) and 100/5 Looks so for higher scale than 5 and higher number of users 50 the results are not too much stable (for read only tests - I repeated tests) and there overhead is about 10% from 60K tps to 55Ktps - maybe I hit a hw limits (it running with 4CPU) Thanks to Tomas Vondra and 2ndq for hw for testing Regards Pavel wait_event_type | wait_event | calls|times -+---++-- Client | ClientRead| 1489681408 | 221616362961 Lock| transactionid | 103113369 | 71918794185 LWLock | WALWriteLock | 104781468 | 20865855903 Lock| tuple | 21323744 | 15800875242 IO | WALSync | 50862170 | 8666988491 LWLock | lock_manager | 18415423 |575308266 IO | WALWrite | 51482764 |205775892 LWLock | buffer_content| 15385387 |168446128 LWLock | wal_insert|1502092 | 90019731 IPC | ProcArrayGroupUpdate | 178238 | 46527364 LWLock | ProcArrayLock | 587356 | 13298246 IO | DataFileExtend|2715557 | 11615216 IPC | ClogGroupUpdate | 54319 | 10622013 IO | DataFileRead |5805298 | 9596545 IO | SLRURead |9518930 | 7166217 LWLock | CLogControlLock | 746759 | 6783602 > -- > Yoshikazu Imai > pgbench.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)
On Thu, Jan 30, 2020 at 1:49 PM Kyotaro Horiguchi wrote: > At Thu, 30 Jan 2020 13:30:56 +0900, Kasahara Tatsuhito > wrote in > > > TID scan : yes , seq_scan, , > > Here is wrong, because TID scan came to have SO_TYPE_SEQSCAN flags > > from commit 147e3722f7. > > It is reflectings the discussion below, which means TID scan doesn't > have corresponding SO_TYPE_ value. Currently it is setting > SO_TYPE_SEQSCAN by accedent. Ah, sorry I misunderstood.. Upon further investigation, the SO_TYPE_SEQSCAN flag was also used at heap_beginscan() to determine whether a predicate lock was taken on the entire relation. if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN)) { /* * Ensure a missing snapshot is noticed reliably, even if the * isolation mode means predicate locking isn't performed (and * therefore the snapshot isn't used here). */ Assert(snapshot); PredicateLockRelation(relation, snapshot); } Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID scan. To keep the old behavior, I think it would be better to add a new SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation. Attach the v2 patch which change the status to following. (same as -v11 but have new SO_TYPE_TIDSCAN flag) increments scan typetable_beginscan?, per scan, per tuple , SO_TYPE flags = TID scan : yes , , , SO_TYPE_TIDSCAN Is it acceptable change for HEAD and v12? Best regards, -- Tatsuhito Kasahara kasahara.tatsuhito _at_ gmail.com fix_tidscan_increments_seqscan_num_v2.patch Description: Binary data
Re: unsupportable composite type partition keys
On Fri, Jan 31, 2020 at 04:20:36PM -0500, Tom Lane wrote: > Julien Rouhaud writes: > > On Sun, Dec 22, 2019 at 10:51 PM Tom Lane wrote: > >> While poking at this, I also started to wonder why CheckAttributeType > >> wasn't recursing into ranges, since those are our other kind of > >> container type. And the answer is that it must, because we allow > >> creation of ranges over composite types: > > > While working on regression tests for index collation versioning [1], > > I noticed that the 2nd patch apparently broke the ability to create a > > table using a range over collatable datatype attribute, which we > > apparently don't test anywhere. > > Ugh. > > > AFAICT, this is only a thinko in CheckAttributeType(), where the range > > collation should be provided rather than the original tuple desc one, > > as per attached. I also added a create/drop table in an existing > > regression test that was already creating range over collatable type. > > Looks good, although I think maybe we'd better test the case a little > harder than this. Will tweak that and push -- thanks! Ah, I wasn't sure that additional tests on a table would be worthwhile enough. Thanks for tweaking and pushing!