Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Robert Haas wrote: > On Tue, May 28, 2019 at 11:27 AM Antonin Houska wrote: > > We thought that it's more convenient for administrator to enter password. To > > achieve that, we can instead tell the admin how to use the key derivation > > tool > > (pg_keysetup) and send its output to postgres. So yes, it's possible to > > always > > receive the key from the "encryption key command". I'll change this. > > Sounds good. I'm not quite sure how this is going to work. Ideally > you'd like the encryption key command to fetch the key from something > like ssh-agent, or maybe pop up a window on the user's terminal with a > key prompt. Just reading from stdin and writing to stdout is not > going to be very convenient... When I mentioned writing to stdout in my previous email, I viewed it from the perspective of postmaster: whichever way the command gets the password from the DBA, it'll always write it to stdout and postmaster will read it from there. This was to address your objection that the executables do not actually "return" strings. > > > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade > > > with > > > +the --link option.) > > > > > > That seems pretty unfortunate. Any way to do better? > > > > > > Currently I'm thinking of making the IV less dependent on RelFileNode > > (e.g. generate a random number for which RelFileNode serves as the seed) and > > storing it in pg_class as a storage parameter. > > I don't think it's a very good idea. For one thing, you can't store > the IV for pg_class in pg_class; that has a circularity problem. For > another thing, I'm not sure we can or should try to do catalog access > from every place where an IV might be needed. In fact, I'd go so far > as to say that sounds like a recipe for a lot of pain and fragility. > > One potential approach is to modify pg_upgrade to preserve the dbnode > and relfilenode. I don't know of any fundamental reason why such a > change shouldn't be possible, although it is probably a bunch of work, > and there may be problems with which I am not acquainted. Actually it doesn't seem to be that tricky. In the --binary-upgrade mode, pg_dump does record pg_class.oid: -- For binary upgrade, must preserve pg_class oids SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('57465'::pg_catalog.oid); The header comment in pg_upgrade.c indicates that this is because of TOAST. So I think that dbnode and relfilenode are not preserved just because there was no strong reason to do so by now. -- Antonin Houska Web: https://www.cybertec-postgresql.com
RE: Why does not subquery pruning conditions inherit to parent query?
Monday, May 27, 2019 7:56 PM Tom Lane wrote: > No, what is happening there is that the subquery gets inlined into the > outer query. That can't happen in your previous example because of the > aggregation/GROUP BY --- but subqueries that are just scan/join queries > generally get merged into the parent. Thank you for your replay and sorry for late response. Ok, I understand. Is it possible to improve a subquery quals to pull up into outer query? Oracle looks like do that. Regards, Kato Sho > -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Monday, May 27, 2019 7:56 PM > To: Kato, Sho/加藤 翔 > Cc: 'David Rowley' ; > pgsql-hack...@postgresql.org > Subject: Re: Why does not subquery pruning conditions inherit to parent > query? > > "Kato, Sho" writes: > > Friday, May 24, 2019 5:10 PM, David Rowley wrote: > >> The planner can only push quals down into a subquery, it cannot pull > >> quals from a subquery into the outer query. > > > However, following query looks like the subquery qual is pushed down > into the outer query. > > postgres=# explain select * from jta, (select a from jtb where a = 1) > c1 where jta.a = c1.a; > > QUERY PLAN > > -- > > Nested Loop (cost=0.00..81.94 rows=143 width=8) > >-> Seq Scan on jta0 (cost=0.00..41.88 rows=13 width=4) > > Filter: (a = 1) > >-> Materialize (cost=0.00..38.30 rows=11 width=4) > > -> Seq Scan on jtb0 (cost=0.00..38.25 rows=11 width=4) > >Filter: (a = 1) > > No, what is happening there is that the subquery gets inlined into the > outer query. That can't happen in your previous example because of the > aggregation/GROUP BY --- but subqueries that are just scan/join queries > generally get merged into the parent. > > regards, tom lane > >
Comment typo in tableam.h
Please see the diff attached. -- Antonin Houska Web: https://www.cybertec-postgresql.com diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h index bf3a472018..e3619b8e7e 100644 --- a/src/include/access/tableam.h +++ b/src/include/access/tableam.h @@ -1331,7 +1331,7 @@ table_finish_bulk_insert(Relation rel, int options) */ /* - * Create storage for `rel` in `newrode`, with persistence set to + * Create storage for `rel` in `newrnode`, with persistence set to * `persistence`. * * This is used both during relation creation and various DDL operations to
Re: Minimal logical decoding on standbys
On Fri, 31 May 2019 at 11:08, Amit Khandekar wrote: > > On Thu, 30 May 2019 at 20:13, Andres Freund wrote: > > > > Hi, > > > > On 2019-05-30 19:46:26 +0530, Amit Khandekar wrote: > > > @@ -1042,7 +1042,8 @@ ReplicationSlotReserveWal(void) > > > if (!RecoveryInProgress() && SlotIsLogical(slot)) > > > { > > > > > > } > > > else > > > { > > > - restart_lsn = GetRedoRecPtr(); > > > + restart_lsn = SlotIsLogical(slot) ? > > > +GetXLogReplayRecPtr(&ThisTimeLineID) : > > > GetRedoRecPtr(); > > > > > > But then when I do pg_create_logical_replication_slot(), it hangs in > > > DecodingContextFindStartpoint(), waiting to find new records > > > (XLogReadRecord). > > > > But just till the primary has logged the necessary WAL records? If you > > just do CHECKPOINT; or such on the primary, it should succeed quickly? > > Yes, it waits until there is a commit record, or (just tried) until a > checkpoint command. Is XLOG_RUNNING_XACTS record essential for the logical decoding to build a consistent snapshot ? Since the restart_lsn is now ReplayRecPtr, there is no XLOG_RUNNING_XACTS record, and so the snapshot state is not yet SNAPBUILD_CONSISTENT. And so DecodingContextFindStartpoint()=>DecodingContextReady() never returns true, and hence DecodingContextFindStartpoint() goes in an infinite loop, until it gets XLOG_RUNNING_XACTS. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Re: New committer: David Rowley
On 5/30/19 11:39 AM, Magnus Hagander wrote: Congratulations to David, may the buildfarm be gentle to him, and his first revert far away! Congrats ! Best regards, Jesper
Re: New committer: David Rowley
On Thu, 30 May 2019 at 11:39, Magnus Hagander wrote: > Congratulations to David, may the buildfarm be gentle to him, and his first > revert far away! Thank you, all. I will do my best not to anger the build gods and turn the farm red ;-) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Why does not subquery pruning conditions inherit to parent query?
On Fri, 31 May 2019 at 03:18, Kato, Sho wrote: > Is it possible to improve a subquery quals to pull up into outer query? Sure, it's possible, but it would require writing code. When it can and cannot/should not be done would need to be determined. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: tableam.h fails cpluspluscheck
Andres Freund writes: > On 2019-05-30 14:01:00 -0400, Tom Lane wrote: >> Kinda looks like you can't get away with using "struct" on a forward >> declaration of something that is not actually a struct type. > Ugh. Odd that only C++ compilers complain. I just removed the typedef, > it's not needed anymore (it used to be neccessary before moving > IndexBuildCallback's definition to tableam.h - but was wrong then too, > just cpluspluscheck didn't notice). Cool, thanks. regards, tom lane
Re: cpluspluscheck vs vpath
Andres Freund writes: > Attached is a small patch allowing cpluspluscheck to run from different > directories. I needs the src and build directories for that, > unsurprisingly. No objection to changing this, but you could reduce the surprise factor for existing workflows with a couple of defaults for the arguments --- allow srcdir to default to "." and builddir to default to the same as srcdir. regards, tom lane
Time range
Hi, I was wondering why there is not a type Range of time without time zone, I think it may be useful for someone, Is good if i do PR. Sorry if I've worte in the wrong place
Re: Time range
On Fri, May 31, 2019 at 08:35:31AM +0200, Donald Shtjefni wrote: Hi, I was wondering why there is not a type Range of time without time zone, I think it may be useful for someone, Is good if i do PR. Sorry if I've worte in the wrong place Doesn't tsrange already do that? That's a timestamp without timezone range type. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On 2018-06-01 14:22:26, Alexander Korotkov wrote: > I'd like to note, that I'm going to provide revised version of this patch > to the next commitfest. > After some conversations at PGCon regarding this patch, I got more > confident that providing separate paths for incremental sorts is right. > In the next revision of this patch, incremental sort paths would be > provided in more cases. Alexander, I'm currently rebasing the patch, and if I get some time I'm going to look into working on it. Would you be able to provide a bit more detail on the changes you were hoping to make after conversations you'd had with others? I'm hoping for any pointers/context you have from those conversations as to what you felt was necessary to get this change committed. Thanks, James Coleman
Re: Time range
Donald Shtjefni schrieb am 31.05.2019 um 13:35: > I was wondering why there is not a type Range of time without time zone, I > think it may be useful for someone, Is good if i do PR. you can easily create one: create type timerange as range (subtype = time); Thomas
Re: incorrect xlog.c coverage report
On 2019-May-30, Michael Paquier wrote: > On Wed, May 29, 2019 at 12:09:08PM -0400, Alvaro Herrera wrote: > > Are there objections to doing that now on the master branch? > > Adding the flush call just on HEAD is fine for me. Not sure that > there is an actual reason to back-patch that. Okay ... I did that (patch attached), and while my new __gcov_flush() shows as covered after I run the src/test/recovery tests, the function I mentioned earlier (validateRecoveryParameters) is not any more covered after the patch than it was before. So I'm not sure how useful this really is. Maybe someone can point to something that this patch is doing wrong ... or maybe I'm just looking at the wrong report, or this is not the function that we wanted to add coverage for? (On a whim I named the symbol USE_GCOV_COVERAGE because we could theoretically have coverage reports using some other symbol. I suppose it could very well be just USE_COVERAGE instead.) gcov after patch: -: 5379:static void 100: 5380:validateRecoveryParameters(void) -: 5381:{ 100: 5382:if (!ArchiveRecoveryRequested) 81: 5383:return; -: 5384: -: 5385:/* -: 5386: * Check for compulsory parameters -: 5387: */ 19: 5388:if (StandbyModeRequested) -: 5389:{ 19: 5390:if ((PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) && #: 5391:(recoveryRestoreCommand == NULL || strcmp(recoveryRestoreCommand, "") == 0)) #: 5392:ereport(WARNING, -: 5393:(errmsg("specified neither primary_conninfo nor restore_command"), -: 5394: errhint("The database server will regularly poll the pg_wal subdirectory to check for files placed there."))); -: 5395:} -: 5396:else -: 5397:{ #: 5398:if (recoveryRestoreCommand == NULL || #: 5399:strcmp(recoveryRestoreCommand, "") == 0) #: 5400:ereport(FATAL, -: 5401: (errcode(ERRCODE_INVALID_PARAMETER_VALUE), -: 5402: errmsg("must specify restore_command when standby mode is not enabled"))); -: 5403:} -: 5404: -: 5405:/* -: 5406: * Override any inconsistent requests. Note that this is a change of -: 5407: * behaviour in 9.5; prior to this we simply ignored a request to pause if -: 5408: * hot_standby = off, which was surprising behaviour. -: 5409: */ 38: 5410:if (recoveryTargetAction == RECOVERY_TARGET_ACTION_PAUSE && 19: 5411:!EnableHotStandby) #: 5412:recoveryTargetAction = RECOVERY_TARGET_ACTION_SHUTDOWN; -: 5413: -: 5414:/* -: 5415: * If user specified recovery_target_timeline, validate it or compute the -: 5416: * "latest" value. We can't do this until after we've gotten the restore -: 5417: * command and set InArchiveRecovery, because we need to fetch timeline -: 5418: * history files from the archive. -: 5419: */ 19: 5420:if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_NUMERIC) -: 5421:{ #: 5422:TimeLineID rtli = recoveryTargetTLIRequested; -: 5423: -: 5424:/* Timeline 1 does not have a history file, all else should */ #: 5425:if (rtli != 1 && !existsTimeLineHistory(rtli)) #: 5426:ereport(FATAL, -: 5427: (errcode(ERRCODE_INVALID_PARAMETER_VALUE), -: 5428: errmsg("recovery target timeline %u does not exist", -: 5429:rtli))); #: 5430:recoveryTargetTLI = rtli; -: 5431:} 19: 5432:else if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST) -: 5433:{ -: 5434:/* We start the "latest" search from pg_control's timeline */ 19: 5435:recoveryTargetTLI = findNewestTimeLine(recoveryTargetTLI); -: 5436:} -: 5437:else -: 5438:{ -: 5439:/* -: 5440: * else we just use the recoveryTargetTLI as already read from -: 5441: * ControlFile -: 5442: */ #: 5443:Assert(recoveryTargetTimeLineGoal == RECOVERY_TAR
Re: incorrect xlog.c coverage report
On 2019-May-31, Alvaro Herrera wrote: > On 2019-May-30, Michael Paquier wrote: > > > On Wed, May 29, 2019 at 12:09:08PM -0400, Alvaro Herrera wrote: > > > Are there objections to doing that now on the master branch? > > > > Adding the flush call just on HEAD is fine for me. Not sure that > > there is an actual reason to back-patch that. > > Okay ... I did that (patch attached), and while my new __gcov_flush() > shows as covered after I run the src/test/recovery tests, the function I > mentioned earlier (validateRecoveryParameters) is not any more covered > after the patch than it was before. I forgot to mention that this patch produces a new warning: /pgsql/source/master/src/backend/tcop/postgres.c: In function 'quickdie': /pgsql/source/master/src/backend/tcop/postgres.c:2737:2: warning: implicit declaration of function '__gcov_flush'; did you mean 'pq_flush'? [-Wimplicit-function-declaration] __gcov_flush(); ^~~~ pq_flush I couldn't find a way to squelch that. gcc devs in their infinite wisdom don't provide a prototype for it, apparently. Another thing I thought about was adding a configure test for the function, but a) apparently the function is very old so it's not necessary, and b) it fails anyway apparently because of that warning. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: incorrect xlog.c coverage report
Alvaro Herrera writes: > I forgot to mention that this patch produces a new warning: > /pgsql/source/master/src/backend/tcop/postgres.c: In function 'quickdie': > /pgsql/source/master/src/backend/tcop/postgres.c:2737:2: warning: implicit > declaration of function '__gcov_flush'; did you mean 'pq_flush'? > [-Wimplicit-function-declaration] > __gcov_flush(); > ^~~~ > pq_flush > I couldn't find a way to squelch that. gcc devs in their infinite > wisdom don't provide a prototype for it, apparently. Ugh. So let's supply our own prototype, presumably it's just extern void __gcov_flush(void); regards, tom lane
Re: Comment typo in tableam.h
On Fri, 31 May 2019 at 05:02, Antonin Houska wrote: > Please see the diff attached. Pushed. Thanks. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Time range
timetzrange is also missing. In my database I have: CREATE TYPE timerange AS RANGE (SUBTYPE = time); COMMENT ON TYPE timerange IS 'range of times without time zone'; GRANT USAGE ON TYPE timerange TO PUBLIC; CREATE TYPE timetzrange AS RANGE (SUBTYPE = timetz); COMMENT ON TYPE timetzrange IS 'range of times with time zone'; GRANT USAGE ON TYPE timetzrange TO PUBLIC; The intent is that these range types are the same as if they were built in. I don't believe I have ever used timetzrange but I did it for completeness. Given that other built-in types have built-in range types, I think that the time and timetz types should also have built-in range types. On Fri, 31 May 2019 at 11:40, Thomas Kellerer wrote: > > > Donald Shtjefni schrieb am 31.05.2019 um 13:35: > > I was wondering why there is not a type Range of time without time zone, > I think it may be useful for someone, Is good if i do PR. > > you can easily create one: > >create type timerange as range (subtype = time); > > Thomas > > > > >
Re: Time range
Isaac Morland writes: > Given that other built-in types have built-in range types, I think that the > time and timetz types should also have built-in range types. There's only a very small number of built-in range types: postgres=# select typname from pg_type where typtype = 'r' order by 1; typname --- daterange int4range int8range numrange tsrange tstzrange (6 rows) I don't think there's any appetite for creating built-in range types across-the-board. The time and timetz types are pretty little used (with good reason), so leaving them out of this list seems fine to me. regards, tom lane
Re: using index or check in ALTER TABLE SET NOT NULL
On Fri, 3 May 2019 at 19:06, David Rowley wrote: > FWIW, I'm in favour of changing ATTACH PARTITION's INFO messages to > DEBUG1 for PG12 to align it to what bbb96c3704c did. > > Anyone else? Does anyone think we shouldn't change the INFO message in ATTACH PARTITION to a DEBUG1 in PG12? I think it makes sense make this change so that it matches the behaviour of the output of ALTER TABLE .. ALTER COLUMN .. SET NOT NULL when it uses a CHECK constraint. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: coverage increase for worker_spi
On 2019-May-30, Alvaro Herrera wrote: > One thing I noticed while writing it, though, is that worker_spi uses > the postgres database, instead of the contrib_regression database that > was created for it. And we create a schema and a table there. This is > going to get some eyebrows raised, I think, so I'll look into fixing > that as a bugfix before getting this commit in. Another thing I noticed when fixing *this*, in turn, is that if you load worker_spi in shared_preload_libraries then the contrib_regression database doesn't exist by the point that runs, so those workers fail to start. The dynamic one does start in the configured database. I guess we could just ignore the failures and just rely on the dynamic worker. I ended up with these two patches. I'm not sure about pushing separately. It seems pointless to backport the "fix" to back branches anyway. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c index c1878dd694..bc9ef64a50 100644 --- a/src/test/modules/worker_spi/worker_spi.c +++ b/src/test/modules/worker_spi/worker_spi.c @@ -55,6 +55,7 @@ static volatile sig_atomic_t got_sigterm = false; /* GUC variables */ static int worker_spi_naptime = 10; static int worker_spi_total_workers = 2; +static char *worker_spi_database = NULL; typedef struct worktable @@ -179,7 +180,7 @@ worker_spi_main(Datum main_arg) BackgroundWorkerUnblockSignals(); /* Connect to our database */ - BackgroundWorkerInitializeConnection("postgres", NULL, 0); + BackgroundWorkerInitializeConnection(worker_spi_database, NULL, 0); elog(LOG, "%s initialized with %s.%s", MyBgworkerEntry->bgw_name, table->schema, table->name); @@ -339,6 +340,15 @@ _PG_init(void) NULL, NULL); + DefineCustomStringVariable("worker_spi.database", + "Database to connect to.", + NULL, + &worker_spi_database, + "postgres", + PGC_POSTMASTER, + 0, + NULL, NULL, NULL); + /* set up common data for all our workers */ memset(&worker, 0, sizeof(worker)); worker.bgw_flags = BGWORKER_SHMEM_ACCESS | -- 2.17.1 diff --git a/src/test/modules/worker_spi/Makefile b/src/test/modules/worker_spi/Makefile index 7cdb33c9df..cbf9b2e37f 100644 --- a/src/test/modules/worker_spi/Makefile +++ b/src/test/modules/worker_spi/Makefile @@ -6,6 +6,14 @@ EXTENSION = worker_spi DATA = worker_spi--1.0.sql PGFILEDESC = "worker_spi - background worker example" +REGRESS = worker_spi + +# enable our module in shared_preload_libraries for dynamic bgworkers +REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/worker_spi/dynamic.conf + +# Disable installcheck to ensure we cover dynamic bgworkers. +NO_INSTALLCHECK = 1 + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/src/test/modules/worker_spi/dynamic.conf b/src/test/modules/worker_spi/dynamic.conf new file mode 100644 index 00..bfe015f664 --- /dev/null +++ b/src/test/modules/worker_spi/dynamic.conf @@ -0,0 +1,2 @@ +shared_preload_libraries = worker_spi +worker_spi.database = contrib_regression diff --git a/src/test/modules/worker_spi/expected/worker_spi.out b/src/test/modules/worker_spi/expected/worker_spi.out new file mode 100644 index 00..dc0a79bf75 --- /dev/null +++ b/src/test/modules/worker_spi/expected/worker_spi.out @@ -0,0 +1,50 @@ +CREATE EXTENSION worker_spi; +SELECT worker_spi_launch(4) IS NOT NULL; + ?column? +-- + t +(1 row) + +-- wait until the worker completes its initialization +DO $$ +DECLARE + visible bool; + loops int := 0; +BEGIN + LOOP + visible := table_name IS NOT NULL + FROM information_schema.tables + WHERE table_schema = 'schema4' AND table_name = 'counted'; + IF visible OR loops > 120 * 10 THEN EXIT; END IF; + PERFORM pg_sleep(0.1); + loops := loops + 1; + END LOOP; +END +$$; +INSERT INTO schema4.counted VALUES ('total', 0), ('delta', 1); +SELECT pg_reload_conf(); + pg_reload_conf + + t +(1 row) + +-- wait until the worker has processed the tuple we just inserted +DO $$ +DECLARE + count int; + loops int := 0; +BEGIN + LOOP + count := count(*) FROM schema4.counted WHERE type = 'delta'; + IF count = 0 OR loops > 120 * 10 THEN EXIT; END IF; + PERFORM pg_sleep(0.1); + loops := loops + 1; + END LOOP; +END +$$; +SELECT * FROM schema4.counted; + type | value +---+--- + total | 1 +(1 row) + diff --git a/src/test/modules/worker_spi/sql/worker_spi.sql b/src/test/modules/worker_spi/sql/worker_spi.sql new file mode 100644 index 00..4683523b29 --- /dev/null +++ b/src/test/modules/worker_spi/sql/worker_spi.sql @@ -0,0 +1,35 @@ +CREATE EXTENSION worker_spi; +SELECT worker_spi_launch(4) IS NOT NULL; +-- wait until the worker completes its initialization +DO $$ +DECLARE + visible bool; + loops int := 0; +BEGIN + LOOP +
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
I've rebased the patch on master and confirmed make check world passes. incremental-sort-27.patch Description: Binary data
Re: PostgreSQL vs SQL/XML Standards
On 2018-Oct-24, Chapman Flack wrote: > Inspired by the wiki page on PostgreSQL vs SQL Standard in general, > I have made another wiki page specifically about $subject. I hope > this was not presumptuous, and invite review / comment. I have not > linked to it from any other page yet. In the SQL Standards session at the Unconference, I found out that the committee produced this technical report in 2011: https://www.iso.org/standard/41528.html "XQuery Regular Expression Support in SQL"; it furthers our lack of support for XQuery in our implementation SQL/XML. That content is probably relevant for this topic, even if we cannot do much about it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: cpluspluscheck vs vpath
Hi, On 2019-05-31 09:56:45 -0400, Tom Lane wrote: > Andres Freund writes: > > Attached is a small patch allowing cpluspluscheck to run from different > > directories. I needs the src and build directories for that, > > unsurprisingly. > > No objection to changing this, but you could reduce the surprise > factor for existing workflows with a couple of defaults for the > arguments --- allow srcdir to default to "." and builddir to default > to the same as srcdir. Pushed, with that modification. Would be kinda nice to do the check in parallel... - Andres
Re: compiling PL/pgSQL plugin with C++
I wrote: > There are various other minor issues, but they generally look > fixable with little consequence. I've now pushed your patch and additional minor fixes, and we've expanded cpluspluscheck's coverage so we don't miss such issues in future. regards, tom lane
Docs for refresh materialized view concurrently
Speaking with Robert today at pgcon, I happily discovered that REFRESH MATERIALIZED VIEW CONCURRENTLY actually only updates rows that have changed since the last refresh, rather than rewriting every row. In my curiosity, I went to the docs, and found that this detail is not mentioned anywhere. This is a great feature that is being undersold, and it should be made clear in the docs. In my experience, there can be tons of WAL generated from large materialized views and the normal REFRESH (without CONCURRENTLY). I had assumed the only benefit of CONCURRENTLY was to allow concurrent access to the table. But actually the incremental refresh is a much bigger win for us in reducing WAL overhead drastically. I've not submitted a patch before, and have a few suggestions I'd like feedback on before I write one (for the docs only). 1. First, even this summary looks untrue: REFRESH MATERIALIZED VIEW — replace the contents of a materialized view. "replace" is not really accurate with the CONCURRENTLY option, because in fact it only updates changed rows. Perhaps instead of "replace": - "replace or incrementally update the contents of a materialized view". Also, the Description part has the same inaccuracy: "completely replaces the contents of a materialized view.The old contents are discarded." That is not true with CONCURRENTLY, correct? Only the old contents *which have changed* are discarded. 2. Lastly, I would suggest adding something like the following to the first paragraph under CONCURRENTLY: - With this option, only actual changed rows are updated in the materialized view, which can significantly reduce the amount of write churn and WAL traffic from a refresh if only a small number of rows will change with each refresh. It is recommended to have a unique index on the materialized view if possible, which will improve the performance of a concurrent refresh. Please correct me if my understanding of this is not right. 3. On a different note, none of the documentation on materialized views notes that they can only be LOGGED. This should be noted, or at least it should be noted that one cannot create an UNLOGGED materialized view in the same place it says that one cannot create a temporary one (under Description in CREATE MATERIALIZED VIEW). Thanks! Jeremy Finzel
Re: Index Skip Scan
After some talks with Jesper at PGCon about the Index Skip Scan, I started testing this patch, because it seems to have great potential in speeding up many of our queries (great conference by the way, really enjoyed my first time being there!). I haven't looked in depth to the code itself, but I focused on some testing with real data that we have. Let me start by sketching our primary use case for this, as it is similar, but slightly different than what was discussed earlier in this thread. I think this use case is something a lot of people who handle timeseries data have. Our database has many tables with timeseries data. We don't update rows, but just insert new rows each time. One example of this would be a table with prices for instruments. Instruments are identified by a column called feedcode. Prices of instrument update with a certain frequency. Each time it updates we insert a new row with the new value and the timestamp at that time. So in the simplest form, you could see it as a table like this: create table prices (feedcode text, updated_at timestamptz, value float8); -- there'll be some other columns as well, this is just an example create unique index on prices (feedcode, updated_at desc); This table perfectly fits the criteria for the index skip scan as there are relatively few distinct feedcodes, but each of them has quite a lot of price inserts (imagine 1000 distinct feedcodes, each of them having one price per second). We normally partition our data by a certain time interval, so let's say we're just looking at one day of prices here. We have other cases with higher update frequencies and/or more distinct values though. Typical queries on this table involve querying the price at a certain point in time, or simply querying the latest update. If we know the feedcode, this is easy: select * from prices where feedcode='A' and updated_at <= '2019-06-01 12:00' order by feedcode, updated_at desc limit 1 Unfortunately, this gets hard if you want to know the price of everything at a certain point in time. The query then becomes: select distinct on (feedcode) * from prices where updated_at <= '2019-06-01 12:00' order by feedcode, updated_at desc Up until now (even with this patch) this uses a regular index scan + a unique node which scans the full index, which is terribly slow and is also not constant - as the table grows it becomes slower and slower. Obviously there are currently already ways to speed this up using the recursive loose index scan, but I think everybody agrees that those are pretty unreadable. However, since they're also several orders of magnitude faster, we actually use them everywhere. Eg. -- certain point in time -- first query *all* distinct feedcodes (disregarding time), then look do an index scan for every feedcode found to see if it has an update in the time window we're interested in -- this essentially means we're doing 2*N index scans with recursive t as ( select feedcode from prices order by feedcode, updated_at desc limit 1 union all select n.feedcode from t cross join lateral (select feedcode from prices where feedcode > t.feedcode order by feedcode, updated_at desc limit 1) n ) select n.* from t cross join lateral (select * from prices where feedcode=t.feedcode and updated_at <= '2019-06-01 12:00' order by feedcode, updated_at desc limit 1) n -- just latest value -- if we're interested in just the latest value, it can actually be optimized to just N index scans like this. -- to make it even more confusing - there's a tradeoff here.. if you're querying a timestamp close to the latest available timestamp, it is often faster to use this method anyway and just put the filter for updated_at inside this query. this avoids the overhead of 2*N index scans, at the expense that the LIMIT 1 may have to scan several tuples before finding one that matches the timestamp criteria. With the 2*N method above we're guaranteed that the first tuple it sees is the correct tuple, but we're doing many more scans... with recursive t as ( select * from prices order by feedcode, updated_at desc limit 1 union all select n.* from t cross join lateral (select * from prices where feedcode > t.feedcode order by feedcode, updated_at desc limit 1) _ ) select * from t I hope this makes our current situation clear. Please do ask if I need to elaborate on something here. So what changes with this patch? The great thing is that the recursive CTE is not required anymore! This is a big win for readability and it helps performance as well. It makes everything much better. I am really happy with these results. If the index skip scan triggers, it is easily over 100x faster than the naive 'distinct on' query in earlier versions of Postgres. It is also quite a bit faster than the recursive CTE version of the query. I have a few remarks though. I tested some of our queries with the patch and found that the following
Re: Index Skip Scan
Actually I'd like to add something to this. I think I've found a bug in the current implementation. Would someone be able to check? Given a table definition of (market text, feedcode text, updated_at timestamptz, value float8) and an index on (market, feedcode, updated_at desc) (note that this table slightly deviates from what I described in my previous mail) and filling it with data. The following query uses an index skip scan and returns just 1 row (incorrect!) select distinct on (market, feedcode) market, feedcode from streams.base_price where market='TEST' The following query still uses the regular index scan and returns many more rows (correct) select distinct on (market, feedcode) * from streams.base_price where market='TEST' It seems that partially filtering on one of the distinct columns triggers incorrect behavior where too many rows in the index are skipped. -Floris
Re: coverage additions
On 2019-May-30, Michael Paquier wrote: > On Thu, May 30, 2019 at 01:52:20PM -0400, Alvaro Herrera wrote: > > If there are other obvious improvements to be had, please let me know. > > (We have PG_TEST_EXTRA="ssl ldap" currently, do we have any more extra > > tests now?) > > You can add kerberos to this list, to give: > PG_TEST_EXTRA='ssl ldap kerberos' Ah, now I remember that I tried this before, but it requires some extra packages installed in the machine I think, and those create running services. Did you note that src/backend/libpq does not even list the gssapi file? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services