Re: C testing for Postgres
On Mon, Jul 1, 2019 at 11:26 PM Michael Paquier wrote: > On Fri, Jun 28, 2019 at 09:42:54AM -0400, Adam Berlin wrote: > > If we were to use this tool, would the community want to vendor the > > framework in the Postgres repository, or keep it in a separate repository > > that produces a versioned shared library? > > Well, my take is that having a base infrastructure for a fault > injection framework is something that would prove to be helpful, and > that I am not against having something in core. While working on > various issues, I have found myself doing many times crazy stat() > calls on an on-disk file to enforce an elog(ERROR) or elog(FATAL), and > by experience fault points are things very *hard* to place correctly > because they should not be single-purpose things. > > Now, we don't want to finish with an infinity of fault points in the > tree, but being able to enforce a failure in a point added for a patch > using a SQL command can make the integration of tests in a patch > easier for reviewers, for example isolation tests with elog(ERROR) > (like what has been discussed for b4721f3). > Just to clarify what Adam is proposing in this thread is *not* a fault injection framework.
Re: Replacing the EDH SKIP primes
On 2019-06-18 13:05, Daniel Gustafsson wrote: > This was touched upon, but never really discussed AFAICT, back when then EDH > parameters were reworked a few years ago. Instead of replacing with custom > ones, as suggested in [1] it we might as well replace with standardized ones > as > this is a fallback. Custom ones wont make it more secure, just add more work > for the project. The attached patch replace the SKIP prime with the 2048 bit > MODP group from RFC 3526, which is the same change that OpenSSL did a few > years > back [2]. It appears that we have consensus to go ahead with this. I was wondering whether the provided binary blob contained any checksums or other internal checks. How would we know whether it contains transposed characters or replaces a 1 by a I or a l? If I just randomly edit the blob, the ssl tests still pass. (The relevant load_dh_buffer() call does get called by the tests.) How can we make sure we actually got a good copy? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add parallelism and glibc dependent only options to reindexdb
On 2019-07-01 22:46, Alvaro Herrera wrote: > On 2019-Jul-02, Thomas Munro wrote: >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: >>> Even if that's just me being delusional, I'd still prefer Alvaro's >>> approach to have distinct switches for each collation system. >> >> Makes sense. But why use the name "glibc" in the code and user >> interface? The name of the collation provider in PostgreSQL is "libc" >> (for example in the CREATE COLLATION command), and the problem applies >> no matter who makes your libc. > > Makes sense. "If your libc is glibc and you go across an upgrade over > version X, please use --include-rule=libc-collation" I think it might be better to put the logic of what indexes are collation affected etc. into the backend REINDEX command. We are likely to enhance the collation version and dependency tracking over time, possibly soon, possibly multiple times, and it would be very cumbersome to have to keep updating reindexdb with this. Moreover, since for performance you likely want to reindex by table, implementing a logic of "reindex all collation-affected indexes on this table" would be much easier to do in the backend. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Implement uuid_version()
On 2019-06-30 14:50, Fabien COELHO wrote: > I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if > it is available in core? That would probably require an extension version update dance in pgcrypto. I'm not sure if it's worth that. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Cache lookup errors with functions manipulation object addresses
On Thu, Feb 21, 2019 at 04:40:13PM +0900, Michael Paquier wrote: > Rebased version fixing some conflicts with HEAD. And rebased version for this stuff on HEAD (66c5bd3), giving visibly v16. -- Michael From 76ef961e02e0276af5516bdce6e0e543526db5b2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 2 Jul 2019 16:07:06 +0900 Subject: [PATCH v16 1/3] Add flag to format_type_extended to enforce NULL-ness If a type scanned is undefined, type format routines have two behaviors depending on if FORMAT_TYPE_ALLOW_INVALID is defined by the caller: - Generate an error - Return undefined type name "???" or "-". The current interface is unhelpful for callers willing to format properly a type name, but still make sure that the type is defined as there could be types matching the undefined strings. In order to counter that, add a new flag called FORMAT_TYPE_FORCE_NULL which returns a NULL result instead of "??? or "-" which does not generate an error. They will be used for future patches to improve the SQL interface for object addresses. --- src/backend/utils/adt/format_type.c | 20 src/include/utils/builtins.h| 1 + 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c index 6ae2a31345..8ee0e10c29 100644 --- a/src/backend/utils/adt/format_type.c +++ b/src/backend/utils/adt/format_type.c @@ -96,6 +96,9 @@ format_type(PG_FUNCTION_ARGS) * - FORMAT_TYPE_ALLOW_INVALID * if the type OID is invalid or unknown, return ??? or such instead * of failing + * - FORMAT_TYPE_FORCE_NULL + * if the type OID is invalid or unknown, return NULL instead of ??? + * or such * - FORMAT_TYPE_FORCE_QUALIFY * always schema-qualify type names, regardless of search_path * @@ -114,13 +117,20 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags) char *buf; bool with_typemod; - if (type_oid == InvalidOid && (flags & FORMAT_TYPE_ALLOW_INVALID) != 0) - return pstrdup("-"); + if (type_oid == InvalidOid) + { + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) + return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + return pstrdup("-"); + } tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(type_oid)); if (!HeapTupleIsValid(tuple)) { - if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) + return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) return pstrdup("???"); else elog(ERROR, "cache lookup failed for type %u", type_oid); @@ -143,7 +153,9 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags) tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_base_type)); if (!HeapTupleIsValid(tuple)) { - if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) + if ((flags & FORMAT_TYPE_FORCE_NULL) != 0) +return NULL; + else if ((flags & FORMAT_TYPE_ALLOW_INVALID) != 0) return pstrdup("???[]"); else elog(ERROR, "cache lookup failed for type %u", type_oid); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index a3cd7d26fa..a1e418cee7 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -109,6 +109,7 @@ extern Datum numeric_float8_no_overflow(PG_FUNCTION_ARGS); #define FORMAT_TYPE_TYPEMOD_GIVEN 0x01 /* typemod defined by caller */ #define FORMAT_TYPE_ALLOW_INVALID 0x02 /* allow invalid types */ #define FORMAT_TYPE_FORCE_QUALIFY 0x04 /* force qualification of type */ +#define FORMAT_TYPE_FORCE_NULL 0x08 /* force NULL if undefined */ extern char *format_type_extended(Oid type_oid, int32 typemod, bits16 flags); extern char *format_type_be(Oid type_oid); -- 2.20.1 From ad8f7f57f91b243926740d5a224357f54295ee34 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 2 Jul 2019 16:07:29 +0900 Subject: [PATCH v16 2/3] Refactor format procedure and operator APIs to be more modular This introduce a new set of extended routines for procedure and operator lookups, with a flags bitmask argument that can modify the default behavior: - Force schema qualification - Force NULL as result instead of a numeric OID for an undefined object. --- src/backend/utils/adt/regproc.c | 61 +++-- src/include/utils/regproc.h | 10 ++ 2 files changed, 52 insertions(+), 19 deletions(-) diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index 17a7f6c9d8..8d6d73039b 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -40,8 +40,6 @@ #include "utils/regproc.h" #include "utils/varlena.h" -static char *format_operator_internal(Oid operator_oid, bool force_qualify); -static char *format_procedure_internal(Oid procedure_oid, bool force_qualify); static void parseNameAndArgTypes(const char *string, bool allowNone, List **names, int *nargs, Oid *argtypes); @@ -322,24 +320,27 @@ to_regprocedure(PG_FUNCTION_ARGS) char * format_procedure(Oid p
Re: POC: converting Lists into arrays
On Tue, Jul 2, 2019 at 1:27 AM Tom Lane wrote: > > So I think this is a win, and attached is v7. > Not related to the diff v6..v7, but shouldn't we throw additionally a memset() with '\0' before calling pfree(): +ListCell *newelements; + +newelements = (ListCell *) +MemoryContextAlloc(GetMemoryChunkContext(list), + new_max_len * sizeof(ListCell)); +memcpy(newelements, list->elements, + list->length * sizeof(ListCell)); +pfree(list->elements); +list->elements = newelements; Or is this somehow ensured by debug pfree() implementation or does it work differently together with Valgrind? Otherwise it seems that the calling code can still be hanging onto a list element from a freed chunk and (rather) happily accessing it, as opposed to almost ensured crash if that is zeroed before returning from enlarge_list(). Cheers, -- Alex
Re: Replacing the EDH SKIP primes
On Tue, Jul 02, 2019 at 08:14:25AM +0100, Peter Eisentraut wrote: > It appears that we have consensus to go ahead with this. Yeah, I was planning to look at that one next. Or perhaps you would like to take care of it, Peter? > > I was wondering whether the provided binary blob contained any checksums > or other internal checks. How would we know whether it contains > transposed characters or replaces a 1 by a I or a l? If I just randomly > edit the blob, the ssl tests still pass. (The relevant load_dh_buffer() > call does get called by the tests.) How can we make sure we actually > got a good copy? > PEM_read_bio_DHparams() has some checks on the Diffie-Hellman key, but it is up to the caller to make sure that it is normally providing a prime number in this case to make the cracking harder, no? RFC 3526 has a small formula in this case, which we can use to double-check the patch. -- Michael signature.asc Description: PGP signature
Re: Refactoring base64 encoding and decoding into a safer interface
> On 2 Jul 2019, at 07:41, Michael Paquier wrote: >> In the below passage, we leave the input buffer with a non-complete >> encoded string. Should we memset the buffer to zero to avoid the >> risk that code which fails to check the return value believes it has >> an encoded string? > > Hmm. Good point. I have not thought of that, and your suggestion > makes sense. > > Another question is if we'd want to actually use explicit_bzero() > here, but that could be a discussion on this other thread, except if > the patch discussed there is merged first: > https://www.postgresql.org/message-id/42d26bde-5d5b-c90d-87ae-6cab875f7...@2ndquadrant.com I’m not sure we need to go to that length, but I don’t have overly strong opinions. I think of this more like a case of “we’ve changed the API with new errorcases that we didn’t handle before, so we’re being a little defensive to help you avoid subtle bugs”. > Attached is an updated patch. Looks good, passes tests, provides value to the code. Bumping this to ready for committer as I no more comments to add. cheers ./daniel
Re: [PATCH] Speedup truncates of relation forks
On Mon, Jun 17, 2019 at 5:01 PM Jamison, Kirk wrote: > > Hi all, > > Attached is the v2 of the patch. I added the optimization that Sawada-san > suggested for DropRelFileNodeBuffers, although I did not acquire the lock > when comparing the minBlock and target block. > > There's actually a comment written in the source code that we could > pre-check buffer tag for forkNum and blockNum, but given that FSM and VM > blocks are small compared to main fork's, the additional benefit of doing so > would be small. > > >* We could check forkNum and blockNum as well as the rnode, but the > >* incremental win from doing so seems small. > > I personally think it's alright not to include the suggested pre-checking. > If that's the case, we can just follow the patch v1 version. > > Thoughts? > > Comments and reviews from other parts of the patch are also very much welcome. > Thank you for updating the patch. Here is the review comments for v2 patch. --- - * visibilitymap_truncate - truncate the visibility map + * visibilitymap_mark_truncate - mark the about-to-be-truncated VM + * + * Formerly, this function truncates VM relation forks. Instead, this just + * marks the dirty buffers. * * The caller must hold AccessExclusiveLock on the relation, to ensure that * other backends receive the smgr invalidation event that this function sends * before they access the VM again. * I don't think we should describe about the previous behavior here. Rather we need to describe what visibilitymap_mark_truncate does and what it returns to the caller. I'm not sure that visibilitymap_mark_truncate function name is appropriate here since it actually truncate map bits, not only marking. Perhaps we can still use visibilitymap_truncate or change to visibilitymap_truncate_prepare, or something? Anyway, this function truncates only tail bits in the last remaining map page and we can have a rule that the caller must call smgrtruncate() later to actually truncate pages. The comment of second paragraph is now out of date since this function no longer sends smgr invalidation message. Is it worth to leave the current visibilitymap_truncate function as a shortcut function, instead of replacing? That way we don't need to change pg_truncate_visibility_map function. The same comments are true for MarkFreeSpaceMapTruncateRel. --- + ForkNumber forks[MAX_FORKNUM]; + BlockNumber blocks[MAX_FORKNUM]; + BlockNumber new_nfsmblocks = InvalidBlockNumber;/* FSM blocks */ + BlockNumber newnblocks = InvalidBlockNumber;/* VM blocks */ + int nforks = 0; I think that we can have new_nfsmblocks and new_nvmblocks and wipe out the comments. --- - /* Truncate the FSM first if it exists */ + /* +* We used to truncate FSM and VM forks here. Now we only mark the +* dirty buffers of all forks about-to-be-truncated if they exist. +*/ + Again, I think we need the description of current behavior rather than the history, except the case where the history is important. --- - /* -* Make an XLOG entry reporting the file truncation. -*/ + /* Make an XLOG entry reporting the file truncation */ Unnecessary change. --- + /* +* We might as well update the local smgr_fsm_nblocks and smgr_vm_nblocks +* setting. smgrtruncate sent an smgr cache inval message, which will cause +* other backends to invalidate their copy of smgr_fsm_nblocks and +* smgr_vm_nblocks, and this one too at the next command boundary. But this +* ensures it isn't outright wrong until then. +*/ + if (rel->rd_smgr) + { + rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks; + rel->rd_smgr->smgr_vm_nblocks = newnblocks; + } new_nfsmblocks and newnblocks could be InvalidBlockNumber when the forks are already enough small. So the above code sets incorrect values to smgr_{fsm,vm}_nblocks. Also, I wonder if we can do the above code in smgrtruncate. Otherwise we always need to set smgr_{fsm,vm}_nblocks after smgrtruncate, which is inconvenient. --- + /* Update the local smgr_fsm_nblocks and smgr_vm_nblocks setting */ + if (rel->rd_smgr) + { + rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks; + rel->rd_smgr->smgr_vm_nblocks = newnblocks; + } The save as above. And we need to set smgr_{fsm,vm}_nblocks in spite of freeing the fake relcache soon? --- + /* Get the lower bound of target block number we're interested in */ + for (i = 0; i < nforks; i++) + { + if (!BlockNumberIsValid(minBlock) || + minBlock > firstDelBlock[i]) + minBlock = firstDelBlock[i]; + } Maybe we can declare 'i' in the for statement (i.e. for (int i = 0; ...)) at every outer loops in this
Re: Increasing default value for effective_io_concurrency?
On Mon, Jul 01, 2019 at 04:32:15PM -0700, Andres Freund wrote: Hi, On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote: I think we should consider changing the effective_io_concurrency default value, i.e. the guc that determines how many pages we try to prefetch in a couple of places (the most important being Bitmap Heap Scan). Maybe we need improve the way it's used / implemented instead - it seems just too hard to determine the correct setting as currently implemented. Sure, if we can improve those bits, that'd be nice. It's definitely hard to decide what value is appropriate for a given storage system. But I'm not sure it's something we can do easily, considering how opaque the hardware is for us ... I wonder In some cases it helps a bit, but a bit higher value (4 or 8) performs significantly better. Consider for example this "sequential" data set from the 6xSSD RAID system (x-axis shows e_i_c values, pct means what fraction of pages matches the query): I assume that the y axis is the time of the query? The y-axis is the fraction of table matched by the query. The values in the contingency table are query durations (average of 3 runs, but the numbers vere very close). How much data is this compared to memory available for the kernel to do caching? Multiple of RAM, in all cases. The queries were hitting random subsets of the data, and the page cache was dropped after each test, to eliminate cross-query caching. pct 0 14 1664 128 --- 1 25990 18624 3269 2219 2189 2171 5 88116 60242 14002 8663 8560 8726 10120556 99364 29856 17117 16590 17383 25101080184327 79212 47884 46846 46855 50130709309857163614103001 94267 94809 75126516435653248281156586139500140087 compared to the e_i_c=0 case, it looks like this: pct 14 1664 128 1 72% 13% 9%8%8% 5 68% 16%10% 10% 10% 10 82% 25%14% 14% 14% 25182% 78%47% 46% 46% 50237% 125%79% 72% 73% 75344% 196% 124% 110% 111% So for 1% of the table the e_i_c=1 is faster by about ~30%, but with e_i_c=4 (or more) it's ~10x faster. This is a fairly common pattern, not just on this storage system. The e_i_c=1 can perform pretty poorly, especially when the query matches large fraction of the table - for example in this example it's 2-3x slower compared to no prefetching, and higher e_i_c values limit the damage quite a bit. I'm surprised the slowdown for small e_i_c values is that big - it's not obvious to me why that is. Which os / os version / filesystem / io scheduler / io scheduler settings were used? This is the system with NVMe storage, and SATA RAID: Linux bench2 4.19.26 #1 SMP Sat Mar 2 19:50:14 CET 2019 x86_64 Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz GenuineIntel GNU/Linux /dev/nvme0n1p1 on /mnt/data type ext4 (rw,relatime) /dev/md0 on /mnt/raid type ext4 (rw,relatime,stripe=48) The other system looks pretty much the same (same kernel, ext4). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached partition not considering altered column properties of root partition.
Hi, In below testcase when I changed the staorage option for root partition, newly attached partition not including the changed staorage option. Is this an expected behavior? postgres=# CREATE TABLE tab1 (c1 INT, c2 text) PARTITION BY RANGE(c1); CREATE TABLE postgres=# create table tt_p1 as select * from tab1 where 1=2; SELECT 0 postgres=# alter table tab1 alter COLUMN c2 set storage main; ALTER TABLE postgres=# postgres=# alter table tab1 attach partition tt_p1 for values from (20) to (30); ALTER TABLE postgres=# \d+ tab1 Partitioned table "public.tab1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- c1 | integer | | | | plain | | c2 | text| | | | main| | Partition key: RANGE (c1) Partitions: tt_p1 FOR VALUES FROM (20) TO (30) postgres=# \d+ tt_p1 Table "public.tt_p1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+--+--+- c1 | integer | | | | plain| | c2 | text| | | | extended | | Partition of: tab1 FOR VALUES FROM (20) TO (30) Partition constraint: ((c1 IS NOT NULL) AND (c1 >= 20) AND (c1 < 30)) Access method: heap -- With Regards, Prabhat Kumar Sahu
Re: Add parallelism and glibc dependent only options to reindexdb
On Mon, Jul 1, 2019 at 11:21 PM Peter Geoghegan wrote: > > On Mon, Jul 1, 2019 at 1:34 PM Julien Rouhaud wrote: > > I have a vague recollection that ICU was providing some backward > > compatibility so that even if you upgrade your lib you can still get > > the sort order that was active when you built your indexes, though > > maybe for a limited number of versions. > > That isn't built in. Another database system that uses ICU handles > this by linking to multiple versions of ICU, each with its own UCA > version and associated collations. I don't think that we want to go > there, so it makes sense to make an upgrade that crosses ICU or glibc > versions as painless as possible. > > Note that ICU does at least provide a standard way to use multiple > versions at once; the symbol names have the ICU version baked in. > You're actually calling the functions using the versioned symbol names > without realizing it, because there is macro trickery involved. Ah, thanks for the clarification!
Re: Refactoring base64 encoding and decoding into a safer interface
On Tue, Jul 02, 2019 at 09:56:03AM +0200, Daniel Gustafsson wrote: > I’m not sure we need to go to that length, but I don’t have overly strong > opinions. I think of this more like a case of “we’ve changed the API with new > errorcases that we didn’t handle before, so we’re being a little defensive to > help you avoid subtle bugs”. I quite like this suggestion. > Looks good, passes tests, provides value to the code. Bumping this to ready > for committer as I no more comments to add. Thanks. I'll look at that again in a couple of days, let's see if others have any input to offer. -- Michael signature.asc Description: PGP signature
Re: Add parallelism and glibc dependent only options to reindexdb
On Sun, Jun 30, 2019 at 11:45:47AM +0200, Julien Rouhaud wrote: Hi, With the glibc 2.28 coming, all users will have to reindex almost every indexes after a glibc upgrade to guarantee the lack of corruption. Unfortunately, reindexdb is not ideal for that as it's processing everything using a single connexion and isn't able to discard indexes that doesn't depend on a glibc collation. PFA a patchset to add parallelism to reindexdb (reusing the infrastructure in vacuumdb with some additions) and an option to discard indexes that doesn't depend on glibc (without any specific collation filtering or glibc version detection), with updated regression tests. Note that this should be applied on top of the existing reindexdb cleanup & refactoring patch (https://commitfest.postgresql.org/23/2115/). This was sponsored by VMware, and has been discussed internally with Kevin and Michael, in Cc. I wonder why this is necessary: pg_log_error("cannot reindex glibc dependent objects and a subset of objects"); What's the reasoning behind that? It seems like a valid use case to me - imagine you have a bug database, but only a couple of tables are used by the application regularly (the rest may be archive tables, for example). Why not to allow rebuilding glibc-dependent indexes on the used tables, so that the database can be opened for users sooner. BTW now that we allow rebuilding only some of the indexes, it'd be great to have a dry-run mode, were we just print which indexes will be rebuilt without actually rebuilding them. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add parallelism and glibc dependent only options to reindexdb
On Tue, Jul 2, 2019 at 9:19 AM Peter Eisentraut wrote: > > On 2019-07-01 22:46, Alvaro Herrera wrote: > > On 2019-Jul-02, Thomas Munro wrote: > >> On Tue, Jul 2, 2019 at 8:34 AM Julien Rouhaud wrote: > >>> Even if that's just me being delusional, I'd still prefer Alvaro's > >>> approach to have distinct switches for each collation system. > >> > >> Makes sense. But why use the name "glibc" in the code and user > >> interface? The name of the collation provider in PostgreSQL is "libc" > >> (for example in the CREATE COLLATION command), and the problem applies > >> no matter who makes your libc. > > > > Makes sense. "If your libc is glibc and you go across an upgrade over > > version X, please use --include-rule=libc-collation" > > I think it might be better to put the logic of what indexes are > collation affected etc. into the backend REINDEX command. We are likely > to enhance the collation version and dependency tracking over time, > possibly soon, possibly multiple times, and it would be very cumbersome > to have to keep updating reindexdb with this. Moreover, since for > performance you likely want to reindex by table, implementing a logic of > "reindex all collation-affected indexes on this table" would be much > easier to do in the backend. That's a great idea, and would make the parallelism in reindexdb much simpler. There's however a downside, as users won't have a way to benefit from index filtering until they upgrade to this version. OTOH glibc 2.28 is already there, and a hypothetical fancy reindexdb is far from being released.
Re: [PATCH] Implement uuid_version()
On 2/7/19 9:26, Peter Eisentraut wrote: On 2019-06-30 14:50, Fabien COELHO wrote: I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if it is available in core? That would probably require an extension version update dance in pgcrypto. I'm not sure if it's worth that. Thoughts? What I have devised for my upcoming patch series is to use a compatibility "shim" that calls the corresponding core code when the expected usage does not match the new names/signatures... This way we wouldn't even need to version bump pgcrypto (full backwards compatibility -> no bump needed). Another matter is whether this should raise some "deprecation warning" or the like; I don't think we have any such mechanisms available yet. FWIW, I'm implementing an "alias" functionality for extensions, too, in order to achieve transparent (for the user) extension renames. HTH Thanks, / J.L.
Re: mcvstats serialization code is still shy of a load
On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote: Tomas Vondra writes: Attached is a slightly improved version of the serialization patch. I reviewed this patch, and tested it on hppa and ppc. I found one serious bug: in the deserialization varlena case, you need - dataptr += MAXALIGN(len); + dataptr += MAXALIGN(len + VARHDRSZ); (approx. line 1100 in mcv.c). Without this, the output data is corrupt, plus the Assert a few lines further down about dataptr having been advanced by the correct amount fires. (On one machine I tested on, that happened during the core regression tests. The other machine got through regression, but trying to do "select * from pg_stats_ext;" afterwards exhibited the crash. I didn't investigate closely, but I suspect the difference has to do with different MAXALIGN values, 4 and 8 respectively.) The attached patch (a delta atop your v2) corrects that plus some cosmetic issues. Thanks. If we're going to push this, it would be considerably less complicated to do so before v12 gets branched --- not long after that, there will be catversion differences to cope with. I'm planning to make the branch tomorrow (Monday), probably ~1500 UTC. Just sayin'. Unfortunately, I was travelling on Sunday and was quite busy on Monday, so I've been unable to push this before the branching :-( I'll push by the end of this week, once I get home. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add parallelism and glibc dependent only options to reindexdb
On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra wrote: > > I wonder why this is necessary: > > pg_log_error("cannot reindex glibc dependent objects and a subset of > objects"); > > What's the reasoning behind that? It seems like a valid use case to me - > imagine you have a bug database, but only a couple of tables are used by > the application regularly (the rest may be archive tables, for example). > Why not to allow rebuilding glibc-dependent indexes on the used tables, so > that the database can be opened for users sooner. It just seemed wrong to me to allow a partial processing for something that's aimed to prevent corruption. I'd think that if users are knowledgeable enough to only reindex a subset of indexes/tables in such cases, they can also discard indexes that don't get affected by a collation lib upgrade. I'm not strongly opposed to supporting if though, as there indeed can be valid use cases. > BTW now that we allow rebuilding only some of the indexes, it'd be great > to have a dry-run mode, were we just print which indexes will be rebuilt > without actually rebuilding them. +1. If we end up doing the filter in the backend, we'd have to add such option in the REINDEX command, and actually issue all the orders to retrieve the list.
Re: Index Skip Scan
On Fri, Jun 21, 2019 at 1:20 AM Jesper Pedersen wrote: > Attached is v20, since the last patch should have been v19. I took this for a quick spin today. The DISTINCT ON support is nice and I think it will be very useful. I've signed up to review it and will have more to say later. But today I had a couple of thoughts after looking into how src/backend/optimizer/plan/planagg.c works and wondering how to do some more skipping tricks with the existing machinery. 1. SELECT COUNT(DISTINCT i) FROM t could benefit from this. (Or AVG(DISTINCT ...) or any other aggregate). Right now you get a seq scan, with the sort/unique logic inside the Aggregate node. If you write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get a skip scan that is much faster in good cases. I suppose you could have a process_distinct_aggregates() in planagg.c that recognises queries of the right form and generates extra paths a bit like build_minmax_path() does. I think it's probably better to consider that in the grouping planner proper instead. I'm not sure. 2. SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if you're allowed to go forwards. Same for SELECT i, MAX(j) FROM t GROUP BY i if you're allowed to go backwards. Those queries are equivalent to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC] (though as Floris noted, the backwards version gives the wrong answers with v20). That does seem like a much more specific thing applicable only to MIN and MAX, and I think preprocess_minmax_aggregates() could be taught to handle that sort of query, building an index only scan path with skip scan in build_minmax_path(). -- Thomas Munro https://enterprisedb.com
d25ea01275 and partitionwise join
Hi Tom, I think an assumption of d25ea01275 breaks partitionwise join. Sorry it took me a while to report it. In https://postgr.es/m/8168.1560446...@sss.pgh.pa.us, Tom wrote: > I poked into this and found the cause. For the sample query, we have > an EquivalenceClass containing the expression > COALESCE(COALESCE(Var_1_1, Var_2_1), Var_3_1) > where each of the Vars belongs to an appendrel parent. > add_child_rel_equivalences() needs to add expressions representing the > transform of that to each child relation. That is, if the children > of table 1 are A1 and A2, of table 2 are B1 and B2, and of table 3 > are C1 and C2, what we'd like to add are the expressions > COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_3_1) > COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_3_1) > COALESCE(COALESCE(Var_1_1, Var_B1_1), Var_3_1) > COALESCE(COALESCE(Var_1_1, Var_B2_1), Var_3_1) > COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C1_1) > COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C2_1) > However, what it's actually producing is additional combinations for > each appendrel after the first, because each call also mutates the > previously-added child expressions. So in this example we also get > COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_3_1) > COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_3_1) > COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_C1_1) > COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_C2_1) > COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_C1_1) > COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_C2_1) > With two appendrels involved, that's O(N^2) expressions; with > three appendrels, more like O(N^3). > > This is by no means specific to FULL JOINs; you could get the same > behavior with join clauses like "WHERE t1.a + t2.b + t3.c = t4.d". > > These extra expressions don't have any use, since we're not > going to join the children directly to each other. ...unless partition wise join thinks they can be joined. Partition wise join can't handle 3-way full joins today, but only because it's broken itself when trying to match a full join clause to the partition key due to one side being a COALESCE expression. Consider this example query: -- p is defined as: -- create table p (a int) partition by list (a); -- create table p1 partition of p for values in (1); -- create table p2 partition of p for values in (2); explain select * from p t1 full outer join p t2 using (a) full outer join p t3 using (a) full outer join p t4 using (a) order by 1; QUERY PLAN ─ Sort (cost=16416733.32..16628145.85 rows=84565012 width=4) Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a)) -> Merge Full Join (cost=536957.40..1813748.77 rows=84565012 width=4) Merge Cond: (t4.a = (COALESCE(COALESCE(t1.a, t2.a), t3.a))) -> Sort (cost=410.57..423.32 rows=5100 width=4) Sort Key: t4.a -> Append (cost=0.00..96.50 rows=5100 width=4) -> Seq Scan on p1 t4 (cost=0.00..35.50 rows=2550 width=4) -> Seq Scan on p2 t4_1 (cost=0.00..35.50 rows=2550 width=4) -> Materialize (cost=536546.83..553128.21 rows=3316275 width=12) -> Sort (cost=536546.83..544837.52 rows=3316275 width=12) Sort Key: (COALESCE(COALESCE(t1.a, t2.a), t3.a)) -> Merge Full Join (cost=14254.85..64024.48 rows=3316275 width=12) Merge Cond: (t3.a = (COALESCE(t1.a, t2.a))) -> Sort (cost=410.57..423.32 rows=5100 width=4) Sort Key: t3.a -> Append (cost=0.00..96.50 rows=5100 width=4) -> Seq Scan on p1 t3 (cost=0.00..35.50 rows=2550 width=4) -> Seq Scan on p2 t3_1 (cost=0.00..35.50 rows=2550 width=4) -> Sort (cost=13844.29..14169.41 rows=130050 width=8) Sort Key: (COALESCE(t1.a, t2.a)) -> Merge Full Join (cost=821.13..2797.38 rows=130050 width=8) Merge Cond: (t1.a = t2.a) -> Sort (cost=410.57..423.32 rows=5100 width=4) Sort Key: t1.a -> Append (cost=0.00..96.50 rows=5100 width=4) -> Seq Scan on p1 t1 (cost=0.00..35.50 rows=2550 width=4) -> Seq Scan on p2 t1_1 (cost=0.00..35.50 rows=2550 width=4) -> Sort (cost=410.57..423.32 rows=5100 width=4)
Re: Add parallelism and glibc dependent only options to reindexdb
On Tue, Jul 02, 2019 at 10:45:44AM +0200, Julien Rouhaud wrote: On Tue, Jul 2, 2019 at 10:28 AM Tomas Vondra wrote: I wonder why this is necessary: pg_log_error("cannot reindex glibc dependent objects and a subset of objects"); What's the reasoning behind that? It seems like a valid use case to me - imagine you have a bug database, but only a couple of tables are used by the application regularly (the rest may be archive tables, for example). Why not to allow rebuilding glibc-dependent indexes on the used tables, so that the database can be opened for users sooner. It just seemed wrong to me to allow a partial processing for something that's aimed to prevent corruption. I'd think that if users are knowledgeable enough to only reindex a subset of indexes/tables in such cases, they can also discard indexes that don't get affected by a collation lib upgrade. I'm not strongly opposed to supporting if though, as there indeed can be valid use cases. I don't know, it just seems like an unnecessary limitation. BTW now that we allow rebuilding only some of the indexes, it'd be great to have a dry-run mode, were we just print which indexes will be rebuilt without actually rebuilding them. +1. If we end up doing the filter in the backend, we'd have to add such option in the REINDEX command, and actually issue all the orders to retrieve the list. Hmmm, yeah. FWIW I'm not requesting v0 to have that feature, but it'd be good to design the feature in a way that allows adding it later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_waldump and PREPARE
On Fri, Apr 26, 2019 at 5:38 AM Michael Paquier wrote: > > On Thu, Apr 25, 2019 at 03:08:36PM -0400, Alvaro Herrera wrote: > > On 2019-Apr-26, Fujii Masao wrote: > >> I did that arrangement because the format of PREPARE TRANSACTION record, > >> i.e., that struct, also needs to be accessed in backend and frontend. > >> But, of course, if there is smarter way, I'm happy to adopt that! > > > > I don't know. I spent some time staring at the code involved, and it > > seems it'd be possible to improve just a little bit on cleanliness > > grounds, with a lot of effort, but not enough practical value. > > Describing those records is something we should do. There are other > parsing routines in xactdesc.c for commit and abort records, so having > that extra routine for prepare at the same place does not sound > strange to me. > > +typedef xl_xact_prepare TwoPhaseFileHeader; > I find this mapping implementation a bit lazy, and your > newly-introduced xl_xact_prepare does not count for all the contents > of the actual WAL record for PREPARE TRANSACTION. Wouldn't it be > better to put all the contents of the record in the same structure, > and not only the 2PC header information? This patch doesn't apply anymore, could you send a rebase?
Re: Introduce MIN/MAX aggregate functions to pg_lsn
Hi, Here are same review comment - any numeric, string, date/time, network, or enum type, + any numeric, string, date/time, network, lsn, or enum type, or arrays of these types same as argument type In the documentation it refereed as pg_lsn type rather than lsn alone +Datum +pg_lsn_larger(PG_FUNCTION_ARGS) +{ + XLogRecPtr lsn1 = PG_GETARG_LSN(0); + XLogRecPtr lsn2 = PG_GETARG_LSN(1); + XLogRecPtr result; + + result = ((lsn1 > lsn2) ? lsn1 : lsn2); + + PG_RETURN_LSN(result); +} rather than using additional variable its more readable and effective to return the argument itself like we do in date data type and other place regards Surafel
Re: Choosing values for multivariate MCV lists
On Mon, Jul 01, 2019 at 12:02:28PM +0100, Dean Rasheed wrote: On Sat, 29 Jun 2019 at 14:01, Tomas Vondra wrote: >However, it looks like the problem is with mcv_list_items()'s use >of %f to convert to text, which is pretty ugly. >>>There's one issue with the signature, though - currently the function >>>returns null flags as bool array, but values are returned as simple >>>text value (formatted in array-like way, but still just a text). >>> >>IMO fixing this to return a text array is worth doing, even though it >>means a catversion bump. >> Attached is a cleaned-up version of that patch. The main difference is that instead of using construct_md_array() this uses ArrayBuildState to construct the arrays, which is much easier. The docs don't need any update because those were already using text[] for the parameter, the code was inconsistent with it. Cool, this looks a lot neater and fixes the issues discussed with both floating point values no longer being converted to text, and proper text arrays for values. One minor nitpick -- it doesn't seem to be necessary to build the arrays *outfuncs and *fmgrinfo. You may as well just fetch the individual column output function information at the point where it's used (in the "if (!item->isnull[i])" block) rather than building those arrays. OK, thanks for the feedback. I'll clean-up the function lookup. This does require catversion bump, but as annoying as it is, I think it's worth it (and there's also the thread discussing the serialization issues). Barring objections, I'll get it committed later next week, once I get back from PostgresLondon. As I mentioned before, if we don't want any additional catversion bumps, it's possible to pass the arrays through output functions - that would allow us keeping the text output (but correct, unlike what we have now). I think this is a clear bug fix, so I'd vote for fixing it properly now, with a catversion bump. I agree. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Index Skip Scan
On Tue, 2 Jul 2019 at 21:00, Thomas Munro wrote: > I took this for a quick spin today. The DISTINCT ON support is nice > and I think it will be very useful. I've signed up to review it and > will have more to say later. But today I had a couple of thoughts > after looking into how src/backend/optimizer/plan/planagg.c works and > wondering how to do some more skipping tricks with the existing > machinery. > > 1. SELECT COUNT(DISTINCT i) FROM t could benefit from this. (Or > AVG(DISTINCT ...) or any other aggregate). Right now you get a seq > scan, with the sort/unique logic inside the Aggregate node. If you > write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get > a skip scan that is much faster in good cases. I suppose you could > have a process_distinct_aggregates() in planagg.c that recognises > queries of the right form and generates extra paths a bit like > build_minmax_path() does. I think it's probably better to consider > that in the grouping planner proper instead. I'm not sure. I think to make that happen we'd need to do a bit of an overhaul in nodeAgg.c to allow it to make use of presorted results instead of having the code blindly sort rows for each aggregate that has a DISTINCT or ORDER BY. The planner would also then need to start requesting paths with pathkeys that suit the aggregate and also probably dictate the order the AggRefs should be evaluated to allow all AggRefs to be evaluated that can be for each sort order. Once that part is done then the aggregates could then also request paths with certain "UniqueKeys" (a feature I mentioned in [1]), however we'd need to be pretty careful with that one since there may be other parts of the query that require that all rows are fed in, not just 1 row per value of "i", e.g SELECT COUNT(DISTINCT i) FROM t WHERE z > 0; can't just feed through 1 row for each "i" value, since we need only the ones that have "z > 0". Getting the first part of this solved is much more important than making skip scans work here, I'd say. I think we need to be able to walk before we can run with DISTINCT / ORDER BY aggs. > 2. SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if > you're allowed to go forwards. Same for SELECT i, MAX(j) FROM t GROUP > BY i if you're allowed to go backwards. Those queries are equivalent > to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC] > (though as Floris noted, the backwards version gives the wrong answers > with v20). That does seem like a much more specific thing applicable > only to MIN and MAX, and I think preprocess_minmax_aggregates() could > be taught to handle that sort of query, building an index only scan > path with skip scan in build_minmax_path(). For the MIN query you just need a path with Pathkeys: { i ASC, j ASC }, UniqueKeys: { i, j }, doing the MAX query you just need j DESC. The more I think about these UniqueKeys, the more I think they need to be a separate concept to PathKeys. For example, UniqueKeys: { x, y } should be equivalent to { y, x }, but with PathKeys, that's not the case, since the order of each key matters. UniqueKeys equivalent version of pathkeys_contained_in() would not care about the order of individual keys, it would say things like, { a, b, c } is contained in { b, a }, since if the path is unique on columns { b, a } then it must also be unique on { a, b, c }. [1] https://www.postgresql.org/message-id/cakjs1f86fgoduunhiq25rkeues4qtqenxm1qbqjwrbozxvg...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Custom compression methods
On Mon, 1 Jul 2019 at 17:28, Alexander Korotkov wrote: > On Mon, Jul 1, 2019 at 5:51 PM Alvaro Herrera > wrote: > > On 2019-Jul-01, Alexander Korotkov wrote: > > > > > As I get we're currently need to make high-level decision of whether > > > we need this [1]. I was going to bring this topic up at last PGCon, > > > but I didn't manage to attend. Does it worth bothering Ildus with > > > continuous rebasing assuming we don't have this high-level decision > > > yet? > > > > I agree that having to constantly rebase a patch that doesn't get acted > > upon is a bit pointless. I see a bit of a process problem here: if the > > patch doesn't apply, it gets punted out of commitfest and reviewers > > don't look at it. This means the discussion goes unseen and no > > decisions are made. My immediate suggestion is to rebase even if other > > changes are needed. > > OK, let's do this assuming Ildus didn't give up yet :) > No, I still didn't give up :) I'm going to post rebased version in few days. I found that are new conflicts with a slice decompression, not sure how to figure out them for now. Also I was thinking maybe there is a point to add possibility to compress any data that goes to some column despite toast threshold size. In our company we have types that could benefit from compression even on smallest blocks. Since pluggable storages were committed I think I should notice that compression methods also can be used by them and are not supposed to work only with toast tables. Basically it's just an interface to call compression functions which are related with some column. Best regards, Ildus Kurbangaliev
Re: progress report for ANALYZE
On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera wrote: > > Here's a patch that implements progress reporting for ANALYZE. Patch applies, code and doc and compiles cleanly. I have few comments: @@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params, if (numrows > 0) { MemoryContext col_context, - old_context; + old_context; + const int index[] = { + PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_TOTAL_BLOCKS, + PROGRESS_ANALYZE_BLOCKS_DONE + }; + const int64 val[] = { + PROGRESS_ANALYZE_PHASE_ANALYSIS, + 0, 0 + }; + + pgstat_progress_update_multi_param(3, index, val); [...] } + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, +PROGRESS_ANALYZE_PHASE_COMPLETE); + If there wasn't any row but multiple blocks were scanned, the PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about the blocks that were scanned. I'm not sure if we should stay consistent here. diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 05240bfd14..98b01e54fa 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS) /* Translate command name into command type code. */ if (pg_strcasecmp(cmd, "VACUUM") == 0) cmdtype = PROGRESS_COMMAND_VACUUM; + if (pg_strcasecmp(cmd, "ANALYZE") == 0) + cmdtype = PROGRESS_COMMAND_ANALYZE; else if (pg_strcasecmp(cmd, "CLUSTER") == 0) cmdtype = PROGRESS_COMMAND_CLUSTER; else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0) it should be an "else if" here. Everything else LGTM.
Re: New EXPLAIN option: ALL
On 2019-05-18 19:39, David Fetter wrote: > On Wed, May 15, 2019 at 09:32:31AM -0400, Tom Lane wrote: >> David Fetter writes: >>> I hope the patch is a little easier to digest as now attached. >> >> To be blunt, I find 500K worth of changes in the regression test >> outputs to be absolutely unacceptable, especially when said changes >> are basically worthless from a diagnostic standpoint. There are at >> least two reasons why this won't fly: > > Here's a patch set with a much smaller change. Will that fly? This appears to be the patch of record for this commit fest. I don't sense much enthusiasm for this change. What is the exact rationale for this proposal? I think using a new keyword EXEC that is similar to an existing one EXECUTE will likely just introduce a new class of confusion. (ANALYZE EXEC EXECUTE ...?) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: C testing for Postgres
> Just to clarify what Adam is proposing in this thread is *not* a fault > injection framework. > Yes, thanks for clarifying Ashwin. Sorry Michael, this testing framework is more like these other frameworks: Java with Junit + Hamcrest: http://hamcrest.org/JavaHamcrest/tutorial Ruby with Rspec: https://rspec.info/documentation/3.8/rspec-expectations/#Built-in_matchers Javascript with Jasmine: https://jasmine.github.io/ >
Re: Introduce MIN/MAX aggregate functions to pg_lsn
On Tue, Jul 2, 2019 at 7:22 AM Surafel Temesgen wrote: > > Hi, > Here are same review comment Thanks for your review. > - any numeric, string, date/time, network, or enum type, > + any numeric, string, date/time, network, lsn, or enum type, > or arrays of these types >same as argument type > In the documentation it refereed as pg_lsn type rather than lsn alone Fixed. > +Datum > +pg_lsn_larger(PG_FUNCTION_ARGS) > +{ > + XLogRecPtr lsn1 = PG_GETARG_LSN(0); > + XLogRecPtr lsn2 = PG_GETARG_LSN(1); > + XLogRecPtr result; > + > + result = ((lsn1 > lsn2) ? lsn1 : lsn2); > + > + PG_RETURN_LSN(result); > +} > > rather than using additional variable its more readable and effective to return the argument > itself like we do in date data type and other place > Fixed. New version attached. Regards, -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3a8581d..b7f746b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14804,7 +14804,7 @@ NULL baz(3 rows) max(expression) - any numeric, string, date/time, network, or enum type, + any numeric, string, date/time, network, pg_lsn, or enum type, or arrays of these types same as argument type Yes @@ -14822,7 +14822,7 @@ NULL baz(3 rows) min(expression) - any numeric, string, date/time, network, or enum type, + any numeric, string, date/time, network, pg_lsn, or enum type, or arrays of these types same as argument type Yes diff --git a/src/backend/utils/adt/pg_lsn.c b/src/backend/utils/adt/pg_lsn.c index 4c329a8..b4c6c23 100644 --- a/src/backend/utils/adt/pg_lsn.c +++ b/src/backend/utils/adt/pg_lsn.c @@ -171,6 +171,24 @@ pg_lsn_ge(PG_FUNCTION_ARGS) PG_RETURN_BOOL(lsn1 >= lsn2); } +Datum +pg_lsn_larger(PG_FUNCTION_ARGS) +{ + XLogRecPtr lsn1 = PG_GETARG_LSN(0); + XLogRecPtr lsn2 = PG_GETARG_LSN(1); + + PG_RETURN_LSN((lsn1 > lsn2) ? lsn1 : lsn2); +} + +Datum +pg_lsn_smaller(PG_FUNCTION_ARGS) +{ + XLogRecPtr lsn1 = PG_GETARG_LSN(0); + XLogRecPtr lsn2 = PG_GETARG_LSN(1); + + PG_RETURN_LSN((lsn1 < lsn2) ? lsn1 : lsn2); +} + /* btree index opclass support */ Datum pg_lsn_cmp(PG_FUNCTION_ARGS) diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat index 044695a..242d843 100644 --- a/src/include/catalog/pg_aggregate.dat +++ b/src/include/catalog/pg_aggregate.dat @@ -146,6 +146,9 @@ { aggfnoid => 'max(inet)', aggtransfn => 'network_larger', aggcombinefn => 'network_larger', aggsortop => '>(inet,inet)', aggtranstype => 'inet' }, +{ aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger', + aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)', + aggtranstype => 'pg_lsn' }, # min { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller', @@ -208,6 +211,9 @@ { aggfnoid => 'min(inet)', aggtransfn => 'network_smaller', aggcombinefn => 'network_smaller', aggsortop => '<(inet,inet)', aggtranstype => 'inet' }, +{ aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller', + aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)', + aggtranstype => 'pg_lsn' }, # count { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any', diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 8733524..aa8674c 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6219,6 +6219,9 @@ { oid => '3564', descr => 'maximum value of all inet input values', proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet', proargtypes => 'inet', prosrc => 'aggregate_dummy' }, +{ oid => '8125', descr => 'maximum value of all pg_lsn input values', + proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn', + proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' }, { oid => '2131', descr => 'minimum value of all bigint input values', proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8', @@ -6283,6 +6286,9 @@ { oid => '3565', descr => 'minimum value of all inet input values', proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet', proargtypes => 'inet', prosrc => 'aggregate_dummy' }, +{ oid => '8126', descr => 'minimum value of all pg_lsn input values', + proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn', + proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' }, # count has two forms: count(any) and count(*) { oid => '2147', @@ -8385,6 +8391,12 @@ { oid => '3413', descr => 'hash', proname => 'pg_lsn_hash_extended', prorettype => 'int8', proargtypes => 'pg_lsn int8', prosrc => 'pg_lsn_hash_extended' }, +{ oid => '8123', descr => 'larger of two', + proname => 'pg_lsn_larger', prorettype => 'pg_lsn', + proargtyp
TopoSort() fix
Hi Hackers, I think I found an issue in the TopoSort() function. As the comments say, /* . * .. If there are any other processes * in the same lock group on the queue, set their number of * beforeConstraints to -1 to indicate that they should be emitted * with their groupmates rather than considered separately. */ If the line "break;" exists, there is no chance to set beforeConstraints to -1 for other processes in the same lock group. So, I think we need delete the line "break;" . See the patch. I just took a look, and I found all the following versions have this line . postgresql-12beta2, postgresql-12beta1, postgresql-11.4, postgresql-11.3,postgresql-11.0, postgresql-10.9,postgresql-10.5, postgresql-10.0 Thanks, Ruihai TopoSort.patch Description: Binary data
Re: Optimize partial TOAST decompression
On Mon, Jul 1, 2019 at 6:46 AM Binguo Bao wrote: > > Andrey Borodin 于2019年6月29日周六 下午9:48写道: >> I've took a look into the code. >> I think we should extract function for computation of max_compressed_size >> and put it somewhere along with pglz code. Just in case something will >> change something about pglz so that they would not forget about compression >> algorithm assumption. >> >> Also I suggest just using 64 bit computation to avoid overflows. And I think >> it worth to check if max_compressed_size is whole data and use min of >> (max_compressed_size, uncompressed_data_size). >> >> Also you declared needsize and max_compressed_size too far from use. But >> this will be solved by function extraction anyway. >> > Thanks for the suggestion. > I've extracted function for computation for max_compressed_size and put the > function into pg_lzcompress.c. This looks good to me. A little commentary around why pglz_maximum_compressed_size() returns a universally correct answer (there's no way the compressed size can ever be larger than this because...) would be nice for peasants like myself. If you're looking to continue down this code line in your next patch, the next TODO item is a little more involved: a user-land (ala PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow the optimization of searching of large objects like JSONB types, and so on, where the thing you are looking for is not at a known location in the object. So, things like looking for a particular substring in a string, or looking for a particular key in a JSONB. "Iterate until you find the thing." would allow optimization of some code lines that currently require full decompression of the objects. P.
Re: progress report for ANALYZE
Hi, In monitoring.sgml, "a" is missing in "row for ech backend that is currently running that command[...]". Anthony On Tuesday, July 2, 2019, Julien Rouhaud wrote: > On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera wrote: >> >> Here's a patch that implements progress reporting for ANALYZE. > > Patch applies, code and doc and compiles cleanly. I have few comments: > > @@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params, > if (numrows > 0) > { > MemoryContext col_context, > - old_context; > + old_context; > + const int index[] = { > + PROGRESS_ANALYZE_PHASE, > + PROGRESS_ANALYZE_TOTAL_BLOCKS, > + PROGRESS_ANALYZE_BLOCKS_DONE > + }; > + const int64 val[] = { > + PROGRESS_ANALYZE_PHASE_ANALYSIS, > + 0, 0 > + }; > + > + pgstat_progress_update_multi_param(3, index, val); > [...] > } > + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, > +PROGRESS_ANALYZE_PHASE_COMPLETE); > + > If there wasn't any row but multiple blocks were scanned, the > PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about > the blocks that were scanned. I'm not sure if we should stay > consistent here. > > diff --git a/src/backend/utils/adt/pgstatfuncs.c > b/src/backend/utils/adt/pgstatfuncs.c > index 05240bfd14..98b01e54fa 100644 > --- a/src/backend/utils/adt/pgstatfuncs.c > +++ b/src/backend/utils/adt/pgstatfuncs.c > @@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS) > /* Translate command name into command type code. */ > if (pg_strcasecmp(cmd, "VACUUM") == 0) > cmdtype = PROGRESS_COMMAND_VACUUM; > + if (pg_strcasecmp(cmd, "ANALYZE") == 0) > + cmdtype = PROGRESS_COMMAND_ANALYZE; > else if (pg_strcasecmp(cmd, "CLUSTER") == 0) > cmdtype = PROGRESS_COMMAND_CLUSTER; > else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0) > > it should be an "else if" here. > > Everything else LGTM. > > >
Re: Hash join explain is broken
I wrote: > Andres Freund writes: >> Tom, any comments? Otherwise I'll go ahead, and commit after a round or >> two of polishing. > Sorry for not getting to this sooner --- I'll try to look tomorrow. I took a look, and I think this is going in the right direction. We definitely need a test case corresponding to the live bug, and I think the comments could use more work, and there are some cosmetic things (I wouldn't add the new struct Hash field at the end, for instance). regards, tom lane
Re: [PATCH] Implement uuid_version()
Peter Eisentraut writes: > On 2019-06-30 14:50, Fabien COELHO wrote: >> I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if >> it is available in core? > That would probably require an extension version update dance in > pgcrypto. I'm not sure if it's worth that. Thoughts? We have some previous experience with this type of thing when we migrated contrib/tsearch2 stuff into core. I'm too caffeine-deprived to remember exactly what we did or how well it worked. But it seems advisable to go study that history, because we could easily make things a mess for users if we fail to consider their upgrade experience. regards, tom lane
Re: POC: converting Lists into arrays
Oleksandr Shulgin writes: > Not related to the diff v6..v7, but shouldn't we throw additionally a > memset() with '\0' before calling pfree(): I don't see the point of that. In debug builds CLOBBER_FREED_MEMORY will take care of it, and in non-debug builds I don't see why we'd expend the cycles. regards, tom lane
Re: TopoSort() fix
Rui Hai Jiang writes: > I think I found an issue in the TopoSort() function. This indeed seems like a live bug introduced by a1c1af2a. Robert? regards, tom lane
Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Robert Haas writes: > Long story short, I agree with you that most people probably don't > care about this very much, but I also agree with Andrew that some of > the current choices we're making are pretty strange, and I'm not > convinced as you are that it's impossible to make a principled choice > between alternatives in all cases. The upstream data appears to > contain some information about intent; it's not just a jumble of > exactly-equally-preferred alternatives. I agree that if there were an easy way to discount the IANA "backward compatibility" zone names, that'd likely be a reasonable thing to do. The problem is that those names aren't distinguished from others in the representation we have available to us (ie, the actual /usr/share/zoneinfo file tree). I'm dubious that relying on zone[1970].tab would improve matters substantially; it would fix some cases, but I don't think it would fix all of them. Resolving all ambiguous zone-name choices is not the charter of those files. regards, tom lane
Re: Conflict handling for COPY FROM
On 28.06.2019 16:12, Alvaro Herrera wrote: On Wed, Feb 20, 2019 at 7:04 PM Andres Freund wrote: Or even just return it as a row. CopyBoth is relatively widely supported these days. i think generating warning about it also sufficiently meet its propose of notifying user about skipped record with existing logging facility and we use it for similar propose in other place too. The different i see is the number of warning that can be generated Warnings seem useless for this purpose. I'm with Andres: returning rows would make this a fine feature. If the user wants the rows in a table as Andrew suggests, she can use wrap the whole thing in an insert. I agree with previous commentators that returning rows will make this feature more versatile. Though, having a possibility to simply skip conflicting/malformed rows is worth of doing from my perspective. However, pushing every single skipped row to the client as a separated WARNING will be too much for a bulk import. So maybe just overall stats about skipped rows number will be enough? Also, I would prefer having an option to ignore all errors, e.g. with option ERROR_LIMIT set to -1. Because it is rather difficult to estimate a number of future errors if you are playing with some badly structured data, while always setting it to 100500k looks ugly. Anyway, below are some issues with existing code after a brief review of the patch: 1) Calculation of processed rows isn't correct (I've checked). You do it in two places, and - processed++; + if (!cstate->error_limit) + processed++; is never incremented if ERROR_LIMIT is specified and no errors occurred/no constraints exist, so the result will always be 0. However, if primary column with constraints exists, then processed is calculated correctly, since another code path is used: + if (specConflict) + { + ... + } + else + processed++; I would prefer this calculation in a single place (as it was before patch) for simplicity and in order to avoid such problems. 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT is specified and was exceeded, which doesn't seem to be correct, does it? - if (resultRelInfo->ri_NumIndices > 0) + if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0) recheckIndexes = ExecInsertIndexTuples(myslot, 3) Trailing whitespaces added to error messages and tests for some reason: + ereport(WARNING, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("skipping \"%s\" --- missing data for column \"%s\" ", + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\" ", -ERROR: missing data for column "e" +ERROR: missing data for column "e" CONTEXT: COPY x, line 1: "2000 230 23 23" -ERROR: missing data for column "e" +ERROR: missing data for column "e" CONTEXT: COPY x, line 1: "2001 231 \N \N" Otherwise, the patch applies/compiles cleanly and regression tests are passed. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Re: New EXPLAIN option: ALL
On Tue, Jul 02, 2019 at 03:06:52PM +0100, Peter Eisentraut wrote: > On 2019-05-18 19:39, David Fetter wrote: > > On Wed, May 15, 2019 at 09:32:31AM -0400, Tom Lane wrote: > >> David Fetter writes: > >>> I hope the patch is a little easier to digest as now attached. > >> > >> To be blunt, I find 500K worth of changes in the regression test > >> outputs to be absolutely unacceptable, especially when said changes > >> are basically worthless from a diagnostic standpoint. There are at > >> least two reasons why this won't fly: > > > > Here's a patch set with a much smaller change. Will that fly? > > This appears to be the patch of record for this commit fest. > > I don't sense much enthusiasm for this change. Neither do I, so withdrawn. I do hope we can go with EXPLAIN and PROFILE, as opposed to EXPLAIN/EXPLAIN ANALYZE. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Add missing operator <->(box, point)
Attached 2nd version of the patches. On 20.04.2019 16:41, Fabien COELHO wrote: About the test, I'd suggest to name the result columns, eg "pt to box dist" and "box to pt dist", otherwise why all is repeated is unclear. Fixed. On 02.07.2019 7:01, Tom Lane wrote: [ warning, drive-by comment ahead ] Fabien COELHO writes: I notice that other distance tests do not test for commutativity. Are they also not implemented, or just not tested? If not implemented, I'd suggest to add them in the same batch. Yeah ... just looking at operators named <->, I see regression=# select oid, oid::regoperator, oprcom, oprcode from pg_operator where oprname = '<->'; oid | oid | oprcom | oprcode --+--++--- 517 | <->(point,point) |517 | point_distance 613 | <->(point,line) | 0 | dist_pl 614 | <->(point,lseg) | 0 | dist_ps 615 | <->(point,box) | 0 | dist_pb 616 | <->(lseg,line) | 0 | dist_sl 617 | <->(lseg,box)| 0 | dist_sb 618 | <->(point,path) | 0 | dist_ppath 706 | <->(box,box) |706 | box_distance 707 | <->(path,path) |707 | path_distance 708 | <->(line,line) |708 | line_distance 709 | <->(lseg,lseg) |709 | lseg_distance 712 | <->(polygon,polygon) |712 | poly_distance 1520 | <->(circle,circle) | 1520 | circle_distance 1522 | <->(point,circle)| 3291 | dist_pc 3291 | <->(circle,point)| 1522 | dist_cpoint 3276 | <->(point,polygon) | 3289 | dist_ppoly 3289 | <->(polygon,point) | 3276 | dist_polyp 1523 | <->(circle,polygon) | 0 | dist_cpoly 1524 | <->(line,box)| 0 | dist_lb 5005 | <->(tsquery,tsquery) | 0 | pg_catalog.tsquery_phrase (20 rows) It's not clear to me why to be particularly more excited about <->(box, point) than about the other missing cases here. regards, tom lane The original goal was to add support of ordering by distance to point to all geometric opclasses. As you can see, GiST and SP-GiST box_ops has no distance operator while more complex circle_ops and poly_ops have it: SELECT amname, opcname, amopopr::regoperator AS dist_opr FROM pg_opclass LEFT JOIN pg_amop ON amopfamily = opcfamily AND amoppurpose = 'o', pg_am, pg_type WHERE opcmethod = pg_am.oid AND opcintype = pg_type.oid AND typcategory = 'G' ORDER BY 1, 2; amname | opcname | dist_opr +---+ brin | box_inclusion_ops | gist | box_ops | gist | circle_ops| <->(circle,point) gist | point_ops | <->(point,point) gist | poly_ops | <->(polygon,point) spgist | box_ops | spgist | kd_point_ops | <->(point,point) spgist | poly_ops | <->(polygon,point) spgist | quad_point_ops| <->(point,point) (9 rows) We could use commuted "const <-> var" operators for kNN searches, but the current implementation requires the existence of "var <-> const" operators, and order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op() at /src/backend/optimizer/path/indxpath.c). -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From cff9fbdcd9d3633a28665c0636e6d3a5a0e8512b Mon Sep 17 00:00:00 2001 From: Nikita Glukhov Date: Thu, 7 Mar 2019 20:22:49 +0300 Subject: [PATCH 1/3] Add operator <->(box, point) --- src/backend/utils/adt/geo_ops.c| 12 + src/include/catalog/pg_operator.dat| 5 +- src/include/catalog/pg_proc.dat| 3 ++ src/test/regress/expected/geometry.out | 96 +- src/test/regress/sql/geometry.sql | 2 +- 5 files changed, 68 insertions(+), 50 deletions(-) diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index 373784f..d8ecfe3 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -2419,6 +2419,18 @@ dist_pb(PG_FUNCTION_ARGS) } /* + * Distance from a box to a point + */ +Datum +dist_bp(PG_FUNCTION_ARGS) +{ + BOX *box = PG_GETARG_BOX_P(0); + Point *pt = PG_GETARG_POINT_P(1); + + PG_RETURN_FLOAT8(box_closept_point(NULL, box, pt)); +} + +/* * Distance from a lseg to a line */ Datum diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat index bacafa5..ae5ed1e 100644 --- a/src/include/catalog/pg_operator.dat +++ b/src/include/catalog/pg_operator.dat @@ -666,7 +666,10 @@ oprresult => 'float8', oprcode => 'dist_ps' }, { oid => '615', descr => 'distance between', oprname => '<->', oprleft => 'point', oprright => 'box', - oprresult => 'float8', oprcode => 'dist_pb' }, + oprresult => 'float8', oprcom => '<->(box,point)', oprcode => 'dist_pb' }, +{ oid => '606', descr => 'distance between', + oprname => '<->', oprleft => 'box', oprright => 'po
Re: Add missing operator <->(box, point)
On Tue, Jul 2, 2019 at 9:19 PM Nikita Glukhov wrote: > We could use commuted "const <-> var" operators for kNN searches, but the > current implementation requires the existence of "var <-> const" operators, > and > order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op() > at /src/backend/optimizer/path/indxpath.c). But probably it's still worth to just add commutator for every <-> operator and close this question. Otherwise, it may arise again once we want to add some more kNN support to opclasses or something. On the other hand, are we already going to limit oid consumption? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH v5] Show detailed table persistence in \dt+
David Fetter writes: > [ v7-0001-Show-detailed-relation-persistence-in-dt.patch ] I looked this over and had a few suggestions, as per attached v8: * The persistence description values ought to be translatable, as is the usual practice in describe.c. This is slightly painful because it requires tweaking the translate_columns[] values in a non-constant way, but it's not that bad. * I dropped the "ELSE 'unknown'" bit in favor of just emitting NULL if the persistence isn't recognized. This is the same way that the table-type CASE just above does it, and I see no reason to be different. Moreover, there are message-style-guidelines issues with what to print if you do want to print something; "unknown" doesn't cut it. * I also dropped the logic for pre-9.1 servers, because the existing precedent in describeOneTableDetails() is that we only consider relpersistence for >= 9.1, and I don't see a real good reason to deviate from that. 9.0 and before are long out of support anyway. If there aren't objections, I think v8 is committable. regards, tom lane diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 1c770b4..0e0af5f 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -3632,7 +3632,8 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys PQExpBufferData buf; PGresult *res; printQueryOpt myopt = pset.popt; - static const bool translate_columns[] = {false, false, true, false, false, false, false}; + int cols_so_far; + bool translate_columns[] = {false, false, true, false, false, false, false, false}; /* If tabtypes is empty, we default to \dtvmsE (but see also command.c) */ if (!(showTables || showIndexes || showViews || showMatViews || showSeq || showForeign)) @@ -3672,15 +3673,40 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys gettext_noop("partitioned index"), gettext_noop("Type"), gettext_noop("Owner")); + cols_so_far = 4; if (showIndexes) + { appendPQExpBuffer(&buf, - ",\n c2.relname as \"%s\"", + ",\n c2.relname as \"%s\"", gettext_noop("Table")); + cols_so_far++; + } if (verbose) { /* + * Show whether a relation is permanent, temporary, or unlogged. Like + * describeOneTableDetails(), we consider that persistence emerged in + * v9.1, even though related concepts existed before. + */ + if (pset.sversion >= 90100) + { + appendPQExpBuffer(&buf, + ",\n CASE c.relpersistence WHEN 'p' THEN '%s' WHEN 't' THEN '%s' WHEN 'u' THEN '%s' END as \"%s\"", + gettext_noop("permanent"), + gettext_noop("temporary"), + gettext_noop("unlogged"), + gettext_noop("Persistence")); + translate_columns[cols_so_far] = true; + } + + /* + * We don't bother to count cols_so_far below here, as there's no need + * to; this might change with future additions to the output columns. + */ + + /* * As of PostgreSQL 9.0, use pg_table_size() to show a more accurate * size of a table, including FSM, VM and TOAST tables. */
Re: Add missing operator <->(box, point)
Alexander Korotkov writes: > On Tue, Jul 2, 2019 at 9:19 PM Nikita Glukhov wrote: >> We could use commuted "const <-> var" operators for kNN searches, but the >> current implementation requires the existence of "var <-> const" operators, >> and >> order-by-op clauses are rebuilt using them (see match_clause_to_ordering_op() >> at /src/backend/optimizer/path/indxpath.c). > But probably it's still worth to just add commutator for every <-> > operator and close this question. Yeah, I was just thinking that it was weird not to have the commutator operators, independently of indexing considerations. Seems like a usability thing. regards, tom lane
Re: [PATCH v5] Show detailed table persistence in \dt+
On 2019-Jul-02, Tom Lane wrote: > * The persistence description values ought to be translatable, as > is the usual practice in describe.c. This is slightly painful > because it requires tweaking the translate_columns[] values in a > non-constant way, but it's not that bad. LGTM. I only fear that the cols_so_far thing is easy to break, and the breakage will be easy to miss. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH v4] Add \warn to psql
David Fetter writes: > [ v7-0001-Add-warn-to-psql.patch ] I took a look at this. I have no quibble with the proposed feature, and the implementation is certainly simple enough. But I'm unconvinced about the proposed test scaffolding. Spinning up a new PG instance is a *hell* of a lot of overhead to pay for testing something that could be tested as per attached. Admittedly, the attached doesn't positively prove which pipe each output string went down, but that does not strike me as a concern large enough to justify adding a TAP test for. I'd be happier about adding TAP infrastructure if it looked like it'd be usable to test some of the psql areas that are unreachable by the existing test methodology, particularly tab-complete.c and prompt.c. But I don't see anything here that looks like it'll work for that. I don't like what you did to command_checks_all, either --- it could hardly say "bolted on after the fact" more clearly if you'd written that in text. If we need an input-stream argument, let's just add it in a rational place and adjust the callers. There aren't that many of 'em, nor has the subroutine been around all that long. regards, tom lane diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out index 4bcf0cc..9b4060f 100644 --- a/src/test/regress/expected/psql.out +++ b/src/test/regress/expected/psql.out @@ -4180,6 +4180,29 @@ drop table psql_serial_tab; \pset format aligned \pset expanded off \pset border 1 +-- \echo and allied features +\echo this is a test +this is a test +\echo -n this is a test without newline +this is a test without newline\echo more test +more test +\set foo bar +\echo foo = :foo +foo = bar +\qecho this is a test +this is a test +\qecho -n this is a test without newline +this is a test without newline\qecho more test +more test +\qecho foo = :foo +foo = bar +\warn this is a test +this is a test +\warn -n this is a test without newline +this is a test without newline\warn more test +more test +\warn foo = :foo +foo = bar -- tests for \if ... \endif \if true select 'okay'; diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql index 26f436a..2bae6c5 100644 --- a/src/test/regress/sql/psql.sql +++ b/src/test/regress/sql/psql.sql @@ -771,6 +771,24 @@ drop table psql_serial_tab; \pset expanded off \pset border 1 +-- \echo and allied features + +\echo this is a test +\echo -n this is a test without newline +\echo more test +\set foo bar +\echo foo = :foo + +\qecho this is a test +\qecho -n this is a test without newline +\qecho more test +\qecho foo = :foo + +\warn this is a test +\warn -n this is a test without newline +\warn more test +\warn foo = :foo + -- tests for \if ... \endif \if true
Re: [PATCH v5] Show detailed table persistence in \dt+
Alvaro Herrera writes: > On 2019-Jul-02, Tom Lane wrote: >> * The persistence description values ought to be translatable, as >> is the usual practice in describe.c. This is slightly painful >> because it requires tweaking the translate_columns[] values in a >> non-constant way, but it's not that bad. > LGTM. I only fear that the cols_so_far thing is easy to break, and the > breakage will be easy to miss. Yeah, but that's pretty true of all the translatability stuff in describe.c. I wonder if there's any way to set up tests for that. The fact that the .po files lag so far behind the source code seems like an impediment --- even if we made a test case that presumed --enable-nls and tried to exercise this, the lack of translations for the new words would get in the way for a long while. regards, tom lane
Re: [PATCH v5] Show detailed table persistence in \dt+
> On 2 Jul 2019, at 22:16, Tom Lane wrote: > even if we made a test case that presumed > --enable-nls and tried to exercise this, the lack of translations > for the new words would get in the way for a long while. For testing though, couldn’t we have an autogenerated .po which has a unique and predictable dummy value translation for every string (the string backwards or something), which can be used for testing? This is all hand-wavy since I haven’t tried actually doing it, but it seems a better option than waiting for .po files to be available. Or am I missing the point of the value of the discussed test? cheers ./daniel
Re: [PATCH v5] Show detailed table persistence in \dt+
On 2019-Jul-02, Daniel Gustafsson wrote: > > On 2 Jul 2019, at 22:16, Tom Lane wrote: > > > even if we made a test case that presumed > > --enable-nls and tried to exercise this, the lack of translations > > for the new words would get in the way for a long while. > > For testing though, couldn’t we have an autogenerated .po which has a unique > and predictable dummy value translation for every string (the string backwards > or something), which can be used for testing? This is all hand-wavy since I > haven’t tried actually doing it, but it seems a better option than waiting for > .po files to be available. Or am I missing the point of the value of the > discussed test? Hmm, no, I think that's precisely it, and that sounds like a pretty good starter idea ... but I wouldn't want to be the one to have to set this up -- it seems pretty laborious. Anyway I'm not objecting to the patch -- I agree that we're already not testing translatability and that this patch shouldn't be forced to start doing it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH v5] Show detailed table persistence in \dt+
> On 2 Jul 2019, at 22:35, Alvaro Herrera wrote: > Anyway I'm not objecting to the patch -- I agree that we're already not > testing translatability and that this patch shouldn't be forced to start > doing it. I forgot to add that to my previous email, the patch as it stands in v8 looks good to me. I’ve missed having this on many occasions. cheers ./daniel
Re: "long" type is not appropriate for counting tuples
Peter Eisentraut writes: > Attached is a patch to implement this in a handful of cases that are > particularly verbose right now. I think those are easy wins. > (Also a second patch that makes use of %zu for size_t where this was not > yet done.) I took a look through these and see nothing objectionable. There are probably more places that can be improved, but we need not insist on getting every such place in one go. Per Robert's position that variables ought to have well-defined widths, there might be something to be said for not touching the variable declarations that you changed from int64 to long long, and instead casting them to long long in the sprintf calls. But I'm not really convinced that that's better than what you've done. Marked CF entry as ready-for-committer. regards, tom lane
Re: benchmarking Flex practices
John Naylor writes: > 0001 is a small patch to remove some unneeded generality from the > current rules. This lowers the number of elements in the yy_transition > array from 37045 to 36201. I don't particularly like 0001. The two bits like this -whitespace ({space}+|{comment}) +whitespace ({space}|{comment}) seem likely to create performance problems for runs of whitespace, in that the lexer will now have to execute the associated action once per space character not just once for the whole run. Those actions are empty, but I don't think flex optimizes for that, and it's really flex's per-action overhead that I'm worried about. Note the comment in the "Performance" section of the flex manual: Another area where the user can increase a scanner's performance (and one that's easier to implement) arises from the fact that the longer the tokens matched, the faster the scanner will run. This is because with long tokens the processing of most input characters takes place in the (short) inner scanning loop, and does not often have to go through the additional work of setting up the scanning environment (e.g., `yytext') for the action. There are a bunch of higher-order productions that use "{whitespace}*", which is surely a bit redundant given the contents of {whitespace}. But maybe we could address that by replacing "{whitespace}*" with "{opt_whitespace}" defined as opt_whitespace ({space}*|{comment}) Not sure what impact if any that'd have on table size, but I'm quite sure that {whitespace} was defined with an eye to avoiding unnecessary lexer action cycles. As for the other two bits that are like -. { - /* This is only needed for \ just before EOF */ +\\ { my recollection is that those productions are defined that way to avoid a flex warning about not all possible input characters being accounted for in the (resp. ) state. Maybe that warning is flex-version-dependent, or maybe this was just a worry and not something that actually produced a warning ... but I'm hesitant to change it. If we ever did get to flex's default action, that action is to echo the current input character to stdout, which would be Very Bad. As far as I can see, the point of 0002 is to have just one set of flex rules for the various variants of quotecontinue processing. That sounds OK, though I'm a bit surprised it makes this much difference in the table size. I would suggest that "state_before" needs a less generic name (maybe "state_before_xqs"?) and more than no comment. Possibly more to the point, it's not okay to have static state variables in the core scanner, so that variable needs to be kept in yyextra. (Don't remember offhand whether it's any more acceptable in the other scanners.) regards, tom lane
Re: Fix two issues after moving to unified logging system for command-line utils
On 2019-07-01 16:18, Alexey Kondratov wrote: > I have found two minor issues with unified logging system for > command-line programs (commited by Peter cc8d415117), while was rebasing > my pg_rewind patch: Fixed, thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Adversarial case for "many duplicates" nbtree split strategy in v12
The single largest benefit of the v12 nbtree enhancement was its adaptability with indexes where a portion of the key space contains many duplicates. Successive page splits choose split points in a way that leaves duplicates together on their own page, and eventually pack pages full of duplicates. I have thought up a specific case where the logic can be fooled into consistently doing the wrong thing, leading to very poor space utilization: drop table if exists descnums; create table descnums(nums int4); create index bloat_desc on descnums (nums desc nulls first); -- Fill first leaf page (leaf root page) with NULL values, to the point where it almost splits: insert into descnums select null from generate_series(0,400); -- Insert integers, which will be treated as descending insertions within index: insert into descnums select i from generate_series(0,1) i; -- Observe if we had 50:50 page splits here: create extension if not exists pgstattuple; select avg_leaf_density, leaf_fragmentation from pgstatindex('bloat_desc'); The final output looks like this: avg_leaf_density | leaf_fragmentation --+ 1.83 | 99.88 (1 row) Although the case is contrived, it is clearly not okay that this is possible -- avg_leaf_density should be about 50 here, which is what you'll see on v11. You'll also see an avg_leaf_density that's at least 50 if you vary any of the details. For example, if you remove "nulls first" then you'll get an avg_leaf_density of ~50. Or, if you make the index ASC then the avg_leaf_density is almost exactly 90 for the usual reason (the NULLs won't "get in the way" of consistently getting rightmost splits that way). Note that I've deliberately arranged for the page splits to be as ineffective as possible by almost filling a leaf page with NULLs, leaving a tiny gap for all future non-NULL integer insertions. This is another case where a bimodal distribution causes trouble when combined with auto-incrementing insertions -- it is slightly similar to the TPC-C issue that the v12 work fixed IMV. You could say that the real root of the problem here is one of two things, depending on your perspective: 1. Arguably, nbtsplitloc.c is already doing the right thing here, and the root of the issue is that _bt_truncate() lacks any way of generating a new high key that is "mid way between" the value NULL in the lastleft tuple and the integer in the firstright tuple during the first split. If _bt_truncate() created a "mid point value" of around INT_MAX/2 for the new high key during the first split, then everything would work out -- we wouldn't keep splitting the same leftmost page again and again. The behavior would stabilize in the same way as it does in the ASC + NULLS LAST case, without hurting any other case that already works well. This isn't an academic point; we might actually need to do that in order to be able to pack the leaf page 90% full with DESC insertions, which ought to be a future goal for nbtsplitloc.c. But that's clearly not in scope for v12. 2. The other way you could look at it (which is likely to be the basis of my fix for v12) is that nbtsplitloc.c has been fooled into treating page splits as "many duplicate" splits, when in fact there are not that many duplicates involved -- there just appears to be many duplicates because they're so unevenly distributed on the page. It would be fine for it to be wrong if there was some way that successive page splits could course correct (see point 1), but that isn't possible here because of the skew -- we get stuck with the same lousy choice of split point again and again. (There also wouldn't be a problem if the integers values were random, since we'd have just one or two uneven splits at the start.) I've already written a rough patch that fixes the issue by taking this second view of the problem. The patch makes nbtsplitloc.c more skeptical about finishing with the "many duplicates" strategy, avoiding the problem -- it can just fall back on a 50:50 page split when it looks like this is happening (the related "single value" strategy must already so something similar in _bt_strategy()). Currently, it simply considers if the new item on the page has an offset number immediately to the right of the split point indicated by the "many duplicates" strategy. We look for it within ~10 offset positions to the right, since that strongly suggests that there aren't that many duplicates after all. I may make the check more careful still, for example by performing additional comparisons on the page to make sure that there are in fact very few distinct values on the whole page. My draft fix doesn't cause any regressions in any of my test cases -- the fix barely affects the splits chosen for my real-world test data, and TPC test data. As far as I know, I already have a comprehensive fix. I will need to think about it much more carefully before proceeding, though. Thoughts? -- Peter Geoghegan
Re: MSVC Build support with visual studio 2019
On Tue, Jul 02, 2019 at 02:10:11PM +0900, Michael Paquier wrote: > OK, committed to HEAD for now after perltidy'ing the patch. Let's see > what the buildfarm has to say about it first. Once we are sure that > the thing is stable, I'll try to backpatch it. This works on my own > dev machines with VS 2015 and 2019, but who knows what hides in the > shadows... The buildfarm did not have much to say, so backpatched down to 9.4, adjusting things on the way. -- Michael signature.asc Description: PGP signature
Re: Memory-Bounded Hash Aggregation
Hi Jeff, On Mon, Jul 01, 2019 at 12:13:53PM -0700, Jeff Davis wrote: This is for design review. I have a patch (WIP) for Approach 1, and if this discussion starts to converge on that approach I will polish and post it. Thanks for working on this. Let's start at the beginning: why do we have two strategies -- hash and sort -- for aggregating data? The two are more similar than they first appear. A partitioned hash strategy writes randomly among the partitions, and later reads the partitions sequentially; a sort will write sorted runs sequentially, but then read the among the runs randomly during the merge phase. A hash is a convenient small representation of the data that is cheaper to operate on; sort uses abbreviated keys for the same reason. What does "partitioned hash strategy" do? It's probably explained in one of the historical discussions, but I'm not sure which one. I assume it simply hashes the group keys and uses that to partition the data, and then passing it to hash aggregate. Hash offers: * Data is aggregated on-the-fly, effectively "compressing" the amount of data that needs to go to disk. This is particularly important when the data contains skewed groups (see below). * Can output some groups after the first pass of the input data even if other groups spilled. * Some data types only support hashing; not sorting. Sort+Group offers: * Only one group is accumulating at once, so if the transition state grows (like with ARRAY_AGG), it minimizes the memory needed. * The input may already happen to be sorted. * Some data types only support sorting; not hashing. Currently, Hash Aggregation is only chosen if the optimizer believes that all the groups (and their transition states) fit in memory. Unfortunately, if the optimizer is wrong (often the case if the input is not a base table), the hash table will keep growing beyond work_mem, potentially bringing the entire system to OOM. This patch fixes that problem by extending the Hash Aggregation strategy to spill to disk when needed. OK, makes sense. Previous discussions: https://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop https://www.postgresql.org/message-id/1419326161.24895.13.camel%40jeff-desktop https://www.postgresql.org/message-id/87be3bd5-6b13-d76e-5618-6db0a4db584d%40iki.fi A lot was discussed, which I will try to summarize and address here. Digression: Skewed Groups: Imagine the input tuples have the following grouping keys: 0, 1, 0, 2, 0, 3, 0, 4, ..., 0, N-1, 0, N Group 0 is a skew group because it consists of 50% of all tuples in the table, whereas every other group has a single tuple. If the algorithm is able to keep group 0 in memory the whole time until finalized, that means that it doesn't have to spill any group-0 tuples. In this example, that would amount to a 50% savings, and is a major advantage of Hash Aggregation versus Sort+Group. Right. I agree efficiently handling skew is important and may be crucial for achieving good performance. High-level approaches: 1. When the in-memory hash table fills, keep existing entries in the hash table, and spill the raw tuples for all new groups in a partitioned fashion. When all input tuples are read, finalize groups in memory and emit. Now that the in-memory hash table is cleared (and memory context reset), process a spill file the same as the original input, but this time with a fraction of the group cardinality. 2. When the in-memory hash table fills, partition the hash space, and evict the groups from all partitions except one by writing out their partial aggregate states to disk. Any input tuples belonging to an evicted partition get spilled to disk. When the input is read entirely, finalize the groups remaining in memory and emit. Now that the in-memory hash table is cleared, process the next partition by loading its partial states into the hash table, and then processing its spilled tuples. 3. Use some kind of hybrid[1][2] of hashing and sorting. Unfortunately the second link does not work :-( Evaluation of approaches: Approach 1 is a nice incremental improvement on today's code. The final patch may be around 1KLOC. It's a single kind of on-disk data (spilled tuples), and a single algorithm (hashing). It also handles skewed groups well because the skewed groups are likely to be encountered before the hash table fills up the first time, and therefore will stay in memory. I'm not going to block Approach 1, althought I'd really like to see something that helps with array_agg. Approach 2 is nice because it resembles the approach of Hash Join, and it can determine whether a tuple should be spilled without a hash lookup. Unfortunately, those upsides are fairly mild, and it has significant downsides: * It doesn't handle skew values well because it's likely to evict them. * If we leave part of the hash table in memory, it's difficult to ensure that we will be able to actually use the space freed by eviction, be
contrib make check-world fail if data have been modified and there's vpath
Hi, because there's destination data src/makefiles/pgxs.mk line ln -s $< $@ fails and make clean doesn't remove these links. ln -sf is an option but it's not tested in configure or rm -f Regards Didier
Re: Bug: ECPG: Cannot use CREATE AS EXECUTE statemnt
Matsumura-san, > I noticed that CREATE AS EXECUTE with using clause needs a new > implementation that all parameters in using clause must be embedded > into > expr-list of EXECUTE in text-format as the following because there is > no interface of protocol for our purpose. > It spends more time for implementing. Do you have any advice? > ... Unfortunately no, I have no advice. Originally all statements needed this treatment. :) Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Re: [PATCH] memory leak in ecpglib
Hi, > Memory leaks occur when the ecpg_update_declare_statement() is called > the second time. > ... I'm going to commit this patch HEAD, this way we can see if it works as advertised. It does not hurt if it gets removed by a rewrite. Thanks for finding the issue, Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
Re: Attached partition not considering altered column properties of root partition.
Hi Prabhat, On Tue, Jul 2, 2019 at 5:12 PM Prabhat Sahu wrote: > > Hi, > > In below testcase when I changed the staorage option for root partition, > newly attached partition not including the changed staorage option. > Is this an expected behavior? Thanks for the report. This seems like a bug. Documentation claims that the child tables inherit column storage options from the parent table. That's actually enforced in only some cases. 1. If you create the child table as a child to begin with (that is, not attach it as child after the fact): create table parent (a text); create table child () inherits (parent); select attrelid::regclass, attname, attstorage from pg_attribute where attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a'; attrelid │ attname │ attstorage ──┼─┼ parent │ a │ x child│ a │ x (2 rows) 2. If you change the parent's column's storage option, child's column is recursively changed. alter table parent alter a set storage main; select attrelid::regclass, attname, attstorage from pg_attribute where attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a'; attrelid │ attname │ attstorage ──┼─┼ parent │ a │ m child│ a │ m (2 rows) However, we fail to enforce the rule when the child is attached after the fact: create table child2 (a text); alter table child2 inherit parent; select attrelid::regclass, attname, attstorage from pg_attribute where attrelid in ('parent'::regclass, 'child'::regclass, 'child2'::regclass) and attname = 'a'; attrelid │ attname │ attstorage ──┼─┼ parent │ a │ m child│ a │ m child2 │ a │ x (3 rows) To fix this, MergeAttributesIntoExisting() should check that the attribute options of a child don't conflict with the parent, which the attached patch implements. Note that partitioning uses the same code as inheritance, so the fix applies to it too. After the patch: create table p (a int, b text) partition by list (a); create table p1 partition of p for values in (1); select attrelid::regclass, attname, attstorage from pg_attribute where attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b'; attrelid │ attname │ attstorage ──┼─┼ p│ b │ x p1 │ b │ x (2 rows) alter table p alter b set storage main; select attrelid::regclass, attname, attstorage from pg_attribute where attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b'; attrelid │ attname │ attstorage ──┼─┼ p│ b │ m p1 │ b │ m (2 rows) create table p2 (like p); select attrelid::regclass, attname, attstorage from pg_attribute where attrelid in ('p'::regclass, 'p1'::regclass, 'p2'::regclass) and attname = 'b'; attrelid │ attname │ attstorage ──┼─┼ p│ b │ m p1 │ b │ m p2 │ b │ x (3 rows) alter table p attach partition p2 for values in (2); ERROR: child table "p2" has different storage option for column "b" than parent DETAIL: EXTENDED versus MAIN -- ok after changing p2 to match alter table p2 alter b set storage main; alter table p attach partition p2 for values in (2); Thanks, Amit attstorage-inherit-bug.patch Description: Binary data
Re: TopoSort() fix
Could the attached patch fix this issue? Or does any one else plan to fix it? If people are busy and have not time, I can go ahead to fix it. To fix this issue, do we need a patch for each official branch? Regards, Ruihai On Tue, Jul 2, 2019 at 11:23 PM Tom Lane wrote: > Rui Hai Jiang writes: > > I think I found an issue in the TopoSort() function. > > This indeed seems like a live bug introduced by a1c1af2a. > Robert? > > regards, tom lane >
Re: progress report for ANALYZE
Hi Alvaro! On 2019/06/22 3:52, Alvaro Herrera wrote: Hello Here's a patch that implements progress reporting for ANALYZE. Sorry for the late reply. My email address was changed to tatsuro.yamada...@nttcom.co.jp. I have a question about your patch. My ex-colleague Vinayak created same patch in 2017 [1], and he couldn't get commit because there are some reasons such as the patch couldn't handle analyzing Foreign table. Therefore, I wonder whether your patch is able to do that or not. However, actually, I think it's okay because the feature is useful for DBAs, even if your patch can't handle Foreign table. I'll review your patch in this week. :) [1] ANALYZE command progress checker https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006 Regards, Tatsuro Yamada
Re: Run-time pruning for ModifyTable
Hi David, On Thu, Jun 27, 2019 at 2:28 PM David Rowley wrote: > Deja vu from this time last year when despite everyone's best efforts > (mostly Alvaro) we missed getting run-time pruning in for MergeAppend > into the PG11 release. This year it was ModifyTable, which is now > possible thanks to Amit and Tom's modifications to the inheritance > planner. > > I've attached what I have so far for this. Thanks for working on this. IIUC, the feature is to skip modifying a given result relation if run-time pruning dictates that none of its existing rows will match some dynamically computable quals. > I think it's mostly okay, > but my brain was overheating a bit at the inheritance_planner changes. I think we need to consider the fact that there is a proposal [1] to get rid of inheritance_planner() as the way of planning UPDATE/DELETEs on inheritance trees. If we go that route, then a given partitioned target table will be expanded at the bottom and so, there's no need for ModifyTable to have its own run-time pruning info, because Append/MergeAppend will have it. Maybe, we will need some code in ExecInitModifyTable() and ExecModifyTable() to handle the case where run-time pruning, during plan tree initialization and plan tree execution respectively, may have rendered modifying a given result relation unnecessary. A cursory look at the patch suggests that most of its changes will be for nothing if [1] materializes. What do you think about that? Thanks, Amit [1] https://www.postgresql.org/message-id/357.1550612935%40sss.pgh.pa.us
Re: Another way to fix inherited UPDATE/DELETE
Hi Tom, On Wed, Feb 20, 2019 at 6:49 AM Tom Lane wrote: > Obviously this'd be a major rewrite with no chance of making it into v12, > but it doesn't sound too big to get done during v13. Are you planning to work on this? Thanks, Amit
Re: Where is SSPI auth username determined for TAP tests?
On Sun, Jun 30, 2019 at 12:09:18PM -0400, Tom Lane wrote: > We could apply the same hack on the source node, but I wonder if it > wouldn't be better to make this less of a kluge. I'm tempted to > propose that "pg_regress --config-auth --user XXX" should understand > XXX as the bootstrap superuser name, and then we could clean up > 010_dump_connstr.pl by using that. I have been reviewing that part, and the part to split the bootstrap user from the set of extra roles created looks fine to me. Now, it seems to me that you can simplify 010_dump_connstr.pl as per the attached because PostgresNode::Init can take care of --auth-config part with the correct options using auth_extra. What do you think about the cleanup attached? -- Michael diff --git a/src/bin/pg_dump/t/010_dump_connstr.pl b/src/bin/pg_dump/t/010_dump_connstr.pl index 048ecf24eb..e9f0ff8211 100644 --- a/src/bin/pg_dump/t/010_dump_connstr.pl +++ b/src/bin/pg_dump/t/010_dump_connstr.pl @@ -177,15 +177,11 @@ my $restore_super = qq{regress_a'b\\c=d\\ne"f}; # dbname/user connection parameters my $envar_node = get_new_node('destination_envar'); -$envar_node->init(extra => - [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]); -$envar_node->run_log( - [ - $ENV{PG_REGRESS}, '--config-auth', - $envar_node->data_dir, '--user', - $dst_bootstrap_super, '--create-role', - $restore_super - ]); +$envar_node->init( + extra => + [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ], + auth_extra => + [ '--user', $dst_bootstrap_super, '--create-role', $restore_super ]); $envar_node->start; # make superuser for restore @@ -210,15 +206,11 @@ is($stderr, '', 'no dump errors'); $restore_super =~ s/"//g if $TestLib::windows_os;# IPC::Run mishandles '"' on Windows my $cmdline_node = get_new_node('destination_cmdline'); -$cmdline_node->init(extra => - [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]); -$cmdline_node->run_log( - [ - $ENV{PG_REGRESS},'--config-auth', - $cmdline_node->data_dir, '--user', - $dst_bootstrap_super,'--create-role', - $restore_super - ]); +$cmdline_node->init( + extra => + [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ], + auth_extra => + [ '--user', $src_bootstrap_super, '--create-role', $restore_super ]); $cmdline_node->start; $cmdline_node->run_log( [ 'createuser', '-U', $dst_bootstrap_super, '-s', $restore_super ]); signature.asc Description: PGP signature
Re: [HACKERS] WIP: Aggregation push-down
On Fri, Mar 1, 2019 at 12:01 AM Antonin Houska wrote: > Tom Lane wrote: > > > Antonin Houska writes: > > > Michael Paquier wrote: > > >> Latest patch set fails to apply, so moved to next CF, waiting on > > >> author. > > > > > Rebased. > > > > This is in need of rebasing again :-(. I went ahead and pushed the 001 > > part, since that seemed fairly uncontroversial. > > ok, thanks. > Another rebase is needed for the patches. Thanks Richard
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Tue, Jul 2, 2019 at 1:47 PM amul sul wrote: > Attached version is rebase atop of the latest master head(c74d49d41c), thanks. Thanks! Will review. Best regards, Etsuro Fujita
Re: POC: converting Lists into arrays
On Tue, 2 Jul 2019 at 11:27, Tom Lane wrote: > My previous patch would have had you replace this with a loop using > an integer list-position index. You can still do that if you like, > but it's less change to convert the loop to a foreach(), drop the > prev/next variables, and replace the list_delete_cell call with > foreach_delete_current: > > foreach(cell, state->enterKeys) > { > TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell); > > if (need to delete) > state->enterKeys = > foreach_delete_current(state->enterKeys, > cell); > } > > So I think this is a win, and attached is v7. It's pretty nice to get rid of those. I also like you've handled the changes in SRFs. I've now read over the entire patch and have noted down the following: 1. MergeAttributes() could do with a comment mention why you're not using foreach() on the outer loop. I almost missed the list_delete_nth_cell() call that's a few branches deep in the outer loop. 2. In expandTupleDesc(), couldn't you just change the following: int i; for (i = 0; i < offset; i++) { if (aliascell) aliascell = lnext(eref->colnames, aliascell); } to: aliascell = offset < list_length(eref->colnames) ? list_nth_cell(eref->colnames, offset) : NULL; 3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ? 4. Do you think it would be a good idea to document that the 'pos' arg in list_insert_nth and co must be <= list_length(). I know you've mentioned that in insert_new_cell, but that's static and list_insert_nth is not. I think it would be better not to have to chase down comments of static functions to find out how to use an external function. 5. Why does enlarge_list() return List *? Can't it just return void? I noticed this after looking at add_new_cell_after() and reading your cautionary comment and then looking at lappend_cell(). At first, it seemed that lappend_cell() could end up reallocating List to make way for the new cell, but from looking at enlarge_list() it seems we always maintain the original allocation of the header. So why bother returning List * in that function? 6. Is there a reason to use memmove() in list_concat() rather than just memcpy()? I don't quite believe the comment you've written. As far as I can see you're not overwriting any useful memory so the order of the copy should not matter. 7. The last part of the following comment might not age well. /* * Note: unlike the individual-list-cell deletion functions, we never make * any effort to move the surviving list cells to new storage. This is * because none of them can move in this operation, so it's the same as * the old implementation in terms of what callers may assume. */ The old comment about extending the list seems more fitting. 9. I see your XXX comment in list_qsort(), but wouldn't it be better to just invent list_qsort_internal() as a static function and just have it qsort the list in-place, then make list_qsort just return list_qsort_internal(list_copy(list)); and keep the XXX comment so that the fixup would just remove the list_copy()? That way, when it comes to fixing that inefficiency we can just cut and paste the internal implementation into list_qsort(). It'll be much less churn, especially if you put the internal version directly below the external one. 10. I wonder if we can reduce a bit of pain for extension authors by back patching a macro that wraps up a lnext() call adding a dummy argument for the List. That way they don't have to deal with their own pre-processor version dependent code. Downsides are we'd need to keep the macro into the future, however, it's just 1 line of code... I also did some more benchmarking of the patch. Basically, I patched with the attached file (converted to .txt not to upset the CFbot) then ran make installcheck. This was done on an AWS m5d.large instance. The patch runs the planner 10 times then LOGs the average time of those 10 runs. Taking the sum of those averages I see: Master: 0.482479 seconds Patched: 0.471949 seconds Which makes the patched version 2.2% faster than master on that run. I've resisted attaching the spreadsheet since there are almost 22k data points per run. Apart from the 10 points above, I think the patch is good to go. I also agree with keeping the further improvements like getting rid of the needless list_copy() in list_concat() calls as a followup patch. I also agree with Tom about moving quickly with this one. Reviewing it in detail took me a long time, I'd really rather not do it again after leaving it to rot for a while. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 1bcfdd67e0..750320cb3b 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -22,