Re: [Proposal] Global temporary tables
On 12.01.2020 4:51, Tomas Vondra wrote: On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote: On 09.01.2020 19:48, Tomas Vondra wrote: The most complex and challenged task is to support GTT for all kind of indexes. Unfortunately I can not proposed some good universal solution for it. Just patching all existed indexes implementation seems to be the only choice. I haven't looked at the indexing issue closely, but IMO we need to ensure that every session sees/uses only indexes on GTT that were defined before the seesion started using the table. Why? It contradicts with behavior of normal tables. Assume that you have active clients and at some point of time DBA recognizes that them are spending to much time in scanning some GTT. It cab create index for this GTT but if existed client will not be able to use this index, then we need somehow make this clients to restart their sessions? In my patch I have implemented building indexes for GTT on demand: if accessed index on GTT is not yet initialized, then it is filled with local data. Yes, I know the behavior would be different from behavior for regular tables. And yes, it would not allow fixing slow queries in sessions without interrupting those sessions. I proposed just ignoring those new indexes because it seems much simpler than alternative solutions that I can think of, and it's not like those other solutions don't have other issues. Quit opposite: prohibiting sessions to see indexes created before session start to use GTT requires more efforts. We need to somehow maintain and check GTT first access time. For example, I've looked at the "on demand" building as implemented in global_private_temp-8.patch, I kinda doubt adding a bunch of index build calls into various places in index code seems somewht suspicious. We in any case has to initialize GTT indexes on demand even if we prohibit usages of indexes created after first access by session to GTT. So the difference is only in one thing: should we just initialize empty index or populate it with local data (if rules for index usability are the same for GTT as for normal tables). From implementation point of view there is no big difference. Actually building index in standard way is even simpler than constructing empty index. Originally I have implemented first approach (I just forgot to consider case when GTT was already user by a session). Then I rewrited it using second approach and patch even became simpler. * brinbuild is added to brinRevmapInitialize, which is meant to initialize state for scanning. It seems wrong to build the index we're scanning from this function (layering and all that). * btbuild is called from _bt_getbuf. That seems a bit ... suspicious? As I already mentioned - support of indexes for GTT is one of the most challenged things in my patch. I didn't find good and universal solution. So I agreed that call of btbuild from _bt_getbuf may be considered as suspicious. I will be pleased if you or sombody else can propose better elternative and not only for B-Tree, but for all other indexes. But as I already wrote above, prohibiting session to used indexes created after first access to GTT doesn't solve the problem. For normal tables (and for local temp tables) indexes are initialized at the time of their creation. With GTT it doesn't work, because each session has its own local data of GTT. We should either initialize/build index on demand (when it is first accessed), either at the moment of session start initialize indexes for all existed GTTs. Last options seem to be much worser from my point of view: there may me huge number of GTT and session may not need to access GTT at all. ... and so on for other index types. Also, what about custom indexes implemented in extensions? It seems a bit strange each of them has to support this separately. I have already complained about it: my patch supports GTT for all built-in indexes, but custom indexes has to handle it themselves. Looks like to provide some generic solution we need to extend index API, providing two diffrent operations: creation and initialization. But extending index API is very critical change... And also it doesn't solve the problem with all existed extensions: them in any case have to be rewritten to implement new API version in order to support GTT. IMHO if this really is the right solution, we need to make it work for existing indexes without having to tweak them individually. Why don't we track a flag whether an index on GTT was initialized in a given session, and if it was not then call the build function before calling any other function from the index AM? But let's talk about other issues caused by "on demand" build. Imagine you have 50 sessions, each using the same GTT with a GB of per-session data. Now you create a new index on the GTT, which forces the sessions to build it's "local" index. Those builds will use maintenance_work_mem e
Re: How to make a OpExpr check compatible among different versions
On 2020-01-13 08:29, Andy Fan wrote: During one of my works for logical rewrite, I want to check if the expr is a given Expr. so the simplest way is: if (expr->opno == 418 && nodeTag(linitial(expr->args)) == T_xxx && nodeTag(lsecond(expr->args)) == T_ ) { .. } if we write code like above, we may have issues if the oid changed in the future version. Generally, you would do this by using a preprocessor symbol. For example, instead of hardcoding the OID of the text type, you would use the symbol TEXTOID instead. Symbols like that exist for many catalog objects that one might reasonably need to hardcode. However, hardcoding an OID reference to an operator looks like a design mistake to me. Operators should normally be looked up via operator classes or similar structures that convey the meaning of the operator. Also, instead of nodeTag() == T_xxx you should use the IsA() macro. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Comment fix in session.h
On Mon, Jan 13, 2020 at 12:51 PM Antonin Houska wrote: > > This diff fixes what I consider a typo. > LGTM. I'll push this in some time. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Question regarding heap_multi_insert documentation
On Mon, Jan 13, 2020 at 12:40:20AM +0100, Daniel Gustafsson wrote: > Thanks for clarifying. PFA tiny patch for this. Thanks, pushed. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Block level parallel vacuum
Hello > I just thought they were concerned > that the variable name skip_index might be confusing because we skip > if skip_index is NOT true. Right. >> > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || >> > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); >> > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || >> > + skip_parallel_vacuum_index(Irel[i], >> > + lps->lvshared)); >> > >> > The above condition is true when the index can *not* do parallel index >> vacuum. Ouch, right. I was wrong. (or the variable name and the comment really confused me) > Okay, would it better if we get rid of this variable and have code like below? > > /* Skip the indexes that can be processed by parallel workers */ > if ( !(get_indstats(lps->lvshared, i) == NULL || > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > continue; Complex condition... Not sure. > How about changing it to skipped_index and change the comment to something > like “We are interested in only index skipped parallel vacuum”? I prefer this idea. > Today, again thinking about it, it seems > the idea Mahendra is suggesting that is giving an error if the > parallel degree is not specified seems reasonable to me. +1 regards, Sergei
[PATCH] distinct aggregates within a window function WIP
Hi hackers, I want to propose to you an old patch for Postgres 11, off-site developed by Oliver Ford, but I have permission from him to publish it and to continue it's development, that allow distinct aggregates, like select sum(distinct nums) within a window function. I have rebased it for current git master branch and have made necessary changes to it to work with Postgres 13devel. It's a WIP, because it doesn't have tests yet (I will add them later) and also, it works for a int, float, and numeric types, but probably distinct check can be rewritten for possible performance improvement, with storing the distinct elements in a hash table which should give a performance improvement. If you find the implementation of patch acceptable from committers perspective, I will answer to all yours design and review notes and will try to go ahead with it, also, I will add this patch to the March commit fest. For example usage of a patch, if you have time series data, with current Postgres you will get an error: postgres=# CREATE TABLE t_demo AS postgres-# SELECT ordinality, day, date_part('week', day) AS week postgres-# FROMgenerate_series('2020-01-02', '2020-01-15', '1 day'::interval) postgres-# WITH ORDINALITY AS day; SELECT 14 postgres=# SELECT * FROM t_demo; ordinality | day | week ++-- 1 | 2020-01-02 00:00:00+02 |1 2 | 2020-01-03 00:00:00+02 |1 3 | 2020-01-04 00:00:00+02 |1 4 | 2020-01-05 00:00:00+02 |1 5 | 2020-01-06 00:00:00+02 |2 6 | 2020-01-07 00:00:00+02 |2 7 | 2020-01-08 00:00:00+02 |2 8 | 2020-01-09 00:00:00+02 |2 9 | 2020-01-10 00:00:00+02 |2 10 | 2020-01-11 00:00:00+02 |2 11 | 2020-01-12 00:00:00+02 |2 12 | 2020-01-13 00:00:00+02 |3 13 | 2020-01-14 00:00:00+02 |3 14 | 2020-01-15 00:00:00+02 |3 (14 rows) postgres=# SELECT *, postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) postgres-# FROMt_demo; ERROR: DISTINCT is not implemented for window functions LINE 2: array_agg(DISTINCT week) OVER (ORDER BY day ROWS ^ So you will need to write something like this: postgres=# SELECT *, (SELECT array_agg(DISTINCT unnest) FROM unnest(x)) AS b postgres-# FROM postgres-# ( postgres(# SELECT *, postgres(# array_agg(week) OVER (ORDER BY day ROWS postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS x postgres(# FROMt_demo postgres(# ) AS a; ordinality | day | week | x | b ++--+-+--- 1 | 2020-01-02 00:00:00+02 |1 | {1,1,1} | {1} 2 | 2020-01-03 00:00:00+02 |1 | {1,1,1,1} | {1} 3 | 2020-01-04 00:00:00+02 |1 | {1,1,1,1,2} | {1,2} 4 | 2020-01-05 00:00:00+02 |1 | {1,1,1,2,2} | {1,2} 5 | 2020-01-06 00:00:00+02 |2 | {1,1,2,2,2} | {1,2} 6 | 2020-01-07 00:00:00+02 |2 | {1,2,2,2,2} | {1,2} 7 | 2020-01-08 00:00:00+02 |2 | {2,2,2,2,2} | {2} 8 | 2020-01-09 00:00:00+02 |2 | {2,2,2,2,2} | {2} 9 | 2020-01-10 00:00:00+02 |2 | {2,2,2,2,2} | {2} 10 | 2020-01-11 00:00:00+02 |2 | {2,2,2,2,3} | {2,3} 11 | 2020-01-12 00:00:00+02 |2 | {2,2,2,3,3} | {2,3} 12 | 2020-01-13 00:00:00+02 |3 | {2,2,3,3,3} | {2,3} 13 | 2020-01-14 00:00:00+02 |3 | {2,3,3,3} | {2,3} 14 | 2020-01-15 00:00:00+02 |3 | {3,3,3} | {3} (14 rows) With attached version, you will get the desired results: postgres=# SELECT *, postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) postgres-# FROMt_demo; ordinality | day | week | array_agg ++--+--- 1 | 2020-01-02 00:00:00+02 |1 | {1} 2 | 2020-01-03 00:00:00+02 |1 | {1} 3 | 2020-01-04 00:00:00+02 |1 | {1,2} 4 | 2020-01-05 00:00:00+02 |1 | {1,2} 5 | 2020-01-06 00:00:00+02 |2 | {1,2} 6 | 2020-01-07 00:00:00+02 |2 | {1,2} 7 | 2020-01-08 00:00:00+02 |2 | {2} 8 | 2020-01-09 00:00:00+02 |2 | {2} 9 | 2020-01-10 00:00:00+02 |2 | {2} 10 | 2020-01-11 00:00:00+02 |2 | {2,3} 11 | 2020-01-12 00:00:00+02 |2 | {2,3} 12 | 2020-01-13 00:00:00+02 |3 | {2,3} 13 | 2020-01-14 00:00:00+02 |3 | {2,3} 14 | 2020-01-15 00:00:00+02 |3 | {3} (14 rows) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 4cc7da268d..66
Re: How to make a OpExpr check compatible among different versions
On Mon, Jan 13, 2020 at 4:09 PM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 2020-01-13 08:29, Andy Fan wrote: > > During one of my works for logical rewrite, I want to check if the expr > > is a given Expr. > > > > so the simplest way is: > > if (expr->opno == 418 && nodeTag(linitial(expr->args)) == T_xxx && > > nodeTag(lsecond(expr->args)) == T_ ) > > { > > .. > > } > > > > if we write code like above, we may have issues if the oid changed in > > the future version. > > Generally, you would do this by using a preprocessor symbol. For > example, instead of hardcoding the OID of the text type, you would use > the symbol TEXTOID instead. Symbols like that exist for many catalog > objects that one might reasonably need to hardcode. > > However, hardcoding an OID reference to an operator looks like a design > mistake to me. Operators should normally be looked up via operator > classes or similar structures that convey the meaning of the operator. > Yes, I just realized this. Thanks for your point! > Also, instead of nodeTag() == T_xxx you should use the IsA() macro. > > Thank you for this as well. > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
isTempNamespaceInUse() is incorrect with its handling of MyBackendId
Hi all, While reviewing some code in namespace.c, I have bumped into the following issue introduced by 246a6c8: diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index c82f9fc4b5..e70243a008 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -3235,8 +3235,8 @@ isTempNamespaceInUse(Oid namespaceId) backendId = GetTempNamespaceBackendId(namespaceId); - if (backendId == InvalidBackendId || - backendId == MyBackendId) + /* No such temporary namespace? */ + if (backendId == InvalidBackendId) return false; The current logic of isTempNamespaceInUse() would cause a session calling the routine to return always false if trying to check if its own temporary session is in use, but that's incorrect. It is actually safe to remove the check on MyBackendId as the code would fall back on a check equivalent to MyProc->tempNamespaceId a bit down as per the attached, so let's fix it. Thoughts? -- Michael diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index c82f9fc4b5..e70243a008 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -3235,8 +3235,8 @@ isTempNamespaceInUse(Oid namespaceId) backendId = GetTempNamespaceBackendId(namespaceId); - if (backendId == InvalidBackendId || - backendId == MyBackendId) + /* No such temporary namespace? */ + if (backendId == InvalidBackendId) return false; /* Is the backend alive? */ signature.asc Description: PGP signature
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Thu, Jan 9, 2020 at 12:09 PM Amit Kapila wrote: > > On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar wrote: > > > > On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila wrote: > > > > > > On Wed, Jan 8, 2020 at 1:12 PM Dilip Kumar wrote: > > > > > > > > I have observed one more design issue. > > > > > > > > > > Good observation. > > > > > > > The problem is that when we > > > > get a toasted chunks we remember the changes in the memory(hash table) > > > > but don't stream until we get the actual change on the main table. > > > > Now, the problem is that we might get the change of the toasted table > > > > and the main table in different streams. So basically, in a stream, > > > > if we have only got the toasted tuples then even after > > > > ReorderBufferStreamTXN the memory usage will not be reduced. > > > > > > > > > > I think we can't split such changes in a different stream (unless we > > > design an entirely new solution to send partial changes of toast > > > data), so we need to send them together. We can keep a flag like > > > data_complete in ReorderBufferTxn and mark it complete only when we > > > are able to assemble the entire tuple. Now, whenever, we try to > > > stream the changes once we reach the memory threshold, we can check > > > whether the data_complete flag is true, if so, then only send the > > > changes, otherwise, we can pick the next largest transaction. I think > > > we can retry it for few times and if we get the incomplete data for > > > multiple transactions, then we can decide to spill the transaction or > > > maybe we can directly spill the first largest transaction which has > > > incomplete data. > > > > > Yeah, we might do something on this line. Basically, we need to mark > > the top-transaction as data-incomplete if any of its subtransaction is > > having data-incomplete (it will always be the latest sub-transaction > > of the top transaction). Also, for streaming, we are checking the > > largest top transaction whereas for spilling we just need the larget > > (sub) transaction. So we also need to decide while picking the > > largest top transaction for streaming, if we get a few transactions > > with in-complete data then how we will go for the spill. Do we spill > > all the sub-transactions under this top transaction or we will again > > find the larget (sub) transaction for spilling. > > > > I think it is better to do later as that will lead to the spill of > only required (minimum changes to get the memory below threshold) > changes. I think instead of doing this can't we just spill the changes which are in toast_hash. Basically, at the end of the stream, we have some toast tuple which we could not stream because we did not have the insert for the main table then we can spill only those changes which are in tuple hash. And, in the subsequent stream whenever we get the insert for the main table at that time we can restore those changes and stream. We can also maintain some flag saying data is not complete (with some change LSN number) and after that LSN we can spill any toast change to disk until we get the change for the main table, by this way we can avoid building tuple hash until we get the change for the main table. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: ALTER TABLE support for dropping generation expression
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed Thank you! Looks good to me. I have no further comments. I'll mark as ready for committer. The new status of this patch is: Ready for Committer
Re: [HACKERS] Block level parallel vacuum
On Thu, Jan 9, 2020 at 4:03 PM Amit Kapila wrote: > > On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada > wrote: > > > > On Wed, 8 Jan 2020 at 22:16, Amit Kapila wrote: > > > > > > > > > What do you think of the attached? Sawada-san, kindly verify the > > > changes and let me know your opinion. > > > > I agreed to not include both the FAST option patch and > > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus > > on the main part and we can discuss and add them later if want. > > > > I've looked at the latest version patch you shared. Overall it looks > > good and works fine. I have a few small comments: > > > > I have addressed all your comments and slightly change nearby comments > and ran pgindent. I think we can commit the first two preparatory > patches now unless you or someone else has any more comments on those. > I have pushed the first one (4e514c6) and I am planning to commit the next one (API: v46-0001-Introduce-IndexAM-fields-for-parallel-vacuum) patch on Wednesday. We are still discussing a few things for the main parallel vacuum patch (v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel) which we should reach conclusion soon. In the attached, I have made a few changes in the comments of patch v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v46-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch Description: Binary data v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel.patch Description: Binary data
Re: Comment fix in session.h
On Mon, Jan 13, 2020 at 2:08 PM Amit Kapila wrote: > > On Mon, Jan 13, 2020 at 12:51 PM Antonin Houska wrote: > > > > This diff fixes what I consider a typo. > > > > LGTM. I'll push this in some time. > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Fri, Jan 10, 2020 at 9:43 PM Alvaro Herrera wrote: > On 2020-Jan-09, Alvaro Herrera wrote: > > > I looked at this a little while and was bothered by the perl changes; it > > seems out of place to have RecursiveCopy be thinking about tablespaces, > > which is way out of its league. So I rewrote that to use a callback: > > the PostgresNode code passes a callback that's in charge to handle the > > case of a symlink. Things look much more in place with that. I didn't > > verify that all places that should use this are filled. > > > > In 0002 I found adding a new function unnecessary: we can keep backwards > > compat by checking 'ref' of the third argument. With that we don't have > > to add a new function. (POD changes pending.) > > I forgot to add that something in these changes is broken (probably the > symlink handling callback) so the tests fail, but I couldn't stay away > from my daughter's birthday long enough to figure out what or how. I'm > on something else today, so if one of you can research and submit fixed > versions, that'd be great. > > Thanks, > I spent some time on this before getting off work today. With below fix, the 4th test is now ok but the 5th (last one) hangs due to panic. (gdb) bt #0 0x003397e32625 in raise () from /lib64/libc.so.6 #1 0x003397e33e05 in abort () from /lib64/libc.so.6 #2 0x00a90506 in errfinish (dummy=0) at elog.c:590 #3 0x00a92b4b in elog_finish (elevel=22, fmt=0xb2d580 "cannot find directory %s tablespace %d database %d") at elog.c:1465 #4 0x0057aa0a in XLogLogMissingDir (spcNode=16384, dbNode=0, path=0x1885100 "pg_tblspc/16384/PG_13_202001091/16389") at xlogutils.c:104 #5 0x0065e92e in dbase_redo (record=0x1841568) at dbcommands.c:2225 #6 0x0056ac94 in StartupXLOG () at xlog.c:7200 diff --git a/src/include/commands/dbcommands.h b/src/include/commands/dbcommands.h index b71b400e700..f8f6d5ffd03 100644 --- a/src/include/commands/dbcommands.h +++ b/src/include/commands/dbcommands.h @@ -19,8 +19,6 @@ #include "lib/stringinfo.h" #include "nodes/parsenodes.h" -extern void CheckMissingDirs4DbaseRedo(void); - extern Oid createdb(ParseState *pstate, const CreatedbStmt *stmt); extern void dropdb(const char *dbname, bool missing_ok, bool force); extern void DropDatabase(ParseState *pstate, DropdbStmt *stmt); diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index e6e7ea505d9..4eef8bb1985 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -615,11 +615,11 @@ sub _srcsymlink my $srcrealdir = readlink($srcpath); opendir(my $dh, $srcrealdir); - while (readdir $dh) + while (my $entry = (readdir $dh)) { - next if (/^\.\.?$/); - my $spath = "$srcrealdir/$_"; - my $dpath = "$dstrealdir/$_"; + next if ($entry eq '.' or $entry eq '..'); + my $spath = "$srcrealdir/$entry"; + my $dpath = "$dstrealdir/$entry"; RecursiveCopy::copypath($spath, $dpath); } closedir $dh;
Re: benchmarking Flex practices
On Mon, Jan 13, 2020 at 7:57 AM Tom Lane wrote: > > Hmm ... after a bit of research I agree that these functions are not > a portability hazard. They are present at least as far back as flex > 2.5.33 which is as old as we've got in the buildfarm. > > However, I'm less excited about them from a performance standpoint. > The BEGIN() macro expands to (ordinarily) > > yyg->yy_start = integer-constant > > which is surely pretty cheap. However, yy_push_state is substantially > more expensive than that, not least because the first invocation in > a parse cycle will involve a malloc() or palloc(). Likewise yy_pop_state > is multiple times more expensive than plain BEGIN(). > > Now, I agree that this is negligible for ECPG's usage, so if > pushing/popping state is helpful there, let's go for it. But I am > not convinced it's negligible for the backend, and I also don't > see that we actually need to track any nested scanner states there. > So I'd rather stick to using BEGIN in the backend. Not sure about > psql. Okay, removed in v11. The advantage of stack functions in ECPG was to avoid having the two variables state_before_str_start and state_before_str_stop. But if we don't use stack functions in the backend, then consistency wins in my mind. Plus, it was easier for me to revert the stack functions for all 3 scanners. > BTW, while looking through the latest patch it struck me that > "UCONST" is an underspecified and potentially confusing name. > It doesn't indicate what kind of constant we're talking about, > for instance a C programmer could be forgiven for thinking > it means something like "123U". What do you think of "USCONST", > following UIDENT's lead of prefixing U onto whatever the > underlying token type is? Makes perfect sense. Grepping through the source tree, indeed it seems the replication command scanner is using UCONST for digits. Some other cosmetic adjustments in ECPG parser.c: -Previously I had a WIP comment in about 2 functions that are copies from elsewhere. In v11 I just noted that they are copied. -I thought it'd be nicer if ECPG spelled UESCAPE in caps when reconstructing the string. -Corrected copy-paste-o in comment Also: -reverted some spurious whitespace changes -revised scan.l comment about the performance benefits of no backtracking -split the ECPG C-comment scanning cleanup into a separate patch, as I did for v6. I include it here since it's related (merging scanner states), but not relevant to making the core scanner smaller. -wrote draft commit messages -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services v11-0001-Reduce-size-of-backend-scanner-transition-array.patch Description: Binary data v11-0002-Merge-ECPG-scanner-states-regarding-C-comments.patch Description: Binary data
Re: our checks for read-only queries are not great
On 2020-01-10 14:41, Robert Haas wrote: This rule very nearly matches the current behavior: it explains why temp table operations are allowed, and why ALTER SYSTEM is allowed, and why REINDEX etc. are allowed. However, there's a notable exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed in a read-only transaction. Under the "doesn't change pg_dump output" criteria, the first and third ones should be permitted but COMMIT PREPARED should be denied, except maybe if the prepared transaction didn't do any writes (and in that case, why did we bother preparing it?). Despite that, this rule does a way better job explaining the current behavior than anything else suggested so far. I don't follow. Does pg_dump dump prepared transactions? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Allow to_date() and to_timestamp() to accept localized names
On Sat, Jan 11, 2020 at 5:06 PM Tomas Vondra wrote: > > Thanks. I did a quick review of this patch, and I think it's almost RFC. > > Thanks for reviewing. > - In func.sgml, it seems we've lost this bit: > > > TM does not include trailing blanks. > to_timestamp and to_date > ignore > the TM modifier. > > >Does that mean the function no longer ignore the TM modifier? That >would be somewhat problematic (i.e. it might break some user code). >But looking at the code I don't see why the patch would have this >effect, so I suppose it's merely a doc bug. > > It is intentional. This patch uses the TM modifier to identify the usage of localized names as input for to_timestamp() and to_date(). > - I don't think we need to include examples how to_timestmap ignores >case, I'd say just stating the fact is clear enough. But if we want to >have examples, I think we should not inline in the para but use the >established pattern: > > > Some examples: > > ... > > > >which is used elsewhere in the func.sgml file. > I was trying to match the style surrounding the usage notes for date/time formatting [1]. Agreed that it is not worth an example on its own, so dropped. > > - In formatting.c the "common/unicode_norm.h" should be right after >includes from "catalog/" to follow the alphabetic order (unless >there's a reason why that does not work). > Fixed. > > - I rather dislike the "dim" parameter name, because I immediately think >"dimension" which makes little sense. I suggest renaming to "nitems" >or "nelements" or something like that. > Agreed, using "nelements" as a better style matchup. Please, find attached a version addressing the above mentioned. [1] https://www.postgresql.org/docs/current/functions-formatting.html Regards, Juan José Santamaría Flecha > 0001-Allow-localized-month-names-to_date-v5.patch Description: Binary data
Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId
On Mon, Jan 13, 2020 at 06:37:03PM +0900, Michael Paquier wrote: > Hi all, > > While reviewing some code in namespace.c, I have bumped into the > following issue introduced by 246a6c8: > diff --git a/src/backend/catalog/namespace.c > b/src/backend/catalog/namespace.c > index c82f9fc4b5..e70243a008 100644 > --- a/src/backend/catalog/namespace.c > +++ b/src/backend/catalog/namespace.c > @@ -3235,8 +3235,8 @@ isTempNamespaceInUse(Oid namespaceId) > > backendId = GetTempNamespaceBackendId(namespaceId); > > - if (backendId == InvalidBackendId || > - backendId == MyBackendId) > + /* No such temporary namespace? */ > + if (backendId == InvalidBackendId) > return false; > > The current logic of isTempNamespaceInUse() would cause a session > calling the routine to return always false if trying to check if its > own temporary session is in use, but that's incorrect. Indeed. > It is actually > safe to remove the check on MyBackendId as the code would fall back on > a check equivalent to MyProc->tempNamespaceId a bit down as per the > attached, so let's fix it. > > Thoughts? But that means an extraneous call to BackendIdGetProc() in that case, it seems better to avoid it if we already have the information.
Re: Add FOREIGN to ALTER TABLE in pg_dump
On Thu, Sep 26, 2019 at 7:17 PM Luis Carril wrote: > > > I don't disagree with adding FOREIGN, though. > > Your patch is failing the pg_dump TAP tests. Please use > configure --enable-tap-tests, fix the problems, then resubmit. > > Fixed, I've attached a new version. Will it be possible to add a test case for this, can we validate by adding one test? Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: pg_basebackup fails on databases with high OIDs
On 2020-01-11 17:47, Magnus Hagander wrote: On Sat, Jan 11, 2020 at 5:44 PM Julien Rouhaud wrote: On Sat, Jan 11, 2020 at 08:21:11AM +0100, Peter Eisentraut wrote: On 2020-01-06 21:00, Magnus Hagander wrote: +0.5 to avoid calling OidInputFunctionCall() Or just directly using atol() instead of atoi()? Well maybe not directly but in a small wrapper that verifies it's not bigger than an unsigned? Unlike in cases where we use oidin etc, we are dealing with data that is "mostly trusted" here, aren't we? Meaning we could call atol() on it, and throw an error if it overflows, and be done with it? Subdirectories in the data directory aren't exactly "untrusted enduser data"... Yeah, it looks like we are using strtoul() without additional error checking in similar situations, so here is a patch doing it like that. - true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid); + true, isDbDir ? (Oid) strtoul(lastDir + 1, NULL, 10) : InvalidOid); Looking at some other code, I just discovered the atooid() macro that already does the same, maybe it'd be better for consistency to use that instead? +1. Whie it does the same thing, consistency is good! :) committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_basebackup fails on databases with high OIDs
On Mon, Jan 13, 2020 at 1:49 PM Peter Eisentraut wrote: > > On 2020-01-11 17:47, Magnus Hagander wrote: > > On Sat, Jan 11, 2020 at 5:44 PM Julien Rouhaud wrote: > >> > >> On Sat, Jan 11, 2020 at 08:21:11AM +0100, Peter Eisentraut wrote: > >>> On 2020-01-06 21:00, Magnus Hagander wrote: > > +0.5 to avoid calling OidInputFunctionCall() > > Or just directly using atol() instead of atoi()? Well maybe not > directly but in a small wrapper that verifies it's not bigger than an > unsigned? > > Unlike in cases where we use oidin etc, we are dealing with data that > is "mostly trusted" here, aren't we? Meaning we could call atol() on > it, and throw an error if it overflows, and be done with it? > Subdirectories in the data directory aren't exactly "untrusted enduser > data"... > >>> > >>> Yeah, it looks like we are using strtoul() without additional error > >>> checking > >>> in similar situations, so here is a patch doing it like that. > >> > >>> - true, > >>> isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid); > >>> + true, > >>> isDbDir ? (Oid) strtoul(lastDir + 1, NULL, 10) : InvalidOid); > >> > >> Looking at some other code, I just discovered the atooid() macro that > >> already > >> does the same, maybe it'd be better for consistency to use that instead? > > > > +1. Whie it does the same thing, consistency is good! :) > > committed Thanks!
Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId
On Mon, Jan 13, 2020 at 01:09:01PM +0100, Julien Rouhaud wrote: > But that means an extraneous call to BackendIdGetProc() in that > case, it seems better to avoid it if we already have the information. Note that you cannot make a direct comparison of the result from GetTempNamespaceBackendId() with MyBackendId, because it is critical to be able to handle the case of a session which has not created yet an object on its own temp namespace (this way any temp objects from previous connections which used the same backend slot can be marked as orphaned and discarded by autovacuum, the whole point of 246a6c8). So in order to get a fast-exit path we could do the following: - Add a routine GetTempNamespace which returns myTempNamespace (the result can be InvalidOid). - Add an assertion at the beginning of isTempNamespaceInUse() to make sure that it never gets called with InvalidOid as input argument. - Return true as a first step of GetTempNamespaceBackendId() if namespaceId matches GetTempNamespace(). What do you think? -- Michael signature.asc Description: PGP signature
Re: Add pg_file_sync() to adminpack
On Sat, Jan 11, 2020 at 02:12:15AM +0900, Fujii Masao wrote: > I'm not sure if returning false with WARNING only in some error cases > is really good idea or not. At least for me, it's more intuitive to > return true on success and emit an ERROR otherwise. I'd like to hear > more opinions about this. Also if returning true on success is rather > confusing, we can change its return type to void. An advantage of not issuing an ERROR if that when working on a list of files (for example a WITH RECURSIVE on the whole data directory?), you can then know which files could not be synced instead of seeing one ERROR about one file, while being unsure about the state of the others. > Could you elaborate why? But if it's not good to sync the existing directory > in the regression test, we may need to give up testing the sync of directory. > Another idea is to add another function like pg_mkdir() into adminpack > and use the directory that we newly created by using that function, > for the test. Or better idea? We should avoid potentially costly tests in any regression scenario if we have a way to do so. I like your idea of having a pg_mkdir(), that feels more natural to have as there is already pg_file_write(). -- Michael signature.asc Description: PGP signature
Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId
On Mon, Jan 13, 2020 at 10:14:52PM +0900, Michael Paquier wrote: > On Mon, Jan 13, 2020 at 01:09:01PM +0100, Julien Rouhaud wrote: > > But that means an extraneous call to BackendIdGetProc() in that > > case, it seems better to avoid it if we already have the information. > > Note that you cannot make a direct comparison of the result from > GetTempNamespaceBackendId() with MyBackendId, because it is critical > to be able to handle the case of a session which has not created yet > an object on its own temp namespace (this way any temp objects from > previous connections which used the same backend slot can be marked as > orphaned and discarded by autovacuum, the whole point of 246a6c8). Oh right. > So in order to get a fast-exit path we could do the following: > - Add a routine GetTempNamespace which returns myTempNamespace (the > result can be InvalidOid). > - Add an assertion at the beginning of isTempNamespaceInUse() to make > sure that it never gets called with InvalidOid as input argument. > - Return true as a first step of GetTempNamespaceBackendId() if > namespaceId matches GetTempNamespace(). > > What do you think? Well, since isTempNamespaceInUse is for now only called by autovacuum, and shouldn't change soon, this really feels premature optimzation, so your original approach looks better.
Re: [PATCH] distinct aggregates within a window function WIP
Krasiyan Andreev writes: > I want to propose to you an old patch for Postgres 11, off-site developed > by Oliver Ford, > but I have permission from him to publish it and to continue it's > development, > that allow distinct aggregates, like select sum(distinct nums) within a > window function. I started to respond by asking whether that's well-defined, but reading down further I see that that's not actually what the feature is: what it is is attaching DISTINCT to a window function itself. I'd still ask whether it's well-defined though, or even minimally sensible. Window functions are generally supposed to produce one row per input row --- how does that square with the implicit row merging of DISTINCT? They're also typically row-order-sensitive --- how does that work with DISTINCT? Also, to the extent that this is sensible, can't you get the same results already today with appropriate use of window framing options? > It's a WIP, because it doesn't have tests yet (I will add them later) and > also, it works for a int, float, and numeric types, As a rule of thumb, operations like this should not be coded to be datatype-specific. We threw out some features in the original window function patch until they could be rewritten to not be limited to a hard-coded set of data types (cf commit 0a459cec9), and I don't see why we'd apply a lesser standard here. Certainly DISTINCT for aggregates has no such limitation. regards, tom lane
Re: Add FOREIGN to ALTER TABLE in pg_dump
vignesh C writes: > On Thu, Sep 26, 2019 at 7:17 PM Luis Carril wrote: >>> Your patch is failing the pg_dump TAP tests. Please use >>> configure --enable-tap-tests, fix the problems, then resubmit. >> Fixed, I've attached a new version. > Will it be possible to add a test case for this, can we validate by > adding one test? Isn't the change in the TAP test output sufficient? regards, tom lane
Re: Add pg_file_sync() to adminpack
On Mon, Jan 13, 2020 at 2:46 PM Michael Paquier wrote: > > On Sat, Jan 11, 2020 at 02:12:15AM +0900, Fujii Masao wrote: > > I'm not sure if returning false with WARNING only in some error cases > > is really good idea or not. At least for me, it's more intuitive to > > return true on success and emit an ERROR otherwise. I'd like to hear > > more opinions about this. Also if returning true on success is rather > > confusing, we can change its return type to void. > > An advantage of not issuing an ERROR if that when working on a list of > files (for example a WITH RECURSIVE on the whole data directory?), you > can then know which files could not be synced instead of seeing one > ERROR about one file, while being unsure about the state of the > others. Actually, can't it create a security hazard, for instance if you call pg_file_sync() on a heap file and the calls errors out, since it's bypassing data_sync_retry?
Re: [PATCH] distinct aggregates within a window function WIP
Tom Lane schrieb am 13.01.2020 um 15:19: > what it is is attaching DISTINCT to a window function itself. > I'd still ask whether it's well-defined though, or even minimally > sensible. Window functions are generally supposed to produce one > row per input row --- how does that square with the implicit row > merging of DISTINCT? They're also typically row-order-sensitive > --- how does that work with DISTINCT? Also, to the extent that > this is sensible, can't you get the same results already today > with appropriate use of window framing options? I find the example using array_agg() and cumulative window functions a bit confusing as well, but I think there are situations where having this is really helpful, e.g.: count(distinct some_column) over (partition by something) I know it's not an argument, but Oracle supports this and porting queries like that from Oracle to Postgres isn't really fun. Thomas
Re: How to retain lesser paths at add_path()?
The v2 patch is attached. This adds two dedicated lists on the RelOptInfo to preserve lesser paths if extension required to retain the path-node to be removed in usual manner. These lesser paths are kept in the separated list, so it never expand the length of pathlist and partial_pathlist. That was the arguable point in the discussion at the last October. The new hook is called just before the path-node removal operation, and gives extension a chance for extra decision. If extension considers the path-node to be removed can be used in the upper path construction stage, they can return 'true' as a signal to preserve this lesser path-node. In case when same kind of path-node already exists in the preserved_pathlist and the supplied lesser path-node is cheaper than the old one, extension can remove the worse one arbitrarily to keep the length of preserved_pathlist. (E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved- pathlist for further opportunity of combined usage with GpuPreAgg path-node. It just needs "the best GpuJoin path-node" somewhere, not two or more.) Because PostgreSQL core has no information which preserved path-node can be removed, extensions that uses path_removal_decision_hook() has responsibility to keep the length of preserved_(partial_)pathlist reasonable. BTW, add_path() now removes the lesser path-node by pfree(), not only detach from the path-list. (IndexPath is an exception) Does it really make sense? It only releases the path-node itself, so may not release entire objects. So, efficiency of memory usage is limited. And ForeignScan / CustomScan may references the path-node to be removed. It seems to me here is no guarantee lesser path-nodes except for IndexPath nodes are safe to release. Best regards, 2020年1月11日(土) 21:27 Tomas Vondra : > > On Sat, Jan 11, 2020 at 05:07:11PM +0900, Kohei KaiGai wrote: > >Hi, > > > >The proposition I posted at 10th-Oct proposed to have a separate list to > >retain > >lesser paths not to expand the path_list length, but here are no comments by > >others at that time. > >Indeed, the latest patch has not been updated yet. > >Please wait for a few days. I'll refresh the patch again. > > > > OK, thanks for the update. I've marked the patch as "waiting on author". > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei pgsql13-path_removal_decision_hook.v2.patch Description: Binary data
Re: [PATCH] distinct aggregates within a window function WIP
I understand yours note about datatype-specific operations, so I need to think more generic about it. About yours additional note, I think that it is not possible to get easy the same result with appropriate use of window framing options, because "exclude ties" will not exclude "current row" itself, only peers of it. So, that is the only difference and reason of DISTINCT aggregate. Maybe, if we can specify at the same time to "exclude ties" and "exclude current row" itself, there will not be need of DISTINCT, but right now I think that nor "exclude ties" nor "exclude groups" or "exclude current row", can specify it, because they can't be nested or used at the same time. На пн, 13.01.2020 г. в 16:19 Tom Lane написа: > Krasiyan Andreev writes: > > I want to propose to you an old patch for Postgres 11, off-site developed > > by Oliver Ford, > > but I have permission from him to publish it and to continue it's > > development, > > that allow distinct aggregates, like select sum(distinct nums) within a > > window function. > > I started to respond by asking whether that's well-defined, but > reading down further I see that that's not actually what the feature > is: what it is is attaching DISTINCT to a window function itself. > I'd still ask whether it's well-defined though, or even minimally > sensible. Window functions are generally supposed to produce one > row per input row --- how does that square with the implicit row > merging of DISTINCT? They're also typically row-order-sensitive > --- how does that work with DISTINCT? Also, to the extent that > this is sensible, can't you get the same results already today > with appropriate use of window framing options? > > > It's a WIP, because it doesn't have tests yet (I will add them later) and > > also, it works for a int, float, and numeric types, > > As a rule of thumb, operations like this should not be coded to be > datatype-specific. We threw out some features in the original window > function patch until they could be rewritten to not be limited to a > hard-coded set of data types (cf commit 0a459cec9), and I don't see > why we'd apply a lesser standard here. Certainly DISTINCT for > aggregates has no such limitation. > > regards, tom lane >
Re: [PATCH] distinct aggregates within a window function WIP
On 13/01/2020 15:19, Tom Lane wrote: > Krasiyan Andreev writes: >> I want to propose to you an old patch for Postgres 11, off-site developed >> by Oliver Ford, >> but I have permission from him to publish it and to continue it's >> development, >> that allow distinct aggregates, like select sum(distinct nums) within a >> window function. > I started to respond by asking whether that's well-defined, but > reading down further I see that that's not actually what the feature > is: what it is is attaching DISTINCT to a window function itself. > I'd still ask whether it's well-defined though, or even minimally > sensible. Window functions are generally supposed to produce one > row per input row --- how does that square with the implicit row > merging of DISTINCT? They're also typically row-order-sensitive > --- how does that work with DISTINCT? It's a little strange because the spec says: If the window ordering clause or the window framing clause of the window structure descriptor that describes the is present, then no simply contained in shall specify DISTINCT or . So it seems to be well defined if all you have is a partition. But then it also says: DENSE_RANK() OVER WNS is equivalent to the : COUNT (DISTINCT ROW ( VE 1 , ..., VE N ) ) OVER (WNS1 RANGE UNBOUNDED PRECEDING) And that kind of looks like a framing clause there. > Also, to the extent that > this is sensible, can't you get the same results already today > with appropriate use of window framing options? I don't see how. I have sometimes wanted this feature so I am +1 on us getting at least a minimal form of it. -- Vik
Re: [Proposal] Global temporary tables
On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote: On 12.01.2020 4:51, Tomas Vondra wrote: On Fri, Jan 10, 2020 at 11:47:42AM +0300, Konstantin Knizhnik wrote: On 09.01.2020 19:48, Tomas Vondra wrote: The most complex and challenged task is to support GTT for all kind of indexes. Unfortunately I can not proposed some good universal solution for it. Just patching all existed indexes implementation seems to be the only choice. I haven't looked at the indexing issue closely, but IMO we need to ensure that every session sees/uses only indexes on GTT that were defined before the seesion started using the table. Why? It contradicts with behavior of normal tables. Assume that you have active clients and at some point of time DBA recognizes that them are spending to much time in scanning some GTT. It cab create index for this GTT but if existed client will not be able to use this index, then we need somehow make this clients to restart their sessions? In my patch I have implemented building indexes for GTT on demand: if accessed index on GTT is not yet initialized, then it is filled with local data. Yes, I know the behavior would be different from behavior for regular tables. And yes, it would not allow fixing slow queries in sessions without interrupting those sessions. I proposed just ignoring those new indexes because it seems much simpler than alternative solutions that I can think of, and it's not like those other solutions don't have other issues. Quit opposite: prohibiting sessions to see indexes created before session start to use GTT requires more efforts. We need to somehow maintain and check GTT first access time. Hmmm, OK. I'd expect such check to be much simpler than the on-demand index building, but I admit I haven't tried implementing either of those options. For example, I've looked at the "on demand" building as implemented in global_private_temp-8.patch, I kinda doubt adding a bunch of index build calls into various places in index code seems somewht suspicious. We in any case has to initialize GTT indexes on demand even if we prohibit usages of indexes created after first access by session to GTT. So the difference is only in one thing: should we just initialize empty index or populate it with local data (if rules for index usability are the same for GTT as for normal tables). From implementation point of view there is no big difference. Actually building index in standard way is even simpler than constructing empty index. Originally I have implemented first approach (I just forgot to consider case when GTT was already user by a session). Then I rewrited it using second approach and patch even became simpler. * brinbuild is added to brinRevmapInitialize, which is meant to initialize state for scanning. It seems wrong to build the index we're scanning from this function (layering and all that). * btbuild is called from _bt_getbuf. That seems a bit ... suspicious? As I already mentioned - support of indexes for GTT is one of the most challenged things in my patch. I didn't find good and universal solution. So I agreed that call of btbuild from _bt_getbuf may be considered as suspicious. I will be pleased if you or sombody else can propose better elternative and not only for B-Tree, but for all other indexes. But as I already wrote above, prohibiting session to used indexes created after first access to GTT doesn't solve the problem. For normal tables (and for local temp tables) indexes are initialized at the time of their creation. With GTT it doesn't work, because each session has its own local data of GTT. We should either initialize/build index on demand (when it is first accessed), either at the moment of session start initialize indexes for all existed GTTs. Last options seem to be much worser from my point of view: there may me huge number of GTT and session may not need to access GTT at all. ... and so on for other index types. Also, what about custom indexes implemented in extensions? It seems a bit strange each of them has to support this separately. I have already complained about it: my patch supports GTT for all built-in indexes, but custom indexes has to handle it themselves. Looks like to provide some generic solution we need to extend index API, providing two diffrent operations: creation and initialization. But extending index API is very critical change... And also it doesn't solve the problem with all existed extensions: them in any case have to be rewritten to implement new API version in order to support GTT. Why not to allow creating only indexes implementing this new API method (on GTT)? IMHO if this really is the right solution, we need to make it work for existing indexes without having to tweak them individually. Why don't we track a flag whether an index on GTT was initialized in a given session, and if it was not then call the build function before calling any other fu
Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
On 2020-01-06 07:06, Tom Lane wrote: Hence, the attached revision only forces quoting in a SQL COPY command, or if the user already typed a quote. Yes, that seems better. Users tend to not like if tab completion messes with what they have already typed unless strictly necessary. The file name completion portion of this patch seems to work quite well now. I have found a weird behavior with identifier quoting, which is not the subject of this patch, but it appears to be affected by it. The good thing is that the new code will behave sensibly with select * from "pg_cl which the old code didn't offer anything on. The problem is that if you have create table "test""1" (a int); then the new code responds to select * from "te by making that select * from "te" whereas the old code curiously handled that perfectly. Neither the old nor the new code will produce anything from select * from te -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Block level parallel vacuum
On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov wrote: > > Hi > Thank you for update! I looked again > > (vacuum_indexes_leader) > + /* Skip the indexes that can be processed by parallel workers > */ > + if (!skip_index) > + continue; > > Does the variable name skip_index not confuse here? Maybe rename to something > like can_parallel? > Again I looked into code and thought that somehow if we can add a boolean flag(can_parallel) in IndexBulkDeleteResult structure to identify that this index is supporting parallel vacuum or not, then it will be easy to skip those indexes and multiple time we will not call skip_parallel_vacuum_index (from vacuum_indexes_leader and parallel_vacuum_index) We can have a linked list of non-parallel supported indexes, then directly we can pass to vacuum_indexes_leader. Ex: let suppose we have 5 indexes into a table. If before launching parallel workers, if we can add boolean flag(can_parallel) IndexBulkDeleteResult structure to identify that this index is supporting parallel vacuum or not. Let index 1, 4 are not supporting parallel vacuum so we already have info in a linked list that 1->4 are not supporting parallel vacuum, so parallel_vacuum_index will process these indexes and rest will be processed by parallel workers. If parallel worker found that can_parallel is false, then it will skip that index. As per my understanding, if we implement this, then we can avoid multiple function calling of skip_parallel_vacuum_index and if there is no index which can't performe parallel vacuum, then we will not call vacuum_indexes_leader as head of list pointing to null. (we can save unnecessary calling of vacuum_indexes_leader) Thoughts? -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Re: Removing pg_pltemplate and creating "trustable" extensions
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > To be clear, I was advocating for a NEW DB-level privilege ('INSTALL' or > > 'CREATE EXTENSION' if we could make that work), so that we have it be > > distinct from CREATE (which, today, really means 'CREATE SCHEMA'). > > I still say this is wrong, or at least pointless, because it'd be a > right that any DB owner could grant to himself. Yes, of course it is, that the DB owner would have this privilege was something you agreed to in the prior email- I'd rather not just have a "if (DBOwner())" check, I'd rather use our actual privilege system and have this be a right that the DB owner has but can then GRANT out to others if they wish to. I'm certainly not suggesting that such a privilege wouldn't be controlled by the DB owner. Forcing it to only be allowed for the DB owner and not be something that the DB owner can GRANT out isn't much better than "if (superuser())"-style checks. > If we're to have any > meaningful access control on extension installation, the privilege > would have to be attached to some other object ... and there's no clear > candidate for what. Extensions are installed at the DB level, not at any other level, and therefore that's the appropriate place to attach them, which is exactly what I'm suggesting we do here. > As someone noted awhile back, if we could somehow > attach ACLs to potentially-installable extensions, that might be an > interesting avenue to pursue. That's well beyond what I'm willing > to pursue for v13, though. Sure, having some catalog of installable extensions where someone (in my thinking, the DB owner) could GRANT out access to install certain extensions to others might be interesting, but it's not what I'm suggesting here. > In the meantime, though, this idea as stated doesn't do anything except > let a DB owner grant install privileges to someone else. I'm not even > convinced that we want that, or that anyone needs it (I can recall zero > such requests related to PLs in the past). And for sure it does not > belong in a minimal implementation of this feature. Yes, that's what this approach would do. I suppose an alternative would be to lump it in with "CREATE" rights on the DB, but I've advocated and will continue to advocate for splitting up of such broad rights. DB-level CREATE rights currently cover both schemas and publications, for example, even though the two have rather little to do with each other. If the only agreeable option is a if (DBOwner())-type check, or lumping the privilege to CREATE (trusted) EXTENSION in with other DB-level CREATE rights, then I'll go along with one of those. I'll be happy enough with that, since it avoids having an additional default role that has to be GRANT'd by a superuser. Ideally, down the road, we'll split out the CREATE privilege (both at DB and at schema level) to be more fine grained, but that can certainly be done later. Thanks, Stephen signature.asc Description: PGP signature
Re: our checks for read-only queries are not great
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > On Fri, 2020-01-10 at 09:29 -0500, Tom Lane wrote: > > > ALTER SYSTEM is read only in my mind. > > > > I'm still having trouble with this conclusion. I think it can only > > be justified by a very narrow reading of "reflected in pg_dump" > > that relies on the specific factorization we've chosen for upgrade > > operations, ie that postgresql.conf mods have to be carried across > > by hand. But that's mostly historical baggage, rather than a sane > > basis for defining "read only". If somebody comes up with a patch > > that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for > > whatever is in postgresql.auto.conf (not an unreasonable idea BTW), > > will you then decide that ALTER SYSTEM SET is no longer read-only? > > I think that having ALTER SYSTEM commands in pg_dumpall output > would be a problem. It would cause all kinds of problems whenever > parameters change. Thinking of the transition "checkpoint_segments" > -> "max_wal_size", you'd have to build some translation magic into pg_dump. > Besides, such a feature would make it harder to restore a dump taken > with version x into version x + n for n > 0. pg_dump already specifically has understanding of how to deal with old options in other things when constructing a dump for a given version- and we already have issues that a dump taken with pg_dump X has a good chance of now being able to be restoreding into a PG X+1, that's why it's recommended to use the pg_dump for the version of PG you're intending to restore into, so I don't particularly agree with any of the arguments presented above. > > Or, perhaps, reject such a patch on the grounds that it breaks this > > arbitrary definition of read-only-ness? > > I agree with Robert that such a patch should be rejected on other > grounds. > > Concerning the topic of the thread, I personally have come to think > that changing GUCs is *not* writing to the database. But that is based > on the fact that you can change GUCs on streaming replication standbys, > and it may be surprising to a newcomer. > > Perhaps it would be good to consider this question: > Do we call something "read-only" if it changes nothing, or do we call it > "read-only" if it is allowed on a streaming replication standby? > The first would be more correct, but the second may be more convenient. The two are distinct from each other and one doesn't imply the other. I don't think we need to, or really want to, force them to be the same. When we're talking about a "read-only" transaction that the user has specifically asked be "read-only" then, imv anyway, we should be pretty stringent regarding what that's allowed to do and shouldn't be allowing that to change state in the system which other processes will see after the transaction is over. A transaction (on a primary or a replica) doesn't need to be started as explicitly "read-only" and perhaps we should change the language when we are starting up to say "database is ready to accept replica connections" or something instead of "read-only" connections to clarify that. Thanks, Stephen signature.asc Description: PGP signature
Re: our checks for read-only queries are not great
On Mon, Jan 13, 2020 at 5:57 AM Peter Eisentraut wrote: > On 2020-01-10 14:41, Robert Haas wrote: > > This rule very nearly matches the current behavior: it explains why > > temp table operations are allowed, and why ALTER SYSTEM is allowed, > > and why REINDEX etc. are allowed. However, there's a notable > > exception: PREPARE, COMMIT PREPARED, and ROLLBACK PREPARED are allowed > > in a read-only transaction. Under the "doesn't change pg_dump output" > > criteria, the first and third ones should be permitted but COMMIT > > PREPARED should be denied, except maybe if the prepared transaction > > didn't do any writes (and in that case, why did we bother preparing > > it?). Despite that, this rule does a way better job explaining the > > current behavior than anything else suggested so far. > > I don't follow. Does pg_dump dump prepared transactions? No, but committing one changes the database contents as seen by a subsequent pg_dump. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: our checks for read-only queries are not great
On Mon, 2020-01-13 at 13:56 -0500, Stephen Frost wrote: > > I think that having ALTER SYSTEM commands in pg_dumpall output > > would be a problem. It would cause all kinds of problems whenever > > parameters change. Thinking of the transition "checkpoint_segments" > > -> "max_wal_size", you'd have to build some translation magic into pg_dump. > > Besides, such a feature would make it harder to restore a dump taken > > with version x into version x + n for n > 0. > > pg_dump already specifically has understanding of how to deal with old > options in other things when constructing a dump for a given version- > and we already have issues that a dump taken with pg_dump X has a good > chance of now being able to be restoreding into a PG X+1, that's why > it's recommended to use the pg_dump for the version of PG you're > intending to restore into, so I don't particularly agree with any of the > arguments presented above. Right. But increasing the difficulty of restoring a version x pg_dump with version x + 1 is still not a thing we should lightly do. Not that the docs currently say "it is recommended to use pg_dumpall from the newer version". They don't say "is is not supported". Yours, Laurenz Albe
Re: our checks for read-only queries are not great
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > On Mon, 2020-01-13 at 13:56 -0500, Stephen Frost wrote: > > > I think that having ALTER SYSTEM commands in pg_dumpall output > > > would be a problem. It would cause all kinds of problems whenever > > > parameters change. Thinking of the transition "checkpoint_segments" > > > -> "max_wal_size", you'd have to build some translation magic into > > > pg_dump. > > > Besides, such a feature would make it harder to restore a dump taken > > > with version x into version x + n for n > 0. > > > > pg_dump already specifically has understanding of how to deal with old > > options in other things when constructing a dump for a given version- > > and we already have issues that a dump taken with pg_dump X has a good > > chance of now being able to be restoreding into a PG X+1, that's why > > it's recommended to use the pg_dump for the version of PG you're > > intending to restore into, so I don't particularly agree with any of the > > arguments presented above. > > Right. > But increasing the difficulty of restoring a version x pg_dump with > version x + 1 is still not a thing we should lightly do. I've never heard that and I don't agree with it being a justification for blocking sensible progress. > Not that the docs currently say "it is recommended to use pg_dumpall > from the newer version". They don't say "is is not supported". It's recommended due to exactly the reasons presented and no one is saying that such isn't supported- but we don't and aren't going to guarantee that it's going to work. We absolutely know of cases where it just won't work, today. If that's justification for saying it's not supported, then fine, let's change the language to say that. Thanks, Stephen signature.asc Description: PGP signature
Re: [Proposal] Global temporary tables
On Mon, Jan 13, 2020 at 05:32:53PM +0100, Tomas Vondra wrote: > On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote: > > > >"if any code tried to access the statistics directly from the table, > >rather than via the caches". > > > >Currently optimizer is accessing statistic though caches. So this > >approach works. If somebody will rewrite optimizer or provide own > >custom optimizer in extension which access statistic directly > >then it we really be a problem. But I wonder why bypassing catalog > >cache may be needed. > > > > I don't know, but it seems extensions like hypopg do it. AFAIR, hypopg only opens pg_statistic to use its tupledesc when creating statistics on hypothetical partitions, but it should otherwise never reads or need plain pg_statistic rows.
Re: benchmarking Flex practices
John Naylor writes: > [ v11 patch ] I pushed this with some small cosmetic adjustments. One non-cosmetic adjustment I experimented with was to change str_udeescape() to overwrite the source string in-place, since we know that's modifiable storage and de-escaping can't make the string longer. I reasoned that saving a palloc() might help reduce the extra cost of UESCAPE processing. It didn't seem to move the needle much though, so I didn't commit it that way. A positive reason to keep the API as it stands is that if we do something about the idea of allowing Unicode strings in non-UTF8 backend encodings, that'd likely break the assumption about how the string can't get longer. I'm about to go off and look at the non-UTF8 idea, btw. regards, tom lane
Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
Peter Eisentraut writes: > The file name completion portion of this patch seems to work quite well now. Thanks for testing! > I have found a weird behavior with identifier quoting, which is not the > subject of this patch, but it appears to be affected by it. > The good thing is that the new code will behave sensibly with > select * from "pg_cl > which the old code didn't offer anything on. Check. > The problem is that if you have > create table "test""1" (a int); > then the new code responds to > select * from "te > by making that > select * from "te" > whereas the old code curiously handled that perfectly. Right. The underlying cause of both these changes seems to be that what readline passes to psql_completion is just the contents of the string, now that we've told it that double-quote is a string quoting character. So that works fine with 'pg_cl' which didn't need quoting anyway. In your second example, because we generate possible matches that are already quoted-if-necessary, we fail to find anything that bare 'te' is a prefix of, where before we were considering '"te' and it worked. I'll think about how to improve this. Given that we're dequoting filenames explicitly anyway, maybe we don't need to tell readline that double-quote is a quoting character. Another idea is that maybe we ought to refactor things so that identifier dequoting and requoting is handled explicitly, similarly to the way filenames work now. That would make the patch a lot bigger though. regards, tom lane
Re: Removing pg_pltemplate and creating "trustable" extensions
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> In the meantime, though, this idea as stated doesn't do anything except >> let a DB owner grant install privileges to someone else. I'm not even >> convinced that we want that, or that anyone needs it (I can recall zero >> such requests related to PLs in the past). And for sure it does not >> belong in a minimal implementation of this feature. > Yes, that's what this approach would do. I suppose an alternative would > be to lump it in with "CREATE" rights on the DB, but I've advocated and > will continue to advocate for splitting up of such broad rights. > DB-level CREATE rights currently cover both schemas and publications, > for example, even though the two have rather little to do with each > other. The patch as I'm proposing it has nothing to do with "CREATE" rights. You're attacking something different from what I actually want to do. regards, tom lane
Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
On 31.12.2019 01:40, Peter Geoghegan wrote: On Mon, Dec 30, 2019 at 9:45 AM Robert Haas wrote: For example, float and numeric types are "never bitwise equal", while array, text, and other container types are "maybe bitwise equal". An array of integers or text with C collation can be treated as bitwise equal attributes, and it would be too harsh to restrict them from deduplication. We might as well support container types (like array) in the first Postgres version that has nbtree deduplication, I suppose. Even still, I don't think that it actually matters much to users. B-Tree indexes on arrays are probably very rare. Note that I don't consider text to be a container type here -- obviously btree/text_ops is a very important opclass for the deduplication feature. It may be the most important opclass overall. Recursively invoking a support function for the "contained" data type in the btree/array_ops support function seems like it might be messy. Not sure about that, though. What bothers me is that this option will unlikely be helpful on its own and we should also provide some kind of recheck function along with opclass, which complicates this idea even further and doesn't seem very clear. It seems like the simplest thing might be to forget about the 'char' column and just have a support function which can be used to assess whether a given opclass's notion of equality is bitwise. I like the idea of relying only on a support function. In attachment you can find the WIP patch that adds support function for btree opclasses. Before continuing, I want to ensure that I understood the discussion above correctly. Current version of the patch adds: 1) new syntax, which allow to provide support function: CREATE OPERATOR CLASS int4_ops_test FOR TYPE int4 USING btree AS OPERATOR 1 =(int4, int4), FUNCTION 1 btint4cmp(int4, int4), SUPPORT datum_image_eqisbitwise; We probably can add more words to specify the purpose of the support function. Do you have any other objections about the place of this new element in CreateOplcass syntax structure? 2) trivial support function that always returns true 'datum_image_eqisbitwise'. It is named after 'datum_image_eq', because we define this support function via its behavior. If this prototype is fine, I will continue this work and add support functions for other opclasses, update pg_dump and documentation. Thoughts? commit 9eb37de922c699208e52349494653241ddeb0116 Author: anastasia Date: Mon Jan 13 23:28:20 2020 +0300 v5-WIP-Opclass-bitwise-equality-0001.patch diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c index e2c6de457c..143ecd1e13 100644 --- a/src/backend/commands/opclasscmds.c +++ b/src/backend/commands/opclasscmds.c @@ -335,7 +335,8 @@ DefineOpClass(CreateOpClassStmt *stmt) storageoid, /* storage datatype oid, if any */ namespaceoid, /* namespace to create opclass in */ opfamilyoid, /* oid of containing opfamily */ -opclassoid; /* oid of opclass we create */ +opclassoid, /* oid of opclass we create */ +eqisbitwise_procoid; /* oid of opclass support function */ int maxOpNumber, /* amstrategies value */ maxProcNumber; /* amsupport value */ bool amstorage; /* amstorage flag */ @@ -564,6 +565,9 @@ DefineOpClass(CreateOpClassStmt *stmt) aclcheck_error_type(ACLCHECK_NOT_OWNER, storageoid); #endif break; + case OPCLASS_ITEM_SUPPORT_FUNCTION: +eqisbitwise_procoid = LookupFuncName(item->name->objname, 0, false, false); +break; default: elog(ERROR, "unrecognized item type: %d", item->itemtype); break; @@ -653,6 +657,7 @@ DefineOpClass(CreateOpClassStmt *stmt) values[Anum_pg_opclass_opcintype - 1] = ObjectIdGetDatum(typeoid); values[Anum_pg_opclass_opcdefault - 1] = BoolGetDatum(stmt->isDefault); values[Anum_pg_opclass_opckeytype - 1] = ObjectIdGetDatum(storageoid); + values[Anum_pg_opclass_opceqisbitwise - 1] = ObjectIdGetDatum(eqisbitwise_procoid); tup = heap_form_tuple(rel->rd_att, values, nulls); @@ -707,6 +712,15 @@ DefineOpClass(CreateOpClassStmt *stmt) recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL); } + /* dependency on support function */ + if (OidIsValid(eqisbitwise_procoid)) + { + referenced.classId = ProcedureRelationId; + referenced.objectId = eqisbitwise_procoid; + referenced.objectSubId = 0; + recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL); + } + /* dependency on owner */ recordDependencyOnOwner(OperatorClassRelationId, opclassoid, GetUserId()); diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 54ad62bb7f..b0cae4e0aa 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -3793,6 +3793,7 @@ _copyCreateOpClassStmt(const CreateOpClassStmt *from) COPY_NODE_FIELD(datatype); COPY_NODE_FIELD(items); COPY_SCALAR_FIELD(isDefault); + COPY_STRING_FIELD(eqisbitwise_fn); return newnode; } d
Re: Removing pg_pltemplate and creating "trustable" extensions
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> In the meantime, though, this idea as stated doesn't do anything except > >> let a DB owner grant install privileges to someone else. I'm not even > >> convinced that we want that, or that anyone needs it (I can recall zero > >> such requests related to PLs in the past). And for sure it does not > >> belong in a minimal implementation of this feature. > > > Yes, that's what this approach would do. I suppose an alternative would > > be to lump it in with "CREATE" rights on the DB, but I've advocated and > > will continue to advocate for splitting up of such broad rights. > > DB-level CREATE rights currently cover both schemas and publications, > > for example, even though the two have rather little to do with each > > other. > > The patch as I'm proposing it has nothing to do with "CREATE" rights. > You're attacking something different from what I actually want to do. Yes, as an aside, I'm argueing that we should split up the general CREATE privileges, but I also said that's not required for this. You're asking "what's the best way to add this privilege to PG?". I'm saying that it should be done through the privilege system, similar to publications. I'd prefer it not be lumped into CREATE, but that at least makes sense to me- adding a default role for this doesn't. I suppose making it akin to ALTER DATABASE and having it be limited to the DB owner is also alright (as I said in my last email) but it means that someone has to be given DB ownership rights in order to install extensions. I don't really see CREATE EXTENSION as being like ALTER DATABASE from a privilege perspective, but having it be DB owner still makes more sense than a default role for this. Thanks, Stephen signature.asc Description: PGP signature
Re: [Proposal] Global temporary tables
On Mon, Jan 13, 2020 at 09:12:38PM +0100, Julien Rouhaud wrote: On Mon, Jan 13, 2020 at 05:32:53PM +0100, Tomas Vondra wrote: On Mon, Jan 13, 2020 at 11:08:40AM +0300, Konstantin Knizhnik wrote: > >"if any code tried to access the statistics directly from the table, >rather than via the caches". > >Currently optimizer is accessing statistic though caches. So this >approach works. If somebody will rewrite optimizer or provide own >custom optimizer in extension which access statistic directly >then it we really be a problem. But I wonder why bypassing catalog >cache may be needed. > I don't know, but it seems extensions like hypopg do it. AFAIR, hypopg only opens pg_statistic to use its tupledesc when creating statistics on hypothetical partitions, but it should otherwise never reads or need plain pg_statistic rows. Ah, OK! Thanks for the clarification. I knew it does something with the catalog, didn't realize it only gets the descriptor. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add FOREIGN to ALTER TABLE in pg_dump
On 2020-Jan-11, Tomas Vondra wrote: > Hi, > > On Thu, Sep 26, 2019 at 01:47:28PM +, Luis Carril wrote: > > > > I don't disagree with adding FOREIGN, though. > > > > Your patch is failing the pg_dump TAP tests. Please use > > configure --enable-tap-tests, fix the problems, then resubmit. > > > > Fixed, I've attached a new version. > > This seems like a fairly small and non-controversial patch (I agree with > Alvaro that having the optional FOREIGN seems won't hurt). So barring > objections I'll polish it a bit and push sometime next week. If we're messing with that code, we may as well reduce cognitive load a little bit and unify all those multiple consecutive appendStringInfo calls into one. (My guess is that this was previously not possible because there were multiple fmtId() calls in the argument list, but that's no longer the case.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From e369a559b2f96e3e73cf1bc4a1fefc9890243a7c Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 13 Jan 2020 12:33:21 -0300 Subject: [PATCH v3] pg_dump: Add FOREIGN to ALTER statements, if appropriate Author: Luis Carril Discussion: https://postgr.es/m/lejpr01mb0185a19b2e7c98e5e2a031f5e7...@lejpr01mb0185.deuprd01.prod.outlook.de --- src/bin/pg_dump/pg_dump.c | 90 --- 1 file changed, 47 insertions(+), 43 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 799b6988b7..2866c55e96 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -15583,6 +15583,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) { char *ftoptions = NULL; char *srvname = NULL; + char *foreign = ""; switch (tbinfo->relkind) { @@ -15616,6 +15617,8 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) ftoptions = pg_strdup(PQgetvalue(res, 0, i_ftoptions)); PQclear(res); destroyPQExpBuffer(query); + + foreign = "FOREIGN "; break; } case RELKIND_MATVIEW: @@ -15957,11 +15960,10 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) continue; appendPQExpBufferStr(q, "\n-- For binary upgrade, set up inherited constraint.\n"); -appendPQExpBuffer(q, "ALTER TABLE ONLY %s ", - qualrelname); -appendPQExpBuffer(q, " ADD CONSTRAINT %s ", - fmtId(constr->dobj.name)); -appendPQExpBuffer(q, "%s;\n", constr->condef); +appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ADD CONSTRAINT %s %s;\n", + foreign, qualrelname, + fmtId(constr->dobj.name), + constr->condef); appendPQExpBufferStr(q, "UPDATE pg_catalog.pg_constraint\n" "SET conislocal = false\n" "WHERE contype = 'c' AND conname = "); @@ -15978,7 +15980,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) { TableInfo *parentRel = parents[k]; - appendPQExpBuffer(q, "ALTER TABLE ONLY %s INHERIT %s;\n", + appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s INHERIT %s;\n", foreign, qualrelname, fmtQualifiedDumpable(parentRel)); } @@ -16084,9 +16086,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) if (!shouldPrintColumn(dopt, tbinfo, j) && tbinfo->notnull[j] && !tbinfo->inhNotNull[j]) { -appendPQExpBuffer(q, "ALTER TABLE ONLY %s ", - qualrelname); -appendPQExpBuffer(q, "ALTER COLUMN %s SET NOT NULL;\n", +appendPQExpBuffer(q, + "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET NOT NULL;\n", + foreign, qualrelname, fmtId(tbinfo->attnames[j])); } @@ -16097,11 +16099,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) */ if (tbinfo->attstattarget[j] >= 0) { -appendPQExpBuffer(q, "ALTER TABLE ONLY %s ", - qualrelname); -appendPQExpBuffer(q, "ALTER COLUMN %s ", - fmtId(tbinfo->attnames[j])); -appendPQExpBuffer(q, "SET STATISTICS %d;\n", +appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET STATISTICS %d;\n", + foreign, qualrelname, + fmtId(tbinfo->attnames[j]), tbinfo->attstattarget[j]); } @@ -16134,11 +16134,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) */ if (storage != NULL) { - appendPQExpBuffer(q, "ALTER TABLE ONLY %s ", - qualrelname); - appendPQExpBuffer(q, "ALTER COLUMN %s ", - fmtId(tbinfo->attnames[j])); - appendPQExpBuffer(q, "SET STORAGE %s;\n", + appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s ALTER COLUMN %s SET STORAGE %s;\n", + foreign, qualrelname, + fmtId(tbinfo->attnames[j]), storage); } } @@ -16148,11 +16146,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) */ if (tbinfo->attoptions[j][0] != '\0') { -appendPQExpBuffer(q, "ALTER TABLE ONLY %s ", - qualrelname); -appendPQExpBuffer(q, "ALTER COLUMN %s ", - fmtId(tbinfo->attnam
Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId
On Mon, Jan 13, 2020 at 02:56:13PM +0100, Julien Rouhaud wrote: > Well, since isTempNamespaceInUse is for now only called by autovacuum, and > shouldn't change soon, this really feels premature optimzation, so your > original approach looks better. Yes, I'd rather keep this routine in its simplest shape for now. If the optimization makes sense, though in most cases it won't because it just helps sessions to detect faster their own temp schema, then let's do it. I'll let this patch aside for a couple of days to let others comment on it, and if there are no objections, I'll commit the fix. Thanks for the lookup! -- Michael signature.asc Description: PGP signature
Re: DROP OWNED CASCADE vs Temp tables
On 2020-Jan-07, Mithun Cy wrote: > I have a test where a user creates a temp table and then disconnect, > concurrently we try to do DROP OWNED BY CASCADE on the same user. Seems > this causes race condition between temptable deletion during disconnection > (@RemoveTempRelations(myTempNamespace)) and DROP OWNED BY CASCADE operation > which will try to remove same temp table when they find them as part of > pg_shdepend. Cute. This seems fiddly to handle better; maybe you'd have to have a new PERFORM_DELETION_* flag that says to ignore "missing" objects; so when you go from shdepDropOwned, you pass that flag all the way down to doDeletion(), so the objtype-specific function is called with "missing_ok", and ignore if the object has already gone away. That's tedious because none of the Remove* functions have the concept of missing_ok. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
On Mon, Jan 13, 2020 at 12:49 PM Anastasia Lubennikova wrote: > In attachment you can find the WIP patch that adds support function for > btree opclasses. Cool. Thanks! > Current version of the patch adds: > > 1) new syntax, which allow to provide support function: > > CREATE OPERATOR CLASS int4_ops_test > FOR TYPE int4 USING btree AS > OPERATOR 1 =(int4, int4), > FUNCTION 1 btint4cmp(int4, int4), > SUPPORT datum_image_eqisbitwise; Hmm. Do we really need these grammar changes? If so, why? I think that you wanted to make this something that could work with any opclass of any index access method, but I don't see that as a useful goal. (If it was useful, it could be considered later, on a case by case basis.) I imagined that this infrastructure would consist of inventing a new variety of B-Tree opclass support function -- something like sortsupport. You could generalize from the example of commit c6e3ac11b60, which added SortSupport functions (also known as "B-Tree support function 2" functions). You might also take a look at the much more recent commit 0a459cec96d, which added in_range functions (also known as "B-Tree support function 3"). Note that neither of those two commits had grammar changes for CREATE OPERATOR CLASS, or anything like that. What I have in mind is a "B-Tree support function 4", obviously. You should probably add a C function that's similar to PrepareSortSupportFromIndexRel()/FinishSortSupportFunction() that will be called from the B-Tree code. This will give a simple yes/no answer to the question: "Is it safe to apply deduplication to this Relation"? This C function will know to return false for an opclass that doesn't have any support function 4 set for any single attribute. It can provide a general overview of what we're telling the caller about the opclass here, etc. Another patch could add a similar function that works with a plain operator, a bit like PrepareSortSupportFromOrderingOp() -- but that isn't necessary now. I suppose that this approach requires something a bit like a struct SortSupportData, with filled-out collation information, etc. The nbtree code expects a simple yes/no answer based on all columns in the index, so it will be necessary to serialize that information to send it across the SQL function interface -- the pg_proc support function will have one argument of type "internal". And, I suppose that you'll also need some basic btvalidate() validation code. > We probably can add more words to specify the purpose of the support > function. Right -- some documentation is needed in btree.sgml, alongside the existing stuff for support functions 1, 2, and 3. > 2) trivial support function that always returns true > 'datum_image_eqisbitwise'. > It is named after 'datum_image_eq', because we define this support > function via its behavior. I like the idea of a generic, trivial SQL-callable function that all simple scalar types can use -- one that just returns true. Maybe we should call this general class of function an "image_equal" function, and refer to "image equality" in the btree.sgml docs. I don't think that using the term "bitwise" is helpful, since it sounds very precise but is actually slightly inaccurate (since TOASTable datums can be "image equal", but not bitwise equal according to datumIsEqual()). How does everyone feel about "image equality" as a name? As I said before, it seems like a good idea to tie this new infrastructure to existing infrastructure used for things like REFRESH MATERIALIZED VIEW CONCURRENTLY. > If this prototype is fine, I will continue this work and add support > functions for other opclasses, update pg_dump and documentation. If this work is structured as a new support function, then it isn't really a user-visible feature -- you won't need pg_dump support, psql support, etc. Most of the documentation will be for operator class authors rather than regular users. We can document the specific opclasses that have support for deduplication later, if at all. I think it would be fine if the deduplication docs (not the docs for this infrastructure) just pointed out specific cases that we *cannot* support -- there are not many exceptions (numeric, text with a nondeterministic collation, a few others like that). -- Peter Geoghegan
Additional improvements to extended statistics
Hi, Now that I've committed [1] which allows us to use multiple extended statistics per table, I'd like to start a thread discussing a couple of additional improvements for extended statistics. I've considered starting a separate patch for each, but that would be messy as those changes will touch roughly the same places. So I've organized it into a single patch series, with the simpler parts at the beginning. There are three main improvements: 1) improve estimates of OR clauses Until now, OR clauses pretty much ignored extended statistics, based on the experience that they're less vulnerable to misestimates. But it's a bit weird that AND clauses are handled while OR clauses are not, so this extends the logic to OR clauses. Status: I think this is fairly OK. 2) support estimating clauses (Var op Var) Currently, we only support clauses with a single Var, i.e. clauses like - Var op Const - Var IS [NOT] NULL - [NOT] Var - ... and AND/OR clauses built from those simple ones. This patch adds support for clauses of the form (Var op Var), of course assuming both Vars come from the same relation. Status: This works, but it feels a bit hackish. Needs more work. 3) support extended statistics on expressions Currently we only allow simple references to columns in extended stats, so we can do CREATE STATISTICS s ON a, b, c FROM t; but not CREATE STATISTICS s ON (a+b), (c + 1) FROM t; This patch aims to allow this. At the moment it's a WIP - it does most of the catalog changes and stats building, but with some hacks/bugs. And it does not even try to use those statistics during estimation. The first question is how to extend the current pg_statistic_ext catalog to support expressions. I've been planning to do it the way we support expressions for indexes, i.e. have two catalog fields - one for keys, one for expressions. One difference is that for statistics we don't care about order of the keys, so that we don't need to bother with storing 0 keys in place for expressions - we can simply assume keys are first, then expressions. And this is what the patch does now. I'm however wondering whether to keep this split - why not to just treat everything as expressions, and be done with it? A key just represents a Var expression, after all. And it would massively simplify a lot of code that now has to care about both keys and expressions. Of course, expressions are a bit more expensive, but I wonder how noticeable that would be. Opinions? ragards [1] https://commitfest.postgresql.org/26/2320/ -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From e8714d7edbfbafd3203623680e290d00ec3f1f8c Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 2 Dec 2019 23:02:17 +0100 Subject: [PATCH 1/3] Support using extended stats for parts of OR clauses --- src/backend/optimizer/path/clausesel.c| 88 +++ src/backend/statistics/extended_stats.c | 56 +--- src/backend/statistics/mcv.c | 5 +- .../statistics/extended_stats_internal.h | 3 +- src/include/statistics/statistics.h | 3 +- 5 files changed, 120 insertions(+), 35 deletions(-) diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index a3ebe10592..8ff756bb31 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -92,7 +92,7 @@ clauselist_selectivity(PlannerInfo *root, */ s1 *= statext_clauselist_selectivity(root, clauses, varRelid, jointype, sjinfo, rel, - &estimatedclauses); + &estimatedclauses, false); } /* @@ -104,6 +104,68 @@ clauselist_selectivity(PlannerInfo *root, estimatedclauses); } +static Selectivity +clauselist_selectivity_or(PlannerInfo *root, + List *clauses, + int varRelid, + JoinType jointype, + SpecialJoinInfo *sjinfo) +{ + ListCell *lc; + Selectivity s1 = 0.0; + RelOptInfo *rel; + Bitmapset *estimatedclauses = NULL; + int idx; + + /* +* Determine if these clauses reference a single relation. If so, and if +* it has extended statistics, try to apply those. +*/ + rel = find_single_rel_for_clauses(root, clauses); + if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL) + { +
Re: Why is pq_begintypsend so slow?
Hi, On 2020-01-11 22:32:45 -0500, Tom Lane wrote: > I wrote: > > I saw at this point that the remaining top spots were > > enlargeStringInfo and appendBinaryStringInfo, so I experimented > > with inlining them (again, see patch below). That *did* move > > the needle: down to 72691 ms, or 20% better than HEAD. > > Oh ... marking the test in the inline part of enlargeStringInfo() > as unlikely() helps quite a bit more: 66100 ms, a further 9% gain. > Might be over-optimizing for this particular case, perhaps > > (But ... I'm not finding these numbers to be super reproducible > across different ASLR layouts. So take it with a grain of salt.) FWIW, I've also observed, in another thread (the node func generation thing [1]), that inlining enlargeStringInfo() helps a lot, especially when inlining some of its callers. Moving e.g. appendStringInfo() inline allows the compiler to sometimes optimize away the strlen. But if e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo() unconditionally, successive appends cannot optimize away memory accesses for ->len/->data. For the case of send functions, we really ought to have at least pq_begintypsend(), enlargeStringInfo() and pq_endtypsend() inline. That way the compiler ought to be able to avoid repeatedly loading/storing ->len, after the initial initStringInfo() call. Might even make sense to also have initStringInfo() inline, because then the compiler would probably never actually materialize the StringInfoData (and would automatically have good aliasing information too). The commit referenced above is obviously quite WIP-ey, and contains things that should be split into separate commits. But I think it might be worth moving more functions into the header, like I've done in that commit. The commit also adds appendStringInfoU?Int(32,64) operations - I've unsuprisingly found these to be *considerably* faster than going through appendStringInfoString(). > but I think that's a reasonable marking given that we overallocate > the stringinfo buffer for most uses. Wonder if it's worth providing a function to initialize the stringinfo differently for the many cases where we have at least a very good idea of how long the string will be. It's sad to allocate 1kb just for e.g. int4send to send an integer plus length. Greetings, Andres Freund [1] https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=commit;h=127e860cf65f50434e0bb97acbba4b0ea6f38cfd
Unicode escapes with any backend encoding
I threatened to do this in another thread [1], so here it is. This patch removes the restriction that the server encoding must be UTF-8 in order to write any Unicode escape with a value outside the ASCII range. Instead, we'll allow the notation and convert to the server encoding if that's possible. (If it isn't, of course you get an encoding conversion failure.) In the cases that were already supported, namely ASCII characters or UTF-8 server encoding, this should be only immeasurably slower than before. Otherwise, it calls the appropriate encoding conversion procedure, which of course will take a little time. But that's better than failing, surely. One way in which this is slightly less good than before is that you no longer get a syntax error cursor pointing at the problematic escape when conversion fails. If we were really excited about that, something could be done with setting up an errcontext stack entry. But that would add a few cycles, so I wasn't sure whether to do it. Grepping for other direct uses of unicode_to_utf8(), I notice that there are a couple of places in the JSON code where we have a similar restriction that you can only write a Unicode escape in UTF8 server encoding. I'm not sure whether these same semantics could be applied there, so I didn't touch that. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw%40mail.gmail.com diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index c908e0b..e134877 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -189,6 +189,23 @@ UPDATE "my_table" SET "a" = 5; ampersands. The length limitation still applies. + +Quoting an identifier also makes it case-sensitive, whereas +unquoted names are always folded to lower case. For example, the +identifiers FOO, foo, and +"foo" are considered the same by +PostgreSQL, but +"Foo" and "FOO" are +different from these three and each other. (The folding of +unquoted names to lower case in PostgreSQL is +incompatible with the SQL standard, which says that unquoted names +should be folded to upper case. Thus, foo +should be equivalent to "FOO" not +"foo" according to the standard. If you want +to write portable applications you are advised to always quote a +particular name or never quote it.) + + Unicode escape in identifiers @@ -230,7 +247,8 @@ U&"d!0061t!+61" UESCAPE '!' The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character. Note that the escape character is -written in single quotes, not double quotes. +written in single quotes, not double quotes, +after UESCAPE. @@ -239,32 +257,18 @@ U&"d!0061t!+61" UESCAPE '!' -The Unicode escape syntax works only when the server encoding is -UTF8. When other server encodings are used, only code -points in the ASCII range (up to \007F) can be -specified. Both the 4-digit and the 6-digit form can be used to +Either the 4-digit or the 6-digit escape form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+, although the availability of the 6-digit form technically makes this unnecessary. (Surrogate -pairs are not stored directly, but combined into a single -code point that is then encoded in UTF-8.) +pairs are not stored directly, but are combined into a single +code point.) -Quoting an identifier also makes it case-sensitive, whereas -unquoted names are always folded to lower case. For example, the -identifiers FOO, foo, and -"foo" are considered the same by -PostgreSQL, but -"Foo" and "FOO" are -different from these three and each other. (The folding of -unquoted names to lower case in PostgreSQL is -incompatible with the SQL standard, which says that unquoted names -should be folded to upper case. Thus, foo -should be equivalent to "FOO" not -"foo" according to the standard. If you want -to write portable applications you are advised to always quote a -particular name or never quote it.) +If the server encoding is not UTF-8, the Unicode code point identified +by one of these escape sequences is converted to the actual server +encoding; an error is reported if that's not possible. @@ -427,25 +431,11 @@ SELECT 'foo' 'bar'; It is your responsibility that the byte sequences you create, especially when using the octal or hexadecimal escapes, compose - valid characters in the server character set encoding. When the - server encoding is UTF-8, then the Unicode escapes or the + valid characters in the server character set encoding. + A useful alternative
Re: Amcheck: do rightlink verification with lock coupling
On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra wrote: > Understood. Is that a reason to not commit of this patch now, though? It could use some polishing. Are you interested in committing it? -- Peter Geoghegan
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
On Fri, Jan 10, 2020 at 1:36 PM Peter Geoghegan wrote: > * HEIKKI: Do we only generate one posting list in one WAL record? I > would assume it's better to deduplicate everything on the page, since > we're modifying it anyway. Still thinking about this one. > * HEIKKI: Does xl_btree_vacuum WAL record store a whole copy of the > remaining posting list on an updated tuple? Wouldn't it be simpler and > more space-efficient to store just the deleted TIDs? This probably makes sense. The btreevacuumposting() code that generates "updated" index tuples (tuples that VACUUM uses to replace existing ones when some but not all of the TIDs need to be removed) was derived from GIN's ginVacuumItemPointers(). That approach works well enough, but we can do better now. It shouldn't be that hard. My preferred approach is a little different to your suggested approach of storing the deleted TIDs directly. I would like to make it work by storing an array of uint16 offsets into a posting list, one array per "updated" tuple (with one offset per deleted TID within each array). These arrays (which must include an array size indicator at the start) can appear in the xl_btree_vacuum record, at the same place the patch currently puts a raw IndexTuple. They'd be equivalent to a raw IndexTuple -- the REDO routine would reconstruct the same raw posting list tuple on its own. This approach seems simpler, and is clearly very space efficient. This approach is similar to the approach used by REDO routines to handle posting list splits. Posting list splits must call _bt_swap_posting() on the primary, while the corresponding REDO routines also call _bt_swap_posting(). For space efficient "updates", we'd have to invent a sibling utility function -- we could call it _bt_delete_posting(), and put it next to _bt_swap_posting() within nbtdedup.c. How do you feel about that approach? (And how do you feel about the existing "REDO routines call _bt_swap_posting()" business that it's based on?) > * HEIKKI: Would it be more clear to have a separate struct for the > posting list split case? (i.e. don't reuse xl_btree_insert) I've concluded that this one probably isn't worthwhile. We'd have to carry a totally separate record on the stack within _bt_insertonpg(). If you feel strongly about it, I will reconsider. -- Peter Geoghegan
Re: DROP OWNED CASCADE vs Temp tables
On Mon, Jan 13, 2020 at 07:45:06PM -0300, Alvaro Herrera wrote: > This seems fiddly to handle better; maybe you'd have to have a new > PERFORM_DELETION_* flag that says to ignore "missing" objects; so when > you go from shdepDropOwned, you pass that flag all the way down to > doDeletion(), so the objtype-specific function is called with > "missing_ok", and ignore if the object has already gone away. That's > tedious because none of the Remove* functions have the concept of > missing_ok. Yes, that would be invasive and I'd rather not backpatch such a change but I don't see a better or cleaner way to handle that correctly either than the way you are describing. Looking at all the subroutines removing the objects by OID, a patch among those lines is repetitive, though not complicated to do. -- Michael signature.asc Description: PGP signature
Re: DROP OWNED CASCADE vs Temp tables
Alvaro Herrera writes: > On 2020-Jan-07, Mithun Cy wrote: >> I have a test where a user creates a temp table and then disconnect, >> concurrently we try to do DROP OWNED BY CASCADE on the same user. Seems >> this causes race condition between temptable deletion during disconnection >> (@RemoveTempRelations(myTempNamespace)) and DROP OWNED BY CASCADE operation >> which will try to remove same temp table when they find them as part of >> pg_shdepend. > Cute. Is this really any worse than any other attempt to issue two DROPs against the same object concurrently? Maybe we can just call it pilot error. > This seems fiddly to handle better; maybe you'd have to have a new > PERFORM_DELETION_* flag that says to ignore "missing" objects; so when > you go from shdepDropOwned, you pass that flag all the way down to > doDeletion(), so the objtype-specific function is called with > "missing_ok", and ignore if the object has already gone away. That's > tedious because none of the Remove* functions have the concept of > missing_ok. That seems fundamentally wrong. By the time we've queued an object for deletion in dependency.c, we have a lock on it, and we've verified that the object is still there (cf. systable_recheck_tuple calls). If shdepDropOwned is doing it differently, I'd say shdepDropOwned is doing it wrong. regards, tom lane
Re: Add FOREIGN to ALTER TABLE in pg_dump
On Mon, Jan 13, 2020 at 7:52 PM Tom Lane wrote: > > vignesh C writes: > > On Thu, Sep 26, 2019 at 7:17 PM Luis Carril wrote: > >>> Your patch is failing the pg_dump TAP tests. Please use > >>> configure --enable-tap-tests, fix the problems, then resubmit. > > >> Fixed, I've attached a new version. > > > Will it be possible to add a test case for this, can we validate by > > adding one test? > > Isn't the change in the TAP test output sufficient? > I could not see any expected file output changes in the patch. Should we modify the existing test to validate this. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Option to dump foreign data in pg_dump
On Fri, Nov 29, 2019 at 2:10 PM Luis Carril wrote: > > Luis, > > It seems you've got enough support for this concept, so let's move > forward with this patch. There are some comments from Tom about the > patch; would you like to send an updated version perhaps? > > Thanks > > Hi, > > I've attached a new version (v6) removing the superfluous JOIN that Tom > identified, and not collecting the oids (avoiding the query) if the option is > not used at all. > > About the testing issues that Tom mentioned: > I do not see how can we have a pure SQL dummy FDW that tests the > functionality. Because the only way to identify if the data of a foreign > table for the chosen server is dumped is if the COPY statement appears in the > output, but if the C callbacks of the FDW are not implemented, then the > SELECT that dumps the data to generate the COPY cannot be executed. > Also, to test that the include option chooses only the data of the specified > foreign servers we would need some negative testing, i.e. that the COPY > statement for the non-desired table does not appear. But I do not find these > kind of tests in the test suite, even for other selective options like > --table or --exclude-schema. > Can you have a look at dump with parallel option. Parallel option will take a lock on table while invoking lockTableForWorker. May be this is not required for foreign tables. Thoughts? Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: benchmarking Flex practices
On Tue, Jan 14, 2020 at 4:12 AM Tom Lane wrote: > > John Naylor writes: > > [ v11 patch ] > > I pushed this with some small cosmetic adjustments. Thanks for your help hacking on the token filter. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Amcheck: do rightlink verification with lock coupling
On Mon, Jan 13, 2020 at 03:49:40PM -0800, Peter Geoghegan wrote: On Sat, Jan 11, 2020 at 4:25 AM Tomas Vondra wrote: Understood. Is that a reason to not commit of this patch now, though? It could use some polishing. Are you interested in committing it? Not really - as a CFM I was trying to revive patches that seem in good shape but not moving. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
I wrote: > Peter Eisentraut writes: >> I have found a weird behavior with identifier quoting, which is not the >> subject of this patch, but it appears to be affected by it. > I'll think about how to improve this. Given that we're dequoting > filenames explicitly anyway, maybe we don't need to tell readline that > double-quote is a quoting character. Another idea is that maybe > we ought to refactor things so that identifier dequoting and requoting > is handled explicitly, similarly to the way filenames work now. > That would make the patch a lot bigger though. On reflection, it seems like the best bet for the moment is to remove double-quote from the rl_completer_quote_characters string, which should restore all behavior around double-quoted strings to what it was before. (We have to keep single-quote in that string, though, or quoted file names fail entirely.) The only impact this has on filename completion is that we're no longer bright enough to convert a double-quoted filename to single-quoted for you, which would be a nice-to-have but it's hardly core functionality. At some point it'd be good to undo that and upgrade quoted-identifier processing, but I don't really want to include such changes in this patch. I'm about burned out on tab completion for right now. regards, tom lane diff --git a/config/programs.m4 b/config/programs.m4 index 90ff944..68ab823 100644 --- a/config/programs.m4 +++ b/config/programs.m4 @@ -209,17 +209,20 @@ fi -# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER -# --- +# PGAC_READLINE_VARIABLES +# --- # Readline versions < 2.1 don't have rl_completion_append_character +# Libedit lacks rl_filename_quote_characters and rl_filename_quoting_function -AC_DEFUN([PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER], +AC_DEFUN([PGAC_READLINE_VARIABLES], [AC_CACHE_CHECK([for rl_completion_append_character], pgac_cv_var_rl_completion_append_character, [AC_LINK_IFELSE([AC_LANG_PROGRAM([#include -#ifdef HAVE_READLINE_READLINE_H -# include +#if defined(HAVE_READLINE_READLINE_H) +#include +#elif defined(HAVE_EDITLINE_READLINE_H) +#include #elif defined(HAVE_READLINE_H) -# include +#include #endif ], [rl_completion_append_character = 'x';])], @@ -228,7 +231,42 @@ AC_DEFUN([PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER], if test x"$pgac_cv_var_rl_completion_append_character" = x"yes"; then AC_DEFINE(HAVE_RL_COMPLETION_APPEND_CHARACTER, 1, [Define to 1 if you have the global variable 'rl_completion_append_character'.]) -fi])# PGAC_VAR_RL_COMPLETION_APPEND_CHARACTER +fi +AC_CACHE_CHECK([for rl_filename_quote_characters], pgac_cv_var_rl_filename_quote_characters, +[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include +#if defined(HAVE_READLINE_READLINE_H) +#include +#elif defined(HAVE_EDITLINE_READLINE_H) +#include +#elif defined(HAVE_READLINE_H) +#include +#endif +], +[rl_filename_quote_characters = "x";])], +[pgac_cv_var_rl_filename_quote_characters=yes], +[pgac_cv_var_rl_filename_quote_characters=no])]) +if test x"$pgac_cv_var_rl_filename_quote_characters" = x"yes"; then +AC_DEFINE(HAVE_RL_FILENAME_QUOTE_CHARACTERS, 1, + [Define to 1 if you have the global variable 'rl_filename_quote_characters'.]) +fi +AC_CACHE_CHECK([for rl_filename_quoting_function], pgac_cv_var_rl_filename_quoting_function, +[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include +#if defined(HAVE_READLINE_READLINE_H) +#include +#elif defined(HAVE_EDITLINE_READLINE_H) +#include +#elif defined(HAVE_READLINE_H) +#include +#endif +], +[rl_filename_quoting_function = 0;])], +[pgac_cv_var_rl_filename_quoting_function=yes], +[pgac_cv_var_rl_filename_quoting_function=no])]) +if test x"$pgac_cv_var_rl_filename_quoting_function" = x"yes"; then +AC_DEFINE(HAVE_RL_FILENAME_QUOTING_FUNCTION, 1, + [Define to 1 if you have the global variable 'rl_filename_quoting_function'.]) +fi +])# PGAC_READLINE_VARIABLES diff --git a/configure b/configure index 25cfbcb..a46ba40 100755 --- a/configure +++ b/configure @@ -16316,10 +16316,12 @@ else cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include -#ifdef HAVE_READLINE_READLINE_H -# include +#if defined(HAVE_READLINE_READLINE_H) +#include +#elif defined(HAVE_EDITLINE_READLINE_H) +#include #elif defined(HAVE_READLINE_H) -# include +#include #endif int @@ -16345,6 +16347,85 @@ if test x"$pgac_cv_var_rl_completion_append_character" = x"yes"; then $as_echo "#define HAVE_RL_COMPLETION_APPEND_CHARACTER 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for rl_filename_quote_characters" >&5 +$as_echo_n "checking for rl_filename_quote_characters... " >&6; } +if ${pgac_cv_var_rl_filename_quote_characters+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +#if defined(HAVE_READLINE_READLINE_H) +#include +#elif defined(HAVE_EDITLINE_READLINE_H) +#include +#e
Re: Unicode escapes with any backend encoding
On Tue, Jan 14, 2020 at 10:02 AM Tom Lane wrote: > > > Grepping for other direct uses of unicode_to_utf8(), I notice that > there are a couple of places in the JSON code where we have a similar > restriction that you can only write a Unicode escape in UTF8 server > encoding. I'm not sure whether these same semantics could be > applied there, so I didn't touch that. > Off the cuff I'd be inclined to say we should keep the text escape rules the same. We've already extended the JSON standard y allowing non-UTF8 encodings. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Unicode escapes with any backend encoding
Andrew Dunstan writes: > On Tue, Jan 14, 2020 at 10:02 AM Tom Lane wrote: >> Grepping for other direct uses of unicode_to_utf8(), I notice that >> there are a couple of places in the JSON code where we have a similar >> restriction that you can only write a Unicode escape in UTF8 server >> encoding. I'm not sure whether these same semantics could be >> applied there, so I didn't touch that. > Off the cuff I'd be inclined to say we should keep the text escape > rules the same. We've already extended the JSON standard y allowing > non-UTF8 encodings. Right. I'm just thinking though that if you can write "é" literally in a JSON string, even though you're using LATIN1 not UTF8, then why not allow writing that as "\u00E9" instead? The latter is arguably truer to spec. However, if JSONB collapses "\u00E9" to LATIN1 "é", that would be bad, unless we have a way to undo it on printout. So there might be some more moving parts here than I thought. regards, tom lane
Re: Making psql error out on output failures
Hi Daniel, I agree with you if psql output doesn't indicate any error when the disk is full, then it is obviously not nice. In some situations, people may end up lost data permanently. However, after I quickly applied your idea/patch to "commit bf65f3c8871bcc95a3b4d5bcb5409d3df05c8273 (HEAD -> REL_12_STABLE, origin/REL_12_STABLE)", and I found the behaviours/results are different. Here is the steps and output, $ sudo mkdir -p /mnt/ramdisk $ sudo mount -t tmpfs -o rw,size=1M tmpfs /mnt/ramdisk Test-1: delete the "file", and run psql command from a terminal directly, $ rm /mnt/ramdisk/file $ psql -d postgres -At -c "select repeat('111', 100)" > /mnt/ramdisk/file Error printing tuples then dump the file, $ rm /mnt/ramdisk/file $ hexdump -C /mnt/ramdisk/file 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 || * 0010 Test-2: delete the "file", run the command within psql console, $ rm /mnt/ramdisk/file $ psql -d postgres psql (12.1) Type "help" for help. postgres=# select repeat('111', 100) \g /mnt/ramdisk/file Error printing tuples postgres=# Then dump the file again, $ hexdump -C /mnt/ramdisk/file 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 || * 0010 As you can see the content are different after applied the patch. David The new status of this patch is: Waiting on Author
Re: Avoid full GIN index scan when possible
Updated patch is attached. It contains more comments as well as commit message. On Sun, Jan 12, 2020 at 12:10 AM Alexander Korotkov wrote: > On Sat, Jan 11, 2020 at 3:19 PM Julien Rouhaud wrote: > > While at it, IIUC only excludeOnly key can have nrequired == 0 (if > > that's the case, this could be explicitly said in startScanKey > > relevant comment), so it'd be more consistent to also use excludeOnly > > rather than nrequired in this assert? > > Make sense. I'll adjust the assert as well as comment. The changes to this assertion are not actually needed. I just accidentally forgot to revert them. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v12.patch Description: Binary data
Re: Protect syscache from bloating with negative cache entries
This is a new complete workable patch after a long time of struggling with benchmarking. At Tue, 19 Nov 2019 19:48:10 +0900 (JST), Kyotaro Horiguchi wrote in > I ran the run2.sh script attached, which runs catcachebench2(), which > asks SearchSysCache3() for cached entries (almost) 24 times per > run. The number of each output line is the mean of 3 times runs, and > stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl > | psql" creates a database for the benchmark. catcachebench2() runs > the shortest path in the three in the attached benchmark program. > > (pg_ctl start) > $ perl gen_tbl.pl | psql ... > (pg_ctl stop) I wonder why I took the average of the time instead of choose the fastest one. This benchmark is extremely CPU intensive so the fastest run reliably represents the perfromance. I changed the benchmark so that it shows the time of the fastest run (run4.sh). Based on the latest result, I used the pattern 3 (SearchSyscacheN indirection, but wrongly pointed as 1 in the last mail) in the latest version, I took the number of the fastest time among 3 iteration of 5 runs of both master/patched O2 binaries. version | min -+- master | 7986.65 patched | 7984.47 = 'indirect' below I would say this version doesn't get degradaded by indirect calls. So, I applied the other part of the catcache expiration patch as the succeeding parts. After that I got somewhat strange but very stable result. Just adding struct members acceleartes the benchmark. The numbers are the fastest time of 20 runs of the bencmark in 10 times iterations. ms master 7980.79 # the master with the benchmark extension (0001) = base7340.96 # add only struct members and a GUC variable. (0002) indirect7998.68 # call SearchCatCacheN indirectly (0003) = expire-off 7422.30 # CatCache expiration (0004) # (catalog_cache_prune_min_age = -1) expire-on 7861.13 # CatCache expiration (catalog_cache_prune_min_age = 0) The patch accelerates CatCaCheSearch for uncertain reasons. I'm not sure what makes the difference between about 8000ms and about 7400 ms, though. Several times building of all versions then running the benchmark gave me the results with the same tendency. I once stop this work at this point and continue later. The following files are attached. 0001-catcache-benchmark-extension.patch: benchmnark extension used by the benchmarking here. The test tables are generated using gentbl2.pl attached. (perl gentbk2.pl | psql) 0002-base_change.patch: Preliminary adds some struct members and a GUC variable to see if they cause any extent of degradation. 0003-Make-CatCacheSearchN-indirect-functions.patch: Rewrite to change CatCacheSearchN functions to be called indirectly. 0004-CatCache-expiration-feature.patch: Add CatCache expiration feature. gentbl2.pl: A script that emits SQL statements to generate test tables. run4.sh : The test script I used for benchmarkiing here. build2.sh : A script I used to build the four types of binaries used here. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From dacf4a2ac9eb49099e744ee24066b94e9f78aa61 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 14 Nov 2019 19:24:36 +0900 Subject: [PATCH 1/4] catcache benchmark extension Provides the function catcachebench(bench_no int), which runs CPU intensive benchmark on catcache search. The test table is created by a script separately provided. catcachebench(0): prewarm catcache with provided test tables. catcachebench(1): fetches all attribute stats of all tables. This benchmark loads a vast number of unique entries. Expriration doesn't work since it runs in a transaction. catcachebench(2): fetches all attribute stats of a tables many times. This benchmark repeatedly accesses already loaded entries. Expriration doesn't work since it runs in a transaction. catcachebench(3): fetches all attribute stats of all tables four times. Different from other modes, this runs expiration by forcibly updates reference clock variable every 1000 entries. At this point, variables needed for the expiration feature is not added so SetCatCacheClock is a dummy macro that just replaces it with its parameter. --- contrib/catcachebench/Makefile | 17 + contrib/catcachebench/catcachebench--0.0.sql | 14 + contrib/catcachebench/catcachebench.c| 330 +++ contrib/catcachebench/catcachebench.control | 6 + src/backend/utils/cache/catcache.c | 35 ++ src/backend/utils/cache/syscache.c | 2 +- src/include/utils/catcache.h | 3 + 7 files changed, 406 insertions(+), 1 deletion(-) create mode 100644 contrib/catcachebench/Makefile create mode 100644 contrib/catcachebench/catcachebench--0.0.sql create mode 100644 contrib/catcachebench/catcachebench.c create mode 100644 contrib/catcachebench/catcachebench.control dif
Improve errors when setting incorrect bounds for SSL protocols
Hi all, (Daniel G. in CC.) As discussed on the thread to be able to set the min/max SSL protocols with libpq, when mixing incorrect bounds the user experience is not that good: https://www.postgresql.org/message-id/9cfa34ee-f670-419d-b92c-cb7943a27...@yesql.se It happens that the error generated with incorrect combinations depends solely on what OpenSSL thinks is fine, and that's the following: psql: error: could not connect to server: SSL error: tlsv1 alert internal error It is hard for users to understand what such an error means and how to act on it. Please note that OpenSSL 1.1.0 has added two routines to be able to get the min/max protocols set in a context, called SSL_CTX_get_min/max_proto_version. Thinking about older versions of OpenSSL I think that it is better to use ssl_protocol_version_to_openssl to do the parsing work. I also found that it is easier to check for compatible versions after setting both bounds in the SSL context, so as there is no need to worry about invalid values depending on the build of OpenSSL used. So attached is a patch to improve the detection of incorrect combinations. Once applied, we get a complain about an incorrect version at server startup (FATAL) or reload (LOG). The patch includes new regression tests. Thoughts? -- Michael diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 62f1fcab2b..75fdb2e91b 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -67,8 +67,7 @@ static bool SSL_initialized = false; static bool dummy_ssl_passwd_cb_called = false; static bool ssl_is_server_start; -static int ssl_protocol_version_to_openssl(int v, const char *guc_name, - int loglevel); +static int ssl_protocol_version_to_openssl(int v); #ifndef SSL_CTX_set_min_proto_version static int SSL_CTX_set_min_proto_version(SSL_CTX *ctx, int version); static int SSL_CTX_set_max_proto_version(SSL_CTX *ctx, int version); @@ -84,6 +83,8 @@ be_tls_init(bool isServerStart) { STACK_OF(X509_NAME) *root_cert_list = NULL; SSL_CTX*context; + int ssl_ver_min = -1; + int ssl_ver_max = -1; /* This stuff need be done only once. */ if (!SSL_initialized) @@ -192,13 +193,19 @@ be_tls_init(bool isServerStart) if (ssl_min_protocol_version) { - int ssl_ver = ssl_protocol_version_to_openssl(ssl_min_protocol_version, - "ssl_min_protocol_version", - isServerStart ? FATAL : LOG); + ssl_ver_min = ssl_protocol_version_to_openssl(ssl_min_protocol_version); - if (ssl_ver == -1) + if (ssl_ver_min == -1) + { + ereport(isServerStart ? FATAL : LOG, + (errmsg("%s setting %s not supported by this build", + "ssl_min_protocol_version", + GetConfigOption("ssl_min_protocol_version", + false, false; goto error; - if (!SSL_CTX_set_min_proto_version(context, ssl_ver)) + } + + if (!SSL_CTX_set_min_proto_version(context, ssl_ver_min)) { ereport(isServerStart ? FATAL : LOG, (errmsg("could not set minimum SSL protocol version"))); @@ -208,13 +215,19 @@ be_tls_init(bool isServerStart) if (ssl_max_protocol_version) { - int ssl_ver = ssl_protocol_version_to_openssl(ssl_max_protocol_version, - "ssl_max_protocol_version", - isServerStart ? FATAL : LOG); + ssl_ver_max = ssl_protocol_version_to_openssl(ssl_max_protocol_version); - if (ssl_ver == -1) + if (ssl_ver_max == -1) + { + ereport(isServerStart ? FATAL : LOG, + (errmsg("%s setting %s not supported by this build", + "ssl_max_protocol_version", + GetConfigOption("ssl_max_protocol_version", + false, false; goto error; - if (!SSL_CTX_set_max_proto_version(context, ssl_ver)) + } + + if (!SSL_CTX_set_max_proto_version(context, ssl_ver_max)) { ereport(isServerStart ? FATAL : LOG, (errmsg("could not set maximum SSL protocol version"))); @@ -222,6 +235,20 @@ be_tls_init(bool isServerStart) } } + /* Check compatibility of min/max protocols */ + if (ssl_min_protocol_version && + ssl_max_protocol_version) + { + /* + * No need to check for invalid values (-1) for each protocol number + * as the code above would have already generated an error. + */ + if (ssl_ver_min > ssl_ver_max) + ereport(isServerStart ? FATAL : LOG, + (errmsg("incompatible min/max SSL protocol versions set"))); + goto error; + } + /* disallow SSL session tickets */ SSL_CTX_set_options(context, SSL_OP_NO_TICKET); @@ -1275,12 +1302,11 @@ X509_NAME_to_cstring(X509_NAME *name) * guc.c independent of OpenSSL availability and version. * * If a version is passed that is not supported by the current OpenSSL - * version, then we log with the given loglevel and return (if we return) -1. - * If a nonnegative value is returned, subsequent code can assume it's working - * with a supported version. + * version, then we return -1. If a nonnegative value is returned, + * subsequen
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
On Sun, Jan 12, 2020 at 8:18 AM Amit Kapila wrote: > > On Sat, Jan 11, 2020 at 11:16 AM Tom Lane wrote: > > > > > * The seeming bug in v10 suggests that we aren't testing large enough > > logical-decoding cases, or at least aren't noticing leaks in that > > area. I'm not sure what a good design is for testing that. I'm not > > thrilled with just using a larger (and slower) test case, but it's > > not clear to me how else to attack it. > > > > It is not clear to me either at this stage, but I think we can decide > that after chasing the issue in v10. My current plan is to revert > this test and make a note of the memory leak problem found (probably > track in Older Bugs section of PostgreSQL 12 Open Items). > Pushed the revert and added an open item in the 'Older Bugs' section of PostgreSQL 12 Open Items [1]. [1] - https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Block level parallel vacuum
On Mon, 13 Jan 2020 at 12:50, Amit Kapila wrote: > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada > wrote: > > > > On Sat, 11 Jan 2020 at 13:18, Amit Kapila wrote: > > > > > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada > > > wrote: > > > > > > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor > > > > wrote: > > > > > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov wrote: > > > > > > > > > > > > Hi > > > > > > Thank you for update! I looked again > > > > > > > > > > > > (vacuum_indexes_leader) > > > > > > + /* Skip the indexes that can be processed by > > > > > > parallel workers */ > > > > > > + if (!skip_index) > > > > > > + continue; > > > > > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to > > > > > > something like can_parallel? > > > > > > > > > > I also agree with your point. > > > > > > > > I don't think the change is a good idea. > > > > > > > > - boolskip_index = > > > > (get_indstats(lps->lvshared, i) == NULL || > > > > - > > > > skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > > > > + boolcan_parallel = > > > > (get_indstats(lps->lvshared, i) == NULL || > > > > + > > > > skip_parallel_vacuum_index(Irel[i], > > > > + > > > >lps->lvshared)); > > > > > > > > The above condition is true when the index can *not* do parallel index > > > > vacuum. How about changing it to skipped_index and change the comment > > > > to something like “We are interested in only index skipped parallel > > > > vacuum”? > > > > > > > > > > Hmm, I find the current code and comment better than what you or > > > Sergei are proposing. I am not sure what is the point of confusion in > > > the current code? > > > > Yeah the current code is also good. I just thought they were concerned > > that the variable name skip_index might be confusing because we skip > > if skip_index is NOT true. > > > > Okay, would it better if we get rid of this variable and have code like below? > > /* Skip the indexes that can be processed by parallel workers */ > if ( !(get_indstats(lps->lvshared, i) == NULL || > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > continue; Make sense to me. > ... > > > > > > > > > > > > > > > > > > > > > Another question about behavior on temporary tables. Use case: the > > > > > > user commands just "vacuum;" to vacuum entire database (and has > > > > > > enough maintenance workers). Vacuum starts fine in parallel, but on > > > > > > first temporary table we hit: > > > > > > > > > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= > > > > > > 0) > > > > > > + { > > > > > > + ereport(WARNING, > > > > > > + (errmsg("disabling parallel option > > > > > > of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel", > > > > > > + > > > > > > RelationGetRelationName(onerel; > > > > > > + params->nworkers = -1; > > > > > > + } > > > > > > > > > > > > And therefore we turn off the parallel vacuum for the remaining > > > > > > tables... Can we improve this case? > > > > > > > > > > Good point. > > > > > Yes, we should improve this. I tried to fix this. > > > > > > > > +1 > > > > > > > > > > Yeah, we can improve the situation here. I think we don't need to > > > change the value of params->nworkers at first place if allow > > > lazy_scan_heap to take care of this. Also, I think we shouldn't > > > display warning unless the user has explicitly asked for parallel > > > option. See the fix in the attached patch. > > > > Agreed. But with the updated patch the PARALLEL option without the > > parallel degree doesn't display warning because params->nworkers = 0 > > in that case. So how about restoring params->nworkers at the end of > > vacuum_rel()? > > > > I had also thought on those lines, but I was not entirely sure about > this resetting of workers. Today, again thinking about it, it seems > the idea Mahendra is suggesting that is giving an error if the > parallel degree is not specified seems reasonable to me. This means > Vacuum (parallel), Vacuum (parallel) , etc. will give an > error "parallel degree must be specified". This idea has merit as now > we are supporting a parallel vacuum by default, so a 'parallel' option > without a parallel degree doesn't have any meaning. If we do that, > then we don't need to do anything additional about the handling of > temp tables (other than what patch is already doing) as well. What do > you think? > Good point! Agreed. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.
Re: [HACKERS] Block level parallel vacuum
On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor wrote: > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov wrote: > > > > Hi > > Thank you for update! I looked again > > > > (vacuum_indexes_leader) > > + /* Skip the indexes that can be processed by parallel > > workers */ > > + if (!skip_index) > > + continue; > > > > Does the variable name skip_index not confuse here? Maybe rename to > > something like can_parallel? > > > > Again I looked into code and thought that somehow if we can add a > boolean flag(can_parallel) in IndexBulkDeleteResult structure to > identify that this index is supporting parallel vacuum or not, then it > will be easy to skip those indexes and multiple time we will not call > skip_parallel_vacuum_index (from vacuum_indexes_leader and > parallel_vacuum_index) > We can have a linked list of non-parallel supported indexes, then > directly we can pass to vacuum_indexes_leader. > > Ex: let suppose we have 5 indexes into a table. If before launching > parallel workers, if we can add boolean flag(can_parallel) > IndexBulkDeleteResult structure to identify that this index is > supporting parallel vacuum or not. > Let index 1, 4 are not supporting parallel vacuum so we already have > info in a linked list that 1->4 are not supporting parallel vacuum, so > parallel_vacuum_index will process these indexes and rest will be > processed by parallel workers. If parallel worker found that > can_parallel is false, then it will skip that index. > > As per my understanding, if we implement this, then we can avoid > multiple function calling of skip_parallel_vacuum_index and if there > is no index which can't performe parallel vacuum, then we will not > call vacuum_indexes_leader as head of list pointing to null. (we can > save unnecessary calling of vacuum_indexes_leader) > > Thoughts? > We skip not only indexes that don't support parallel index vacuum but also indexes supporting it depending on vacuum phase. That is, we could skip different indexes at different vacuum phase. Therefore with your idea, we would need to have at least three linked lists for each possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is that right? I think we can check if there are indexes that should be processed by the leader process before entering the loop in vacuum_indexes_leader by comparing nindexes_parallel_XXX of LVParallelState to the number of indexes but I'm not sure it's effective since the number of indexes on a table should be small. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Amcheck: do rightlink verification with lock coupling
Hi Peter! Sorry for answering so long. > 11 янв. 2020 г., в 7:49, Peter Geoghegan написал(а): > > I'm curious why Andrey's corruption problems were not detected by the > cross-page amcheck test, though. We compare the first non-pivot tuple > on the right sibling leaf page with the last one on the target page, > towards the end of bt_target_page_check() -- isn't that almost as good > as what you have here in practice? I probably would have added > something like this myself earlier, if I had reason to think that > verification would be a lot more effective that way. We were dealing with corruption caused by lost page update. Consider two pages: A->B If A is split into A` and C we get: A`->C->B But if update of A is lost we still have A->B, but B backward pointers points to C. B's smallest key is bigger than hikey of A`, but this do not violate cross-pages invariant. Page updates may be lost due to bug in backup software with incremental backups, bug in storage layer of Aurora-style system, bug in page cache, incorrect fsync error handling, bug in ssd firmware etc. And our data checksums do not detect this kind of corruption. BTW I think that it would be better if our checksums were not stored on a page itseft, they could detect this kind of faults. We were able to timely detect all those problems on primaries in our testing environment. But much later found out that some standbys were corrupted, the problem appeared only when they were promoted. Also, in nearby thread Grygory Rylov (g0djan) is trying to enable one more invariant in standby checks. Thanks! Best regards, Andrey Borodin.
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Mon, Jan 13, 2020 at 3:18 PM Dilip Kumar wrote: > > On Thu, Jan 9, 2020 at 12:09 PM Amit Kapila wrote: > > > > On Thu, Jan 9, 2020 at 10:30 AM Dilip Kumar wrote: > > > > > > On Thu, Jan 9, 2020 at 9:35 AM Amit Kapila > > > wrote: > > > > > > > > > The problem is that when we > > > > > get a toasted chunks we remember the changes in the memory(hash table) > > > > > but don't stream until we get the actual change on the main table. > > > > > Now, the problem is that we might get the change of the toasted table > > > > > and the main table in different streams. So basically, in a stream, > > > > > if we have only got the toasted tuples then even after > > > > > ReorderBufferStreamTXN the memory usage will not be reduced. > > > > > > > > > > > > > I think we can't split such changes in a different stream (unless we > > > > design an entirely new solution to send partial changes of toast > > > > data), so we need to send them together. We can keep a flag like > > > > data_complete in ReorderBufferTxn and mark it complete only when we > > > > are able to assemble the entire tuple. Now, whenever, we try to > > > > stream the changes once we reach the memory threshold, we can check > > > > whether the data_complete flag is true Here, we can also consider streaming the changes when data_complete is false, but some additional changes have been added to the same txn as the new changes might complete the tuple. > > > > , if so, then only send the > > > > changes, otherwise, we can pick the next largest transaction. I think > > > > we can retry it for few times and if we get the incomplete data for > > > > multiple transactions, then we can decide to spill the transaction or > > > > maybe we can directly spill the first largest transaction which has > > > > incomplete data. > > > > > > > Yeah, we might do something on this line. Basically, we need to mark > > > the top-transaction as data-incomplete if any of its subtransaction is > > > having data-incomplete (it will always be the latest sub-transaction > > > of the top transaction). Also, for streaming, we are checking the > > > largest top transaction whereas for spilling we just need the larget > > > (sub) transaction. So we also need to decide while picking the > > > largest top transaction for streaming, if we get a few transactions > > > with in-complete data then how we will go for the spill. Do we spill > > > all the sub-transactions under this top transaction or we will again > > > find the larget (sub) transaction for spilling. > > > > > > > I think it is better to do later as that will lead to the spill of > > only required (minimum changes to get the memory below threshold) > > changes. > I think instead of doing this can't we just spill the changes which > are in toast_hash. Basically, at the end of the stream, we have some > toast tuple which we could not stream because we did not have the > insert for the main table then we can spill only those changes which > are in tuple hash. > Hmm, I think this can turn out to be inefficient because we can easily end up spilling the data even when we don't need to so. Consider cases, where part of the streamed changes are for toast, and remaining are the changes which we would have streamed and hence can be removed. In such cases, we could have easily consumed remaining changes for toast without spilling. Also, I am not sure if spilling changes from the hash table is a good idea as they are no more in the same order as they were in ReorderBuffer which means the order in which we serialize the changes normally would change and that might have some impact, so we would need some more study if we want to pursue this idea. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Add pg_file_sync() to adminpack
On Mon, Jan 13, 2020 at 03:39:32PM +0100, Julien Rouhaud wrote: > Actually, can't it create a security hazard, for instance if you call > pg_file_sync() on a heap file and the calls errors out, since it's > bypassing data_sync_retry? Are you mistaking security with durability here? By default, the function proposed is only executable by a superuser, so that's not really a security concern.. But I agree that failing to detect a PANIC on a fsync for a sensitive Postgres file could lead to corruptions. That's why we PANIC these days. -- Michael signature.asc Description: PGP signature
Re: Add pg_file_sync() to adminpack
On Tue, Jan 14, 2020 at 7:18 AM Michael Paquier wrote: > > On Mon, Jan 13, 2020 at 03:39:32PM +0100, Julien Rouhaud wrote: > > Actually, can't it create a security hazard, for instance if you call > > pg_file_sync() on a heap file and the calls errors out, since it's > > bypassing data_sync_retry? > > Are you mistaking security with durability here? Yes, data durability sorry. > By default, the > function proposed is only executable by a superuser, so that's not > really a security concern.. But I agree that failing to detect a > PANIC on a fsync for a sensitive Postgres file could lead to > corruptions. That's why we PANIC these days. Exactly. My concern is that some superuser may not be aware that pg_file_sync could actually corrupt data, so there should be a big red warning explaining that.
Re: base backup client as auxiliary backend process
On Sat, 11 Jan 2020 at 18:52, Peter Eisentraut wrote: > > On 2020-01-10 04:32, Masahiko Sawada wrote: > > I agreed that these patches are useful on its own and 0001 patch and > > committed 0001 > > > 0002 patch look good to me. For 0003 patch, > > > > + linkend="guc-primary-slot-name"/>. Otherwise, the WAL receiver may > > use > > + a temporary replication slot (determined by > + linkend="guc-wal-receiver-create-temp-slot"/>), but these are not > > shown > > + here. > > > > I think it's better to show the temporary slot name on > > pg_stat_wal_receiver view. Otherwise user would have no idea about > > what wal receiver is using the temporary slot. > > Makes sense. It makes the code a bit more fiddly, but it seems worth > it. New patches attached. Thank you for updating the patch! - Replication slot name used by this WAL receiver + + Replication slot name used by this WAL receiver. This is only set if a + permanent replication slot is set using . Otherwise, the WAL receiver may use + a temporary replication slot (determined by ), but these are not shown + here. + Now that the slot name is shown even if it's a temp slot the above documentation changes needs to be changed. Other changes look good to me. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Implementing Incremental View Maintenance
On Sat, 11 Jan 2020 09:27:58 +0900 nuko yokohama wrote: > LIMIT clause without ORDER BY should be prohibited when creating > incremental materialized views. > > In SQL, the result of a LIMIT clause without ORDER BY is undefined. > If the LIMIT clause is allowed when creating an incremental materialized > view, incorrect results will be obtained when the view is updated after > updating the source table. Thank you for your advice. It's just as you said. LIMIT/OFFSET clause should is prohibited. We will add this to next patch. Best Regards, Takuma Hoshiai > > ``` > [ec2-user@ip-10-0-1-10 ivm]$ psql --version > psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql > DROP TABLE IF EXISTS test CASCADE; > psql:limit-problem.sql:1: NOTICE: drop cascades to materialized view > test_imv > DROP TABLE > CREATE TABLE test (id int primary key, data text); > CREATE TABLE > INSERT INTO test VALUES (generate_series(1, 10), 'foo'); > INSERT 0 10 > CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1; > SELECT 1 >Materialized view "public.test_imv" > Column | Type | Collation | Nullable | Default | Storage | > Stats target | Description > ---+-+---+--+-+--+--+- > id| integer | | | | plain| > | > data | text| | | | extended | > | > __ivm_count__ | bigint | | | | plain| > | > View definition: > SELECT test.id, > test.data >FROM test > LIMIT 1; > Access method: heap > Incremental view maintenance: yes > > SELECT * FROM test LIMIT 1; > id | data > +-- > 1 | foo > (1 row) > > TABLE test_imv; > id | data > +-- > 1 | foo > (1 row) > > UPDATE test SET data = 'bar' WHERE id = 1; > UPDATE 1 > SELECT * FROM test LIMIT 1; > id | data > +-- > 2 | foo > (1 row) > > TABLE test_imv; > id | data > +-- > 1 | bar > (1 row) > > DELETE FROM test WHERE id = 1; > DELETE 1 > SELECT * FROM test LIMIT 1; > id | data > +-- > 2 | foo > (1 row) > > TABLE test_imv; > id | data > +-- > (0 rows) > ``` > > ORDER BY clause is not allowed when executing CREATE INCREMENTAL > MATELIARIZED VIEW. > We propose not to allow LIMIT clauses as well. > > > 2018年12月27日(木) 21:57 Yugo Nagata : > > > Hi, > > > > I would like to implement Incremental View Maintenance (IVM) on > > PostgreSQL. > > IVM is a technique to maintain materialized views which computes and > > applies > > only the incremental changes to the materialized views rather than > > recomputate the contents as the current REFRESH command does. > > > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > > [1]. > > Our implementation uses row OIDs to compute deltas for materialized > > views. > > The basic idea is that if we have information about which rows in base > > tables > > are contributing to generate a certain row in a matview then we can > > identify > > the affected rows when a base table is updated. This is based on an idea of > > Dr. Masunaga [2] who is a member of our group and inspired from ID-based > > approach[3]. > > > > In our implementation, the mapping of the row OIDs of the materialized view > > and the base tables are stored in "OID map". When a base relation is > > modified, > > AFTER trigger is executed and the delta is recorded in delta tables using > > the transition table feature. The accual udpate of the matview is triggerd > > by REFRESH command with INCREMENTALLY option. > > > > However, we realize problems of our implementation. First, WITH OIDS will > > be removed since PG12, so OIDs are no longer available. Besides this, it > > would > > be hard to implement this since it needs many changes of executor nodes to > > collect base tables's OIDs during execuing a query. Also, the cost of > > maintaining > > OID map would be high. > > > > For these reasons, we started to think to implement IVM without relying on > > OIDs > > and made a bit more surveys. > > > > We also looked at Kevin Grittner's discussion [4] on incremental matview > > maintenance. In this discussion, Kevin proposed to use counting algorithm > > [5] > > to handle projection views (using DISTNICT) properly. This algorithm need > > an > > additional system column, count_t, in materialized views and delta tables > > of > > base tables. > > > > However, the discussion about IVM is now stoped, so we would like to > > restart and > > progress this. > > > > > > Through our PoC inplementation and surveys, I think we need to think at > > least > > the followings for implementing IVM. > > > > 1. How to extract changes on base tables > > > > I think there would be at least two approaches for it. > > > > - Using transition table in AFTER trigge