Re: Proposal: Global Index
> 2021年1月7日 22:16,Bruce Momjian 写道: > > On Thu, Jan 7, 2021 at 05:44:01PM +0800, 曾文旌 wrote: >> I've been following this topic for a long time. It's been a year since the >> last response. >> It was clear that our customers wanted this feature as well, and a large >> number of them mentioned it. >> >> So, I wish the whole feature to mature as soon as possible. >> I summarized the scheme mentioned in the email and completed the POC >> patch(base on PG_13). > > I think you need to address the items mentioned in this blog, and the > email link it mentions: > > https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020 Thank you for your reply. I read your blog and it helped me a lot. The blog mentions a specific problem: "A large global index might also reintroduce problems that prompted the creation of partitioning in the first place. " I don't quite understand, could you give some specific information? In addition you mentioned: "It is still unclear if these use-cases justify the architectural changes needed to enable global indexes." Please also describe the problems you see, I will confirm each specific issue one by one. Thanks Wenjing > > I am not clear this is a feature we will want. Yes, people ask for it, > but if the experience will be bad for them and they will regret using > it, I am not sure we want it. Of course, if you code it up and we get > a good user experience, we would want it --- I am just saying it is not > clear right now. > > -- > Bruce Momjian https://momjian.us > EnterpriseDB https://enterprisedb.com > > The usefulness of a cup is in its emptiness, Bruce Lee smime.p7s Description: S/MIME cryptographic signature
Re: Why latestRemovedXid|cuteoff_xid are always sent?
Hello, Peter. Thanks for your explanation. One of the reasons I was asking - is an idea to use the same technique in the "LP_DEAD index hint bits on standby" WIP patch to reduce the amount of additional WAL. Now I am sure such optimization should work correctly. Thanks, Michail.
Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
On Thu, 07 Jan 2021 at 17:53, Bharath Rupireddy wrote: > On Mon, Dec 28, 2020 at 5:56 PM Bharath Rupireddy > wrote: >> >> On Tue, Dec 22, 2020 at 7:01 PM Bharath Rupireddy >> wrote: >> > Currently, $subject is not allowed. We do plan the mat view query >> > before every refresh. I propose to show the explain/explain analyze of >> > the select part of the mat view in case of Refresh Mat View(RMV). It >> > will be useful for the user to know what exactly is being planned and >> > executed as part of RMV. Please note that we already have >> > explain/explain analyze CTAS/Create Mat View(CMV), where we show the >> > explain/explain analyze of the select part. This proposal will do the >> > same thing. >> > >> > The behaviour can be like this: >> > EXPLAIN REFRESH MATERIALIZED VIEW mv1; --> will not refresh the mat >> > view, but shows the select part's plan of mat view. >> > EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW mv1; --> will refresh the >> > mat view and shows the select part's plan of mat view. >> > >> > Thoughts? If okay, I will post a patch later. >> >> Attaching below patches: >> >> 0001 - Rearrange Refresh Mat View Code - Currently, the function >> ExecRefreshMatView in matview.c is having many lines of code which is >> not at all good from readability and maintainability perspectives. >> This patch adds a few functions and moves the code from >> ExecRefreshMatView to them making the code look better. >> >> 0002 - EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW support and tests. >> >> If this proposal is useful, I have few open points - 1) In the patch I >> have added a new mat view info parameter to ExplainOneQuery(), do we >> also need to add it to ExplainOneQuery_hook_type? 2) Do we document >> (under respective command pages or somewhere else) that we allow >> explain/explain analyze for a command? >> >> Thoughts? > > Attaching v2 patch set reabsed on the latest master f7a1a805cb. And > also added an entry for upcoming commitfest - > https://commitfest.postgresql.org/32/2928/ > > Please consider the v2 patches for further review. > Thanks for updating the patch! + /* Get the data generating query. */ + dataQuery = get_matview_query(stmt, &matviewRel, &matviewOid); - /* -* Check for active uses of the relation in the current transaction, such -* as open scans. -* -* NB: We count on this to protect us against problems with refreshing the -* data using TABLE_INSERT_FROZEN. -*/ - CheckTableNotInUse(matviewRel, "REFRESH MATERIALIZED VIEW"); + relowner = matviewRel->rd_rel->relowner; After apply the patch, there is a duplicate relowner = matviewRel->rd_rel->relowner; + else if(matviewInfo) + dest = CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap); If the `matviewInfo->OIDNewHeap` is invalid, IMO we don't need create DestReceiver, isn't it? And we should add a space after `if`. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.Ltd.
Re: Proposal: Global Index
On Fri, Jan 8, 2021 at 4:02 PM 曾文旌 wrote: > > > 2021年1月7日 22:16,Bruce Momjian 写道: > > > > On Thu, Jan 7, 2021 at 05:44:01PM +0800, 曾文旌 wrote: > >> I've been following this topic for a long time. It's been a year since the > >> last response. > >> It was clear that our customers wanted this feature as well, and a large > >> number of them mentioned it. > >> > >> So, I wish the whole feature to mature as soon as possible. > >> I summarized the scheme mentioned in the email and completed the POC > >> patch(base on PG_13). > > > > I think you need to address the items mentioned in this blog, and the > > email link it mentions: > > > > https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020 > > Thank you for your reply. > I read your blog and it helped me a lot. > > The blog mentions a specific problem: "A large global index might also > reintroduce problems that prompted the creation of partitioning in the first > place. " > I don't quite understand, could you give some specific information? > > In addition you mentioned: "It is still unclear if these use-cases justify > the architectural changes needed to enable global indexes." > Please also describe the problems you see, I will confirm each specific issue > one by one. One example is date partitioning. People frequently need to store only the most recent data. For instance doing a monthly partitioning and dropping the oldest partition every month once you hit the wanted retention is very efficient for that use case, as it should be almost instant (provided that you can acquire the necessary locks immediately). But if you have a global index, you basically lose the advantage of partitioning as it'll require heavy changes on that index.
Re: In-placre persistance change of a relation
At Fri, 08 Jan 2021 14:47:05 +0900 (JST), Kyotaro Horiguchi wrote in > This version RelationChangePersistence() is changed not to choose > in-place method for indexes other than btree. It seems to be usable > with all kind of indexes other than Gist, but at the mement it applies > only to btrees. > > 1: > https://www.postgresql.org/message-id/ca+tgmozez5rons49c7mepjhjndqmqtvrz_lcqukprwdmrev...@mail.gmail.com Hmm. This is not wroking correctly. I'll repost after fixint that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: a misbehavior of partition row movement (?)
Hi, On Mon, Dec 28, 2020 at 10:01 PM Arne Roland wrote: > While I'd agree that simply deleting with "on delete cascade" on tuple > rerouting is a strong enough violation of the spirit of partitioning changing > that behavior, I am surprised by the idea to make a differentiation between > fks and other triggers. The way user defined triggers work, is a violation to > the same degree I get complaints about that on a monthly (if not weekly) > basis. It's easy to point towards the docs, but the docs offer no solution to > archive the behavior, that makes partitioning somewhat transparent. Most > people I know don't partition because they like to complicate things, but > just because they have to much data. > > It isn't just a thing with after triggers. Imo before triggers are even > worse: If we move a row between partitions we fire all three triggers at once > (at different leaf pages obviously). It's easy to point towards the docs. > Heart bleed was well documented, but I am happy that it was fixed. I don't > want to imply this totally unrelated security issue has anything to do with > our weird behavior. I just want to raise the question whether that's a good > thing, because frankly I haven't met anyone thus far, who thought it is. Just to be clear, are you only dissatisfied with the firing of triggers during a row-moving UPDATEs on partitioned tables or all firing behaviors of triggers defined on partitioned tables? Can you give a specific example of the odd behavior in that case? > To me it seems the RI triggers are just a specific victim of the way we made > triggers work in general. That's true. > What I tried to express, albeit I apparently failed: I care about having > triggers, which make partitioning transparent, on the long run. > > > because what can happen > > today with CASCADE triggers does not seem like a useful behavior by > > any measure. > > I wholeheartedly agree. I just want to point out, that you could state the > same about triggers in general. > > Backwards compatibility is usually a pretty compelling argument. I would > still want to raise the question, whether this should change, because I > frankly think this is a bad idea. > > You might ask: Why am I raising this question in this thread? > > In case we can not (instantly) agree on the fact that this behavior should > change, I'd still prefer to think about making that behavior possible for > other triggers (possibly later) as well. One possibility would be having an > entry in the catalog to mark when the trigger should fire. Sorry, I don't understand what new setting for triggers you are thinking of here. > I don't want to stall your definitely useful RI-Trigger fix, but I sincerely > believe, that we have to do better with triggers in general. > > If we would make the condition when to fire or not dependent something > besides that fact whether the trigger is internal, we could at a later date > choose to make both ways available, if anyone makes a good case for this. > Even though I still think it's not worth it. > > >I don't quite recall if the decision to implement it like this was > > based on assuming that this is what users would like to see happen in > > this case or the perceived difficulty of implementing it the other way > > around, that is, of firing AFTER UPDATE triggers in this case. > > I tried to look that up, but I couldn't find any discussion about this. Do > you have any ideas in which thread that was handled? It was discussed here: https://www.postgresql.org/message-id/flat/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com It's a huge discussion, so you'll have to ctrl+f "trigger" to spot relevant emails. You might notice that the developers who participated in that discussion gave various opinions and what we have today got there as a result of a majority of them voting for the current approach. Someone also said this during the discussion: "Regarding the trigger issue, I can't claim to have a terribly strong opinion on this. I think that practically anything we do here might upset somebody, but probably any halfway-reasonable thing we choose to do will be OK for most people." So what we've got is that "halfway-reasonable" thing, YMMV. :) -- Amit Langote EDB: http://www.enterprisedb.com
Re: Single transaction in the tablesync worker?
Hi Amit. PSA the v13 patch for the Tablesync Solution1. Differences from v12: + Fixed whitespace errors of v12-0001 + Modify TCOPYDONE state comment (houzj feedback) + WIP fix for AlterSubscripion_refresh (Amit feedback) Features: * The tablesync slot is now permanent instead of temporary. The tablesync slot name is no longer tied to the Subscription slot na * The tablesync slot cleanup (drop) code is added for DropSubscription and for process_syncing_tables_for_sync functions * The tablesync worker is now allowing multiple tx instead of single tx * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful copy_table in LogicalRepSyncTableStart. * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then it will bypass the initial copy_table phase. * Now tablesync sets up replication origin tracking in LogicalRepSyncTableStart (similar as done for the apply worker). The origin is advanced when first created. * The tablesync replication origin tracking is cleaned up during DropSubscription and/or process_syncing_tables_for_apply. * The DropSubscription cleanup code was enhanced (v7+) to take care of any crashed tablesync workers. * Updates to PG docs TODO / Known Issues: * Address review comments * ALTER PUBLICATION DROP TABLE can mean knowledge of tablesyncs gets lost causing resource cleanup to be missed. There is a WIP fix for this in the AlterSubscription_refresh, however it is not entirely correct; there are known race conditions. See FIXME comments. --- Kind Regards, Peter Smith. Fujitsu Australia On Thu, Jan 7, 2021 at 6:52 PM Peter Smith wrote: > > Hi Amit. > > PSA the v12 patch for the Tablesync Solution1. > > Differences from v11: > + Added PG docs to mention the tablesync slot > + Refactored tablesync slot drop (done by > DropSubscription/process_syncing_tables_for_sync) > + Fixed PG docs mentioning wrong state code > + Fixed wrong code comment describing TCOPYDONE state > > > > Features: > > * The tablesync slot is now permanent instead of temporary. The > tablesync slot name is no longer tied to the Subscription slot na > > * The tablesync slot cleanup (drop) code is added for DropSubscription > and for process_syncing_tables_for_sync functions > > * The tablesync worker is now allowing multiple tx instead of single tx > > * A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful > copy_table in LogicalRepSyncTableStart. > > * If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then > it will bypass the initial copy_table phase. > > * Now tablesync sets up replication origin tracking in > LogicalRepSyncTableStart (similar as done for the apply worker). The > origin is advanced when first created. > > * The tablesync replication origin tracking is cleaned up during > DropSubscription and/or process_syncing_tables_for_apply. > > * The DropSubscription cleanup code was enhanced (v7+) to take care of > any crashed tablesync workers. > > * Updates to PG docs > > TODO / Known Issues: > > * Address review comments > > * Patch applies with whitespace warning > > --- > > Kind Regards, > Peter Smith. > Fujitsu Australia v13-0002-Tablesync-extra-logging.patch Description: Binary data v13-0001-Tablesync-Solution1.patch Description: Binary data
Re: proposal: enhancing plpgsql debug API - returns text value of variable content
Hi rebase Regards Pavel diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 3a9349b724..208ba181fc 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -4123,6 +4123,8 @@ plpgsql_estate_setup(PLpgSQL_execstate *estate, { (*plpgsql_plugin_ptr)->error_callback = plpgsql_exec_error_callback; (*plpgsql_plugin_ptr)->assign_expr = exec_assign_expr; + (*plpgsql_plugin_ptr)->eval_datum = exec_eval_datum; + (*plpgsql_plugin_ptr)->cast_value = do_cast_value; if ((*plpgsql_plugin_ptr)->func_setup) ((*plpgsql_plugin_ptr)->func_setup) (estate, func); diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 0b0ff4e5de..82a703639e 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -417,6 +417,7 @@ pl_block : decl_sect K_BEGIN proc_sect exception_sect K_END opt_label new->cmd_type = PLPGSQL_STMT_BLOCK; new->lineno = plpgsql_location_to_lineno(@2); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->label = $1.label; new->n_initvars = $1.n_initvars; new->initvarnos = $1.initvarnos; @@ -911,7 +912,8 @@ stmt_perform : K_PERFORM new = palloc0(sizeof(PLpgSQL_stmt_perform)); new->cmd_type = PLPGSQL_STMT_PERFORM; new->lineno = plpgsql_location_to_lineno(@1); - new->stmtid = ++plpgsql_curr_compile->nstatements; + new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); plpgsql_push_back_token(K_PERFORM); /* @@ -947,6 +949,7 @@ stmt_call : K_CALL new->cmd_type = PLPGSQL_STMT_CALL; new->lineno = plpgsql_location_to_lineno(@1); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); plpgsql_push_back_token(K_CALL); new->expr = read_sql_stmt(); new->is_call = true; @@ -963,6 +966,7 @@ stmt_call : K_CALL new->cmd_type = PLPGSQL_STMT_CALL; new->lineno = plpgsql_location_to_lineno(@1); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); plpgsql_push_back_token(K_DO); new->expr = read_sql_stmt(); new->is_call = false; @@ -998,7 +1002,8 @@ stmt_assign : T_DATUM new = palloc0(sizeof(PLpgSQL_stmt_assign)); new->cmd_type = PLPGSQL_STMT_ASSIGN; new->lineno = plpgsql_location_to_lineno(@1); - new->stmtid = ++plpgsql_curr_compile->nstatements; + new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->varno = $1.datum->dno; /* Push back the head name to include it in the stmt */ plpgsql_push_back_token(T_DATUM); @@ -1020,6 +1025,7 @@ stmt_getdiag : K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';' new->cmd_type = PLPGSQL_STMT_GETDIAG; new->lineno = plpgsql_location_to_lineno(@1); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->is_stacked = $2; new->diag_items = $4; @@ -1192,6 +1198,7 @@ stmt_if : K_IF expr_until_then proc_sect stmt_elsifs stmt_else K_END K_IF ';' new->cmd_type = PLPGSQL_STMT_IF; new->lineno = plpgsql_location_to_lineno(@1); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->cond = $2; new->then_body = $3; new->elsif_list = $4; @@ -1297,6 +1304,7 @@ stmt_loop : opt_loop_label K_LOOP loop_body new->cmd_type = PLPGSQL_STMT_LOOP; new->lineno = plpgsql_location_to_lineno(@2); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->label = $1; new->body = $3.stmts; @@ -1315,6 +1323,7 @@ stmt_while : opt_loop_label K_WHILE expr_until_loop loop_body new->cmd_type = PLPGSQL_STMT_WHILE; new->lineno = plpgsql_location_to_lineno(@2); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->label = $1; new->cond = $3; new->body = $4.stmts; @@ -1336,6 +1345,7 @@ stmt_for : opt_loop_label K_FOR for_control loop_body new = (PLpgSQL_stmt_fori *) $3; new->lineno = plpgsql_location_to_lineno(@2); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->label = $1; new->body = $4.stmts; $$ = (PLpgSQL_stmt *) new; @@ -1351,6 +1361,7 @@ stmt_for : opt_loop_label K_FOR for_control loop_body new = (PLpgSQL_stmt_forq *) $3; new->lineno = plpgsql_location_to_lineno(@2); new->stmtid = ++plpgsql_curr_compile->nstatements; + new->ns = plpgsql_ns_top(); new->label = $1; new->body = $4.stmts; $$ = (PLpgSQL_stmt *) new; @@ -1381,6 +1392,7 @@ for_control : for_variable K_IN new = palloc0(sizeof(PLpgSQL_stmt_dynfors)
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Jan 6, 2021 at 2:09 PM Antonin Houska wrote: > > Greg Nancarrow wrote: > > > Posting an updated set of patches to address recent feedback: > > Following is my review. > .. > > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch > --- > > @@ -1021,12 +1039,15 @@ IsInParallelMode(void) > * Prepare for entering parallel mode, based on command-type. > */ > void > -PrepareParallelMode(CmdType commandType) > +PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader) > { > Assert(!IsInParallelMode() || force_parallel_mode != > FORCE_PARALLEL_OFF); > > if (IsModifySupportedInParallelMode(commandType)) > { > + if (isParallelModifyLeader) > + (void) GetCurrentCommandId(true); > > I miss a comment here. I suppose this is to set currentCommandIdUsed, so that > the leader process gets a new commandId for the following statements in the > same transaction, and thus it can see the rows inserted by the parallel > workers? > oh no, leader backend and worker backends must use the same commandId. I am also not sure if we need this because for Insert statements we already call GetCurrentCommandId(true) is standard_ExecutorStart. We don't want the rows visibility behavior for parallel-inserts any different than non-parallel ones. > If my understanding is correct, I think that the leader should not participate > in the execution of the Insert node, else it would use higher commandId than > the workers. That would be weird, although probably not data corruption. > Yeah, exactly this is the reason both leader and backends must use the same commandId. > I > wonder if parallel_leader_participation should be considered false for the > "Gather -> Insert -> ..." plans. > If what I said above is correct then this is moot. > > > @@ -208,7 +236,7 @@ ExecGather(PlanState *pstate) > } > > /* Run plan locally if no workers or enabled and not > single-copy. */ > - node->need_to_scan_locally = (node->nreaders == 0) > + node->need_to_scan_locally = (node->nworkers_launched <= 0) > || (!gather->single_copy && > parallel_leader_participation); > node->initialized = true; > } > > Is this change needed? The code just before this test indicates that nreaders > should be equal to nworkers_launched. > This change is required because we don't need to set up readers for parallel-insert unless there is a returning clause. See the below check a few lines before this change: - if (pcxt->nworkers_launched > 0) + if (pcxt->nworkers_launched > 0 && !(isParallelModifyLeader && !isParallelModifyWithReturning)) { I think this check could be simplified to if (pcxt->nworkers_launched > 0 && isParallelModifyWithReturning) or something like that. > > In grouping_planner(), this branch > > + /* Consider a supported parallel table-modification command */ > + if (IsModifySupportedInParallelMode(parse->commandType) && > + !inheritance_update && > + final_rel->consider_parallel && > + parse->rowMarks == NIL) > + { > > is very similar to creation of the non-parallel ModifyTablePaths - perhaps an > opportunity to move the common code into a new function. > +1. > > @@ -2401,6 +2494,13 @@ grouping_planner(PlannerInfo *root, bool > inheritance_update, > } > } > > + if (parallel_modify_partial_path_count > 0) > + { > + final_rel->rows = current_rel->rows;/* ??? why hasn't > this been > + >* set above somewhere */ > + generate_useful_gather_paths(root, final_rel, false); > + } > + > extra.limit_needed = limit_needed(parse); > extra.limit_tuples = limit_tuples; > extra.count_est = count_est; > > A boolean variable (e.g. have_parallel_modify_paths) would suffice, there's no > need to count the paths using parallel_modify_partial_path_count. > Sounds sensible. > > @@ -252,6 +252,7 @@ set_plan_references(PlannerInfo *root, Plan *plan) > PlannerGlobal *glob = root->glob; > int rtoffset = list_length(glob->finalrtable); > ListCell *lc; > + Plan *finalPlan; > > /* > * Add all the query's RTEs to the flattened rangetable. The live > ones > @@ -302,7 +303,17 @@ set_plan_references(PlannerInfo *root, Plan *plan) > } > > /* Now fix the Plan tree */ > - return set_plan_refs(root, plan, rtoffset); > + finalPlan = set_plan_refs(root, plan, rtoffset); > + if (finalPlan != NULL && IsA(finalPlan, Gather)) > + { > + Plan *subplan = outerPlan(finalPlan); > + > + if (IsA(subplan, ModifyTable) && castNo
Re: Improper use about DatumGetInt32
On 2020-12-25 08:45, Michael Paquier wrote: If we really think that we ought to differentiate, then we could do what pg_stat_statement does, and have a separate C function that's called with the obsolete signature (pg_stat_statements_1_8 et al). With the 1.8 flavor, it is possible to pass down a negative number and it may not fail depending on the number of blocks of the relation, so I think that you had better have a compatibility layer if a user has the new binaries but is still on 1.8. And that's surely a safe move. I think on 64-bit systems it's actually safe, but on 32-bit systems (with USE_FLOAT8_BYVAL), if you use the new binaries with the old SQL-level definitions, you'd get the int4 that is passed in interpreted as a pointer, which would lead to very bad things. So I think we need to create new functions with a different C symbol. I'll work on that.
Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
On Fri, Jan 8, 2021 at 1:50 PM japin wrote: > Thanks for updating the patch! > > + /* Get the data generating query. */ > + dataQuery = get_matview_query(stmt, &matviewRel, &matviewOid); > > - /* > -* Check for active uses of the relation in the current transaction, > such > -* as open scans. > -* > -* NB: We count on this to protect us against problems with > refreshing the > -* data using TABLE_INSERT_FROZEN. > -*/ > - CheckTableNotInUse(matviewRel, "REFRESH MATERIALIZED VIEW"); > + relowner = matviewRel->rd_rel->relowner; > > After apply the patch, there is a duplicate > > relowner = matviewRel->rd_rel->relowner; Corrected that. > + else if(matviewInfo) > + dest = > CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap); > > If the `matviewInfo->OIDNewHeap` is invalid, IMO we don't need create > DestReceiver, isn't it? And we should add a space after `if`. Yes, we can skip creating the dest receiver when OIDNewHeap is invalid, this can happen for plain explain refresh mat view case. if (explainInfo && !explainInfo->es->analyze) OIDNewHeap = InvalidOid; else OIDNewHeap = get_new_heap_oid(stmt, matviewRel, matviewOid, &relpersistence); Since we don't call ExecutorRun for plain explain, we can skip the dest receiver creation. I modified the code as below in explain.c. if (into) dest = CreateIntoRelDestReceiver(into); else if (matviewInfo && OidIsValid(matviewInfo->OIDNewHeap)) dest = CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap); else dest = None_Receiver; Thanks for taking a look at the patches. Attaching v3 patches, please consider these for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From f6f1760e6b460d3265ee343d5b1d53f82f45fee6 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 8 Jan 2021 14:04:26 +0530 Subject: [PATCH v3 1/2] Rearrange Refresh Mat View Code Currently, the function ExecRefreshMatView in matview.c is having many lines of code which is not at all good from readability and maintainability perspectives. This patch adds few functions and moves the code from ExecRefreshMatView to them making the code look better. --- src/backend/commands/matview.c | 452 - 1 file changed, 272 insertions(+), 180 deletions(-) diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index c5c25ce11d..06bd5629a8 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -64,7 +64,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self); static void transientrel_shutdown(DestReceiver *self); static void transientrel_destroy(DestReceiver *self); -static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query, +static uint64 refresh_matview_datafill(Oid OIDNewHeap, Query *query, const char *queryString); static char *make_temptable_name_n(char *tempname, int n); static void refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner, @@ -73,6 +73,16 @@ static void refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersist static bool is_usable_unique_index(Relation indexRel); static void OpenMatViewIncrementalMaintenance(void); static void CloseMatViewIncrementalMaintenance(void); +static Query *get_matview_query(RefreshMatViewStmt *stmt, Relation *rel, +Oid *objectId); +static Query *rewrite_refresh_matview_query(Query *dataQuery); +static Oid get_new_heap_oid(RefreshMatViewStmt *stmt, Relation matviewRel, + Oid matviewOid, char *relpersistence); +static void match_matview_with_new_data(RefreshMatViewStmt *stmt, + Relation matviewRel, Oid matviewOid, + Oid OIDNewHeap, Oid relowner, + int save_sec_context, + char relpersistence, uint64 processed); /* * SetMatViewPopulatedState @@ -140,127 +150,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, { Oid matviewOid; Relation matviewRel; - RewriteRule *rule; - List *actions; Query *dataQuery; - Oid tableSpace; - Oid relowner; Oid OIDNewHeap; - DestReceiver *dest; uint64 processed = 0; - bool concurrent; - LOCKMODE lockmode; + Oid relowner; char relpersistence; Oid save_userid; int save_sec_context; int save_nestlevel; ObjectAddress address; - /* Determine strength of lock needed. */ - concurrent = stmt->concurrent; - lockmode = concurrent ? ExclusiveLock : AccessExclusiveLock; - - /* - * Get a lock until end of transaction. - */ - matviewOid = RangeVarGetRelidExtended(stmt->relation, - lockmode, 0, - RangeVarCallbackOwnsTable, NULL); - matviewRel = table_open(matviewOid, NoLock); - - /* Make sure it is a materialized view.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow wrote: > > Posting an updated set of patches to address recent feedback: > > - Removed conditional-locking code used in parallel-safety checking > code (Tsunakawa-san feedback). It turns out that for the problem test > case, no parallel-safety checking should be occurring that locks > relations because those inserts are specifying VALUES, not an > underlying SELECT, so the parallel-safety checking code was updated to > bail out early if no underlying SELECT is specified for the INSERT. No > change to the test code was required. > - Added comment to better explain the reason for treating "INSERT ... > ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) > - Added assertion to heap_prepare_insert() (Amit) > - Updated error handling for NULL index_expr_item case (Vignesh) Thanks Greg for fixing and posting a new patch. Few comments: +-- Test INSERT with underlying query. +-- (should create plan with parallel SELECT, Gather parent node) +-- +explain(costs off) insert into para_insert_p1 select unique1, stringu1 from tenk1; + QUERY PLAN + + Insert on para_insert_p1 + -> Gather + Workers Planned: 4 + -> Parallel Seq Scan on tenk1 +(4 rows) + +insert into para_insert_p1 select unique1, stringu1 from tenk1; +select count(*), sum(unique1) from para_insert_p1; + count | sum +---+-- + 1 | 49995000 +(1 row) + For one of the test you can validate that the same transaction has been used by all the parallel workers, you could use something like below to validate: SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM para_insert_p1) as dt; Few includes are not required: #include "executor/nodeGather.h" +#include "executor/nodeModifyTable.h" #include "executor/nodeSubplan.h" #include "executor/tqueue.h" #include "miscadmin.h" @@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags) GatherState *gatherstate; Plan *outerNode; TupleDesc tupDesc; + Index varno; This include is not required in nodeModifyTable.c +#include "catalog/index.h" +#include "catalog/indexing.h" @@ -43,7 +49,11 @@ #include "parser/parse_agg.h" #include "parser/parse_coerce.h" #include "parser/parse_func.h" +#include "parser/parsetree.h" +#include "partitioning/partdesc.h" +#include "rewrite/rewriteHandler.h" #include "rewrite/rewriteManip.h" +#include "storage/lmgr.h" #include "tcop/tcopprot.h" The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c There are few typos: +table and populate it can use a parallel plan. Another exeption is the command +INSERT INTO ... SELECT ... which can use a parallel plan for +the underlying SELECT part of the query. exeption should be exception + /* + * For the number of workers to use for a parallel + * INSERT/UPDATE/DELETE, it seems resonable to use the same number + * of workers as estimated for the underlying query. + */ + parallelModifyWorkers = path->parallel_workers; resonable should be reasonable Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Single transaction in the tablesync worker?
On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie wrote: > > > PSA the v12 patch for the Tablesync Solution1. > > > > Differences from v11: > > + Added PG docs to mention the tablesync slot > > + Refactored tablesync slot drop (done by > > DropSubscription/process_syncing_tables_for_sync) > > + Fixed PG docs mentioning wrong state code > > + Fixed wrong code comment describing TCOPYDONE state > > > Hi > > I look into the new patch and have some comments. > > 1. > + /* Setup replication origin tracking. */ > ①+ originid = replorigin_by_name(originname, true); > + if (!OidIsValid(originid)) > + { > > ②+ originid = replorigin_by_name(originname, true); > + if (originid != InvalidRepOriginId) > + { > > There are two different style code which check whether originid is valid. > Both are fine, but do you think it’s better to have a same style here? Yes. I think the 1st style is better, so I used the OidIsValid for all the new code of the patch. But the check in DropSubscription is an exception; there I used 2nd style but ONLY to be consistent with another originid check which already existed in that same function. > > > 2. > * state to SYNCDONE. There might be zero changes applied > between > * CATCHUP and SYNCDONE, because the sync worker might be ahead > of the > * apply worker. > + *- The sync worker has a intermediary state TCOPYDONE which comes > after > + * DATASYNC and before SYNCWAIT. This state indicates that the > initial > > This comment about TCOPYDONE is better to be placed at [1]*, where is between > DATASYNC and SYNCWAIT. > > * - Tablesync worker starts; changes table state from INIT to > DATASYNC while > * copying. > [1]* > * - Tablesync worker finishes the copy and sets table state to > SYNCWAIT; > * waits for state change. > Agreed. I have moved the comment per your suggestion (and I also re-worded it again). Fixed in latest patch [v13] > 3. > + /* > +* To build a slot name for the sync work, we are limited to > NAMEDATALEN - > +* 1 characters. > +* > +* The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). > (It's > +* actually the NAMEDATALEN on the remote that matters, but this > scheme > +* will also work reasonably if that is different.) > +*/ > + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* > for sanity */ > + > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > The comments says syncslotname is limit to NAMEDATALEN - 1 characters. > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not > NAMEDATALEN - 1. > Should we change the comment here? > The comment wording is a remnant from older code which had a differently format slot name. I think the comment is still valid, albeit maybe unnecessary since in the current code the tablesync slot name length is fixed. But I left the older comment here as a safety reminder in case some future change would want to modify the slot name. What do you think? [v13] = https://www.postgresql.org/message-id/CAHut%2BPvby4zg6kM1RoGd_j-xs9OtPqZPPVhbiC53gCCRWdNSrw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
Re: Add session statistics to pg_stat_database
On Fri, 2021-01-08 at 12:00 +0900, Masahiro Ikeda wrote: > 2. monitoring.sgml > > > > IIUC, "active_time" includes the time executes a fast-path function > > > and > > > "idle in transaction" includes "idle in transaction(aborted)" time. > > > Why don't you reference pg_stat_activity's "state" column and > > > "active_time" is the total time when the state is "active" and "fast > > > path"? > > > "idle in transaction" is as same too. > > > > Good idea; I have expanded the documentation like that. > > BTW, is there any reason to merge the above statistics? > IIUC, to separate statistics' cons is that two columns increase, and > there is no performance penalty. So, I wonder that there is a way to > separate them > corresponding to the state column of pg_stat_activity. Sure, that could be done. I decided to do it like this because I thought that few people would be interested in "time spend doing fast-path function calls"; my guess was that the more interesting value is "time where the database was busy calculating results". I tried to keep the balance between providing reasonable detail while not creating more additional columns to "pg_stat_database" than necessary. This is of course a matter of taste, and it is good to hear different opinions. If more people share your opinion, I'll change the code. > There are some following codes in pgstatfuncs.c. > int64 result = 0.0; > > But, I think the following is better. > int64 result = 0; You are right. That was a silly copy-and-paste error. Fixed. > Although now pg_stat_get_db_session_time is initialize "result" to zero > when it is declared, > another pg_stat_XXX function didn't initialize. Is it better to change > it? I looked at other similar functions, and the ones I saw returned NULL if there were no data. In that case, it makes sense to write char *result; if ((result = get_stats_data()) == NULL) PG_RETURN_NULL(); PG_RETURN_TEXT_P(cstring_to_text(result)); But I want to return 0 for the session time if there are no data yet, so I think initializing the result to 0 in the declaration makes sense. There are some functions that do it like this: int32 result; result = 0; for (...) { if (...) result++; } PG_RETURN_INT32(result); Again, it is a matter of taste, and I didn't detect a clear pattern in the existing code that I feel I should follow in this question. Version 12 of the patch is attached. Yours, Laurenz Albe From 324847353f5d9e5b2899dd93d43fb345df1dcdb8 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Thu, 7 Jan 2021 16:33:45 +0100 Subject: [PATCH] Add session statistics to pg_stat_database If "track_counts" is active, track the following per database: - total number of connections - number of sessions that ended by loss of network connection, fatal errors and operator intervention - total time spent in database sessions - total time spent executing queries - total idle in transaction time This is useful to check if connection pooling is working. It also helps to estimate the size of the connection pool required to keep the database busy, which depends on the percentage of the transaction time that is spent idling. Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander (This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID) --- doc/src/sgml/monitoring.sgml | 77 +++ src/backend/catalog/system_views.sql | 7 ++ src/backend/postmaster/pgstat.c | 134 ++- src/backend/tcop/postgres.c | 11 ++- src/backend/utils/adt/pgstatfuncs.c | 94 +++ src/backend/utils/error/elog.c | 8 ++ src/include/catalog/pg_proc.dat | 32 +++ src/include/pgstat.h | 39 src/test/regress/expected/rules.out | 7 ++ 9 files changed, 404 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 43fe8ae383..59622173da 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3737,6 +3737,83 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i + + + session_time double precision + + + Time spent by database sessions in this database, in milliseconds + (note that statistics are only updated when the state of a session + changes, so if sessions have been idle for a long time, this idle time + won't be included) + + + + + + active_time double precision + + + Time spent executing SQL statements in this database, in milliseconds + (this corresponds to the states active and + fastpath function call in + + pg_stat_activity) + + + + + + idle_in_transact
Re: proposal - psql - use pager for \watch command
ne 19. 4. 2020 v 19:27 odesílatel Pavel Stehule napsal: > Hi, > > last week I finished pspg 3.0 https://github.com/okbob/pspg . pspg now > supports pipes, named pipes very well. Today the pspg can be used as pager > for output of \watch command. Sure, psql needs attached patch. > > I propose new psql environment variable PSQL_WATCH_PAGER. When this > variable is not empty, then \watch command starts specified pager, and > redirect output to related pipe. When pipe is closed - by pager, then > \watch cycle is leaved. > > If you want to test proposed feature, you need a pspg with > cb4114f98318344d162a84b895a3b7f8badec241 > commit. > > Then you can set your env > > export PSQL_WATCH_PAGER="pspg --stream" > psql > > SELECT * FROM pg_stat_database; > \watch 1 > > Comments, notes? > > Regards > > Pavel > rebase diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 303e7c3ad8..7ab5aed3fa 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -173,6 +173,7 @@ static char *pset_value_string(const char *param, printQueryOpt *popt); static void checkWin32Codepage(void); #endif +static bool sigpipe_received = false; /*-- @@ -4749,6 +4750,17 @@ do_shell(const char *command) return true; } +/* + * We want to detect sigpipe to break from watch cycle faster, + * before waiting 1 sec in sleep time, and unsuccess write to + * pipe. + */ +static void +pagerpipe_sigpipe_handler(int signum) +{ + sigpipe_received = true; +} + /* * do_watch -- handler for \watch * @@ -4763,6 +4775,8 @@ do_watch(PQExpBuffer query_buf, double sleep) const char *strftime_fmt; const char *user_title; char *title; + const char *pagerprog = NULL; + FILE *pagerpipe = NULL; int title_len; int res = 0; @@ -4772,6 +4786,21 @@ do_watch(PQExpBuffer query_buf, double sleep) return false; } + pagerprog = getenv("PSQL_WATCH_PAGER"); + if (pagerprog) + { + sigpipe_received = false; + +#ifndef WIN32 + pqsignal(SIGPIPE, pagerpipe_sigpipe_handler); +#endif + + pagerpipe = popen(pagerprog, "w"); + if (!pagerpipe) + /*silently proceed without pager */ + restore_sigpipe_trap(); + } + /* * Choose format for timestamps. We might eventually make this a \pset * option. In the meantime, using a variable for the format suppresses @@ -4783,7 +4812,8 @@ do_watch(PQExpBuffer query_buf, double sleep) * Set up rendering options, in particular, disable the pager, because * nobody wants to be prompted while watching the output of 'watch'. */ - myopt.topt.pager = 0; + if (!pagerpipe) + myopt.topt.pager = 0; /* * If there's a title in the user configuration, make sure we have room @@ -4817,7 +4847,7 @@ do_watch(PQExpBuffer query_buf, double sleep) myopt.title = title; /* Run the query and print out the results */ - res = PSQLexecWatch(query_buf->data, &myopt); + res = PSQLexecWatch(query_buf->data, &myopt, pagerpipe); /* * PSQLexecWatch handles the case where we can no longer repeat the @@ -4826,6 +4856,9 @@ do_watch(PQExpBuffer query_buf, double sleep) if (res <= 0) break; + if (pagerpipe && ferror(pagerpipe)) + break; + /* * Set up cancellation of 'watch' via SIGINT. We redo this each time * through the loop since it's conceivable something inside @@ -4843,14 +4876,27 @@ do_watch(PQExpBuffer query_buf, double sleep) i = sleep_ms; while (i > 0) { - long s = Min(i, 1000L); + long s = Min(i, pagerpipe ? 100L : 1000L); pg_usleep(s * 1000L); if (cancel_pressed) break; + + if (sigpipe_received) +break; + i -= s; } sigint_interrupt_enabled = false; + + if (sigpipe_received) + break; + } + + if (pagerpipe) + { + fclose(pagerpipe); + restore_sigpipe_trap(); } pg_free(title); diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 6f507104f4..08bb046462 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -592,12 +592,13 @@ PSQLexec(const char *query) * e.g., because of the interrupt, -1 on error. */ int -PSQLexecWatch(const char *query, const printQueryOpt *opt) +PSQLexecWatch(const char *query, const printQueryOpt *opt, FILE *printQueryFout) { PGresult *res; double elapsed_msec = 0; instr_time before; instr_time after; + FILE *fout; if (!pset.db) { @@ -638,14 +639,16 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt) return 0; } + fout = printQueryFout ? printQueryFout : pset.queryFout; + switch (PQresultStatus(res)) { case PGRES_TUPLES_OK: - printQuery(res, opt, pset.queryFout, false, pset.logfile); + printQuery(res, opt, fout, false, pset.logfile); break; case PGRES_COMMAND_OK: - fprintf(pset.queryFout, "%s\n%s\n\n", opt->title, PQcmdStatus(res)); + fprintf(fout, "%s\n%s\n\n", opt->title, PQcmdStatus(res)); break; case PGRES_EMPTY_QUERY: @@ -668,7 +671,7 @@ PSQLexecWatch(const char *query, const printQueryOpt *opt) PQclear(res); - fflush(pset.queryFout); + fflush(f
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote: > I expected there'd be some disagreement on this, but I do continue to > feel that it's sensible to enable checksums by default. +1 I think the problem here (apart from the original line of argumentation) is that there are two kinds of PostgreSQL installations: - installations done on dubious hardware with minimal tuning (the "cheap crowd") - installations done on good hardware, where people make an effort to properly configure the database (the "serious crowd") I am aware that this is an oversimplification for the sake of the argument. The voices against checksums on by default are probably thinking of the serious crowd. If checksums were enabled by default, the cheap crowd would benefit from the early warnings that something has gone wrong. The serious crowd are more likely to choose a non-default setting to avoid paying the price for a feature that they don't need. Yours, Laurenz Albe
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Jan 8, 2021 at 12:21 PM Antonin Houska wrote: > > Amit Kapila wrote: > > > > As an alternative, have you considered allocation of the XID even in > > > parallel > > > mode? I imagine that the first parallel worker that needs the XID for > > > insertions allocates it and shares it with the other workers as well as > > > with > > > the leader process. > > > > > > > As a matter of this patch > > (v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch), we > > never need to allocate xids by workers because Insert is always > > performed by leader backend. > > When writing this comment, I was actually thinking of > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch rather > than v11-0001, see below. On the other hand, if we allowed XID allocation in > the parallel mode (as a separate patch), even the 0001 patch would get a bit > simpler. > > > Even, if we want to do what you are suggesting it would be tricky because > > currently, we don't have such an infrastructure where we can pass > > information among workers. > > How about barriers (storage/ipc/barrier.c)? What I imagine is that all the > workers "meet" at the barrier when they want to insert the first tuple. Then > one of them allocates the XID, makes it available to others (via shared > memory) and all the workers can continue. > Even if want to do this I am not sure if we need barriers because there is no intrinsic need for all workers to stop before allocating XID. After allocation of XID, we just need some way for other workers to use it, maybe something like all workers currently synchronizes for getting the block number to process in parallel sequence scans. But the question is it really worth because in many cases it would be already allocated by the time parallel DML operation is started and we do share it in the beginning? I think if we really want to allow allocation of xid in parallel-mode then we should also think to allow it for subtransactions xid not only for main transactions and that will open up some other use cases. I feel it is better to tackle that problem separately. -- With Regards, Amit Kapila.
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Hi, On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: > On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote: > > I expected there'd be some disagreement on this, but I do continue to > > feel that it's sensible to enable checksums by default. > > +1 I don't disagree with this in principle, but if you want that you need to work on making checksum overhead far smaller. That's doable. Afterwards it makes sense to have this discussion. > I think the problem here (apart from the original line of argumentation) > is that there are two kinds of PostgreSQL installations: > > - installations done on dubious hardware with minimal tuning > (the "cheap crowd") > > - installations done on good hardware, where people make an effort to > properly configure the database (the "serious crowd") > > I am aware that this is an oversimplification for the sake of the argument. > > The voices against checksums on by default are probably thinking of > the serious crowd. > > If checksums were enabled by default, the cheap crowd would benefit > from the early warnings that something has gone wrong. > > The serious crowd are more likely to choose a non-default setting > to avoid paying the price for a feature that they don't need. I don't really buy this argument. That way we're going to have an ever growing set of things that need to be tuned to have a database that's usable in an even halfway busy setup. That's unavoidable in some cases, but it's a significant cost across use cases. Increasing the overhead in the default config from one version to the next isn't great - it makes people more hesitant to upgrade. It's also not a cost you're going to find all that quickly, and it's a really hard to pin down cost. Andres
Re: Disable WAL logging to speed up data loading
On Fri, Jan 8, 2021 at 2:12 PM tsunakawa.ta...@fujitsu.com wrote: > > From: Robert Haas > > Were the issues that I mentioned regarding GIST (and maybe other AMs) > > in the last paragraph of > > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW > > dmrev...@mail.gmail.com > > addressed in some way? That seems like a pretty hard engineering > > problem to me, and I don't see that there's been any discussion of it. > > Those are correctness concerns separate from any wal_level tracking we > > might want to do to avoid accidental mistakes. > > Thank you very much for reminding me of this. I forgot I replied as follows: > > > -- > Unlogged GiST indexes use fake LSNs that are instance-wide. Unlogged > temporary GiST indexes use backend-local sequence values. Other unlogged and > temporary relations don't set LSNs on pages. So, I think it's enough to call > GetFakeLSNForUnloggedRel() when wal_level = none as well. > -- > > > But this is not correct. We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) > WAL records to be emitted (by tweaking the filter in XLogInsert()), just like > those WAL records are emitted when wal_level = minimal and and other WAL > records are not emitted. I think it's better to have index AM (and perhaps table AM) control it instead of filtering in XLogInsert(). Because otherwise third-party access methods that use LSN like gist indexes cannot write WAL records at all when wal_level = none even if they want. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: Added schema level support for publication.
On Thu, Jan 7, 2021 at 10:03 PM vignesh C wrote: > > This feature adds schema option while creating publication. Users will > be able to specify one or more schemas while creating publication, > when the user specifies schema option, then the data changes for the > tables present in the schema specified by the user will be replicated > to the subscriber. Few examples have been listed below: > > Create a publication that publishes all changes for all the tables > present in production schema: > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production; > > Create a publication that publishes all changes for all the tables > present in marketing and sales schemas: > CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales; > > Add some schemas to the publication: > ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june; > > Drop some schema from the publication: > ALTER PUBLICATION production_quarterly_publication DROP SCHEMA > production_july; > > Attached is a POC patch for the same. I felt this feature would be quite > useful. > What do we do if the user Drops the schema? Do we automatically remove it from the publication? I see some use of such a feature but you haven't described the use case or how did you arrive at the conclusion that it would be quite useful? -- With Regards, Amit Kapila.
Re: plpgsql variable assignment with union is broken
On Thu, Jan 7, 2021 at 10:55 AM Pavel Stehule wrote: >> Again, if this is narrowly confined to assignment into set query >> operations, maybe this is not so bad. But is it? > > PLpgSQL_Expr: opt_target_list > <--><--><-->from_clause where_clause > <--><--><-->group_clause having_clause window_clause > <--><--><-->opt_sort_clause opt_select_limit opt_for_locking_clause > <--><--><--><-->{ > > So SELECT INTO and assignment are not fully compatible now. OK. Well, I went ahead and checked the code base for any suspicious assignments that might fall out of the tightened syntax. It was cursory, but didn't turn up anything obvious. So I'm going to change my position on this. My feedback would be to take backwards compatibility very seriously relating to language changes in the future, and to rely less on documentation arguments as it relates to change justification. The current behavior has been in place for decades and is therefore a de facto standard. Change freedom ought to be asserted in scenarios where behavior is reserved as undefined or is non standard rather than assumed. Rereading the development thread, it seems a fairly short timeframe between idea origination and commit, and hypothetical impacts to existing code bases was not rigorously assessed. I guess it's possible this may ultimately cause some heartburn but it's hard to say without strong data to justify that position. Having said all of that, I'm very excited about the array assignment improvements and investment in this space is very much appreciated. . merlin
Re: Global snapshots
On 2021/01/01 12:14, tsunakawa.ta...@fujitsu.com wrote: Hello, Fujii-san and I discussed how to move the scale-out development forward. We are both worried that Clock-SI is (highly?) likely to infringe the said Microsoft's patent. So we agreed we are going to investigate the Clock-SI and the patent, and if we have to conclude that we cannot embrace Clock-SI, we will explore other possibilities. Yes. IMO, it seems that Clock-SI overlaps with the patent and we can't use it. First, looking back how to interpret the patent document, patent "claims" are what we should pay our greatest attention. According to the following citation from the IP guide by Software Freedom Law Center (SFLC) [1], software infringes a patent if it implements everything of any claim, not all claims. -- 4.2 Patent Infringement To prove that you5 infringe a patent, the patent holder must show that you make, use, offer to sell, or sell the invention as it is defined in at least one claim of the patent. For software to infringe a patent, the software essentially must implement everything recited in one of the patent�fs claims. It is crucial to recognize that infringement is based directly on the claims of the patent, and not on what is stated or described in other parts of the patent document. -- And, Clock-SI implements at least claims 11 and 20 cited below. It doesn't matter whether Clock-SI uses a physical clock or logical one. Thanks for sharing the result of your investigation! Regarding at least claim 11, I reached the same conclusion. As far as I understand correctly, Clock-SI actually does the method described at the claim 11 when determing the commit time and doing the commit on each node. I don't intend to offend Clock-SI and any activities based on that. OTOH, I'm now wondering if it's worth considering another approach for global transaction support, while I'm still interested in Clock-SI technically. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Add session statistics to pg_stat_database
On 2021-01-08 18:34, Laurenz Albe wrote: On Fri, 2021-01-08 at 12:00 +0900, Masahiro Ikeda wrote: 2. monitoring.sgml > > IIUC, "active_time" includes the time executes a fast-path function > > and > > "idle in transaction" includes "idle in transaction(aborted)" time. > > Why don't you reference pg_stat_activity's "state" column and > > "active_time" is the total time when the state is "active" and "fast > > path"? > > "idle in transaction" is as same too. > > Good idea; I have expanded the documentation like that. BTW, is there any reason to merge the above statistics? IIUC, to separate statistics' cons is that two columns increase, and there is no performance penalty. So, I wonder that there is a way to separate them corresponding to the state column of pg_stat_activity. Sure, that could be done. I decided to do it like this because I thought that few people would be interested in "time spend doing fast-path function calls"; my guess was that the more interesting value is "time where the database was busy calculating results". I tried to keep the balance between providing reasonable detail while not creating more additional columns to "pg_stat_database" than necessary. This is of course a matter of taste, and it is good to hear different opinions. If more people share your opinion, I'll change the code. OK, I understood. I don't have any strong opinions to add them. There are some following codes in pgstatfuncs.c. int64 result = 0.0; But, I think the following is better. int64 result = 0; You are right. That was a silly copy-and-paste error. Fixed. Thanks. Although now pg_stat_get_db_session_time is initialize "result" to zero when it is declared, another pg_stat_XXX function didn't initialize. Is it better to change it? I looked at other similar functions, and the ones I saw returned NULL if there were no data. In that case, it makes sense to write char *result; if ((result = get_stats_data()) == NULL) PG_RETURN_NULL(); PG_RETURN_TEXT_P(cstring_to_text(result)); But I want to return 0 for the session time if there are no data yet, so I think initializing the result to 0 in the declaration makes sense. There are some functions that do it like this: int32 result; result = 0; for (...) { if (...) result++; } PG_RETURN_INT32(result); Again, it is a matter of taste, and I didn't detect a clear pattern in the existing code that I feel I should follow in this question. Thanks, I understood. I checked my comments are fixed. This patch looks good to me for monitoring session statistics. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Re: [PATCH] Simple progress reporting for COPY command
On Thu, 7 Jan 2021 at 23:00, Josef Šimánek wrote: > > čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra > napsal: > > > > I'm not particularly attached to the "lines" naming, it just seemed OK > > to me. So if there's consensus to rename this somehow, I'm OK with it. > > The problem I do see here is it depends on the "way" of COPY. If > you're copying from CSV file to table, those are actually lines (since > 1 line = 1 tuple). But copying from DB to file is copying tuples (but > 1 tuple = 1 file line). Line works better here for me personally. > > Once I'll fix the problem with triggers (and also another cases if > found), I think we can consider it lines. It will represent amount of > lines processed from file on COPY FROM and amount of lines written to > file in COPY TO form (at least in CSV format). I'm not sure how BINARY > format works, I'll check. Counterexample that 1 tuple need not be 1 line, in csv/binary: /* * create a table with one tuple containing 1 text field, which consists of * 10 newline characters. * If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n). */ # CREATE TABLE ttab (val) AS SELECT * FROM (values ( repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text )) as v; # -- indeed, one unix-style line, according to $ wc -l copy.txt # COPY ttab TO 'copy.txt' (format text); COPY 1 # TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text); COPY 1 # -- 11 lines # COPY ttab TO 'copy.csv' (format csv); COPY 1 # TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv); COPY 1 # -- 13 lines # COPY ttab TO 'copy.bin' (format binary); COPY 1 # TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary); COPY 1 All of the above copy statements would only report 'lines_processed = 1', in the progress reporting, while csv/binary line counts are definatively inconsistent with what the progress reporting shows, because progress reporting counts tuples / table rows, not the amount of lines in the external file.
Re: WIP: System Versioned Temporal Table
On 1/8/21 7:33 AM, Simon Riggs wrote: > > * What happens if you ask for a future time? > It will give an inconsistent result as it scans, so we should refuse a > query for time > current_timestamp. That seems like a significant limitation. Can we fix it instead of refusing the query? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On 1/8/21 5:03 AM, Andres Freund wrote: On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: The serious crowd are more likely to choose a non-default setting to avoid paying the price for a feature that they don't need. I don't really buy this argument. That way we're going to have an ever growing set of things that need to be tuned to have a database that's usable in an even halfway busy setup. That's unavoidable in some cases, but it's a significant cost across use cases. Increasing the overhead in the default config from one version to the next isn't great - it makes people more hesitant to upgrade. It's also not a cost you're going to find all that quickly, and it's a really hard to pin down cost. I'm +1 for enabling checksums by default, even with the performance penalties. As far as people upgrading, one advantage is existing pg_upgrade'd databases would not be affected. Only newly init'd clusters would get this setting. Regards, -- -David da...@pgmasters.net
Re: Proposal: Global Index
On Fri, Jan 8, 2021 at 11:26:48AM +0800, 曾文旌 wrote: > > On Thu, Jan 7, 2021 at 05:44:01PM +0800, 曾文旌 wrote: > >> I've been following this topic for a long time. It's been a year since the > >> last response. > >> It was clear that our customers wanted this feature as well, and a large > >> number of them mentioned it. > >> > >> So, I wish the whole feature to mature as soon as possible. > >> I summarized the scheme mentioned in the email and completed the POC > >> patch(base on PG_13). > > > > I think you need to address the items mentioned in this blog, and the > > email link it mentions: > > > > https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020 > > Thank you for your reply. > I read your blog and it helped me a lot. > > The blog mentions a specific problem: "A large global index might also > reintroduce problems that prompted the creation of partitioning in the first > place. " > I don't quite understand, could you give some specific information? Well, if you created partitions, you probably did so because: 1. heap files are smaller, allowing for more targeted sequential scans 2. smaller indexes 3. the ability to easily drop tables/indexes that are too old If you have global indexes, #1 is the same, but #2 is not longer true, and for #3, you can drop the heap but the index entries still exist in the global index and must be removed. So, if you created partitions for one of the three reasons, once you have global indexes, some of those advantage of partitioning are no longer true. I am sure there are some workloads where the advantages of partitioning, minus the advantages lost when using global indexes, are useful, but are there enough of them to make the feature useful? I don't know. > In addition you mentioned: "It is still unclear if these use-cases justify > the architectural changes needed to enable global indexes." > Please also describe the problems you see, I will confirm each specific issue > one by one. Well, the email thread I linked to has a lot of them, but the fundamental issue is that you have to break the logic that a single index serves a single heap file. Considering what I said above, is there enough usefulness to warrant such an architectural change? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Improper use about DatumGetInt32
On 2021-01-08 10:21, Peter Eisentraut wrote: I think on 64-bit systems it's actually safe, but on 32-bit systems (with USE_FLOAT8_BYVAL), if you use the new binaries with the old SQL-level definitions, you'd get the int4 that is passed in interpreted as a pointer, which would lead to very bad things. So I think we need to create new functions with a different C symbol. I'll work on that. Updated patch that does that. >From d0b802478ef2f5fc4200175c56b1f9b2f712e9ed Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 8 Jan 2021 16:49:07 +0100 Subject: [PATCH v3] pageinspect: Change block number arguments to bigint Block numbers are 32-bit unsigned integers. Therefore, the smallest SQL integer type that they can fit in is bigint. However, in the pageinspect module, most input and output parameters dealing with block numbers were declared as int. The behavior with block numbers larger than a signed 32-bit integer was therefore dubious. Change these arguments to type bigint and add some more explicit error checking on the block range. (Other contrib modules appear to do this correctly already.) Since we are changing argument types of existing functions, in order to not misbehave if the binary is updated before the extension is updated, we need to create new C symbols for the entry points, similar to how it's done in other extensions as well. Reported-by: Ashutosh Bapat Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local --- contrib/pageinspect/Makefile | 3 +- contrib/pageinspect/brinfuncs.c | 13 ++- contrib/pageinspect/btreefuncs.c | 84 ++ contrib/pageinspect/hashfuncs.c | 11 ++- contrib/pageinspect/pageinspect--1.8--1.9.sql | 81 + contrib/pageinspect/pageinspect.control | 2 +- contrib/pageinspect/pageinspect.h | 9 ++ contrib/pageinspect/rawpage.c | 87 ++- doc/src/sgml/pageinspect.sgml | 12 +-- 9 files changed, 270 insertions(+), 32 deletions(-) create mode 100644 contrib/pageinspect/pageinspect--1.8--1.9.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index d9d8177116..3fcbfea293 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -12,7 +12,8 @@ OBJS = \ rawpage.o EXTENSION = pageinspect -DATA = pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \ +DATA = pageinspect--1.8--1.9.sql \ + pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \ pageinspect--1.5.sql pageinspect--1.5--1.6.sql \ pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c index e872f65f01..0e3c2deb66 100644 --- a/contrib/pageinspect/brinfuncs.c +++ b/contrib/pageinspect/brinfuncs.c @@ -252,7 +252,18 @@ brin_page_items(PG_FUNCTION_ARGS) int att = attno - 1; values[0] = UInt16GetDatum(offset); - values[1] = UInt32GetDatum(dtup->bt_blkno); + switch (TupleDescAttr(tupdesc, 1)->atttypid) + { + case INT8OID: + values[1] = Int64GetDatum((int64) dtup->bt_blkno); + break; + case INT4OID: + /* support for old extension version */ + values[1] = UInt32GetDatum(dtup->bt_blkno); + break; + default: + elog(ERROR, "incorrect output types"); + } values[2] = UInt16GetDatum(attno); values[3] = BoolGetDatum(dtup->bt_columns[att].bv_allnulls); values[4] = BoolGetDatum(dtup->bt_columns[att].bv_hasnulls); diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c index 445605db58..a123955aae 100644 --- a/contrib/pageinspect/btreefuncs.c +++ b/contrib/pageinspect/btreefuncs.c @@ -41,8 +41,10 @@ #include "utils/varlena.h" PG_FUNCTION_INFO_V1(bt_metap); +PG_FUNCTION_INFO_V1(bt_page_items_1_9); PG_FUNCTION_INFO_V1(bt_page_items); PG_FUNCTION_INFO_V1(bt_page_items_bytea); +PG_FUNCTION_INFO_V1(bt_page_stats_1_9); PG_FUNCTION_INFO_V1(bt_page_stats); #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX) @@ -160,11 +162,11 @@ GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat) * Usage: SELECT * FROM bt_page_stats('t1_pkey', 1); * --- */ -Datum -bt_page_stats(PG_FUNCTION_ARGS) +static Datum +bt_page_stats_inter
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Fri, Jan 8, 2021 at 10:53:35AM +0100, Laurenz Albe wrote: > On Thu, 2021-01-07 at 16:14 -0500, Stephen Frost wrote: > > I expected there'd be some disagreement on this, but I do continue to > > feel that it's sensible to enable checksums by default. > > +1 > > I think the problem here (apart from the original line of argumentation) > is that there are two kinds of PostgreSQL installations: > > - installations done on dubious hardware with minimal tuning > (the "cheap crowd") > > - installations done on good hardware, where people make an effort to > properly configure the database (the "serious crowd") > > I am aware that this is an oversimplification for the sake of the argument. > > The voices against checksums on by default are probably thinking of > the serious crowd. > > If checksums were enabled by default, the cheap crowd would benefit > from the early warnings that something has gone wrong. > > The serious crowd are more likely to choose a non-default setting > to avoid paying the price for a feature that they don't need. I think you have captured the major issue here --- it explains a lot. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Fri, Jan 8, 2021 at 09:46:58AM -0500, David Steele wrote: > On 1/8/21 5:03 AM, Andres Freund wrote: > > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: > > > > > > The serious crowd are more likely to choose a non-default setting > > > to avoid paying the price for a feature that they don't need. > > > > I don't really buy this argument. That way we're going to have an ever > > growing set of things that need to be tuned to have a database that's > > usable in an even halfway busy setup. That's unavoidable in some cases, but > > it's a significant cost across use cases. > > > > Increasing the overhead in the default config from one version to the next > > isn't great - it makes people more hesitant to upgrade. It's also not a > > cost you're going to find all that quickly, and it's a really hard to pin > > down cost. > > I'm +1 for enabling checksums by default, even with the performance > penalties. > > As far as people upgrading, one advantage is existing pg_upgrade'd databases > would not be affected. Only newly init'd clusters would get this setting. I think once we have better online enabling of checksums people can more easily test the overhead on their workloads. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Fri, Jan 8, 2021 at 4:57 PM Bruce Momjian wrote: > > On Fri, Jan 8, 2021 at 09:46:58AM -0500, David Steele wrote: > > On 1/8/21 5:03 AM, Andres Freund wrote: > > > On Fri, Jan 8, 2021, at 01:53, Laurenz Albe wrote: > > > > > > > > The serious crowd are more likely to choose a non-default setting > > > > to avoid paying the price for a feature that they don't need. > > > > > > I don't really buy this argument. That way we're going to have an ever > > > growing set of things that need to be tuned to have a database that's > > > usable in an even halfway busy setup. That's unavoidable in some cases, > > > but it's a significant cost across use cases. > > > > > > Increasing the overhead in the default config from one version to the > > > next isn't great - it makes people more hesitant to upgrade. It's also > > > not a cost you're going to find all that quickly, and it's a really hard > > > to pin down cost. > > > > I'm +1 for enabling checksums by default, even with the performance > > penalties. > > > > As far as people upgrading, one advantage is existing pg_upgrade'd databases > > would not be affected. Only newly init'd clusters would get this setting. > > I think once we have better online enabling of checksums people can more > easily test the overhead on their workloads. Yeah, definitely. If they have equivalent hardware they can easily do it now -- create a replica, turn off checksums on replica, compare. That is, assuming we turn them on by default :) But being able to turn them both on and off without a large downtime is obviously going to make experimentation a lot more reasonable. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
On Fri, Jan 8, 2021 at 05:06:24PM +0100, Magnus Hagander wrote: > On Fri, Jan 8, 2021 at 4:57 PM Bruce Momjian wrote: > > I think once we have better online enabling of checksums people can more > > easily test the overhead on their workloads. > > Yeah, definitely. > > If they have equivalent hardware they can easily do it now -- create a > replica, turn off checksums on replica, compare. That is, assuming we > turn them on by default :) But being able to turn them both on and off > without a large downtime is obviously going to make experimentation a > lot more reasonable. Can someone compute overhead on a real workload for us now? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
On Fri, 08 Jan 2021 at 17:24, Bharath Rupireddy wrote: > On Fri, Jan 8, 2021 at 1:50 PM japin wrote: >> Thanks for updating the patch! >> >> + /* Get the data generating query. */ >> + dataQuery = get_matview_query(stmt, &matviewRel, &matviewOid); >> >> - /* >> -* Check for active uses of the relation in the current transaction, >> such >> -* as open scans. >> -* >> -* NB: We count on this to protect us against problems with >> refreshing the >> -* data using TABLE_INSERT_FROZEN. >> -*/ >> - CheckTableNotInUse(matviewRel, "REFRESH MATERIALIZED VIEW"); >> + relowner = matviewRel->rd_rel->relowner; >> >> After apply the patch, there is a duplicate >> >> relowner = matviewRel->rd_rel->relowner; > > Corrected that. > >> + else if(matviewInfo) >> + dest = >> CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap); >> >> If the `matviewInfo->OIDNewHeap` is invalid, IMO we don't need create >> DestReceiver, isn't it? And we should add a space after `if`. > > Yes, we can skip creating the dest receiver when OIDNewHeap is > invalid, this can happen for plain explain refresh mat view case. > > if (explainInfo && !explainInfo->es->analyze) > OIDNewHeap = InvalidOid; > else > OIDNewHeap = get_new_heap_oid(stmt, matviewRel, matviewOid, > &relpersistence); > > Since we don't call ExecutorRun for plain explain, we can skip the > dest receiver creation. I modified the code as below in explain.c. > > if (into) > dest = CreateIntoRelDestReceiver(into); > else if (matviewInfo && OidIsValid(matviewInfo->OIDNewHeap)) > dest = CreateTransientRelDestReceiver(matviewInfo->OIDNewHeap); > else > dest = None_Receiver; > > Thanks for taking a look at the patches. > Thanks! > Attaching v3 patches, please consider these for further review. > I find that both the declaration and definition of match_matview_with_new_data() have a tab between type and variable. We can use pgindent to fix it. What do you think? static void match_matview_with_new_data(RefreshMatViewStmt *stmt, Relation matviewRel, ^ Oid matviewOid, Oid OIDNewHeap, Oid relowner, int save_sec_context, char relpersistence, uint64 processed) ^ -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: WIP: System Versioned Temporal Table
On Fri, Jan 8, 2021 at 5:34 AM Simon Riggs wrote: > On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs > wrote: > > > > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs > wrote: > > > > > I've minimally rebased the patch to current head so that it compiles > > > and passes current make check. > > > > Full version attached > > New version attached with improved error messages, some additional > docs and a review of tests. > > The v10 patch fails to make on the current master branch (15b824da). Error: make[2]: Entering directory '/var/lib/postgresql/git/postgresql/src/backend/parser' '/usr/bin/perl' ./check_keywords.pl gram.y ../../../src/include/parser/kwlist.h /usr/bin/bison -Wno-deprecated -d -o gram.c gram.y gram.y:3685.55-56: error: $4 of ‘ColConstraintElem’ has no declared type n->contype = ($4)->contype; ^^ gram.y:3687.56-57: error: $4 of ‘ColConstraintElem’ has no declared type n->raw_expr = ($4)->raw_expr; ^^ gram.y:3734.41-42: error: $$ of ‘generated_type’ has no declared type $$ = n; ^^ gram.y:3741.41-42: error: $$ of ‘generated_type’ has no declared type $$ = n; ^^ gram.y:3748.41-42: error: $$ of ‘generated_type’ has no declared type $$ = n; ^^ ../../../src/Makefile.global:750: recipe for target 'gram.c' failed make[2]: *** [gram.c] Error 1 make[2]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend/parser' Makefile:137: recipe for target 'parser/gram.h' failed make[1]: *** [parser/gram.h] Error 2 make[1]: Leaving directory '/var/lib/postgresql/git/postgresql/src/backend' src/Makefile.global:389: recipe for target 'submake-generated-headers' failed make: *** [submake-generated-headers] Error 2 * UPDATE doesn't set EndTime correctly, so err... the patch doesn't > work on this aspect. > Everything else does actually work, AFAICS, so we "just" need a way to > update the END_TIME column in place... > So input from other Hackers/Committers needed on this point to see > what is acceptable. > > * Anomalies around use of CURRENT_TIMESTAMP are not discussed or resolved > > * No discussion, comments or tests around freezing and whether that > causes issues here > > * What happens if you ask for a future time? > It will give an inconsistent result as it scans, so we should refuse a > query for time > current_timestamp. * ALTER TABLE needs some work, it's a bit klugey at the moment and > needs extra tests. > Should refuse DROP COLUMN on a system time column > > * Do StartTime and EndTime show in SELECT *? Currently, yes. Would > guess we wouldn't want them to, not sure what standard says. > > I prefer to have them hidden by default. This was mentioned up-thread with no decision, it seems the standard is ambiguous. MS SQL appears to have flip-flopped on this decision [1]. > SELECT * shows the timestamp columns, don't we need to hide the period > > timestamp columns from this query? > > > > > The sql standard didn't dictate hiding the column but i agree hiding it by > default is good thing because this columns are used by the system > to classified the data and not needed in user side frequently. I can > change to that if we have consensus > * The syntax changes in gram.y probably need some coralling > > Overall, it's a pretty good patch and worth working on more. I will > consider a recommendation on what to do with this. > > -- > Simon Riggshttp://www.EnterpriseDB.com/ I am increasingly interested in this feature and have heard others asking for this type of functionality. I'll do my best to continue reviewing and testing. Thanks, Ryan Lambert [1] https://bornsql.ca/blog/temporal-tables-hidden-columns/
Re: proposal: schema variables
On 2021-01-08 07:20, Pavel Stehule wrote: Hi just rebase [schema-variables-20200108.patch] Hey Pavel, My gcc 8.3.0 compile says: (on debian 10/Buster) utility.c: In function ‘CreateCommandTag’: utility.c:2332:8: warning: this statement may fall through [-Wimplicit-fallthrough=] tag = CMDTAG_SELECT; ^~~ utility.c:2334:3: note: here case T_LetStmt: ^~~~ compile, check, check-world, runs without further problem. I also changed a few typos/improvements in the documentation, see attached. One thing I wasn't sure of: I have assumed that ON TRANSACTIONAL END RESET should be ON TRANSACTION END RESET and changed it accordingly, please double-check. Erik Rijkers --- doc/src/sgml/ref/create_variable.sgml.orig 2021-01-08 17:40:20.181823036 +0100 +++ doc/src/sgml/ref/create_variable.sgml 2021-01-08 17:59:46.976127524 +0100 @@ -16,7 +16,7 @@ CREATE VARIABLE - define a new permissioned typed schema variable + define a schema variable @@ -29,24 +29,24 @@ Description - The CREATE VARIABLE command creates a new schema variable. + The CREATE VARIABLE command creates a schema variable. Schema variables, like relations, exist within a schema and their access is controlled via GRANT and REVOKE commands. - Their changes are non-transactional by default. + Changing a schema variable is non-transactional by default. The value of a schema variable is local to the current session. Retrieving a variable's value returns either a NULL or a default value, unless its value is set to something else in the current session with a LET command. By default, - the content of variable is not transactional, alike regular variables in PL languages. + the content of a variable is not transactional. This is the same as in regular variables in PL languages. - Schema variables are retrieved by the regular SELECT SQL command. - Their value can be set with the LET SQL command. - Notably, while schema variables share properties with tables, they cannot be updated - with UPDATE commands. + Schema variables are retrieved by the SELECT SQL command. + Their value is set with the LET SQL command. + While schema variables share properties with tables, their value cannot be updated + with an UPDATE command. @@ -76,7 +76,7 @@ name - The name (optionally schema-qualified) of the variable to create. + The name, optionally schema-qualified, of the variable. @@ -85,7 +85,7 @@ data_type - The name (optionally schema-qualified) of the data type of the variable to be created. + The name, optionally schema-qualified, of the data type of the variable. @@ -105,7 +105,7 @@ NOT NULL - The NOT NULL clause forbid to set the variable to + The NOT NULL clause forbids to set the variable to a null value. A variable created as NOT NULL and without an explicitly declared default value cannot be read until it is initialized by a LET command. This obliges the user to explicitly initialize the variable @@ -118,22 +118,22 @@ DEFAULT default_expr - The DEFAULT clause assigns a default data to - schema variable. + The DEFAULT clause can be used to assign a default value to + a schema variable. -ON COMMIT DROP, ON TRANSACTIONAL END RESET +ON COMMIT DROP, ON TRANSACTION END RESET The ON COMMIT DROP clause specifies the behaviour - of temporary schema variable at transaction commit. With this clause the + of a temporary schema variable at transaction commit. With this clause the      variable is dropped at commit time. The clause is only allowed -      for temporary variables. The ON TRANSACTIONAL END RESET +      for temporary variables. The ON TRANSACTION END RESET clause enforces the variable to be reset to its default value when - the transaction is either commited or rolled back. + the transaction is committed or rolled back. @@ -145,7 +145,7 @@ Notes - Use DROP VARIABLE command to remove a variable. + Use the DROP VARIABLE command to remove a variable. --- doc/src/sgml/ref/discard.sgml.orig 2021-01-08 18:02:25.837531779 +0100 +++ doc/src/sgml/ref/discard.sgml 2021-01-08 18:40:09.973630164 +0100 @@ -104,6 +104,7 @@ DISCARD PLANS; DISCARD TEMP; DISCARD SEQUENCES; +DISCARD VARIABLES; --- doc/src/sgml/ref/drop_variable.sgml.orig 2021-01-08 18:05:28.643147771 +0100 +++ doc/src/sgml/ref/drop_variable.sgml 2021-01-08 18:07:17.876113523 +0100 @@ -16,7 +16,7 @@ DROP VARIABLE - removes a schema variable + remove a schema variable @@ -52,7 +52,7 @@ name - The name (optionally schema-qualified) of a schema variable. + The name, optionally schema-qualified, of a schema v
Re: pgbench: option delaying queries till connections establishment?
On Sun, Jan 3, 2021 at 9:49 AM Fabien COELHO wrote: > > Just for fun, the attached 0002 patch is a quick prototype of a > > replacement for that stuff that seems to work OK on a Mac here. (I'm > > not sure if the Windows part makes sense or works.) > > Thanks! That will definitely help because I do not have a Mac. I'll do > some cleanup. I think the main things to clean up are: 1. pthread_barrier_init() should check for errors from pthread_cond_init() and pthread_mutex_init(), and return -1. 2. pthread_barrier_destroy() should call pthread_cond_destroy() and pthread_mutex_destroy(). 3 . Decide if it's sane for the Windows-based emulation to be in here too, or if it should stay in pgbench.c. Or alternatively, if we're emulating pthread stuff on Windows, why not also put the other pthread emulation stuff from pgbench.c into a "ports" file; that seems premature and overkill for your project. I dunno. 4. cfbot shows that it's not building on Windows because HAVE_PTHREAD_BARRIER_WAIT is missing from Solution.pm. As far as I know, it's OK and common practice to ignore the return code from eg pthread_mutex_lock() and pthread_cond_wait(), with rationale along the lines that there'd have to be a programming error for them to fail in simple cases. Unfortunately, cfbot can only tell us that it's building OK on a Mac, but doesn't actually run the pgbench code to reach this stuff. It's not running check-world on that platform yet for the following asinine reason: connection to database failed: Unix-domain socket path "/private/var/folders/3y/l0z1x3693dl_8n0qybp4dqwhgn/T/cirrus-ci-build/src/bin/pg_upgrade/.s.PGSQL.58080" is too long (maximum 103 bytes)
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On 2020-Dec-01, Alvaro Herrera wrote: > On 2020-Nov-30, Justin Pryzby wrote: > Thanks for all the comments. I'll incorporate everything and submit an > updated version later. Here's a rebased version 5, with the typos fixed. More comments below. > > The attname "detached" is a stretch of what's intuitive (it's more like > > "detachING" or half-detached). But I think psql should for sure show > > something > > more obvious to users. Esp. seeing as psql output isn't documented. Let's > > figure out what to show to users and then maybe rename the column that, too. > > OK. I agree that "being detached" is the state we want users to see, or > maybe "detach pending", or "unfinisheddetach" (ugh). I'm not sure that > pg_inherits.inhbeingdetached" is a great column name. Opinions welcome. I haven't changed this yet; I can't make up my mind about what I like best. Partition of: parent FOR VALUES IN (1) UNFINISHED DETACH Partition of: parent FOR VALUES IN (1) UNDER DETACH Partition of: parent FOR VALUES IN (1) BEING DETACHED > > ATExecDetachPartition: > > Doesn't this need to lock the table before testing for default partition ? > > Correct, it does. I failed to point out that by the time ATExecDetachPartition is called, the relation has already been locked by the invoking ALTER TABLE support code. > > I ended up with apparently broken constraint when running multiple loops > > around > > a concurrent detach / attach: > > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; > > do :; done& > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; > > do :; done& > > > > "p1_check" CHECK (true) > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2) > > Not good. Haven't had time to investigate this problem yet. -- Álvaro Herrera >From d21acb0e99ed3d296db70fed2d5d036d721d611b Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 13 Jul 2020 20:15:30 -0400 Subject: [PATCH v5 1/2] Let ALTER TABLE exec routines deal with the relation This means that ATExecCmd relies on AlteredRelationInfo->rel instead of keeping the relation as a local variable; this is useful if the subcommand needs to modify the relation internally. For example, a subcommand that internally commits its transaction needs this. --- src/backend/commands/tablecmds.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 993da56d43..a8528a3423 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -156,6 +156,8 @@ typedef struct AlteredTableInfo Oid relid; /* Relation to work on */ char relkind; /* Its relkind */ TupleDesc oldDesc; /* Pre-modification tuple descriptor */ + /* Transiently set during Phase 2, normally set to NULL */ + Relation rel; /* Information saved by Phase 1 for Phase 2: */ List *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */ /* Information saved by Phases 1/2 for Phase 3: */ @@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, AlterTableUtilityContext *context); static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, AlterTableUtilityContext *context); -static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, +static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass, AlterTableUtilityContext *context); static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab, @@ -4405,7 +4407,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, { AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab); List *subcmds = tab->subcmds[pass]; - Relation rel; ListCell *lcmd; if (subcmds == NIL) @@ -4414,10 +4415,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, /* * Appropriate lock was obtained by phase 1, needn't get it again */ - rel = relation_open(tab->relid, NoLock); + tab->rel = relation_open(tab->relid, NoLock); foreach(lcmd, subcmds) -ATExecCmd(wqueue, tab, rel, +ATExecCmd(wqueue, tab, castNode(AlterTableCmd, lfirst(lcmd)), lockmode, pass, context); @@ -4429,7 +4430,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, if (pass == AT_PASS_ALTER_TYPE) ATPostAlterTypeCleanup(wqueue, tab, lockmode); - relation_close(rel, NoLock); + if (tab->rel) + { +relation_close(tab->rel, NoLock); +tab->rel = NULL; + } } } @@ -4455,11 +4460,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode, * ATExecCmd: dispatch a subcommand to appropriate execution routine */ static void -ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, +ATExecCmd(Lis
Re: A failure of standby to follow timeline switch
Masao-san: Are you intending to act as committer for these? Since the bug is mine I can look into it, but since you already did all the reviewing work, I'm good with you giving it the final push. 0001 looks good to me; let's get that one committed quickly so that we can focus on the interesting stuff. While the implementation of find_in_log is quite dumb (not this patch's fault), it seems sufficient to deal with small log files. We can improve the implementation later, if needed, but we have to get the API right on the first try. 0003: The fix looks good to me. I verified that the test fails without the fix, and it passes with the fix. The test added in 0002 is a bit optimistic regarding timing, as well as potentially slow; it loops 1000 times and sleeps 100 milliseconds each time. In a very slow server (valgrind or clobber_cache animals) this could not be sufficient time, while on fast servers it may end up waiting longer than needed. Maybe we can do something like this: for (my $i = 0 ; $i < 1000; $i++) { my $current_log_size = determine_current_log_size() if ($node_standby_3->find_in_log( "requested WAL segment [0-9A-F]+ has already been removed", $logstart)) { last; } elsif ($node_standby_3->find_in_log( "End of WAL reached on timeline", $logstart)) { $success = 1; last; } $logstart = $current_log_size; while (determine_current_log_size() == current_log_size) { usleep(10_000); # with a retry count? } } With test patch, make check PROVE_FLAGS="--timer" PROVE_TESTS=t/001_stream_rep.pl ok 6386 ms ( 0.00 usr 0.00 sys + 1.14 cusr 0.93 csys = 2.07 CPU) ok 6352 ms ( 0.00 usr 0.00 sys + 1.10 cusr 0.94 csys = 2.04 CPU) ok 6255 ms ( 0.01 usr 0.00 sys + 0.99 cusr 0.97 csys = 1.97 CPU) without test patch: ok 4954 ms ( 0.00 usr 0.00 sys + 0.71 cusr 0.64 csys = 1.35 CPU) ok 5033 ms ( 0.01 usr 0.00 sys + 0.71 cusr 0.73 csys = 1.45 CPU) ok 4991 ms ( 0.01 usr 0.00 sys + 0.73 cusr 0.59 csys = 1.33 CPU) -- Álvaro Herrera
Re: Key management with tests
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Thu, Jan 7, 2021 at 04:08:49PM -0300, Álvaro Herrera wrote: > > On 2021-Jan-07, Bruce Momjian wrote: > > > > > All the tests pass now. The current src/test directory is 19MB, and > > > adding these tests takes it to 23MB, or a 20% increase. That seems like > > > a lot. It is testing 128-bit and 256-bit keys --- should we do fewer > > > tests, or just test 256, or use gzip to compress the tests by 50%? > > > (Does every platform have gzip?) > > > > So the tests are about 95% of the patch ... do we really need that many > > tests? > > No, I don't think so. Stephen imported the entire NIST test suite. It > was so comperhensive, it detected several OpenSSL bugs for zero-length > strings, which I already reported, but we would never be encrypting > zero-length strings, so there wasn't a lot of value to it. I ran the entire test suite locally to ensure everything worked, but I didn't actually include all of it in the PR which you merged- I had already reduced it quite a bit by removing all 'additional authenticated data' test cases (which the tests will automatically skip and which we haven't implemented support for in the common library wrappers) and by removing the 192-bit cases. This reduced the overall test set by about 2/3rd's or so, as I recall. > Anyway, I think we need to figure out how to trim. The first part would > be to figure out whether we need 128 _and_ 256-bit tests, and then see > what items are really useful. Stephen, do you have any ideas on that? > We currently have 10296 tests, and I think we could get away with 100. Yeah, it's probably still too much, but I don't have any particularly justifiable suggestions as to exactly what we should remove or what we should keep. Perhaps it'd make sense to try and cover the cases that are more likely to be issues between our wrapper functions and OpenSSL, and not stress too much about constantly testing cases that should really be up to OpenSSL. As such, I'd propose: - Add back in some 192-bit tests, so we cover all three bit lengths. - Add back in some additional authenticated test cases, just to make sure that, until/unless we implement support, the test code properly skips over those. - Keep tests for various length plaintext/ciphertext (including 0-byte cases, so we make sure those work, since they really should). - Keep at least one test for each length of tag that's included in the test suite. I'm not sure how many tests we'd end up with from that, but my swag / gut feeling is that it'd probably be on the order of 100ish and a small enough set that it won't dwarf the rest of the patch. Would be nice if we had a way for some buildfarm animal or something to pull in the entire suite and test it, imv.. If anyone wants to volunteer, I'd be happy to explain how to make that happen (it's not hard though- download/unzip the files, drop them in the directory, update the test script to add all the files into the array). Thanks, Stephen signature.asc Description: PGP signature
Re: Key management with tests
Greetings Bruce, * Bruce Momjian (br...@momjian.us) wrote: > On Fri, Jan 1, 2021 at 01:07:50AM -0500, Bruce Momjian wrote: > > On Thu, Dec 31, 2020 at 11:50:47PM -0500, Bruce Momjian wrote: > > > I have completed the key management patch with tests created by Stephen > > > Frost. Original patch by Masahiko Sawada. It requires the hex > > > reorganization patch first. The key patch is now 2.1MB because of the > > > tests, so attaching it here seems unwise: > > > > > > https://github.com/postgres/postgres/compare/master...bmomjian:hex.diff > > > https://github.com/postgres/postgres/compare/master...bmomjian:key.diff > > > > > > I will add it to the commitfest. I think we need to figure out how much > > > of the tests we want to add. > > > > I am getting regression test errors using OpenSSL 1.1.1d 10 Sep 2019 > > with zero-length input data (no -p), while Stephen is able for those > > tests to pass. This needs more research, plus I think higher-level > > tests. > > I have found the cause of the failure, which I added as a C comment: > > /* > * OpenSSL 1.1.1d and earlier crashes on some zero-length plaintext > * and ciphertext strings. It crashes on an encryption call to > * EVP_EncryptFinal_ex(() in GCM mode of zero-length strings if > * plaintext is NULL, even though plaintext_len is zero. Setting > * plaintext to non-NULL allows it to work. In KW/KWP mode, > * zero-length strings fail if plaintext_len = 0 and plaintext is > * non-NULL (the opposite). OpenSSL 1.1.1e+ is fine with all options. > */ > else if (cipher == PG_CIPHER_AES_GCM) > { > plaintext_len = 0; > plaintext = pg_malloc0(1); > } > > All the tests pass now. The current src/test directory is 19MB, and > adding these tests takes it to 23MB, or a 20% increase. That seems like > a lot. It is testing 128-bit and 256-bit keys --- should we do fewer > tests, or just test 256, or use gzip to compress the tests by 50%? > (Does every platform have gzip?) Thanks a lot for working on this and figuring out what the issue was and fixing it! That's great that we got all those cases passing for you too. Thanks again, Stephen signature.asc Description: PGP signature
Re: Key management with tests
On Fri, Jan 8, 2021 at 03:33:44PM -0500, Stephen Frost wrote: > > No, I don't think so. Stephen imported the entire NIST test suite. It > > was so comperhensive, it detected several OpenSSL bugs for zero-length > > strings, which I already reported, but we would never be encrypting > > zero-length strings, so there wasn't a lot of value to it. > > I ran the entire test suite locally to ensure everything worked, but I > didn't actually include all of it in the PR which you merged- I had > already reduced it quite a bit by removing all 'additional > authenticated data' test cases (which the tests will automatically skip > and which we haven't implemented support for in the common library > wrappers) and by removing the 192-bit cases. This reduced the overall > test set by about 2/3rd's or so, as I recall. Wow, so that was reduced! > > Anyway, I think we need to figure out how to trim. The first part would > > be to figure out whether we need 128 _and_ 256-bit tests, and then see > > what items are really useful. Stephen, do you have any ideas on that? > > We currently have 10296 tests, and I think we could get away with 100. > > Yeah, it's probably still too much, but I don't have any particularly > justifiable suggestions as to exactly what we should remove or what we > should keep. > > Perhaps it'd make sense to try and cover the cases that are more likely > to be issues between our wrapper functions and OpenSSL, and not stress > too much about constantly testing cases that should really be up to > OpenSSL. As such, I'd propose: > > - Add back in some 192-bit tests, so we cover all three bit lengths. > - Add back in some additional authenticated test cases, just to make > sure that, until/unless we implement support, the test code properly > skips over those. > - Keep tests for various length plaintext/ciphertext (including 0-byte > cases, so we make sure those work, since they really should). > - Keep at least one test for each length of tag that's included in the > test suite. Makes sense. I did a simplistic trim-down to 90 tests but it still was 40% of the patch; attached. The hex strings are very long. > I'm not sure how many tests we'd end up with from that, but my swag / > gut feeling is that it'd probably be on the order of 100ish and a small > enough set that it won't dwarf the rest of the patch. > > Would be nice if we had a way for some buildfarm animal or something to > pull in the entire suite and test it, imv.. If anyone wants to > volunteer, I'd be happy to explain how to make that happen (it's not > hard though- download/unzip the files, drop them in the directory, > update the test script to add all the files into the array). Yes, do we have a place to store more comprehensive tests outside of our git tree? Has this been done before? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee cryptotest.tgz Description: application/gtar-compressed
Re: Key management with tests
On Fri, Jan 8, 2021 at 03:34:23PM -0500, Stephen Frost wrote: > > All the tests pass now. The current src/test directory is 19MB, and > > adding these tests takes it to 23MB, or a 20% increase. That seems like > > a lot. It is testing 128-bit and 256-bit keys --- should we do fewer > > tests, or just test 256, or use gzip to compress the tests by 50%? > > (Does every platform have gzip?) > > Thanks a lot for working on this and figuring out what the issue was and > fixing it! That's great that we got all those cases passing for you > too. Yes, I was relieved. The pattern of when zero-length strings fail in which modes is still very odd, but at least it reports an error, so it isn't returning incorrect data. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: WIP: System Versioned Temporal Table
On Fri, Jan 8, 2021 at 11:38 AM Simon Riggs wrote: > On Fri, Jan 8, 2021 at 4:50 PM Ryan Lambert > wrote: > > > > On Fri, Jan 8, 2021 at 5:34 AM Simon Riggs > wrote: > >> > >> On Fri, Jan 8, 2021 at 7:34 AM Simon Riggs < > simon.ri...@enterprisedb.com> wrote: > >> > > >> > On Fri, Jan 8, 2021 at 7:13 AM Simon Riggs < > simon.ri...@enterprisedb.com> wrote: > >> > > >> > > I've minimally rebased the patch to current head so that it compiles > >> > > and passes current make check. > >> > > >> > Full version attached > >> > >> New version attached with improved error messages, some additional > >> docs and a review of tests. > >> > > > > The v10 patch fails to make on the current master branch (15b824da). > Error: > > Updated v11 with additional docs and some rewording of messages/tests > to use "system versioning" correctly. > > No changes on the points previously raised. > > -- > Simon Riggshttp://www.EnterpriseDB.com/ Thank you! The v11 applies and installs. I tried a simple test, unfortunately it appears the versioning is not working. The initial value is not preserved through an update and a new row does not appear to be created. CREATE TABLE t ( id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, v BIGINT NOT NULL ) WITH SYSTEM VERSIONING ; Verify start/end time columns created. t=# \d t Table "public.t" Column | Type | Collation | Nullable | Default ---+--+---+--+-- id| bigint | | not null | generated by default as identity v | bigint | | not null | StartTime | timestamp with time zone | | not null | generated always as row start EndTime | timestamp with time zone | | not null | generated always as row end Indexes: "t_pkey" PRIMARY KEY, btree (id, "EndTime") Add a row and check the timestamps set as expected. INSERT INTO t (v) VALUES (1); SELECT * FROM t; id | v | StartTime | EndTime +---+---+-- 1 | 1 | 2021-01-08 20:56:20.848097+00 | infinity Update the row. UPDATE t SET v = -1; The value for v updated but StartTime is the same. SELECT * FROM t; id | v | StartTime | EndTime ++---+-- 1 | -1 | 2021-01-08 20:56:20.848097+00 | infinity Querying the table for all versions only returns the single updated row (v = -1) with the original row StartTime. The original value has disappeared entirely it seems. SELECT * FROM t FOR SYSTEM_TIME FROM '-infinity' TO 'infinity'; I also created a non-versioned table and later added the columns using ALTER TABLE and encountered the same behavior. Ryan Lambert
Re: Key management with tests
Greetings, * Bruce Momjian (br...@momjian.us) wrote: > On Fri, Jan 8, 2021 at 03:33:44PM -0500, Stephen Frost wrote: > > > Anyway, I think we need to figure out how to trim. The first part would > > > be to figure out whether we need 128 _and_ 256-bit tests, and then see > > > what items are really useful. Stephen, do you have any ideas on that? > > > We currently have 10296 tests, and I think we could get away with 100. > > > > Yeah, it's probably still too much, but I don't have any particularly > > justifiable suggestions as to exactly what we should remove or what we > > should keep. > > > > Perhaps it'd make sense to try and cover the cases that are more likely > > to be issues between our wrapper functions and OpenSSL, and not stress > > too much about constantly testing cases that should really be up to > > OpenSSL. As such, I'd propose: > > > > - Add back in some 192-bit tests, so we cover all three bit lengths. > > - Add back in some additional authenticated test cases, just to make > > sure that, until/unless we implement support, the test code properly > > skips over those. > > - Keep tests for various length plaintext/ciphertext (including 0-byte > > cases, so we make sure those work, since they really should). > > - Keep at least one test for each length of tag that's included in the > > test suite. > > Makes sense. I did a simplistic trim-down to 90 tests but it still was > 40% of the patch; attached. The hex strings are very long. I don't think we actually need to stress over the size of the test data relative to the size of the patch- it's not like it's all that much perl code. I can appreciate that we don't want to add megabytes worth of test data to the git repo though. > > I'm not sure how many tests we'd end up with from that, but my swag / > > gut feeling is that it'd probably be on the order of 100ish and a small > > enough set that it won't dwarf the rest of the patch. > > > > Would be nice if we had a way for some buildfarm animal or something to > > pull in the entire suite and test it, imv.. If anyone wants to > > volunteer, I'd be happy to explain how to make that happen (it's not > > hard though- download/unzip the files, drop them in the directory, > > update the test script to add all the files into the array). > > Yes, do we have a place to store more comprehensive tests outside of our > git tree? Has this been done before? Not that I'm aware of. Thanks, Stephen signature.asc Description: PGP signature
Re: Enhance traceability of wal_level changes for backup management
Greetings, * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote: > On Thursday, January 7, 2021 2:40 AM Stephen Frost wrote: > > * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote: > > > You said > > > > The use case I imagined is that the user temporarily changes > > > > wal_level to 'none' from 'replica' or 'logical' to speed up loading > > > > and changes back to the normal. In this case, the backups taken > > > > before the wal_level change cannot be used to restore the database > > > > to the point after the wal_level change. So I think backup > > > > management tools would want to recognize the time or LSN when/where > > > > wal_level is changed to ‘none’ in order to do some actions such as > > > > invalidating older backups, re-calculating backup redundancy etc. > > > > Actually the same is true when the user changes to ‘minimal’. The > > > > tools would need to recognize the time or LSN in this case too. > > > > Since temporarily changing wal_level has been an uncommon use case > > > > some tools perhaps are not aware of that yet. But since introducing > > > > wal_level = ’none’ could make the change common, I think we need to > > provide a way for the tools. > > > > I continue to be against the idea of introducing another wal_level. If > > there's > > additional things we can do to reduce WAL traffic while we continue to use > > it to > > keep the system in a consistent state then we should implement those for the > > 'minimal' case and be done with it. > The new wal_level=none patch achieves something that cannot be done > or cannot be implemented together with wal_level='minimal' clearly. I disagree. > Did you have a look at the peformance evaluation result that I conducted in > [1] ? > It proved that data loading of 'none' is much faster than that of 'minimal'. That test claims to have generated 10G worth of WAL, which makes it seem pretty likely that you didn't use UNLOGGED tables, and didn't create the table in the same transaction as you performed the data loading for that table, so, naturally, you got the full amount of WAL. > > Adding another wal_level is just going to be confusing and increase > > complexity, > > and the chances that someone will end up in a bad state. > Even if when we committed another idea, > that is "ALTER TABLE tbl SET UNLOGGED/LOGGED without copying relation data", > the complexity like taking a full backup before bulk data loading didn't > change > and when any accidents happened during no wal logging for specific table with > the improvement, > user would need to start from the backup again. This looks same to me. Yes, there is still the issue that running with wal_level = minimal for a period of time means that you can't perform PITR from before the change to minimal through to the time after it's been changed back to something higher than minimal. That's no different, I agree. That isn't an argument to introduce another WAL level though. > Additionally, the patch itself in that thread is big and more complicated. This isn't a reason to avoid introducing another wal_level. > The complexity you meant is the wal_level's impact to backup management tools > or anything else ? The complexity of having another wal_level and having to document it and explain how it behaves to users, particularly when, effectively, it shouldn't actually be any different from wal_level = minimal, assuming we've found and implemented the interesting optimizations regarding wal_level = minimal. > > > I wondered, couldn't backup management tools utilize the information > > > in the backup that is needed to be made when wal_level is changed to > > > "none" > > for example ? > > > > What information is that, exactly? If there's a way to detect that the > > wal_level > > has been flipped to 'minimal' and then back to 'replica', other than > > scanning the > > WAL, we'd certainly like to hear of it, so we can implement logic in > > pgbackrest > > to detect that happening. I'll admit that I've not gone hunting for such > > of late, > > but I don't recall seeing anything that would change that either. > Excuse me for making you confused. > I was thinking about control file in the backup as information. > > I'm not familiar with the internals of those backup management tools > but do they monitor the control file and its values of the runnning server at > short intervals ? > And, if they don't do so and if we want accurate time or LSN that indicates > wal_level changes, > I thought we could pick up exact information from control file of cluster > directory or its backup > during server stop (because changing wal_level requires server stop). > That's all. Sorry for the noise. If it's in the control file then pg_controldata should be able to pull it ount and all of the backup tools should be able to manage that, even if they can't read it directly like tools which are also written in C, so having it in the control file seems alright to
Re: list of extended statistics on psql
On 1/8/21 1:14 AM, Tomas Vondra wrote: > > > On 1/8/21 12:52 AM, Tatsuro Yamada wrote: >> Hi, >> >> On 2021/01/08 0:56, Tomas Vondra wrote: >>> On 1/7/21 3:47 PM, Alvaro Herrera wrote: On 2021-Jan-07, Tomas Vondra wrote: > On 1/7/21 1:46 AM, Tatsuro Yamada wrote: >> I overlooked the check for MCV in the logic building query >> because I created the patch as a new feature on PG14. >> I'm not sure whether we should do back patch or not. However, I'll >> add the check on the next patch because it is useful if you decide to >> do the back patch on PG10, 11, 12, and 13. > > BTW perhaps a quick look at the other \d commands would show if > there are > precedents. I didn't have time for that. Yes, we do promise that new psql works with older servers. >>> >>> Yeah, makes sense. That means we need add the check for 12 / MCV. >> >> >> Ah, I got it. >> I fixed the patch to work with older servers to add the checking >> versions. And I tested \dX command on older servers (PG10 - 13). >> These results look fine. >> >> 0001: >> Added the check code to handle pre-PG12. It has not MCV and >> pg_statistic_ext_data. >> 0002: >> This patch is the same as the previous patch (not changed). >> >> Please find the attached files. >> > > OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to > squash the patches into a single commit. > Attached is a patch I plan to commit - 0001 is the last submitted version with a couple minor tweaks, mostly in docs/comments, and small rework of branching to be more like the other functions in describe.c. While working on that, I realized that 'defined' might be a bit ambiguous, I initially thought it means 'NOT NULL' (which it does not). I propose to change it to 'requested' instead. Tatsuro, do you agree, or do you think 'defined' is better? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company >From 3d2f4ef2ecba9fd7987df665237add6fc4ec03c1 Mon Sep 17 00:00:00 2001 From: Tatsuro Yamada Date: Thu, 7 Jan 2021 14:28:20 +0900 Subject: [PATCH 1/2] psql \dX: list extended statistics objects The new command lists extended statistics objects, possibly with their sizes. All past releases with extended statistics are supported. Author: Tatsuro Yamada Reviewed-by: Julien Rouhaud, Alvaro Herrera, Tomas Vondra Discussion: https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1 --- doc/src/sgml/ref/psql-ref.sgml | 14 +++ src/bin/psql/command.c | 3 + src/bin/psql/describe.c | 150 src/bin/psql/describe.h | 3 + src/bin/psql/help.c | 1 + src/bin/psql/tab-complete.c | 4 +- src/test/regress/expected/stats_ext.out | 94 +++ src/test/regress/sql/stats_ext.sql | 31 + 8 files changed, 299 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 221a967bfe..d01acc92b8 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1918,6 +1918,20 @@ testdb=> + + +\dX [ pattern ] + + +Lists extended statistics. +If pattern +is specified, only those extended statistics whose names match the +pattern are listed. +If + is appended to the command name, each extended +statistics is listed with its size. + + + \dy[+] [ pattern ] diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 303e7c3ad8..c5ebc1c3f4 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -928,6 +928,9 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) else success = listExtensions(pattern); break; + case 'X': /* Extended Statistics */ +success = listExtendedStats(pattern, show_verbose); +break; case 'y': /* Event Triggers */ success = listEventTriggers(pattern, show_verbose); break; diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index caf97563f4..46f54199fb 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -4392,6 +4392,156 @@ listEventTriggers(const char *pattern, bool verbose) return true; } +/* + * \dX + * + * Describes extended statistics. + */ +bool +listExtendedStats(const char *pattern, bool verbose) +{ + PQExpBufferData buf; + PGresult *res; + printQueryOpt myopt = pset.popt; + + if (pset.sversion < 10) + { + char sverbuf[32]; + + pg_log_error("The server (version %s) does not support extended statistics.", + formatPGVersionNumber(pset.sversion, false, + sverbuf, sizeof(sverbuf))); + return true; + } + + initPQExpBuffer(&buf); + printfPQExpBuffer(&buf, + "SELECT \n" + "es.stxnamespace::pg_catalog.regnamespace::t
Re: PoC/WIP: Extended statistics on expressions
On 1/8/21 3:35 AM, Justin Pryzby wrote: > On Fri, Jan 08, 2021 at 01:57:29AM +0100, Tomas Vondra wrote: >> Attached is a patch fixing most of the issues. There are a couple >> exceptions: > > In the docs: > > ... > Thanks! Checking the docs and comments is a tedious work, I appreciate you going through all that. I'll fix that in the next version. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add table access method as an option to pgbench
tablespace is an extraneous word ? Thanks a lot for pointing this out. I will fix it in next patch once get all issues clarified. On Sun, Dec 27, 2020 at 09:14:53AM -0400, Fabien COELHO wrote: src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE HANDLER heap_tableam_handler; ... src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2; Do you suggest to add a new positive test case by using table access method *heap2* created using the existing heap_tableam_handler? If you found pre-existing inconsistencies/errors/deficiencies, I'd suggest to fix them in a separate patch. Typically those are small, and you can make them 0001 patch. Errors might be worth backpatching, so in addition to making the "feature" patches small, it's good if they're separate. Like writing NAME vs TABLESPACE. Or escape vs escapes. I will provide a separate small patch when fixing the *tablespace* typo mention above. Can you help to clarity which releases or branches need to be back patched? Or maybe using SET default_tablespace instead of modifying the CREATE sql. That'd need to be done separately for indexes, and RESET after creating the tables, to avoid accidentally affecting indexes, too. Why should it not affect indexes? I rarely use pgbench, and probably never looked at its source before, but I saw that it has a separate --tablespace and --index-tablespace, and that --tablespace is *not* the default for indexes. So if we changed it to use SET default_tablespace instead of amending the DDL sql, we'd need to make sure the SET applied only to the intended CREATE command, and not all following create commands. In the case that --index-tablespace is not specified, it would be buggy to do this: SET default_tablespace=foo; CREATE TABLE ... CREATE INDEX ... CREATE TABLE ... CREATE INDEX ... ... If this `tablespace` issue also applies to `table-access-method`, yes, I agree it could be buggy too. But, the purpose of this patch is to provide an option for end user to test a newly created table access method. If the the *new* table access method hasn't fully implemented all the interfaces yet, then no matter the end user use the option provided by this patch or the GUC parameter, the results will be the same. -- David Software Engineer Highgo Software Inc. (Canada) www.highgo.ca
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Fri, Jan 8, 2021 at 9:55 AM Bharath Rupireddy wrote: > I will make the changes and post a new patch set soon. Attaching v9 patch set that has addressed the review comments on the disconnect function returning setof records, documentation changes, and postgres_fdw--1.0-1.1.sql changes. Please consider the v9 patch set for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v9-0001-postgres_fdw-function-to-discard-cached-connectio.patch Description: Binary data v9-0002-postgres_fdw-add-keep_connections-GUC-to-not-cach.patch Description: Binary data v9-0003-postgres_fdw-server-level-option-keep_connection-.patch Description: Binary data
Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
On Fri, Jan 8, 2021 at 9:50 PM japin wrote: > > Attaching v3 patches, please consider these for further review. > > > > I find that both the declaration and definition of > match_matview_with_new_data() > have a tab between type and variable. We can use pgindent to fix it. > What do you think? > > > static void > match_matview_with_new_data(RefreshMatViewStmt *stmt, Relation matviewRel, > ^ > Oid matviewOid, Oid OIDNewHeap, Oid relowner, > int save_sec_context, char relpersistence, > uint64 processed) > ^ I ran pgindent on 0001 patch to fix the above. 0002 patch has no changes. If I'm correct, pgindent will be run periodically on master. Attaching v4 patch set for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v4-0001-Rearrange-Refresh-Mat-View-Code.patch Description: Binary data v4-0002-EXPLAIN-EXPLAIN-ANALYZE-REFRESH-MATERIALIZED-VIEW.patch Description: Binary data
Re: Improper use about DatumGetInt32
On Fri, Jan 08, 2021 at 04:54:47PM +0100, Peter Eisentraut wrote: > Updated patch that does that. Thanks. Looks sane seen from here. +/* LCOV_EXCL_START */ Does it really make sense to add those markers here? It seems to me that we would ignore any new coverage if regression tests based on older versions are added (we may really want to have such tests for more in-core extensions to be able to verify the portability of an extension, but that's not the job of this patch of course). - elog(ERROR, "block 0 is a meta page"); + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("block 0 is a meta page"))); [...] +errmsg("block number %llu is out of range for relation \"%s\"", This does not follow the usual style for error reports that should not be written as full sentences? Maybe something like "invalid block number %u referring to meta page" and "block number out of range for relation %s: %llu"? -- Michael signature.asc Description: PGP signature
Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
On Sat, 09 Jan 2021 at 09:38, Bharath Rupireddy wrote: > On Fri, Jan 8, 2021 at 9:50 PM japin wrote: >> > Attaching v3 patches, please consider these for further review. >> > >> >> I find that both the declaration and definition of >> match_matview_with_new_data() >> have a tab between type and variable. We can use pgindent to fix it. >> What do you think? >> >> >> static void >> match_matview_with_new_data(RefreshMatViewStmt *stmt, Relation matviewRel, >> ^ >> Oid matviewOid, Oid OIDNewHeap, Oid relowner, >> int save_sec_context, char relpersistence, >> uint64 processed) >> ^ > > I ran pgindent on 0001 patch to fix the above. 0002 patch has no > changes. If I'm correct, pgindent will be run periodically on master. > Thanks for your point out. I don't know before. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: [PATCH] Simple progress reporting for COPY command
On Fri, Jan 8, 2021 at 7:00 PM Matthias van de Meent wrote: > On Thu, 7 Jan 2021 at 23:00, Josef Šimánek wrote: > > > > čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra > > napsal: > > > > > > I'm not particularly attached to the "lines" naming, it just seemed OK > > > to me. So if there's consensus to rename this somehow, I'm OK with it. > > > > The problem I do see here is it depends on the "way" of COPY. If > > you're copying from CSV file to table, those are actually lines (since > > 1 line = 1 tuple). But copying from DB to file is copying tuples (but > > 1 tuple = 1 file line). Line works better here for me personally. > > > > Once I'll fix the problem with triggers (and also another cases if > > found), I think we can consider it lines. It will represent amount of > > lines processed from file on COPY FROM and amount of lines written to > > file in COPY TO form (at least in CSV format). I'm not sure how BINARY > > format works, I'll check. > > Counterexample that 1 tuple need not be 1 line, in csv/binary: Yes in binary format file, 1 tuple is definitely not 1 line. > /* > * create a table with one tuple containing 1 text field, which consists of > * 10 newline characters. > * If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n). > */ > # CREATE TABLE ttab (val) AS > SELECT * FROM (values ( > repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text > )) as v; > > # -- indeed, one unix-style line, according to $ wc -l copy.txt > # COPY ttab TO 'copy.txt' (format text); > COPY 1 > # TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text); > COPY 1 > > # -- 11 lines > # COPY ttab TO 'copy.csv' (format csv); > COPY 1 > # TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv); > COPY 1 > > # -- 13 lines > # COPY ttab TO 'copy.bin' (format binary); > COPY 1 > # TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary); > COPY 1 > > All of the above copy statements would only report 'lines_processed = 1', > in the progress reporting, while csv/binary line counts are definatively > inconsistent with what the progress reporting shows, because progress > reporting counts tuples / table rows, not the amount of lines in the > external file. I agree with that. +1 to change the name of 'lines_processed' column to 'tuples_processed'. So, the meaning of 'tuples_processed' can be - for COPY FROM, it's the number of tuples that are written to the target table, for COPY TO, it's the number of tuples from the source table or source plan that are written out to the file/stdin. IMHO, it's good to have at least the following info: the type of command COPY TO or COPY FROM, the format CSV,BINARY,TEXT and from where the input comes from or output goes to - FILE, STDIN, STDOUT or PROGRAM. Otherwise users will have to join pg_stat_progress_copy view with other views to get the type of copy command or the entire query. That's sometimes cumbersome to figure out what's the other view name and write join queries. Another point related to documentation, I see that the pg_stat_progress_copy view details are added to monitoring.sgml, but I didn't find any mention of it in the copy.sgml. For instance, we have pg_stat_progress_basebackup documentation both in monitoring.sgml and a mention of it and link to it in pg_basebackup.sgml. Usually, users will look at copy documentation to find any kind of information related to it, so better we have it mentioned there as well. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: logical replication worker accesses catalogs in error context callback
On Thu, Jan 7, 2021 at 5:47 AM Masahiko Sawada wrote: > > On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada wrote: > > > > On Wed, Jan 6, 2021 at 11:02 AM Andres Freund wrote: > > > > > > Hi, > > > > > > Due to a debug ereport I just noticed that worker.c's > > > slot_store_error_callback is doing something quite dangerous: > > > > > > static void > > > slot_store_error_callback(void *arg) > > > { > > > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg; > > > LogicalRepRelMapEntry *rel; > > > char *remotetypname; > > > Oid remotetypoid, > > > localtypoid; > > > > > > /* Nothing to do if remote attribute number is not set */ > > > if (errarg->remote_attnum < 0) > > > return; > > > > > > rel = errarg->rel; > > > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum]; > > > > > > /* Fetch remote type name from the LogicalRepTypMap cache */ > > > remotetypname = logicalrep_typmap_gettypname(remotetypoid); > > > > > > /* Fetch local type OID from the local sys cache */ > > > localtypoid = get_atttype(rel->localreloid, errarg->local_attnum > > > + 1); > > > > > > errcontext("processing remote data for replication target > > > relation \"%s.%s\" column \"%s\", " > > >"remote type %s, local type %s", > > >rel->remoterel.nspname, rel->remoterel.relname, > > >rel->remoterel.attnames[errarg->remote_attnum], > > >remotetypname, > > >format_type_be(localtypoid)); > > > } > > > > > > > > > that's not code that can run in an error context callback. It's > > > theoretically possible (but unlikely) that > > > logicalrep_typmap_gettypname() is safe to run in an error context > > > callback. But get_atttype() definitely isn't. > > > > > > get_attype() may do catalog accesses. That definitely can't happen > > > inside an error context callback - the entire transaction might be > > > borked at this point! > > > > You're right. Perhaps calling to format_type_be() is also dangerous > > since it does catalog access. We should have added the local type > > names to SlotErrCallbackArg so we avoid catalog access in the error > > context. > > > > I'll try to fix this. > > Attached the patch that fixes this issue. > > Since logicalrep_typmap_gettypname() could search the sys cache by > calling to format_type_be(), I stored both local and remote type names > to SlotErrCallbackArg so that we can just set the names in the error > callback without sys cache lookup. > > Please review it. Patch looks good, except one minor comment - can we store rel->remoterel.atttyps[remoteattnum] into a local variable and use it in logicalrep_typmap_gettypname, just to save on indentation? I quickly searched in places where error callbacks are being used, I think we need a similar kind of fix for conversion_error_callback in postgres_fdw.c, because get_attname and get_rel_name are being used which do catalogue lookups. ISTM that all the other error callbacks are good in the sense that they are not doing sys catalogue lookups. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Single transaction in the tablesync worker?
On Fri, Jan 8, 2021 at 2:55 PM Peter Smith wrote: > > On Fri, Jan 8, 2021 at 1:02 PM Hou, Zhijie wrote: > > > > > 3. > > + /* > > +* To build a slot name for the sync work, we are limited to > > NAMEDATALEN - > > +* 1 characters. > > +* > > +* The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + > > '\0'). (It's > > +* actually the NAMEDATALEN on the remote that matters, but this > > scheme > > +* will also work reasonably if that is different.) > > +*/ > > + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* > > for sanity */ > > + > > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); > > > > The comments says syncslotname is limit to NAMEDATALEN - 1 characters. > > But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not > > NAMEDATALEN - 1. > > Should we change the comment here? > > > > The comment wording is a remnant from older code which had a > differently format slot name. > I think the comment is still valid, albeit maybe unnecessary since in > the current code the tablesync slot > name length is fixed. But I left the older comment here as a safety reminder > in case some future change would want to modify the slot name. What do > you think? > I find it quite confusing. The comments should reflect the latest code. You can probably say in some form that the length of slotname shouldn't exceed NAMEDATALEN because of remote node constraints on slot name length. Also, probably the StaticAssert on NAMEDATALEN is not required. 1. + +Additional table synchronization slots are normally transient, created +internally and dropped automatically when they are no longer needed. +These table synchronization slots have generated names: + pg_%u_sync_%u (parameters: Subscription oid, Table relid) + The last line seems too long. I think we are not strict for 80 char limit in docs but it is good to be close to that, however, this appears quite long. 2. + /* + * Cleanup any remaining tablesync resources. + */ + { + char originname[NAMEDATALEN]; + RepOriginId originid; + char state; + XLogRecPtr statelsn; I have already mentioned previously that let's not use this new style of code (start using { to localize the scope of variables). I don't know about others but I find it difficult to read such a code. You might want to consider moving this whole block to a separate function. 3. /* + * XXX - Should optimize this to avoid multiple + * connect/disconnect. + */ + wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err); I think it is better to avoid multiple connect/disconnect here. In this same function, we have connected to the publisher, we should be able to use the same connection. 4. process_syncing_tables_for_sync() { .. + /* + * Cleanup the tablesync slot. + */ + syncslotname = ReplicationSlotNameForTablesync( +MySubscription->oid, +MyLogicalRepWorker->relid); + PG_TRY(); + { + elog(DEBUG1, "process_syncing_tables_for_sync: dropping the tablesync slot \"%s\".", syncslotname); + ReplicationSlotDropAtPubNode(wrconn, syncslotname); + } + PG_FINALLY(); + { + pfree(syncslotname); + } + PG_END_TRY(); .. } Both here and in DropSubscription(), it seems we are using PG_TRY..PG_FINALLY just to free the memory even though ReplicationSlotDropAtPubNode already has try..finally. Can we arrange code to move allocation of syncslotname inside ReplicationSlotDropAtPubNode to avoid additional try..finaly? BTW, if the usage of try..finally here is only to free the memory, I am not sure if it is required because I think we will anyway Reset the memory context where this memory is allocated as part of error handling. 5. #define SUBREL_STATE_DATASYNC 'd' /* data is being synchronized (sublsn * NULL) */ +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed + * (sublsn NULL) */ #define SUBREL_STATE_SYNCDONE 's' /* synchronization finished in front of * apply (sublsn set) */ I am not very happy with the new state name SUBREL_STATE_TCOPYDONE as it is quite different from other adjoining state names and somehow not going well with the code. How about SUBREL_STATE_ENDCOPY 'e' or SUBREL_STATE_FINISHEDCOPY 'f'? -- With Regards, Amit Kapila.
Re: [HACKERS] logical decoding of two-phase transactions
On Tue, Jan 5, 2021 at 4:26 PM Amit Kapila wrote: > > On Tue, Jan 5, 2021 at 2:11 PM Ajin Cherian wrote: > > > > > > I've addressed the above comments and the patch is attached. I've > > called it v36-0007. > > > > Thanks, I have pushed this after minor wordsmithing. > The test case is failing on one of the build farm machines. See email from Tom Lane [1]. The symptom clearly shows that we are decoding empty xacts which can happen due to background activity by autovacuum. I think we need a fix similar to what we have done in https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=82a0ba7707e010a29f5fe1a0020d963c82b8f1cb. I'll try to reproduce and provide a fix for this later today or tomorrow. [1] - https://www.postgresql.org/message-id/363512.1610171267%40sss.pgh.pa.us -- With Regards, Amit Kapila.