Re: Typo in pgbench messages.
On Thu, 24 Feb 2022 14:44:05 +0100 Daniel Gustafsson wrote: > > On 24 Feb 2022, at 13:58, Fabien COELHO wrote: > > >> One argument against a backpatch is that this would be disruptive with > >> tools that parse and analyze the output generated by pgbench. Fabien, > >> don't you have some tools and/or wrappers doing exactly that? > > > > Yep, I like to "| cut -d' ' -fx" and other "line.split(' ')" or whatever. > > > > I think that the break of typographical rules is intentional to allow such > > simplistic low-level stream handling through pipes or scripts. I'd prefer > > that the format is not changed. Maybe a comment could be added to explain > > the reason behind it. > > That doesn't sound like an overwhelmingly convincing argument to print some > messages with 'X %' and others with 'X%'. I agree with Daniel. If inserting a space in front of % was intentional for handling the result in such tools, we should fix other places where 'X%' is used in the pgbench output. Regards, Yugo Nagata -- Yugo NAGATA
pipeline mode and commands not allowed in a transaction block
Hi, I found that when we use pipeline mode, we can execute commands which is not allowed in a transaction block, for example CREATE DATABASE, in the same transaction with other commands. In extended query protocol, a transaction starts when Parse, Bind, Executor, or Describe message is received, and is closed when Sync message is received if COMMIT, ROLLBACK, or END is not sent. In a pipeline mode, Sync message is sent at the end of the pipeline instead of for each query. Therefore, multiple queries can be in the same transaction without using an explicit transaction block. It is similar to implicit transaction block which starts when multiple statements are sent in simple query protocol, but the server doesn't regard it as an implicit transaction block. Therefore, problems that would not occur in implicit transactions could occur in transactions started in a pipeline mode. For example, CREATE DATABASE or DROP DATABASE can be executed in the same transaction with other commands, and when the transaction fails, this causes an inconsistency between the system catalog and base directory. Do you think we should prevent such problems from server side? or, it is user's responsible to avoid such problematic use of pipeline or protocol messages? If we want to handle it from server side, I think a few ideas: 1. If the server receive more than one Execute messages before receiving Sync, start an implicit transaction block. If the first Execute message is for a command not allowed in a transaction (CREATE DATABASE etc.), explicitly close the transaction after the command not to share the transaction with other commands. 2. When a pipeline start by calling PQenterPipelineMode in libpq, start an implicit transaction at the server. For this purpose, we would need to add a new message to signal the start of pipeline mode to the protocol. It is user responsible to avoid the problematic protocol use when libpq is not used. What do you think about it? Regards, Yugo Nagata -- Yugo NAGATA
Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error
Hi Horiguchi-san, On Thu, 27 Jan 2022 17:50:25 +0900 (JST) Kyotaro Horiguchi wrote: > At Tue, 16 Nov 2021 02:26:43 +0900, Yugo NAGATA wrote > in > > Thank you for pointing it out! > > I attached the updated patch. > > I think we want more elabolative comment for the new place of > preparing as you mentioned in the first mail. Thank you for your suggestion. I added comments on the prepareCommands() call as in the updated patch. Regards, Yugo Nagata Yugo NAGATA diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index f166a77e3a..c5893eabe4 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2832,6 +2832,30 @@ chooseScript(TState *thread) return i - 1; } +/* Prepare SQL commands in the chosen script */ +static void +prepareCommands(CState *st) +{ + int j; + Command **commands = sql_script[st->use_file].commands; + + for (j = 0; commands[j] != NULL; j++) + { + PGresult *res; + char name[MAX_PREPARE_NAME]; + + if (commands[j]->type != SQL_COMMAND) + continue; + preparedStatementName(name, st->use_file, j); + res = PQprepare(st->con, name, + commands[j]->argv[0], commands[j]->argc - 1, NULL); + if (PQresultStatus(res) != PGRES_COMMAND_OK) + pg_log_error("%s", PQerrorMessage(st->con)); + PQclear(res); + } + st->prepared[st->use_file] = true; +} + /* Send a SQL command, using the chosen querymode */ static bool sendCommand(CState *st, Command *command) @@ -2865,42 +2889,6 @@ sendCommand(CState *st, Command *command) char name[MAX_PREPARE_NAME]; const char *params[MAX_ARGS]; - if (!st->prepared[st->use_file]) - { - int j; - Command **commands = sql_script[st->use_file].commands; - - for (j = 0; commands[j] != NULL; j++) - { -PGresult *res; -char name[MAX_PREPARE_NAME]; - -if (commands[j]->type != SQL_COMMAND) - continue; -preparedStatementName(name, st->use_file, j); -if (PQpipelineStatus(st->con) == PQ_PIPELINE_OFF) -{ - res = PQprepare(st->con, name, - commands[j]->argv[0], commands[j]->argc - 1, NULL); - if (PQresultStatus(res) != PGRES_COMMAND_OK) - pg_log_error("%s", PQerrorMessage(st->con)); - PQclear(res); -} -else -{ - /* - * In pipeline mode, we use asynchronous functions. If a - * server-side error occurs, it will be processed later - * among the other results. - */ - if (!PQsendPrepare(st->con, name, - commands[j]->argv[0], commands[j]->argc - 1, NULL)) - pg_log_error("%s", PQerrorMessage(st->con)); -} - } - st->prepared[st->use_file] = true; - } - getQueryParams(st, command, params); preparedStatementName(name, st->use_file, st->command); @@ -3172,6 +3160,20 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) memset(st->prepared, 0, sizeof(st->prepared)); } +/* + * Prepare SQL commands if needed. + * + * We should send Parse messages before executing meta commands + * especially /startpipeline. If a Parse message is sent in + * pipeline mode, a transaction starts before BEGIN is sent, and + * it could be a problem. For example, "BEGIN ISOLATION LEVEL + * SERIALIZABLE" is sent after a transaction starts, the error + * "ERROR: SET TRANSACTION ISOLATION LEVEL must be called before any query" + * occurs. + */ +if (querymode == QUERY_PREPARED && !st->prepared[st->use_file]) + prepareCommands(st); + /* record transaction start time */ st->txn_begin = now; diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index f1341092fe..7e4ec728e8 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -837,6 +837,23 @@ select 1 \gset f } }); +# Working \startpipeline in prepared query mode with serializable +$node->pgbench( + '-t 1 -n -M prepared', + 0, + [ qr{type: .*/001_pgbench_pipeline_serializable}, qr{actually processed: 1/1} ], + [], + 'working \startpipeline with serializable', + { + '001_pgbench_pipeline_serializable' => q{ +-- test startpipeline with serializable +\startpipeline +BEGIN ISOLATION LEVEL SERIALIZABLE; +} . "select 1;\n" x 10 . q{ +END; +\endpipeline +} + }); # trigger many expression errors my @errors = (
Re: Implementing Incremental View Maintenance
On Wed, 16 Feb 2022 22:34:18 +0800 huyajun wrote: > Hi, Nagata-san > I am very interested in IMMV and read your patch but have some comments in > v25-0007-Add-Incremental-View-Maintenance-support.patch and want to discuss > with you. Thank you for your review! > > + /* For IMMV, we need to rewrite matview query */ > + query = rewriteQueryForIMMV(query, into->colNames); > + query_immv = copyObject(query); > > /* Create triggers on incremental maintainable materialized view */ > + Assert(query_immv != NULL); > + CreateIvmTriggersOnBaseTables(query_immv, > matviewOid, true); > 1. Do we need copy query?Is it okay that CreateIvmTriggersOnBaseTables > directly use (Query *) into->viewQuery instead of query_immv like > CreateIndexOnIMMV? It seems only planner may change query, but it shouldn't > affect us finding the correct base table in CreateIvmTriggersOnBaseTables . The copy to query_immv was necessary for supporting sub-queries in the view definition. However, we excluded the fueature from the current patch to reduce the patch size, so it would be unnecessary. I'll fix it. > > +void > +CreateIndexOnIMMV(Query *query, Relation matviewRel) > +{ > + Query *qry = (Query *) copyObject(query); > 2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only > read query in CreateIndexOnIMMV. This was also necessary for supporting CTEs, but unnecessary in the current patch, so I'll fix it, too. Regards, Yugo Nagata -- Yugo NAGATA
Re: Commitfest 2022-03 Patch Triage Part 1a.i
> > > > Tom's feedback seems to have been acted on last November. And the last > > update in January was that it was passing CI now. Is this ready to > > commit now? > > > > > > > 2138: Incremental Materialized View Maintenance > > > === > > > The size of the > > > patchset and the length of the thread make it hard to gauge just far away > > > it > > > is, maybe the author or a reviewer can summarize the current state and > > > outline > > > what is left for it to be committable. > > > > There is an updated patch set as of February but I have the same > > difficulty wrapping my head around the amount of info here. > > > > Is this one really likely to be commitable in 15? If not I think we > > should move this to 16 now and concentrate on patches that will be > > commitable in this release. I think this patch set needs more reviews to be commitable in 15, so I returned the target version to blank. I'll change it to 16 later. > > > > > 2218: Implement INSERT SET syntax > > > = > > > The author has kept this patch updated, and has seemingly updated > > > according to > > > the review comments. Tom: do you have any opinions on whether the updated > > > version addresses your concerns wrt the SELECT rewrite? > > > > I don't see any discussion implying that Tom's concerns were met. I'm > > not exactly clear why Tom's concerns are real problems though -- > > wouldn't it be a *good* thing if we have a more expressive syntax? But > > that's definitely what needs to be resolved before it can move ahead. > > > > So unless there's objections I'm going to update this to "waiting on > > author". > > > > -- > > greg > > > > -- > greg -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
Hello Zhihong Yu, I already replied to your comments before, but I forgot to include the list to CC, so I resend the same again. Sorry for the duplicate emails. On Thu, 3 Feb 2022 09:51:52 -0800 Zhihong Yu wrote: > For CreateIndexOnIMMV(): > > + ereport(NOTICE, > + (errmsg("could not create an index on materialized view > \"%s\" automatically", > ... > + return; > + } > > Should the return type be changed to bool so that the caller knows whether > the index creation succeeds ? > If index creation is unsuccessful, should the call > to CreateIvmTriggersOnBaseTables() be skipped ? CreateIvmTriggersOnBaseTables() have to be called regardless of whether an index is created successfully or not, so I think CreateindexOnIMMV() doesn't have to return the result for now. > For check_ivm_restriction_walker(): > > + break; > + expression_tree_walker(node, check_ivm_restriction_walker, > NULL); > + break; > > Something is missing between the break and expression_tree_walker(). Yes, it's my mistake during making the patch-set. I fixed it in the updated patch I attached in the other post. Regards, Yugo Nagata -- Yugo NAGATA
Tab completion for ALTER MATERIALIZED VIEW ... SET ACCESS METHOD
Hello hackers, SET ACCESS METHOD is supported in ALTER TABLE since the commit b0483263dd. Since that time, this also has be allowed SET ACCESS METHOD in ALTER MATERIALIZED VIEW. Although it is not documented, this works. I cannot found any reasons to prohibit SET ACCESS METHOD in ALTER MATERIALIZED VIEW, so I think it is better to support this in psql tab-completion and be documented. I attached a patch to fix the tab-completion and the documentation about this syntax. Also, I added description about SET TABLESPACE syntax that would have been overlooked. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml index 7011a0e7da..cae135c27a 100644 --- a/doc/src/sgml/ref/alter_materialized_view.sgml +++ b/doc/src/sgml/ref/alter_materialized_view.sgml @@ -43,6 +43,8 @@ ALTER MATERIALIZED VIEW ALL IN TABLESPACE namecolumn_name SET COMPRESSION compression_method CLUSTER ON index_name SET WITHOUT CLUSTER +SET ACCESS METHOD new_access_method +SET TABLESPACE new_tablespace SET ( storage_parameter [= value] [, ... ] ) RESET ( storage_parameter [, ... ] ) OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER } diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 6957567264..daf4206caa 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2124,7 +2124,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("TO"); /* ALTER MATERIALIZED VIEW xxx SET */ else if (Matches("ALTER", "MATERIALIZED", "VIEW", MatchAny, "SET")) - COMPLETE_WITH("(", "SCHEMA", "TABLESPACE", "WITHOUT CLUSTER"); + COMPLETE_WITH("(", "ACCESS METHOD", "SCHEMA", "TABLESPACE", "WITHOUT CLUSTER"); /* ALTER POLICY */ else if (Matches("ALTER", "POLICY")) COMPLETE_WITH_QUERY(Query_for_list_of_policies); @@ -2485,6 +2485,13 @@ psql_completion(const char *text, int start, int end) else if (Matches("ALTER", "TYPE", MatchAny, "RENAME", "VALUE")) COMPLETE_WITH_ENUM_VALUE(prev3_wd); + /* + * If we have ALTER MATERIALIZED VIEW SET ACCESS METHOD provide a list of table + * AMs. + */ + else if (Matches("ALTER", "MATERIALIZED", "VIEW", MatchAny, "SET", "ACCESS", "METHOD")) + COMPLETE_WITH_QUERY(Query_for_list_of_table_access_methods); + /* * ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ] * ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]
Re: Tab completion for ALTER MATERIALIZED VIEW ... SET ACCESS METHOD
Hi Michael-san, On Wed, 16 Mar 2022 16:18:09 +0900 Michael Paquier wrote: > Hi Nagata-san, > > On Wed, Mar 16, 2022 at 01:33:37PM +0900, Yugo NAGATA wrote: > > SET ACCESS METHOD is supported in ALTER TABLE since the commit > > b0483263dd. Since that time, this also has be allowed SET ACCESS > > METHOD in ALTER MATERIALIZED VIEW. Although it is not documented, > > this works. > > Yes, that's an oversight. I see no reason to not authorize that, and > the rewrite path in tablecmds.c is the same as for plain tables. > > > I cannot found any reasons to prohibit SET ACCESS METHOD in ALTER > > MATERIALIZED VIEW, so I think it is better to support this in psql > > tab-completion and be documented. > > I think that we should have some regression tests about those command > flavors. How about adding a couple of queries to create_am.sql for > SET ACCESS METHOD and to tablespace.sql for SET TABLESPACE? Thank you for your review! I added some queries in the regression test. Attached is the updated patch. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml index 7011a0e7da..cae135c27a 100644 --- a/doc/src/sgml/ref/alter_materialized_view.sgml +++ b/doc/src/sgml/ref/alter_materialized_view.sgml @@ -43,6 +43,8 @@ ALTER MATERIALIZED VIEW ALL IN TABLESPACE namecolumn_name SET COMPRESSION compression_method CLUSTER ON index_name SET WITHOUT CLUSTER +SET ACCESS METHOD new_access_method +SET TABLESPACE new_tablespace SET ( storage_parameter [= value] [, ... ] ) RESET ( storage_parameter [, ... ] ) OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER } diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 380cbc0b1f..c3fddcd0af 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2124,7 +2124,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("TO"); /* ALTER MATERIALIZED VIEW xxx SET */ else if (Matches("ALTER", "MATERIALIZED", "VIEW", MatchAny, "SET")) - COMPLETE_WITH("(", "SCHEMA", "TABLESPACE", "WITHOUT CLUSTER"); + COMPLETE_WITH("(", "ACCESS METHOD", "SCHEMA", "TABLESPACE", "WITHOUT CLUSTER"); /* ALTER POLICY */ else if (Matches("ALTER", "POLICY")) COMPLETE_WITH_QUERY(Query_for_list_of_policies); @@ -2485,6 +2485,13 @@ psql_completion(const char *text, int start, int end) else if (Matches("ALTER", "TYPE", MatchAny, "RENAME", "VALUE")) COMPLETE_WITH_ENUM_VALUE(prev3_wd); + /* + * If we have ALTER MATERIALIZED VIEW SET ACCESS METHOD provide a list of table + * AMs. + */ + else if (Matches("ALTER", "MATERIALIZED", "VIEW", MatchAny, "SET", "ACCESS", "METHOD")) + COMPLETE_WITH_QUERY(Query_for_list_of_table_access_methods); + /* * ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ] * ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ] diff --git a/src/test/regress/expected/create_am.out b/src/test/regress/expected/create_am.out index 32b7134080..cf83c75ab3 100644 --- a/src/test/regress/expected/create_am.out +++ b/src/test/regress/expected/create_am.out @@ -254,9 +254,33 @@ SELECT COUNT(a), COUNT(1) FILTER(WHERE a=1) FROM heaptable; 9 | 1 (1 row) +-- ALTER MATERIALIZED VIEW SET ACCESS METHOD +CREATE MATERIALIZED VIEW heapmv USING heap AS SELECt * FROM heaptable; +SELECT amname FROM pg_class c, pg_am am + WHERE c.relam = am.oid AND c.oid = 'heapmv'::regclass; + amname + + heap +(1 row) + +ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2; +SELECT amname FROM pg_class c, pg_am am + WHERE c.relam = am.oid AND c.oid = 'heapmv'::regclass; + amname + + heap2 +(1 row) + +SELECT COUNT(a), COUNT(1) FILTER(WHERE a=1) FROM heapmv; + count | count +---+--- + 9 | 1 +(1 row) + -- No support for multiple subcommands ALTER TABLE heaptable SET ACCESS METHOD heap, SET ACCESS METHOD heap2; ERROR: cannot have multiple SET ACCESS METHOD subcommands +DROP MATERIALIZED VIEW heapmv; DROP TABLE heaptable; -- No support for partitioned tables. CREATE TABLE am_partitioned(x INT, y INT) diff --git a/src/test/regress/expected/tablespace.out b/src/test/regress/expected/tablespace.out index 473fe8c28e..c52cf1cfcf 100644 --- a/src/test/regress/expected/tablespace.out +++ b/src/test/regress/expected/tablespace.out @@ -905,6 +905,16 @@ SELECT COUNT(*) FROM testschema.atable; -- checks heap 3 (1 row) +-- let's try moving a materialized view from one place to another +CREATE MATERIALIZED VIEW testschema.amv AS SELECT * FROM testschema.atable; +ALTER MATERIALIZED VIEW
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hello Fabien, On Sat, 12 Mar 2022 15:54:54 +0100 (CET) Fabien COELHO wrote: > Hello Yugo-san, > > About Pgbench error handling v16: Thank you for your review! I attached the updated patches. > This patch set needs a minor rebase because of 506035b0. Otherwise, patch > compiles, global and local "make check" are ok. Doc generation is ok. I rebased it. > ## About v16-2 > English: "he will be aborted" -> "it will be aborted". Fixed. > I'm still not sure I like the "failure detailed" option, ISTM that the report > could be always detailed. That would remove some complexity and I do not think > that people executing a bench with error handling would mind having the > details. > No big deal. I didn't change it because I think those who don't expect any failures using a well designed script may not need details of failures. I think reporting such details will be required only for benchmarks where any failures are expected. > printVerboseErrorMessages: I'd make the buffer static and initialized only > once > so that there is no significant malloc/free cycle involved when calling the > function. OK. I fixed printVerboseErrorMessages to use a static variable. > advanceConnectionState: I'd really prefer not to add new variables (res, > status) > in the loop scope, and only declare them when actually needed in the state > branches, > so as to avoid any unwanted interaction between states. I fixed to declare the variables in the case statement blocks. > typo: "fullowing" -> "following" fixed. > Pipeline cleaning: the advance function is already s long, I'd put that > in a > separate function and call it. Ok. I made a new function "discardUntilSync" for the pipeline cleaning. > I think that the report should not remove data when they are 0, otherwise it > makes > it harder to script around it (in failures_detailed on line 6284). I fixed to report both serialization and deadlock failures always even when they are 0. Regards, Yugo Nagata -- Yugo NAGATA >From b4360d3c03013c86e1e62247f1c3c1378aacc38d Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Wed, 26 May 2021 16:58:36 +0900 Subject: [PATCH v17 1/2] Pgbench errors: use the Variables structure for client variables This is most important when it is used to reset client variables during the repeating of transactions after serialization/deadlock failures. Don't allocate Variable structs one by one. Instead, add a constant margin each time it overflows. --- src/bin/pgbench/pgbench.c | 163 +++--- 1 file changed, 100 insertions(+), 63 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 000ffc4a5c..ab2c5dfc5f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -289,6 +289,12 @@ const char *progname; volatile bool timer_exceeded = false; /* flag from signal handler */ +/* + * We don't want to allocate variables one by one; for efficiency, add a + * constant margin each time it overflows. + */ +#define VARIABLES_ALLOC_MARGIN 8 + /* * Variable definitions. * @@ -306,6 +312,24 @@ typedef struct PgBenchValue value; /* actual variable's value */ } Variable; +/* + * Data structure for client variables. + */ +typedef struct +{ + Variable *vars; /* array of variable definitions */ + int nvars; /* number of variables */ + + /* + * The maximum number of variables that we can currently store in 'vars' + * without having to reallocate more space. We must always have max_vars >= + * nvars. + */ + int max_vars; + + bool vars_sorted; /* are variables sorted by name? */ +} Variables; + #define MAX_SCRIPTS 128 /* max number of SQL scripts allowed */ #define SHELL_COMMAND_SIZE 256 /* maximum size allowed for shell command */ @@ -460,9 +484,7 @@ typedef struct int command; /* command number in script */ /* client variables */ - Variable *variables; /* array of variable definitions */ - int nvariables; /* number of variables */ - bool vars_sorted; /* are variables sorted by name? */ + Variables variables; /* various times about current transaction in microseconds */ pg_time_usec_t txn_scheduled; /* scheduled start time of transaction */ @@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2) /* Locate a variable by name; returns NULL if unknown */ static Variable * -lookupVariable(CState *st, char *name) +lookupVariable(Variables *variables, char *name) { Variable key; /* On some versions of Solaris, bsearch of zero items dumps core */ - if (st->nvariables <= 0) + if (variables->nvars <= 0) return NULL; /* Sort if we have to */ - if (!st->vars_sorted) + if (!variables->vars_sorted) { - qsort((void *) st->variables, st
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Hi Ishii-san, On Sun, 20 Mar 2022 09:52:06 +0900 (JST) Tatsuo Ishii wrote: > Hi Yugo, > > I have looked into the patch and I noticed that linkend=... endterm=...> is used in pgbench.sgml. e.g. > > > > AFAIK this is the only place where "endterm" is used. In other places > "link" tag is used instead: Thank you for pointing out it. I've checked other places using referring to , and found that "xreflabel"s are used in such tags. So, I'll fix it in this style. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Sun, 20 Mar 2022 16:11:43 +0900 (JST) Tatsuo Ishii wrote: > > Hi Yugo, > > > > I tested with serialization error scenario by setting: > > default_transaction_isolation = 'repeatable read' > > The result was: > > > > $ pgbench -t 10 -c 10 --max-tries=10 test > > transaction type: > > scaling factor: 10 > > query mode: simple > > number of clients: 10 > > number of threads: 1 > > maximum number of tries: 10 > > number of transactions per client: 10 > > number of transactions actually processed: 100/100 > > number of failed transactions: 0 (0.000%) > > number of transactions retried: 35 (35.000%) > > total number of retries: 74 > > latency average = 5.306 ms > > initial connection time = 15.575 ms > > tps = 1884.516810 (without initial connection time) > > > > I had hard time to understand what those numbers mean: > > number of transactions retried: 35 (35.000%) > > total number of retries: 74 > > > > It seems "total number of retries" matches with the number of ERRORs > > reported in PostgreSQL. Good. What I am not sure is "number of > > transactions retried". What does this mean? > > Oh, ok. I see it now. It turned out that "number of transactions > retried" does not actually means the number of transactions > rtried. Suppose pgbench exectutes following in a session: > > BEGIN;-- transaction A starts > : > (ERROR) > ROLLBACK; -- transaction A aborts > > (retry) > > BEGIN;-- transaction B starts > : > (ERROR) > ROLLBACK; -- transaction B aborts > > (retry) > > BEGIN;-- transaction C starts > : > END; -- finally succeeds > > In this case "total number of retries:" = 2 and "number of > transactions retried:" = 1. In this patch transactions A, B and C are > regarded as "same" transaction, so the retried transaction count > becomes 1. But it's confusing to use the language "transaction" here > because A, B and C are different transactions. I would think it's > better to use different language instead of "transaction", something > like "cycle"? i.e. > > number of cycles retried: 35 (35.000%) In the original patch by Marina Polyakova it was "number of retried", but I changed it to "number of transactions retried" is because I felt it was confusing with "number of retries". I chose the word "transaction" because a transaction ends in any one of successful commit , skipped, or failure, after possible retries. Well, I agree with that it is somewhat confusing wording. If we can find nice word to resolve the confusion, I don't mind if we change the word. Maybe, we can use "executions" as well as "cycles". However, I am not sure that the situation is improved by using such word because what such word exactly means seems to be still unclear for users. Another idea is instead reporting only "the number of successfully retried transactions" that does not include "failed transactions", that is, transactions failed after retries, like this; number of transactions actually processed: 100/100 number of failed transactions: 0 (0.000%) number of successfully retried transactions: 35 (35.000%) total number of retries: 74 The meaning is clear and there seems to be no confusion. Regards, Yugo Nagata -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Tue, 22 Mar 2022 09:08:15 +0900 Yugo NAGATA wrote: > Hi Ishii-san, > > On Sun, 20 Mar 2022 09:52:06 +0900 (JST) > Tatsuo Ishii wrote: > > > Hi Yugo, > > > > I have looked into the patch and I noticed that > linkend=... endterm=...> is used in pgbench.sgml. e.g. > > > > > > > > AFAIK this is the only place where "endterm" is used. In other places > > "link" tag is used instead: > > Thank you for pointing out it. > > I've checked other places using referring to , and found > that "xreflabel"s are used in such tags. So, I'll fix it > in this style. I attached the updated patch. I also fixed the following paragraph which I had forgotten to fix in the previous patch. The first seven lines report some of the most important parameter settings. The sixth line reports the maximum number of tries for transactions with serialization or deadlock errors Regards, Yugo Nagata -- Yugo NAGATA >From b9f993e81836c4379478809da0e690023c319038 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Wed, 26 May 2021 16:58:36 +0900 Subject: [PATCH v18 1/2] Pgbench errors: use the Variables structure for client variables This is most important when it is used to reset client variables during the repeating of transactions after serialization/deadlock failures. Don't allocate Variable structs one by one. Instead, add a constant margin each time it overflows. --- src/bin/pgbench/pgbench.c | 163 +++--- 1 file changed, 100 insertions(+), 63 deletions(-) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 000ffc4a5c..ab2c5dfc5f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -289,6 +289,12 @@ const char *progname; volatile bool timer_exceeded = false; /* flag from signal handler */ +/* + * We don't want to allocate variables one by one; for efficiency, add a + * constant margin each time it overflows. + */ +#define VARIABLES_ALLOC_MARGIN 8 + /* * Variable definitions. * @@ -306,6 +312,24 @@ typedef struct PgBenchValue value; /* actual variable's value */ } Variable; +/* + * Data structure for client variables. + */ +typedef struct +{ + Variable *vars; /* array of variable definitions */ + int nvars; /* number of variables */ + + /* + * The maximum number of variables that we can currently store in 'vars' + * without having to reallocate more space. We must always have max_vars >= + * nvars. + */ + int max_vars; + + bool vars_sorted; /* are variables sorted by name? */ +} Variables; + #define MAX_SCRIPTS 128 /* max number of SQL scripts allowed */ #define SHELL_COMMAND_SIZE 256 /* maximum size allowed for shell command */ @@ -460,9 +484,7 @@ typedef struct int command; /* command number in script */ /* client variables */ - Variable *variables; /* array of variable definitions */ - int nvariables; /* number of variables */ - bool vars_sorted; /* are variables sorted by name? */ + Variables variables; /* various times about current transaction in microseconds */ pg_time_usec_t txn_scheduled; /* scheduled start time of transaction */ @@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2) /* Locate a variable by name; returns NULL if unknown */ static Variable * -lookupVariable(CState *st, char *name) +lookupVariable(Variables *variables, char *name) { Variable key; /* On some versions of Solaris, bsearch of zero items dumps core */ - if (st->nvariables <= 0) + if (variables->nvars <= 0) return NULL; /* Sort if we have to */ - if (!st->vars_sorted) + if (!variables->vars_sorted) { - qsort((void *) st->variables, st->nvariables, sizeof(Variable), + qsort((void *) variables->vars, variables->nvars, sizeof(Variable), compareVariableNames); - st->vars_sorted = true; + variables->vars_sorted = true; } /* Now we can search */ key.name = name; return (Variable *) bsearch((void *) &key, -(void *) st->variables, -st->nvariables, +(void *) variables->vars, +variables->nvars, sizeof(Variable), compareVariableNames); } /* Get the value of a variable, in string form; returns NULL if unknown */ static char * -getVariable(CState *st, char *name) +getVariable(Variables *variables, char *name) { Variable *var; char stringform[64]; - var = lookupVariable(st, name); + var = lookupVariable(variables, name); if (var == NULL) return NULL; /* not found */ @@ -1562,21 +1584,37 @@ valid_variable_name(const char *name) return true; } +/* + * Make sure there is enough space for 'needed' more variable in the variables + * array. + */ +static void +enlargeVariables(Variables *variables, int needed) +{ + /* total number of variables required no
Re: Tab completion for ALTER MATERIALIZED VIEW ... SET ACCESS METHOD
On Sat, 19 Mar 2022 19:31:59 +0900 Michael Paquier wrote: > On Fri, Mar 18, 2022 at 03:13:05PM +0900, Michael Paquier wrote: > > Thanks. This looks rather sane to me. I'll split things into 3 > > commits in total, as of the psql completion, SET TABLESPACE and SET > > ACCESS METHOD. The first and third patches are only for HEAD, while > > the documentation hole with SET TABLESPACE should go down to v10. > > Backpatching the tests of ALTER MATERIALIZED VIEW ALL IN TABLESPACE > > would not hurt, either, as there is zero coverage for that now. > > I have applied the set, after splitting things mostly as mentioned > upthread: > - The doc change for SET TABLESPACE on ALTER MATVIEW has been > backpatched. > - The regression tests for SET TABLESPACE have been applied on HEAD, > as these are independent of the rest, good on their own. > - All the remaining parts for SET ACCESS METHOD (psql tab completion, > tests and docs) have been merged together on HEAD. I could not > understand why the completion done after SET ACCESS METHOD was not > grouped with the other parts for ALTER MATVIEW, so I have moved the > new entry there, and I have added a test checking after an error for > multiple subcommands, while on it. > > Thanks, Nagata-san! Thank you! -- Yugo NAGATA
Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
On Wed, 23 Mar 2022 14:26:54 -0400 Tom Lane wrote: > Tatsuo Ishii writes: > > The patch Pushed. Thank you! > > My hoary animal prairiedog doesn't like this [1]: > > # Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) > got an error in command 3 \\(SQL\\) of script 0; ERROR: could not serialize > access due to concurrent update\\b.*\\g1)/' > # at t/001_pgbench_with_server.pl line 1229. > # 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: > 2 nxacts: 1 dbName: postgres > ... > # pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: > could not serialize access due to concurrent update > ... > # ' > # doesn't match '(?s-xim:client (0|1) got an error in command 3 \\(SQL\\) > of script 0; ERROR: could not serialize access due to concurrent > update\\b.*\\g1)' > # Looks like you failed 1 test of 425. > > I'm not sure what the "\\b.*\\g1" part of this regex is meant to > accomplish, but it seems to be assuming more than it should > about the output format of TAP messages. I have edited the test code from the original patch by mistake, but I could not realize because the test works in my machine without any errors somehow. I attached a patch to fix the test as was in the original patch, where backreferences are used to check retry of the same query. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index d173ceae7a..3eb5905e5a 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -1222,7 +1222,8 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read"; # Check that we have a serialization error and the same random value of the # delta variable in the next try my $err_pattern = -"client (0|1) got an error in command 3 \\(SQL\\) of script 0; " + "(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*" + . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; " . "ERROR: could not serialize access due to concurrent update\\b.*" . "\\g1";
Re: Implementing Incremental View Maintenance
Hi, Attached is the revised patch (v20) to add support for Incremental Materialized View Maintenance (IVM). In according with Konstantin's suggestion, I made a few optimizations. 1. Creating an index on the matview automatically When creating incremental maintainable materialized view (IMMV)s, a unique index on IMMV is created automatically if possible. If the view definition query has a GROUP BY clause, the index is created on the columns of GROUP BY expressions. Otherwise, if the view contains all primary key attributes of its base tables in the target list, the index is created on these attributes. Also, if the view has DISTINCT, a unique index is created on all columns in the target list. In other cases, no index is created. In all cases, a NOTICE message is output to inform users that an index is created or that an appropriate index is necessary for efficient IVM. 2. Use a weaker lock on the matview if possible If the view has only one base table in this query, RowExclusiveLock is held on the view instead of AccessExclusiveLock, because we don't need to wait other concurrent transaction's result in order to maintain the view in this case. When the same row in the view is affected due to concurrent maintenances, a row level lock will protect it. On Tue, 24 Nov 2020 12:46:57 +0300 Konstantin Knizhnik wrote: > The most obvious optimization is not to use exclusive table lock if view > depends just on one table (contains no joins). > Looks like there are no any anomalies in this case, are there? I confirmed the effect of this optimizations. First, when I performed pgbench (SF=100) without any materialized views, the results is : pgbench test4 -T 300 -c 8 -j 4 latency average = 6.493 ms tps = 1232.146229 (including connections establishing) Next, created a view as below, I performed the same pgbench. CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS SELECT bid, count(abalance), sum(abalance), avg(abalance) FROM pgbench_accounts GROUP BY bid; The result is here: [the previous version (v19 with exclusive table lock)] - latency average = 77.677 ms - tps = 102.990159 (including connections establishing) [In the latest version (v20 with weaker lock)] - latency average = 17.576 ms - tps = 455.159644 (including connections establishing) There is still substantial overhead, but we can see that the effect of the optimization. Regards, Yugo Nagata -- Yugo NAGATA IVM_patches_v20.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
Hi hackers, I heard the opinion that this patch is too big and hard to review. So, I wander that we should downsize the patch by eliminating some features and leaving other basic features. If there are more opinions this makes it easer for reviewers to look at this patch, I would like do it. If so, we plan to support only selection, projection, inner-join, and some aggregates in the first release and leave sub-queries, outer-join, and CTE supports to the next release. Regards, Yugo Nagata On Tue, 22 Dec 2020 21:51:36 +0900 Yugo NAGATA wrote: > Hi, > > Attached is the revised patch (v20) to add support for Incremental > Materialized View Maintenance (IVM). > > In according with Konstantin's suggestion, I made a few optimizations. > > 1. Creating an index on the matview automatically > > When creating incremental maintainable materialized view (IMMV)s, > a unique index on IMMV is created automatically if possible. > > If the view definition query has a GROUP BY clause, the index is created > on the columns of GROUP BY expressions. Otherwise, if the view contains > all primary key attributes of its base tables in the target list, the index > is created on these attributes. Also, if the view has DISTINCT, > a unique index is created on all columns in the target list. > In other cases, no index is created. > > In all cases, a NOTICE message is output to inform users that an index is > created or that an appropriate index is necessary for efficient IVM. > > 2. Use a weaker lock on the matview if possible > > If the view has only one base table in this query, RowExclusiveLock is > held on the view instead of AccessExclusiveLock, because we don't > need to wait other concurrent transaction's result in order to > maintain the view in this case. When the same row in the view is > affected due to concurrent maintenances, a row level lock will > protect it. > > On Tue, 24 Nov 2020 12:46:57 +0300 > Konstantin Knizhnik wrote: > > > The most obvious optimization is not to use exclusive table lock if view > > depends just on one table (contains no joins). > > Looks like there are no any anomalies in this case, are there? > > I confirmed the effect of this optimizations. > > First, when I performed pgbench (SF=100) without any materialized views, > the results is : > > pgbench test4 -T 300 -c 8 -j 4 > latency average = 6.493 ms > tps = 1232.146229 (including connections establishing) > > Next, created a view as below, I performed the same pgbench. > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS > SELECT bid, count(abalance), sum(abalance), avg(abalance) > FROM pgbench_accounts GROUP BY bid; > > The result is here: > > [the previous version (v19 with exclusive table lock)] > - latency average = 77.677 ms > - tps = 102.990159 (including connections establishing) > > [In the latest version (v20 with weaker lock)] > - latency average = 17.576 ms > - tps = 455.159644 (including connections establishing) > > There is still substantial overhead, but we can see that the effect > of the optimization. > > Regards, > Yugo Nagata > > -- > Yugo NAGATA -- Yugo NAGATA
Re: Is Recovery actually paused?
On Thu, 11 Feb 2021 16:36:55 +0530 Dilip Kumar wrote: > On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy > wrote: > > > > On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar wrote: > > > > > > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar > > > wrote: > > > > > > > > I don't find any problem with this approach as well, but I personally > > > > feel that the other approach where we don't wait in any API and just > > > > return the recovery pause state is much simpler and more flexible. So > > > > I will make the pending changes in that patch and let's see what are > > > > the other opinion and based on that we can conclude. Thanks for the > > > > patch. I don't think that we need to include the waiting approach in pg_get_wal_replay_pause_state patch. However, Horiguchi-san's patch may be useful for some users who want pg_wal_replay_pause to wait until recovery gets paused instead of polling the state from applications. So, I shink we could discuss this patch in another thread as another commitfest entry independent from pg_get_wal_replay_pause_state. > > > Here is an updated version of the patch which fixes the last two open > > > problems > > > 1. In RecoveryRequiresIntParameter set the recovery pause state in the > > > loop so that if recovery resumed and pause requested again we can set > > > to pause again. > > > 2. If the recovery state is already 'paused' then don't set it back to > > > the 'pause requested'. > > > > > > One more point is that in 'pg_wal_replay_pause' even if we don't > > > change the state because it was already set to the 'paused' then also > > > we call the WakeupRecovery. But I don't think there is any problem > > > with that, if we think that this should be changed then we can make > > > SetRecoveryPause return a bool such that if it doesn't do state change > > > then it returns false and in that case we can avoid calling > > > WakeupRecovery, but I felt that is unnecessary. Any other thoughts on > > > this? > > > > IMO, that WakeupRecovery should not be a problem, because even now, if > > we issue a simple select pg_reload_conf(); (without even changing any > > config parameter), WakeupRecovery gets called. > > > > Thanks for the patch. I tested the new function and it works as > > expected. I have no further comments on the v13 patch. > > Thanks for the review and testing. I have no futher comments on the v13 patch, too. Also, I agree with Robert Haas's suggestions. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
Hi, Attached is a rebased patch (v22a). Ragards, Yugo Nagata -- Yugo NAGATA IVM_patches_v22a.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
On Thu, 18 Feb 2021 19:38:44 +0800 Andy Fan wrote: > On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA wrote: > > > Hi, > > > > Attached is a rebased patch (v22a). > > > > Thanks for the patch. Will you think posting a patch with the latest commit > at that > time is helpful? If so, when others want to review it, they know which > commit to > apply the patch without asking for a new rebase usually. I rebased the patch because cfbot failed. http://cfbot.cputube.org/ Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Mon, 8 Mar 2021 15:42:00 -0500 Andrew Dunstan wrote: > > On 2/18/21 9:01 PM, Yugo NAGATA wrote: > > On Thu, 18 Feb 2021 19:38:44 +0800 > > Andy Fan wrote: > > > >> On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA wrote: > >> > >>> Hi, > >>> > >>> Attached is a rebased patch (v22a). > >>> > >> Thanks for the patch. Will you think posting a patch with the latest commit > >> at that > >> time is helpful? If so, when others want to review it, they know which > >> commit to > >> apply the patch without asking for a new rebase usually. > > I rebased the patch because cfbot failed. > > http://cfbot.cputube.org/ > > > > It's bitrotted a bit more dues to commits bb437f995d and 25936fd46c Thank you for letting me konw. I'll rebase it soon. > > > (A useful feature of the cfbot might be to notify the authors and > reviewers when it detects bitrot for a previously passing entry.) +1 The feature notifying it authors seems to me nice. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Tue, 9 Mar 2021 09:20:49 +0900 Yugo NAGATA wrote: > On Mon, 8 Mar 2021 15:42:00 -0500 > Andrew Dunstan wrote: > > > > > On 2/18/21 9:01 PM, Yugo NAGATA wrote: > > > On Thu, 18 Feb 2021 19:38:44 +0800 > > > Andy Fan wrote: > > > > > >> On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA wrote: > > >> > > >>> Hi, > > >>> > > >>> Attached is a rebased patch (v22a). > > >>> > > >> Thanks for the patch. Will you think posting a patch with the latest > > >> commit > > >> at that > > >> time is helpful? If so, when others want to review it, they know which > > >> commit to > > >> apply the patch without asking for a new rebase usually. > > > I rebased the patch because cfbot failed. > > > http://cfbot.cputube.org/ > > > > > > > It's bitrotted a bit more dues to commits bb437f995d and 25936fd46c > > Thank you for letting me konw. I'll rebase it soon. Done. Attached is a rebased patch set. Regards, Yugo Nagata -- Yugo NAGATA IVM_patches_v22b.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
Hi Anastasia Lubennikova, I am writing this to you because I would like to ask the commitfest manager something. The status of the patch was changed to "Waiting on Author" from "Ready for Committer" at the beginning of this montfor the reason that rebase was necessary. Now I updated the patch, so can I change the status back to "Ready for Committer"? Regards, Yugo Nagata On Mon, 5 Oct 2020 18:16:18 +0900 Yugo NAGATA wrote: > Hi, > > Attached is the rebased patch (v18) to add support for Incremental > Materialized View Maintenance (IVM). It is able to be applied to > current latest master branch. -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Tue, 27 Oct 2020 12:14:52 -0400 Adam Brusselback wrote: > That was a good bit more work to get ready than I expected. It's broken > into two scripts, one to create the schema, the other to load data and > containing a couple check queries to ensure things are working properly > (checking the materialized tables against a regular view for accuracy). Thank you very much! I am really grateful. > The first test case is to give us a definitive result on what "agreed > pricing" is in effect at a point in time based on a product hierarchy > our customers setup, and allow pricing to be set on nodes in that > hierarchy, as well as specific products (with an order of precedence). > The second test case maintains some aggregated amounts / counts / boolean > logic at an "invoice" level for all the detail lines which make up that > invoice. > > Both of these are real-world use cases which were simplified a bit to make > them easier to understand. We have other use cases as well, but with how > much time this took to prepare i'll keep it at this for now. > If you need anything clarified or have any issues, just let me know. Although I have not look into it in details yet, in my understanding, it seems that materialized views are used to show "pricing" or "invoice" information before the order is confirmed, that is, before the transaction is committed. Definitely, these will be use cases where immediate view maintenance is useful. I am happy because I found concrete use cases of immediate IVM. However, unfortunately, the view definitions in your cases are complex, and the current implementation of the patch doesn't support it. We would like to improve the feature in future so that more complex views could benefit from IVM. Regards, Yugo Nagata > On Fri, Oct 23, 2020 at 3:58 AM Yugo NAGATA wrote: > > > Hi Adam, > > > > On Thu, 22 Oct 2020 10:07:29 -0400 > > Adam Brusselback wrote: > > > > > Hey there Yugo, > > > I've asked a coworker to prepare a self contained example that > > encapsulates > > > our multiple use cases. > > > > Thank you very much! > > > > > The immediate/eager approach is exactly what we need, as within the same > > > transaction we have statements that can cause one of those "materialized > > > tables" to be updated, and then sometimes have the need to query that > > > "materialized table" in a subsequent statement and need to see the > > changes > > > reflected. > > > > The proposed patch provides the exact this feature and I think this will > > meet > > your needs. > > > > > As soon as my coworker gets that example built up I'll send a followup > > with > > > it attached. > > > > Great! We are looking forward to it. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA > > -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Wed, 28 Oct 2020 12:01:58 +0300 Anastasia Lubennikova wrote: > ср, 28 окт. 2020 г. в 08:02, Yugo NAGATA : > > > Hi Anastasia Lubennikova, > > > > I am writing this to you because I would like to ask the commitfest > > manager something. > > > > The status of the patch was changed to "Waiting on Author" from > > "Ready for Committer" at the beginning of this montfor the reason > > that rebase was necessary. Now I updated the patch, so can I change > > the status back to "Ready for Committer"? > > > > Regards, > > Yugo Nagata > > > > > Yes, go ahead. As far as I see, the patch is in a good shape and there are > no unanswered questions from reviewers. > Feel free to change the status of CF entries, when it seems reasonable to > you. Thank you for your response! I get it. > P.S. Please, avoid top-posting, It makes it harder to follow the > discussion, in-line replies are customary in pgsql mailing lists. > See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details. I understand it. Regards, Yugo Nagata > -- > Best regards, > Lubennikova Anastasia -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Thu, 5 Nov 2020 22:58:25 -0600 Justin Pryzby wrote: > On Mon, Oct 05, 2020 at 06:16:18PM +0900, Yugo NAGATA wrote: > This needs to be rebased again - the last version doesn't apply anymore. > http://cfbot.cputube.org/yugo-nagata.html I attached the rebased patch (v19). > I looked though it a bit and attach some fixes to the user-facing docs. Thank you for pointing out a lot of typos and making the patch to fix it! Your fixes are included in the latest patch. > There's some more typos in the source that I didn't fix: > constrains > materliazied > cluase > immediaite > clumn > Temrs > migth > recalculaetd > speified > secuirty > > commit message: comletion > > psql and pg_dump say 13 but should say 14 now: > pset.sversion >= 13 These were also fixed. > # bag union > big union? "bag union" is union operation of bag (multi-set) that does not eliminate duplicate of tuples. > + relisivm bool > + > + > + True if materialized view enables incremental view maintenance > > This isn't clear, but I think it should say "True for materialized views which > are enabled for incremental view maintenance (IVM)." Yes, you are right. I also fixed it in this way. Regards, Yugo Nagata -- Yugo NAGATA IVM_patches_v19.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
On Wed, 11 Nov 2020 19:10:35 +0300 Konstantin Knizhnik wrote: Thank you for reviewing this patch! > > The patch is not applied to the current master because makeFuncCall > prototype is changed, > I fixed it by adding COAERCE_CALL_EXPLICIT. The rebased patch was submitted. > Ooops! Now TPS are much lower: > > tps = 141.767347 (including connections establishing) > > Speed of updates is reduced more than 70 times! > Looks like we loose parallelism because almost the same result I get > with just one connection. As you and Ishii-san mentioned in other posts, I think the reason would be a table lock on the materialized view that is acquired during view maintenance. I will explain more a bit in another post. > 4. Finally let's create one more view (it is reasonable to expect that > analytics will run many different queries and so need multiple views). > > create incremental materialized view teller_avgs as select > t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on > a.bid=t.bid group by t.tid; > > It is great that not only simple aggregates like SUM are supported, but > also AVG. > But insertion speed now is reduced twice - 72TPS. Yes, the current implementation takes twice time for updating a table time when a new incrementally maintainable materialized view is defined on the table because view maintenance is performed for each view. > > So good news is that incremental materialized views really work. > And bad news is that maintenance overhead is too large which > significantly restrict applicability of this approach. > Certainly in case of dominated read-only workload such materialized > views can significantly improve performance. > But unfortunately my dream that them allow to combine OLAP+OLPT is not > currently realized. As you concluded, there is a large overhead on updating base tables in the current implementation because it is immediate maintenance in which the view is updated in the same sentence where its base table is modified. Therefore, this is not suitable to OLTP workload where there are frequent updates of tables. For suppressing maintenance overhead in such workload, we have to implement "deferred maintenance" which collects table change logs and updates the view in another transaction afterward. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
non-incremental). > Time of creation of non-incremental materialized view is about 18 seconds: > > postgres=# create materialized view teller_avgs as select > t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on > a.bid=t.bid group by t.tid; > SELECT 1000 > Time: 17795.395 ms (00:17.795) > > But refresh of such view takes 55 seconds: > > postgres=# refresh materialized view teller_avgs; > REFRESH MATERIALIZED VIEW > Time: 55500.381 ms (00:55.500) Hmm, interesting... I would like to investigate this issue, too. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Tue, 24 Nov 2020 12:46:57 +0300 Konstantin Knizhnik wrote: > > > On 24.11.2020 12:21, Yugo NAGATA wrote: > > > >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 > >> connections. > >> It is still about 7 times slower than performance without incremental view. > >> But now the gap is not so dramatic. And it seems to be clear that this > >> exclusive lock on matview is real show stopper for concurrent updates. > >> I do not know which race conditions and anomalies we can get if replace > >> table-level lock with row-level lock here. > > I explained it here: > > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > > > For example, suppose there is a view V = R*S that joins tables R and S, > > and there are two concurrent transactions T1 which changes table R to R' > > and T2 which changes S to S'. Without any lock, in READ COMMITTED mode, > > V would be updated to R'*S in T1, and R*S' in T2, so it would cause > > inconsistency. By locking the view V, transactions T1, T2 are processed > > serially and this inconsistency can be avoided. > > > > Especially, suppose that tuple dR is inserted into R in T1, and dS is > > inserted into S in T2, where dR and dS will be joined in according to > > the view definition. In this situation, without any lock, the change of V is > > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not > > be included in the results. This inconsistency could not be resolved by > > row-level lock. > > > >> But I think that this problem should be addressed in any case: single > >> client update mode is very rare scenario. > > This behavior is explained in rules.sgml like this: > > > > + > > +Concurrent Transactions > > + > > +Suppose an IMMV is defined on two base tables and > > each > > +table was modified in different a concurrent transaction > > simultaneously. > > +In the transaction which was committed first, IMMV > > can > > +be updated considering only the change which happened in this > > transaction. > > +On the other hand, in order to update the view correctly in the > > transaction > > +which was committed later, we need to know the changes occurred in > > +both transactions. For this reason, ExclusiveLock > > +is held on an IMMV immediately after a base table is > > +modified in READ COMMITTED mode to make sure that > > +the IMMV is updated in the latter transaction after > > +the former transaction is committed. In REPEATABLE > > READ > > +or SERIALIZABLE mode, an error is raised immediately > > +if lock acquisition fails because any changes which occurred in > > +other transactions are not be visible in these modes and > > +IMMV cannot be updated correctly in such situations. > > + > > + > > > > Hoever, should we describe explicitly its impact on performance here? > > > > Sorry, I didn't think much about this problem. > But I think that it is very important to try to find some solution of > the problem. > The most obvious optimization is not to use exclusive table lock if view > depends just on one table (contains no joins). > Looks like there are no any anomalies in this case, are there? Thank you for your suggestion! That makes sense. > Yes, most analytic queries contain joins (just two queries among 22 > TPC-H have no joins). > So may be this optimization will not help much. Yes, but if a user want to incrementally maintain only aggregate views on a large table, like TPC-H Q1, it will be helpful. For this optimization, we have to only check the number of RTE in the rtable list and it would be cheap. > I wonder if it is possible to somehow use predicate locking mechanism of > Postgres to avoid this anomalies without global lock? You mean that, ,instead of using any table lock, if any possibility of the anomaly is detected using predlock mechanism then abort the transaction? I don't have concrete idea to implement it and know if it is possible yet, but I think it is worth to consider this. Thanks. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Wed, 25 Nov 2020 15:16:05 +0300 Konstantin Knizhnik wrote: > > > On 24.11.2020 13:11, Yugo NAGATA wrote: > > > >> I wonder if it is possible to somehow use predicate locking mechanism of > >> Postgres to avoid this anomalies without global lock? > > You mean that, ,instead of using any table lock, if any possibility of the > > anomaly is detected using predlock mechanism then abort the transaction? > > Yes. If both transactions are using serializable isolation level, then > lock is not needed, isn't it? > So at least you can add yet another simple optimization: if transaction > has serializable isolation level, > then exclusive lock is not required. As long as we use the trigger approach, we can't handle concurrent view maintenance in either repeatable read or serializable isolation level. It is because one transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS) in such cases, and we cannot get the incremental change on the view (dV=dR*dS). Therefore, in the current implementation, the transaction is aborted when the concurrent view maintenance happens in repeatable read or serializable. > But I wonder if we can go further so that even if transaction is using > read-committed or repeatable-read isolation level, > we still can replace exclusive table lock with predicate locks. > > The main problem with this approach (from my point of view) is the > predicate locks are able to detect conflict but not able to prevent it. > I.e. if such conflict is detected then transaction has to be aborted. > And it is not always desirable, especially because user doesn't expect > it: how can insertion of single record with unique keys in a table cause > transaction conflict? > And this is what will happen in your example with transactions T1 and T2 > inserting records in R and S tables. Yes. I wonder that either aborting transaction or waiting on locks is unavoidable when a view is incrementally updated concurrently (at least in the immediate maintenance where a view is update in the same transaction that updates the base table). > And what do you think about backrgound update of materialized view? > On update/insert trigger will just add record to some "delta" table and > then some background worker will update view. > Certainly in this case we loose synchronization between main table and > materialized view (last one may contain slightly deteriorated data). > But in this case no exclusive lock is needed, isn't it? Of course, we are considering this type of view maintenance. This is deferred maintenance where a view is update after the transaction that updates the base tables is committed. Views can be updated in bacground in a appropreate timing or as a response of a user command. To implement this, we needs a mechanism to maintain change logs which records changes of base tables. We think that implementing this infrastructure is not trivial work, so, in the first patch proposal, we decided to start from immediate approach which needs less code. -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Wed, 25 Nov 2020 18:00:16 +0300 Konstantin Knizhnik wrote: > > > On 25.11.2020 16:06, Yugo NAGATA wrote: > > On Wed, 25 Nov 2020 15:16:05 +0300 > > Konstantin Knizhnik wrote: > > > >> > >> On 24.11.2020 13:11, Yugo NAGATA wrote: > >>>> I wonder if it is possible to somehow use predicate locking mechanism of > >>>> Postgres to avoid this anomalies without global lock? > >>> You mean that, ,instead of using any table lock, if any possibility of the > >>> anomaly is detected using predlock mechanism then abort the transaction? > >> Yes. If both transactions are using serializable isolation level, then > >> lock is not needed, isn't it? > >> So at least you can add yet another simple optimization: if transaction > >> has serializable isolation level, > >> then exclusive lock is not required. > > As long as we use the trigger approach, we can't handle concurrent view > > maintenance > > in either repeatable read or serializable isolation level. It is because > > one > > transaction (R= R+dR) cannot see changes occurred in another transaction > > (S'= S+dS) > > in such cases, and we cannot get the incremental change on the view > > (dV=dR*dS). > > Therefore, in the current implementation, the transaction is aborted when > > the > > concurrent view maintenance happens in repeatable read or serializable. > > Sorry, may be I do not correctly understand you or you do not understand me. > Lets consider two serializable transactions (I do not use view or > triggers, but perform correspondent updates manually): > > > > create table t(pk integer, val int); > create table mat_view(gby_key integer primary key, total bigint); > insert into t values (1,0),(2,0); > insert into mat_view values (1,0),(2,0); > > Session 1: Session 2: > > begin isolation level serializable; > begin isolation level serializable; > insert into t values (1,200); insert into t > values (1,300); > update mat_view set total=total+200 where gby_key=1; > update mat_view set total=total+300 where gby_key=1; > > commit; > ERROR: could not serialize access due to concurrent update > > So both transactions are aborted. > It is expected behavior for serializable transactions. > But if transactions updating different records of mat_view, then them > can be executed concurrently: > > Session 1: Session 2: > > begin isolation level serializable; > begin isolation level serializable; > insert into t values (1,200); insert into t > values (2,300); > update mat_view set total=total+200 where gby_key=1; > update mat_view set total=total+300 where gby_key=2; > commit; commit; > > So, if transactions are using serializable isolation level, then we can > update mat view without exclusive lock > and if there is not conflict, this transaction can be executed concurrently. > > Please notice, that exclusive lock doesn't prevent conflict in first case: > > Session 1: Session 2: > > begin isolation level serializable; > begin isolation level serializable; > insert into t values (1,200); insert into t > values (1,300); > lock table mat_view; > update mat_view set total=total+200 where gby_key=1; > lock table mat_view; > > commit; > update mat_view set total=total+300 where gby_key=1; > commit; > ERROR: could not serialize access due to concurrent update > > > So do you agree that there are no reasons for using explicit lock for > serializable transactions? Yes, I agree. I said an anomaly could occur in repeatable read and serializable isolation level, but it was wrong. In serializable, the transaction will be aborted in programmable cases due to predicate locks, and we don't need the lock. However, in repeatable read, the anomaly still could occurs when the table is defined on more than one base tables even if we lock the view. To prevent it, the only way I found is aborting the transaction forcedly in such cases for now. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Thu, 22 Oct 2020 20:36:48 +0530 Dilip Kumar wrote: > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar wrote: > > > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi > > wrote: > > > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas > > > wrote in > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar > > > > wrote: > > > > > One idea could be, if the recovery process is waiting for WAL and a > > > > > recovery pause is requested then we can assume that the recovery is > > > > > paused because before processing the next wal it will always check > > > > > whether the recovery pause is requested or not. > > > .. > > > > However, it might be better to implement this by having the system > > > > absorb the pause immediately when it's in this state, rather than > > > > trying to detect this state and treat it specially. > > > > > > The paused state is shown in pg_stat_activity.wait_event and it is > > > strange that pg_is_wal_replay_paused() is inconsistent with the > > > column. > > > > Right > > > > To make them consistent, we need to call recoveryPausesHere() > > > at the end of WaitForWALToBecomeAvailable() and let > > > pg_wal_replay_pause() call WakeupRecovery(). > > > > > > I think we don't need a separate function to find the state. > > > > The idea makes sense to me. I will try to change the patch as per the > > suggestion. > > Here is the patch based on this idea. I reviewd this patch. First, I made a recovery conflict situation using a table lock. Standby: #= begin; #= select * from t; Primary: #= begin; #= lock t in ; After this, WAL of the table lock cannot be replayed due to a lock acquired in the standby. Second, during the delay, I executed pg_wal_replay_pause() and pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until max_standby_streaming_delay was expired, and eventually returned true. I can also see the same behaviour by setting recovery_min_apply_delay. So, pg_is_wal_replay_paused waits for recovery to get paused and this works successfully as expected. However, I wonder users don't expect pg_is_wal_replay_paused to wait. Especially, if max_standby_streaming_delay is -1, this will be blocked forever, although this setting may not be usual. In addition, some users may set recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, it could wait for a long time. At least, I think we need some descriptions on document to explain pg_is_wal_replay_paused could wait while a time. Also, how about adding a new boolean argument to pg_is_wal_replay_paused to control whether this waits for recovery to get paused or not? By setting its default value to true or false, users can use the old format for calling this and the backward compatibility can be maintained. As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. + errhint("Recovery control functions can only be executed during recovery."))); There are a few tabs at the end of this line. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
Hi, Attached is the revised patch (v21) to add support for Incremental Materialized View Maintenance (IVM). In addition to some typos in the previous enhancement, I fixed a check to prevent a view from containing an expression including aggregates like sum(x)/sum(y) in this revision. Regards, Yugo Nagata On Tue, 22 Dec 2020 22:24:22 +0900 Yugo NAGATA wrote: > Hi hackers, > > I heard the opinion that this patch is too big and hard to review. > So, I wander that we should downsize the patch by eliminating some > features and leaving other basic features. > > If there are more opinions this makes it easer for reviewers to look > at this patch, I would like do it. If so, we plan to support only > selection, projection, inner-join, and some aggregates in the first > release and leave sub-queries, outer-join, and CTE supports to the > next release. > > Regards, > Yugo Nagata > > On Tue, 22 Dec 2020 21:51:36 +0900 > Yugo NAGATA wrote: > > Hi, > > > > Attached is the revised patch (v20) to add support for Incremental > > Materialized View Maintenance (IVM). > > > > In according with Konstantin's suggestion, I made a few optimizations. > > > > 1. Creating an index on the matview automatically > > > > When creating incremental maintainable materialized view (IMMV)s, > > a unique index on IMMV is created automatically if possible. > > > > If the view definition query has a GROUP BY clause, the index is created > > on the columns of GROUP BY expressions. Otherwise, if the view contains > > all primary key attributes of its base tables in the target list, the index > > is created on these attributes. Also, if the view has DISTINCT, > > a unique index is created on all columns in the target list. > > In other cases, no index is created. > > > > In all cases, a NOTICE message is output to inform users that an index is > > created or that an appropriate index is necessary for efficient IVM. > > > > 2. Use a weaker lock on the matview if possible > > > > If the view has only one base table in this query, RowExclusiveLock is > > held on the view instead of AccessExclusiveLock, because we don't > > need to wait other concurrent transaction's result in order to > > maintain the view in this case. When the same row in the view is > > affected due to concurrent maintenances, a row level lock will > > protect it. > > > > On Tue, 24 Nov 2020 12:46:57 +0300 > > Konstantin Knizhnik wrote: > > > > > The most obvious optimization is not to use exclusive table lock if view > > > depends just on one table (contains no joins). > > > Looks like there are no any anomalies in this case, are there? > > > > I confirmed the effect of this optimizations. > > > > First, when I performed pgbench (SF=100) without any materialized views, > > the results is : > > > > pgbench test4 -T 300 -c 8 -j 4 > > latency average = 6.493 ms > > tps = 1232.146229 (including connections establishing) > > > > Next, created a view as below, I performed the same pgbench. > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS > > SELECT bid, count(abalance), sum(abalance), avg(abalance) > > FROM pgbench_accounts GROUP BY bid; > > > > The result is here: > > > > [the previous version (v19 with exclusive table lock)] > > - latency average = 77.677 ms > > - tps = 102.990159 (including connections establishing) > > > > [In the latest version (v20 with weaker lock)] > > - latency average = 17.576 ms > > - tps = 455.159644 (including connections establishing) > > > > There is still substantial overhead, but we can see that the effect > > of the optimization. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA > > > -- > Yugo NAGATA > > -- Yugo NAGATA IVM_patches_v21.tar.gz Description: application/gzip
Re: Is Recovery actually paused?
On Thu, 10 Dec 2020 11:25:23 +0530 Dilip Kumar wrote: > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > Especially, if max_standby_streaming_delay is -1, this will be blocked > > > forever, > > > although this setting may not be usual. In addition, some users may set > > > recovery_min_apply_delay for a large. If such users call > > > pg_is_wal_replay_paused, > > > it could wait for a long time. > > > > > > At least, I think we need some descriptions on document to explain > > > pg_is_wal_replay_paused could wait while a time. > > > > Ok > > Fixed this, added some comments in .sgml as well as in function header Thank you for fixing this. Also, is it better to fix the description of pg_wal_replay_pause from "Pauses recovery." to "Request to pause recovery." in according with pg_is_wal_replay_paused? > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused > > > to > > > control whether this waits for recovery to get paused or not? By setting > > > its > > > default value to true or false, users can use the old format for calling > > > this > > > and the backward compatibility can be maintained. > > > > So basically, if the wait_recovery_pause flag is false then we will > > immediately return true if the pause is requested? I agree that it is > > good to have an API to know whether the recovery pause is requested or > > not but I am not sure is it good idea to make this API serve both the > > purpose? Anyone else have any thoughts on this? > > I think the current pg_is_wal_replay_paused() already has another purpose; this waits recovery to actually get paused. If we want to limit this API's purpose only to return the pause state, it seems better to fix this to return the actual state at the cost of lacking the backward compatibility. If we want to know whether pause is requested, we may add a new API like pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually get paused, we may add an option to pg_wal_replay_pause() for this purpose. However, this might be a bikeshedding. If anyone don't care that pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not > > > cancel > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting > > > loop. How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during this is blocking. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Wed, 13 Jan 2021 17:49:43 +0530 Dilip Kumar wrote: > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > Dilip Kumar wrote: > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > wait. > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > blocked forever, > > > > > > although this setting may not be usual. In addition, some users may > > > > > > set > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > pg_is_wal_replay_paused, > > > > > > it could wait for a long time. > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > Ok > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > Thank you for fixing this. > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > "Pauses recovery." to "Request to pause recovery." in according with > > > pg_is_wal_replay_paused? > > > > Okay > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > pg_is_wal_replay_paused to > > > > > > control whether this waits for recovery to get paused or not? By > > > > > > setting its > > > > > > default value to true or false, users can use the old format for > > > > > > calling this > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > immediately return true if the pause is requested? I agree that it is > > > > > good to have an API to know whether the recovery pause is requested or > > > > > not but I am not sure is it good idea to make this API serve both the > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > > this waits recovery to actually get paused. If we want to limit this API's > > > purpose only to return the pause state, it seems better to fix this to > > > return > > > the actual state at the cost of lacking the backward compatibility. If we > > > want > > > to know whether pause is requested, we may add a new API like > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery > > > to actually > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > purpose. > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > care either. > > > > I don't think that it will be blocked ever, because > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > recovery process will not be stuck on waiting for the WAL. Yes, there is no stuck on waiting for the WAL. However, it can be stuck during resolving a recovery conflict. The process could wait for max_standby_streaming_delay or max_standby_archive_delay at most before recovery get completely paused. Also, it could wait for recovery_min_apply_delay if it has a valid value. It is possible that a user set this parameter to a large value, so it could wait for a long time. However, this will be avoided by calling recoveryPausesHere() or CheckAndSetRecoveryPause() in recoveryApplyDelay(). > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I > > > > > > can not cancel > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > waiting loop. > > > > > > How about this fix? I think users may want to cancel > > > pg_is_wal_replay_paused() during > > > this is blocking. > > > > Yeah, we can do this. I will send the updated patch after putting > > some more thought into these comments. Thanks again for the feedback. > > > > Please find the updated patch. Thanks. I confirmed that I can cancel pg_is_wal_repaly_paused() during stuck. Although it is a very trivial comment, I think that the new line before HandleStartupProcInterrupts() is unnecessary. @@ -6052,12 +6062,20 @@ recoveryPausesHere(bool endOfRecovery) (errmsg("recovery has paused"), errhint("Execute pg_wal_replay_resume() to continue."))); - while (RecoveryIsPaused()) + while (RecoveryPauseRequested()) { + HandleStartupProcInterrupts(); Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Sun, 17 Jan 2021 11:33:52 +0530 Dilip Kumar wrote: > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: > > > > On Wed, 13 Jan 2021 17:49:43 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > > > wait. > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > > > blocked forever, > > > > > > > > although this setting may not be usual. In addition, some users > > > > > > > > may set > > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > > pg_is_wal_replay_paused, > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to > > > > > > > > explain > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > Ok > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function > > > > > > header > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > > pg_is_wal_replay_paused? > > > > > > > > Okay > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > > pg_is_wal_replay_paused to > > > > > > > > control whether this waits for recovery to get paused or not? > > > > > > > > By setting its > > > > > > > > default value to true or false, users can use the old format > > > > > > > > for calling this > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we > > > > > > > will > > > > > > > immediately return true if the pause is requested? I agree that > > > > > > > it is > > > > > > > good to have an API to know whether the recovery pause is > > > > > > > requested or > > > > > > > not but I am not sure is it good idea to make this API serve both > > > > > > > the > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > purpose; > > > > > this waits recovery to actually get paused. If we want to limit this > > > > > API's > > > > > purpose only to return the pause state, it seems better to fix this > > > > > to return > > > > > the actual state at the cost of lacking the backward compatibility. > > > > > If we want > > > > > to know whether pause is requested, we may add a new API like > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > recovery to actually > > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > > purpose. > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > > > care either. > > > > > > > > I don't think that it will be blocked ever, because > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > recovery process will not be stuck on waiting for the WAL. > > > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck > > during r
Re: Columns correlation and adaptive query optimization
Hello, On Thu, 26 Mar 2020 18:49:51 +0300 Konstantin Knizhnik wrote: > Attached please find new version of the patch with more comments and > descriptions added. Adaptive query optimization is very interesting feature for me, so I looked into this patch. Here are some comments and questions. (1) This patch needs rebase because clauselist_selectivity was modified to improve estimation of OR clauses. (2) If I understand correctly, your proposal consists of the following two features. 1. Add a feature to auto_explain that creates an extended statistic automatically if an error on estimated rows number is large. 2. Improve rows number estimation of join results by considering functional dependencies between vars in the join qual if the qual has more than one clauses, and also functional dependencies between a var in the join qual and vars in quals of the inner/outer relation. As you said, these two parts are independent each other, so one feature will work even if we don't assume the other. I wonder it would be better to split the patch again, and register them to commitfest separately. (3) + DefineCustomBoolVariable("auto_explain.suggest_only", +"Do not create statistic but just record in WAL suggested create statistics statement.", +NULL, + &auto_explain_suggest_on To imply that this parameter is involving to add_statistics_threshold, it seems better for me to use more related name like add_statistics_suggest_only. Also, additional documentations for new parameters are required. (4) + /* +* Prevent concurrent access to extended statistic table +*/ + stat_rel = table_open(StatisticExtRelationId, AccessExclusiveLock); + slot = table_slot_create(stat_rel, NULL); + scan = table_beginscan_catalog(stat_rel, 2, entry); (snip) + table_close(stat_rel, AccessExclusiveLock); + } When I tested the auto_explain part, I got the following WARNING. WARNING: buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, flags=0x8300, refcount=1 2) WARNING: buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, flags=0x8300, refcount=1 1) WARNING: relcache reference leak: relation "pg_statistic_ext" not closed WARNING: TupleDesc reference leak: TupleDesc 0x7fa439266338 (12029,-1) still referenced WARNING: Snapshot reference leak: Snapshot 0x55c332c10418 still referenced To suppress this, I think we need table_endscan(scan) and ExecDropSingleTupleTableSlot(slot) before finishing this function. (6) + elog(NOTICE, "Auto_explain suggestion: CREATE STATISTICS %s %s FROM %s", stat_name, create_stat_stmt, rel_name); We should use ereport instead of elog for log messages. (7) + double dep = find_var_dependency(root, innerRelid, var, clauses_attnums); + if (dep != 0.0) + { + s1 *= dep + (1 - dep) * s2; + continue; + } I found the following comment of clauselist_apply_dependencies(): * we actually combine selectivities using the formula * * P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b) so, is it not necessary using the same formula in this patch? That is, s1 *= dep + (1-dep) * s2 (if s1 <= s2) s1 *= dep * (s2/s1) + (1-dep) * s2 (otherwise) . (8) +/* + * Try to find dependency between variables. + * var: varaibles which dependencies are considered + * join_vars: list of variables used in other clauses + * This functions return strongest dependency and some subset of variables from the same relation + * or 0.0 if no dependency was found. + */ +static double +var_depends_on(PlannerInfo *root, Var* var, List* clause_vars) +{ The comment mentions join_vars but the actual argument name is clauses_vars, so it needs unification. (9) Currently, it only consider functional dependencies statistics. Can we also consider multivariate MCV list, and is it useful? (10) To achieve adaptive query optimization (AQO) in PostgreSQL, this patch proposes to use auto_explain for getting feedback from actual results. So, could auto_explain be a infrastructure of AQO in future? Or, do you have any plan or idea to make built-in infrastructure for AQO? Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Tue, 19 Jan 2021 21:32:31 +0530 Dilip Kumar wrote: > On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar wrote: > > > > On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA wrote: > >> > >> On Sun, 17 Jan 2021 11:33:52 +0530 > >> Dilip Kumar wrote: > >> > >> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: > >> > > > >> > > On Wed, 13 Jan 2021 17:49:43 +0530 > >> > > Dilip Kumar wrote: > >> > > > >> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar > >> > > > wrote: > >> > > > > > >> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA > >> > > > > wrote: > >> > > > > > > >> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > >> > > > > > Dilip Kumar wrote: > >> > > > > > > >> > > > > > > > > However, I wonder users don't expect > >> > > > > > > > > pg_is_wal_replay_paused to wait. > >> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this > >> > > > > > > > > will be blocked forever, > >> > > > > > > > > although this setting may not be usual. In addition, some > >> > > > > > > > > users may set > >> > > > > > > > > recovery_min_apply_delay for a large. If such users call > >> > > > > > > > > pg_is_wal_replay_paused, > >> > > > > > > > > it could wait for a long time. > >> > > > > > > > > > >> > > > > > > > > At least, I think we need some descriptions on document to > >> > > > > > > > > explain > >> > > > > > > > > pg_is_wal_replay_paused could wait while a time. > >> > > > > > > > > >> > > > > > > > Ok > >> > > > > > > > >> > > > > > > Fixed this, added some comments in .sgml as well as in > >> > > > > > > function header > >> > > > > > > >> > > > > > Thank you for fixing this. > >> > > > > > > >> > > > > > Also, is it better to fix the description of pg_wal_replay_pause > >> > > > > > from > >> > > > > > "Pauses recovery." to "Request to pause recovery." in according > >> > > > > > with > >> > > > > > pg_is_wal_replay_paused? > >> > > > > > >> > > > > Okay > >> > > > > > >> > > > > > > >> > > > > > > > > Also, how about adding a new boolean argument to > >> > > > > > > > > pg_is_wal_replay_paused to > >> > > > > > > > > control whether this waits for recovery to get paused or > >> > > > > > > > > not? By setting its > >> > > > > > > > > default value to true or false, users can use the old > >> > > > > > > > > format for calling this > >> > > > > > > > > and the backward compatibility can be maintained. > >> > > > > > > > > >> > > > > > > > So basically, if the wait_recovery_pause flag is false then > >> > > > > > > > we will > >> > > > > > > > immediately return true if the pause is requested? I agree > >> > > > > > > > that it is > >> > > > > > > > good to have an API to know whether the recovery pause is > >> > > > > > > > requested or > >> > > > > > > > not but I am not sure is it good idea to make this API serve > >> > > > > > > > both the > >> > > > > > > > purpose? Anyone else have any thoughts on this? > >> > > > > > > > > >> > > > > > > >> > > > > > I think the current pg_is_wal_replay_paused() already has > >> > > > > > another purpose; > >> > > > > > this waits recovery to actually get paused. If we want to limit > >> > > > &g
Re: Implementing Incremental View Maintenance
Hi, Attached is a revised patch (v22) rebased for the latest master head. Regards, Yugo Nagata -- Yugo NAGATA IVM_patches_v22.tar.gz Description: application/gzip
Re: Is Recovery actually paused?
On Mon, 25 Jan 2021 14:53:18 +0530 Dilip Kumar wrote: > I have changed as per other functions for consistency. Thank you for updating the patch. Here are a few comments: (1) - SetRecoveryPause(true); + SetRecoveryPause(RECOVERY_PAUSE_REQUESTED); ereport(LOG (errmsg("recovery has paused"), errdetail("If recovery is unpaused, the server will shut down."), errhint("You can then restart the server after making the necessary configuration changes."))); - while (RecoveryIsPaused()) + while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { HandleStartupProcInterrupts(); This fix would be required for code added by the following commit. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=15251c0a60be76eedee74ac0e94b433f9acca5af Due to this, the recovery could get paused after the configuration change in the primary. However, after applying this patch, pg_is_wal_replay_paused returns "pause requested" although it should return "paused". To fix this, we must pass RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED. Or, we can call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(), but this seems redundant. (2) - while (RecoveryIsPaused()) + while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { + HandleStartupProcInterrupts(); Though it is trivial, I think the new line after the brace is unnecessary. Regards, Yugo Nagata -- Yugo NAGATA
Re: Columns correlation and adaptive query optimization
cs are created only when the excution time exceeds log_min_duration, if these behaviors are intentional. In addition, additional statistics are created only if #rows is over-estimated and not if it is under-estimated. Although it seems good as a criterion for creating multicolumn statistic since extended statisstic is usually useful to fix over-estimation, I am not sure if we don't have to consider under-estimation case at all. > > (9) > > Currently, it only consider functional dependencies statistics. Can we also > > consider multivariate MCV list, and is it useful? > > > Right now auto_explain create statistic without explicit specification > of statistic kind. > According to the documentation all supported statistics kinds should be > created in this case: Yes, auto_explain creates all kinds of extended statistics. However, IIUC, the clausesel patch uses only functional dependencies statistics for improving join, so my question was about possibility to consider MCV in the clausesel patch. > > (10) > > To achieve adaptive query optimization (AQO) in PostgreSQL, this patch > > proposes > > to use auto_explain for getting feedback from actual results. So, could > > auto_explain > > be a infrastructure of AQO in future? Or, do you have any plan or idea to > > make > > built-in infrastructure for AQO? > Sorry, I do not have answer for this question. > I just patched auto_explain extension because it is doing half of the > required work (analyze expensive statements). > It can be certainly moved to separate extension. In this case it will > party duplicate existed functionality and > settings of auto_explain (like statement execution time threshold). I am > not sure that it is good. > But from the other side, this my patch makes auto_explain extension to > do some unexpected work... I think that auto_explain is an extension originally for aiming to detect and log plans that take a long time, so it doesn't seem so unnatural for me to use this for improving such plans. Especially, the feature to find tunable points in executed plans seems useful. > Actually task of adaptive query optimization is much bigger. > We have separate AQO extension which tries to use machine learning to > correctly adjust estimations. > This my patch is much simpler and use existed mechanism (extended > statistics) to improve estimations. Well, this patch provide a kind of AQO as auto_explain feature, but this is independent of the AQO extension. Is it right? Anyway, I'm interested in the AQO extension, so I'll look into this, too. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Wed, 27 Jan 2021 13:29:23 +0530 Dilip Kumar wrote: > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > wrote: > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > wrote: > > > > +1 to just show the recovery pause state in the output of > > > > pg_is_wal_replay_paused. But, should the function name > > > > "pg_is_wal_replay_paused" be something like > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > > in a function, I expect a boolean output. Others may have better > > > > thoughts. > > > > > > Maybe we should leave the existing function pg_is_wal_replay_paused() > > > alone and add a new one with the name you suggest that returns text. > > > That would create less burden for tool authors. > > > > +1 > > > > Yeah, we can do that, I will send an updated patch soon. This means pg_is_wal_replay_paused is left without any change and this returns whether pause is requested or not? If so, it seems good to modify the documentation of this function in order to note that this could not return the actual pause state. Regards, Yugo Nagata -- Yugo NAGATA
Re: [PATCH] Add extra statistics to explain for Nested Loop
Max(dst->max_tuples, add->max_tuples); 3) There are garbage lines and I could not apply this patch. diff --git a/src/test/regress/expected/timetz.out b/src/test/regress/expected/timetz.out index 038bb5fa094..5294179aa45 100644 Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Thu, 28 Jan 2021 09:55:42 +0530 Dilip Kumar wrote: > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > Dilip Kumar wrote: > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > wrote: > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > wrote: > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" > > > > > > > exists > > > > > > > in a function, I expect a boolean output. Others may have better > > > > > > > thoughts. > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > pg_is_wal_replay_paused() > > > > > > alone and add a new one with the name you suggest that returns text. > > > > > > That would create less burden for tool authors. > > > > > > > > > > +1 > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > returns whether pause is requested or not? If so, it seems good to modify > > > the documentation of this function in order to note that this could not > > > return the actual pause state. > > > > Yes, we can say that it will return true if the replay pause is > > requested. I am changing that in my new patch. > > I have modified the patch, changes > > - I have added a new interface pg_get_wal_replay_pause_state to get > the pause request state > - Now, we are not waiting for the recovery to actually get paused so I > think it doesn't make sense to put a lot of checkpoints to check the > pause requested so I have removed that check from the > recoveryApplyDelay but I think it better we still keep that check in > the WaitForWalToBecomeAvailable because it can wait forever before the > next wal get available. I think basically the check in WaitForWalToBecomeAvailable is independent of the feature of pg_get_wal_replay_pause_state, that is, reporting the actual pause state. This function could just return 'pause requested' if a pause is requested during waiting for WAL. However, I agree the change to allow recovery to transit the state to 'paused' during WAL waiting because 'paused' has more useful information for users than 'pause requested'. Returning 'paused' lets users know clearly that no more WAL are applied until recovery is resumed. On the other hand, when 'pause requested' is returned, user can't say whether the next WAL wiill be applied or not from this information. For the same reason, I think it is also useful to call recoveryPausesHere in recoveryApplyDelay. In addition, in RecoveryRequiresIntParameter, recovery should get paused if a parameter value has a problem. However, pg_get_wal_replay_pause_state will return 'pause requested' in this case. So, I think, we should pass RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED, or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(). Regrads, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Fri, 29 Jan 2021 16:33:32 +0530 Dilip Kumar wrote: > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > wrote: > > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when > > > > > > > > > "is" exists > > > > > > > > > in a function, I expect a boolean output. Others may have > > > > > > > > > better > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > alone and add a new one with the name you suggest that returns > > > > > > > > text. > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > > > returns whether pause is requested or not? If so, it seems good to > > > > > modify > > > > > the documentation of this function in order to note that this could > > > > > not > > > > > return the actual pause state. > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > requested. I am changing that in my new patch. > > > > > > I have modified the patch, changes > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > the pause request state > > > - Now, we are not waiting for the recovery to actually get paused so I > > > think it doesn't make sense to put a lot of checkpoints to check the > > > pause requested so I have removed that check from the > > > recoveryApplyDelay but I think it better we still keep that check in > > > the WaitForWalToBecomeAvailable because it can wait forever before the > > > next wal get available. > > > > I think basically the check in WaitForWalToBecomeAvailable is independent > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > > actual pause state. This function could just return 'pause requested' > > if a pause is requested during waiting for WAL. > > > > However, I agree the change to allow recovery to transit the state to > > 'paused' during WAL waiting because 'paused' has more useful information > > for users than 'pause requested'. Returning 'paused' lets users know > > clearly that no more WAL are applied until recovery is resumed. On the > > other hand, when 'pause requested' is returned, user can't say whether > > the next WAL wiill be applied or not from this information. > > > > For the same reason, I think it is also useful to call recoveryPausesHere > > in recoveryApplyDelay. > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > available so it can not be controlled by user so it is good to put a > check for the recovery pause, however recoveryApplyDelay wait for the > apply delay which is configured by user and it is predictable value by > the user. I don't have much objection to putting that check in the > recoveryApplyDelay as well but I feel it is not necessary. Any other > thoughts on this? I'm not sure if the user can figure out easily that the reason why pg_get_wal_replay_pause_state returns 'pause requested' is due to recovery_min_apply_delay because it would needs knowledge of the internal mechanism of recovery. However, if there are not any other opinions of it, I don't care that recoveryApplyDelay is left as is because such check and state transition is independent of the goal of pg_get_wal_replay_pause_state itself as I mentioned above. Regards, Yugo Nagata -- Yugo NAGATA
Re: [PATCH] Add extra statistics to explain for Nested Loop
On Mon, 1 Feb 2021 13:28:45 +0800 Julien Rouhaud wrote: > On Thu, Jan 28, 2021 at 8:38 PM Yugo NAGATA wrote: > > > > postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j; > > > > QUERY PLAN > > --- > > Nested Loop (cost=0.00..2752.00 rows=991 width=8) (actual > > time=0.021..17.651 rows=991 loops=1) > >Output: a.i, b.j > >Join Filter: (a.i = b.j) > >Rows Removed by Join Filter: 99009 > >-> Seq Scan on public.b (cost=0.00..2.00 rows=100 width=4) (actual > > time=0.009..0.023 rows=100 loops=1) > > Output: b.j > >-> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual > > time=0.005..0.091 min_time=0.065 max_time=0.163 min_rows=1000 rows=1000 > > max_rows=1000 loops=100) > > Output: a.i > > Planning Time: 0.066 ms > > Execution Time: 17.719 ms > > (10 rows) > > > > I don't like this format where the extra statistics appear in the same > > line of existing information because the output format differs depended > > on whether the plan node's loops > 1 or not. This makes the length of a > > line too long. Also, other information reported by VERBOSE doesn't change > > the exiting row format and just add extra rows for new information. > > > > Instead, it seems good for me to add extra rows for the new statistics > > without changint the existing row format as other VERBOSE information, > > like below. > > > >-> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual > > time=0.005..0.091 rows=1000 loops=100) > > Output: a.i > > Min Time: 0.065 ms > > Max Time: 0.163 ms > > Min Rows: 1000 > > Max Rows: 1000 > > > > or, like Buffers, > > > >-> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual > > time=0.005..0.091 rows=1000 loops=100) > > Output: a.i > > Loops: min_time=0.065 max_time=0.163 min_rows=1000 max_rows=1000 > > > > and so on. What do you think about it? > > It's true that the current output is a bit long, which isn't really > convenient to read. Using one of those alternative format would also > have the advantage of not breaking compatibility with tools that > process those entries. I personally prefer the 2nd option with the > extra "Loops:" line . For non text format, should we keep the current > format? For non text format, I think "Max/Min Rows", "Max/Min Times" are a bit simple and the meaning is unclear. Instead, similar to a style of "Buffers", does it make sense using "Max/Min Rows in Loops" and "Max/Min Times in Loops"? Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
Hi, On Sun, 7 Feb 2021 19:27:02 +0530 Dilip Kumar wrote: > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > wrote: > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > wrote: > > > > We can not do that, basically, under one lock we need to check the > > > > state and set it to pause. Because by the time you release the lock > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set > > > > it to RECOVERY_PAUSED. > > > > > > Got it. Thanks. > > > > Hi Dilip, I have one more question: > > > > +/* test for recovery pause, if user has requested the pause */ > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > +RECOVERY_PAUSE_REQUESTED) > > +recoveryPausesHere(false); > > + > > +now = GetCurrentTimestamp(); > > + > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > whenever the variable now is used within the for loop in > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > used within case XLOG_FROM_STREAM: > > > > Am I missing something? > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > by mistake. Thanks for observing this. I also have a question: @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue currValue, minValue))); - SetRecoveryPause(true); + SetRecoveryPause(RECOVERY_PAUSED); ereport(LOG, (errmsg("recovery has paused"), errdetail("If recovery is unpaused, the server will shut down."), errhint("You can then restart the server after making the necessary configuration changes."))); - while (RecoveryIsPaused()) + while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { HandleStartupProcInterrupts(); If a user call pg_wal_replay_pause while waiting in RecoveryRequiresIntParameter, the state become 'pause requested' and this never returns to 'paused'. Should we check recoveryPauseState in this loop as in recoveryPausesHere? Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Mon, 8 Feb 2021 07:51:22 +0530 Dilip Kumar wrote: > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA wrote: > > > Hi, > > > > On Sun, 7 Feb 2021 19:27:02 +0530 > > Dilip Kumar wrote: > > > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > > > wrote: > > > > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > > > wrote: > > > > > > We can not do that, basically, under one lock we need to check the > > > > > > state and set it to pause. Because by the time you release the > > lock > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to > > set > > > > > > it to RECOVERY_PAUSED. > > > > > > > > > > Got it. Thanks. > > > > > > > > Hi Dilip, I have one more question: > > > > > > > > +/* test for recovery pause, if user has requested the pause */ > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > > > +RECOVERY_PAUSE_REQUESTED) > > > > +recoveryPausesHere(false); > > > > + > > > > +now = GetCurrentTimestamp(); > > > > + > > > > > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > > > whenever the variable now is used within the for loop in > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > > > used within case XLOG_FROM_STREAM: > > > > > > > > Am I missing something? > > > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > > > by mistake. Thanks for observing this. > > > > I also have a question: > > > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char > > *param_name, int currValue, int minValue > >currValue, > >minValue))); > > > > - SetRecoveryPause(true); > > + SetRecoveryPause(RECOVERY_PAUSED); > > > > ereport(LOG, > > (errmsg("recovery has paused"), > > errdetail("If recovery is > > unpaused, the server will shut down."), > > errhint("You can then restart the > > server after making the necessary configuration changes."))); > > > > - while (RecoveryIsPaused()) > > + while (GetRecoveryPauseState() != > > RECOVERY_NOT_PAUSED) > > { > > HandleStartupProcInterrupts(); > > > > > > > > If a user call pg_wal_replay_pause while waiting in > > RecoveryRequiresIntParameter, > > the state become 'pause requested' and this never returns to 'paused'. > > Should we check recoveryPauseState in this loop as in > > > I think the right fix should be that the state should never go from > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > care of that. It makes sense to take care of this in pg_wal_replay_pause, but I wonder it can not handle the case that a user resume and pause again while a sleep. -- Yugo NAGATA
Re: Is Recovery actually paused?
On Mon, 8 Feb 2021 09:35:00 +0530 Dilip Kumar wrote: > On Mon, Feb 8, 2021 at 8:18 AM Yugo NAGATA wrote: > > > > On Mon, 8 Feb 2021 07:51:22 +0530 > > Dilip Kumar wrote: > > > > > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA wrote: > > > > > > > Hi, > > > > > > > > On Sun, 7 Feb 2021 19:27:02 +0530 > > > > Dilip Kumar wrote: > > > > > > > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > > > > > wrote: > > > > > > > > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > > > > > wrote: > > > > > > > > We can not do that, basically, under one lock we need to check > > > > > > > > the > > > > > > > > state and set it to pause. Because by the time you release the > > > > lock > > > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want > > > > > > > > to > > > > set > > > > > > > > it to RECOVERY_PAUSED. > > > > > > > > > > > > > > Got it. Thanks. > > > > > > > > > > > > Hi Dilip, I have one more question: > > > > > > > > > > > > +/* test for recovery pause, if user has requested the > > > > > > pause */ > > > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState > > > > > > == > > > > > > +RECOVERY_PAUSE_REQUESTED) > > > > > > +recoveryPausesHere(false); > > > > > > + > > > > > > +now = GetCurrentTimestamp(); > > > > > > + > > > > > > > > > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > > > > > whenever the variable now is used within the for loop in > > > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > > > > > used within case XLOG_FROM_STREAM: > > > > > > > > > > > > Am I missing something? > > > > > > > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > > > > > by mistake. Thanks for observing this. > > > > > > > > I also have a question: > > > > > > > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char > > > > *param_name, int currValue, int minValue > > > >currValue, > > > >minValue))); > > > > > > > > - SetRecoveryPause(true); > > > > + SetRecoveryPause(RECOVERY_PAUSED); > > > > > > > > ereport(LOG, > > > > (errmsg("recovery has paused"), > > > > errdetail("If recovery is > > > > unpaused, the server will shut down."), > > > > errhint("You can then restart > > > > the > > > > server after making the necessary configuration changes."))); > > > > > > > > - while (RecoveryIsPaused()) > > > > + while (GetRecoveryPauseState() != > > > > RECOVERY_NOT_PAUSED) > > > > { > > > > HandleStartupProcInterrupts(); > > > > > > > > > > > > > > > > If a user call pg_wal_replay_pause while waiting in > > > > RecoveryRequiresIntParameter, > > > > the state become 'pause requested' and this never returns to 'paused'. > > > > Should we check recoveryPauseState in this loop as in > > > > > > > > > I think the right fix should be that the state should never go from > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > > > care of that. > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > it can not handle the case that a user resume and pause again while a sleep. > > Right, we will have to check and set in the loop. But we should not > allow the state to go from paused to pause requested irrespective of > this. I agree with you. -- Yugo NAGATA
Re: Is Recovery actually paused?
On Mon, 08 Feb 2021 17:32:46 +0900 (JST) Kyotaro Horiguchi wrote: > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA wrote in > > > > > I think the right fix should be that the state should never go from > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should > > > > > take > > > > > care of that. > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > > > it can not handle the case that a user resume and pause again while a > > > > sleep. > > > > > > Right, we will have to check and set in the loop. But we should not > > > allow the state to go from paused to pause requested irrespective of > > > this. > > > > I agree with you. > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > immediately change the state to PAUSE always we see REQUESTED in the > waiting loop, despite that we allow change the state from PAUSE to > REQUESTED via NOT_PAUSED between two successive loop condition checks? If a user call pg_wal_replay_pause while recovery is paused, users can observe 'pause requested' during a sleep alghough the time window is short. It seems a bit odd that pg_wal_replay_pause changes the state like this because This state meeans that recovery may not be 'paused'. -- Yugo NAGATA
Re: Is Recovery actually paused?
On Tue, 09 Feb 2021 10:58:04 +0900 (JST) Kyotaro Horiguchi wrote: > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar wrote > in > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > Kyotaro Horiguchi wrote: > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > wrote in > > > > > > > > I think the right fix should be that the state should never go > > > > > > > > from > > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause > > > > > > > > should take > > > > > > > > care of that. > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I > > > > > > > wonder > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > while a sleep. > > > > > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > > > allow the state to go from paused to pause requested irrespective of > > > > > > this. > > > > > > > > > > I agree with you. > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > observe 'pause requested' during a sleep alghough the time window is > > > short. > > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > > because This state meeans that recovery may not be 'paused'. > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > requested'. the logical state transition should always be as below > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > request and then paused but there is nothing wrong with going to > > paused) > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get > > paused) > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > PAUSE_REQUESTED without going to NOT PAUSED) > > I didn't asked about the internal logical correctness, but asked about > *actual harm* revealed to users. I don't see any actual harm in the > "wrong" transition because: Actually, the incorrect state transition is not so harmful except that users can observe unnecessary state changes. However, I don't think any actual harm in prohibit the incorrect state transition. So, I think we can do it. > If we are going to introduce that complexity, I'd like to re-propose > to introduce interlocking between the recovery side and the > pause-requestor side instead of introducing the intermediate state, > which is the cause of the complexity. > > The attached PoC patch adds: > > - A solid checkpoint just before calling rm_redo. It doesn't add a > info_lck since the check is done in the existing lock section. > > - Interlocking between the above and SetRecoveryPause without adding a > shared variable. > (This is what I called "synchronous" before.) I think waiting in pg_wal_replay_pasue is a possible option, but this will also introduce other complexity to codes such as possibility of waiting for long or for ever. For example, waiting in SetRecoveryPause as in your POC patch appears to make recovery stuck in RecoveryRequiresIntParameter. By the way, attaching other patch to a thread without the original patch will make commitfest and cfbot APP confused... Regards, Yugo Nagata -- Yugo NAGATA
Re: Commitfest 2021-11 Patch Triage - Part 1
On Wed, 1 Dec 2021 09:14:31 -0300 Marcos Pegoraro wrote: > > > > I think the reason why we can't update a materialized view directly is > > because > > it is basically a "view" and it should not contains any data irrelevant to > > its > > definition and underlying tables. If we would have a feature to update a > > materialized view direcly, maybe, it should behave as updatable-view as > > well > > as normal (virtual) views, although I am not sure > > > > Well, I didn´t find any place where is detailed why those tables are not > updatable. > And would be fine to be updated through triggers or cron jobs until IVM is > available. > CheckValidRowMarkRel just gives an exception "cannot lock rows in > materialized view ...", but why ? > What are the differences between Materialized Views and tables ? It would be that materialized views are related to its definition query and expected to have contents that is consistent with it, at least at some point in time, I think. In order to allow triggers to update materialized views, we'd need to make OpenMatViewIncrementalMaintenance and CloseMatViewIncrementalMaintenance public since there are static functions in matview.c. However, there is a concern that it will make the contents of a materialized view completely unreliable [1]. Therefore, if we do it, we would need privilege management in some way. [1] https://www.postgresql.org/message-id/flat/CACjxUsP8J6bA4RKxbmwujTVMwMZrgR3AZ7yP5F2XkB-f9w7K7Q%40mail.gmail.com#efbee336d7651ce39bc5ff9574f92002 Regards, Yugo Nagata -- Yugo NAGATA
Allow DELETE to use ORDER BY and LIMIT/OFFSET
Hello hackers, We cannot use ORDER BY or LIMIT/OFFSET in the current DELETE statement syntax, so all the row matching the WHERE condition are deleted. However, the tuple retrieving process of DELETE is basically same as SELECT statement, so I think that we can also allow DELETE to use ORDER BY and LIMIT/OFFSET. Attached is the concept patch. This enables the following operations: postgres=# select * from t order by i; i 1 2 2 2 2 5 10 20 33 35 53 (11 rows) postgres=# delete from t where i = 2 limit 2; DELETE 2 postgres=# select * from t order by i; i 1 2 2 5 10 20 33 35 53 (9 rows) postgres=# delete from t order by i offset 3 limit 3; DELETE 3 postgres=# select * from t order by i; i 1 2 2 33 35 53 (6 rows) Although we can do the similar operations using ctid and a subquery such as DELETE FROM t WHERE ctid IN (SELECT ctid FROM t WHERE ... ORDER BY ... LIMIT ...), it is more user friendly and intuitive to allow it in the DELETE syntax because ctid is a system column and most users may not be familiar with it. Although this is not allowed in the SQL standard, it is supported in MySQL[1]. DB2 also supports it although the syntax is somewhat strange.[2] Also, here seem to be some use cases. For example, - when you want to delete the specified number of rows from a table that doesn't have a primary key and contains tuple duplicated. - when you want to delete the bottom 10 items with bad scores (without using rank() window function). - when you want to delete only some of rows because it takes time to delete all of them. [1] https://dev.mysql.com/doc/refman/8.0/en/delete.html [2] https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.html How do you think it? -- Yugo NAGATA diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 297b6ee715..c596bd80ea 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -3241,6 +3241,11 @@ _copyDeleteStmt(const DeleteStmt *from) COPY_NODE_FIELD(returningList); COPY_NODE_FIELD(withClause); + COPY_NODE_FIELD(sortClause); + COPY_NODE_FIELD(limitOffset); + COPY_NODE_FIELD(limitCount); + COPY_SCALAR_FIELD(limitOption); + return newnode; } diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index f537d3eb96..84f509b57b 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -1050,6 +1050,11 @@ _equalDeleteStmt(const DeleteStmt *a, const DeleteStmt *b) COMPARE_NODE_FIELD(returningList); COMPARE_NODE_FIELD(withClause); + COMPARE_NODE_FIELD(sortClause); + COMPARE_NODE_FIELD(limitOffset); + COMPARE_NODE_FIELD(limitCount); + COMPARE_SCALAR_FIELD(limitOption); + return true; } diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index e276264882..8a201749bf 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -3668,6 +3668,12 @@ raw_expression_tree_walker(Node *node, return true; if (walker(stmt->withClause, context)) return true; +if (walker(stmt->sortClause, context)) + return true; +if (walker(stmt->limitOffset, context)) + return true; +if (walker(stmt->limitCount, context)) + return true; } break; case T_UpdateStmt: diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 146ee8dd1e..015e879f7a 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -471,6 +471,12 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt) qual = transformWhereClause(pstate, stmt->whereClause, EXPR_KIND_WHERE, "WHERE"); + qry->sortClause = transformSortClause(pstate, + stmt->sortClause, + &qry->targetList, + EXPR_KIND_ORDER_BY, + false /* allow SQL92 rules */ ); + qry->returningList = transformReturningList(pstate, stmt->returningList); /* done building the range table and jointree */ @@ -482,6 +488,15 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt) qry->hasTargetSRFs = pstate->p_hasTargetSRFs; qry->hasAggs = pstate->p_hasAggs; + /* transform LIMIT */ + qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset, + EXPR_KIND_OFFSET, "OFFSET", + stmt->limitOption); + qry->limitCount = transformLimitClause(pstate, stmt->limitCount, + EXPR_KIND_LIMIT, "LIMIT", + stmt->limitOption); + qry->limitOption = stmt->limitOption; + assign_query_collations(pstate, qry); /* this must be done after collations, for reliable comparison of exprs */ diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 86ce33bd97..0c6d11c23c 100644
Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET
On Fri, 17 Dec 2021 09:47:18 +0900 Yugo NAGATA wrote: > Hello hackers, > > We cannot use ORDER BY or LIMIT/OFFSET in the current > DELETE statement syntax, so all the row matching the > WHERE condition are deleted. However, the tuple retrieving > process of DELETE is basically same as SELECT statement, > so I think that we can also allow DELETE to use ORDER BY > and LIMIT/OFFSET. > > Attached is the concept patch. This enables the following > operations: After post this, I noticed that there are several similar proposals in past: https://www.postgresql.org/message-id/flat/AANLkTi%3D6fBZh9yZT7f7kKh%2BzmQngAyHgZWBPM3eiEMj1%40mail.gmail.com https://www.postgresql.org/message-id/flat/1393112801.59251.YahooMailNeo%40web163006.mail.bf1.yahoo.com https://www.postgresql.org/message-id/flat/CADB9FDf-Vh6RnKAMZ4Rrg_YP9p3THdPbji8qe4qkxRuiOwm%3Dmg%40mail.gmail.com https://www.postgresql.org/message-id/flat/CALAY4q9fcrscybax7fg_uojFwjw_Wg0UMuSrf-FvN68SeSAPAA%40mail.gmail.com Anyway, I'll review these threads before progressing it. > > > postgres=# select * from t order by i; > i > > 1 > 2 > 2 > 2 > 2 > 5 > 10 > 20 > 33 > 35 > 53 > (11 rows) > > postgres=# delete from t where i = 2 limit 2; > DELETE 2 > postgres=# select * from t order by i; > i > > 1 > 2 > 2 > 5 > 10 > 20 > 33 > 35 > 53 > (9 rows) > > postgres=# delete from t order by i offset 3 limit 3; > DELETE 3 > postgres=# select * from t order by i; > i > > 1 > 2 > 2 > 33 > 35 > 53 > (6 rows) > > > Although we can do the similar operations using ctid and a subquery > such as > > DELETE FROM t WHERE ctid IN (SELECT ctid FROM t WHERE ... ORDER BY ... LIMIT > ...), > > it is more user friendly and intuitive to allow it in the DELETE syntax > because ctid is a system column and most users may not be familiar with it. > > Although this is not allowed in the SQL standard, it is supported > in MySQL[1]. DB2 also supports it although the syntax is somewhat > strange.[2] > > Also, here seem to be some use cases. For example, > - when you want to delete the specified number of rows from a table > that doesn't have a primary key and contains tuple duplicated. > - when you want to delete the bottom 10 items with bad scores > (without using rank() window function). > - when you want to delete only some of rows because it takes time > to delete all of them. > > [1] https://dev.mysql.com/doc/refman/8.0/en/delete.html > [2] > https://www.dba-db2.com/2015/04/delete-first-1000-rows-in-a-db2-table-using-fetch-first.html > > How do you think it? > > -- > Yugo NAGATA -- Yugo NAGATA
Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET
On Thu, 16 Dec 2021 22:17:58 -0500 Tom Lane wrote: > Yugo NAGATA writes: > > We cannot use ORDER BY or LIMIT/OFFSET in the current > > DELETE statement syntax, so all the row matching the > > WHERE condition are deleted. However, the tuple retrieving > > process of DELETE is basically same as SELECT statement, > > so I think that we can also allow DELETE to use ORDER BY > > and LIMIT/OFFSET. > > Indeed, this is technically possible, but we've rejected the idea > before and I'm not aware of any reason to change our minds. > The problem is that a partial DELETE is not very deterministic > about which rows are deleted, and that does not seem like a > great property for a data-updating command. (The same applies > to UPDATE, which is why we don't allow these options in that > command either.) The core issues are: > > * If the sort order is underspecified, or you omit ORDER BY > entirely, then it's not clear which rows will be operated on. > The LIMIT might stop after just some of the rows in a peer > group, and you can't predict which ones. > > * UPDATE/DELETE necessarily involve the equivalent of SELECT > FOR UPDATE, which may cause the rows to be ordered more > surprisingly than you expected, ie the sort happens *before* > rows are replaced by their latest versions, which might have > different sort keys. > > We live with this amount of indeterminism in SELECT, but that > doesn't make it a brilliant idea to allow it in UPDATE/DELETE. Thank you for your explaining it! I'm glad to understand why this idea is not good and has been rejected. Regards, Yugo Nagata -- Yugo NAGATA
Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET
On Thu, 16 Dec 2021 20:56:59 -0700 "David G. Johnston" wrote: > On Thursday, December 16, 2021, Yugo NAGATA wrote: > > > > > Also, here seem to be some use cases. For example, > > - when you want to delete the specified number of rows from a table > > that doesn't have a primary key and contains tuple duplicated. > > > Not our problem…use the tools correctly; there is always the hack > work-around for the few who didn’t. > > > > - when you want to delete the bottom 10 items with bad scores > > (without using rank() window function). > > > This one doesn’t make sense to me. > > - when you want to delete only some of rows because it takes time > > to delete all of them. > > > > > This seems potentially compelling though I’d be more concerned about the > memory aspects than simply taking a long amount of time. If this is a > problem then maybe discuss it without having a solution-in-hand? But given > the intense I/O cost that would happen spreading this out over time seems > acceptable and it should be an infrequent thing to do. Expecting users to > plan and execute some custom code for their specific need seems reasonable. > > So even if Tom’s technical concerns aren’t enough working on this based > upon these uses cases doesn’t seem of high enough benefit. Thank you for your comments. Ok. I agree that there are not so strong use cases. Regards, Yugo Nagata -- Yugo NAGATA
Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET
Hello Greg, On Fri, 17 Dec 2021 01:40:45 -0500 Greg Stark wrote: > On Thu, 16 Dec 2021 at 22:18, Tom Lane wrote: > > > > * If the sort order is underspecified, or you omit ORDER BY > > entirely, then it's not clear which rows will be operated on. > > The LIMIT might stop after just some of the rows in a peer > > group, and you can't predict which ones. > > Meh, that never seemed very compelling to me. I think that's on the > user and there are *lots* of cases where the user can easily know > enough extra context to know that's what she wants. In particular I've > often wanted to delete one of two identical records and it would have > been really easy to just do a DELETE LIMIT 1. I know there are ways to > do it but it's always seemed like unnecessary hassle when there was a > natural way to express it. Out of curiosity, could you please tell me the concrete situations where you wanted to delete one of two identical records? > > * UPDATE/DELETE necessarily involve the equivalent of SELECT > > FOR UPDATE, which may cause the rows to be ordered more > > surprisingly than you expected, ie the sort happens *before* > > rows are replaced by their latest versions, which might have > > different sort keys. > > This... is a real issue. Or is it? Just to be clear I think what > you're referring to is a case like: > > INSERT INTO t values (1),(2) > > In another session: BEGIN; UPDATE t set c=0 where c=2 > > DELETE FROM t ORDER BY c ASC LIMIT 1 > > > In other session: COMMIT > > Which row was deleted? In this case it was the row where c=1. Even > though the UPDATE reported success (i.e. 1 row updated) so presumably > it happened "before" the delete. The delete saw the ordering from > before it was blocked and saw the ordering with c=1, c=2 so apparently > it happened "before" the update. When I tried it using my patch, the DELETE deletes the row where c=1 as same as above, but it did not block. Is that the result of an experiment using my patch or other RDBMS like MySQL? > There are plenty of other ways to see the same surprising > serialization failure. If instead of a DELETE you did an UPDATE there > are any number of functions or other features that have been added > which can expose the order in which the updates happened. > > The way to solve those cases would be to use serializable isolation > (or even just repeatable read in this case). Wouldn't that also solve > the DELETE serialization failure too? Do you mean such serialization failures would be avoided in SERIALIZABLE or REPATABLE READ by aborting the transaction, such serialization failures may be accepted in READ COMMITTED? Regards, Yugo Nagata -- Yugo NAGATA
Re: Columns correlation and adaptive query optimization
On Wed, 10 Mar 2021 03:00:25 +0100 Tomas Vondra wrote: > What is being proposed here - an extension suggesting which statistics > to create (and possibly creating them automatically) is certainly > useful, but I'm not sure I'd call it "adaptive query optimization". I > think "adaptive" means the extension directly modifies the estimates > based on past executions. So I propose calling it maybe "statistics > advisor" or something like that. I am also agree with the idea to implement this feature as a new extension for statistics advisor. > BTW Why is "qual" in > > static void > AddMultiColumnStatisticsForQual(void* qual, ExplainState *es) > > declared as "void *"? Shouldn't that be "List *"? When I tested this extension using TPC-H queries, it raised segmentation fault in this function. I think the cause would be around this argument. Regards, Yugo Nagata -- Yugo NAGATA
Re: Columns correlation and adaptive query optimization
On Fri, 19 Mar 2021 19:58:27 +0300 Konstantin Knizhnik wrote: > > > On 19.03.2021 12:17, Yugo NAGATA wrote: > > On Wed, 10 Mar 2021 03:00:25 +0100 > > Tomas Vondra wrote: > > > >> What is being proposed here - an extension suggesting which statistics > >> to create (and possibly creating them automatically) is certainly > >> useful, but I'm not sure I'd call it "adaptive query optimization". I > >> think "adaptive" means the extension directly modifies the estimates > >> based on past executions. So I propose calling it maybe "statistics > >> advisor" or something like that. > > I am also agree with the idea to implement this feature as a new > > extension for statistics advisor. > > > >> BTW Why is "qual" in > >> > >>static void > >>AddMultiColumnStatisticsForQual(void* qual, ExplainState *es) > >> > >> declared as "void *"? Shouldn't that be "List *"? > > When I tested this extension using TPC-H queries, it raised segmentation > > fault in this function. I think the cause would be around this argument. > > > > Regards, > > Yugo Nagata > > > Attached please find new version of the patch with > AddMultiColumnStatisticsForQual parameter type fix and one more fix > related with handling synthetic attributes. > I can not reproduce the crash on TPC-H queries, so if the problem > persists, can you please send me stack trace and may be some other > information helping to understand the reason of SIGSEGV? I also could not reproduce the segfault. I don't know why I observed it, but it may be because I missed something when installing. Sorry for annoying you. Instead, I observed "ERROR: cache lookup failed for attribute 6 of relation " in v8 patch, but this was fixed in v9 patch. Regards, Yugo Nagata -- Yugo NAGATA
Re: Columns correlation and adaptive query optimization
Hello Konstantin, I tested this patch as a statistics advisor using TPC-H queries. The used parameters are: auto_explain.add_statistics_suggest_only = on auto_explain.add_statistics_threshold = 0.1 auto_explain.log_analyze = on auto_explain.log_min_duration = 0 auto_explain suggested to create a few extented statistics for some queries, but I could not find performance improvement with the "join selectivity estimation using extended statistics" patch. I think this is because there are no such correlation in TPC-H dataset that the extended statistics can help, though. During this test, I came up with a additional comments. 1) As found in TPC-H test, suggested extended statistics may not be useful for improving performance. Therefore, to decide to adopt it or not, it would be useful if we could get information about "why we require it" or "why this is suggested" as DETAIL or HINT. For example, we may show a line or snippet of EXPLAIN result as the reason of the suggestion. 2) For Q21 of TPC-H, the following extended statistics were suggested. NOTICE: Auto_explain suggestion: CREATE STATISTICS lineitem_l_commitdate_l_receiptdate ON l_commitdate, l_receiptdate FROM lineitem NOTICE: Auto_explain suggestion: CREATE STATISTICS lineitem_l_commitdate_l_receiptdate_l_suppkey ON l_commitdate, l_receiptdate, l_suppkey FROM lineitem The latter's target columns includes the former's, so I am not sure we need both of them. (Which we should adopt may be up to on administrator, though.) 3) For Q22 of TPC-H, the following two same extended statistics were suggested. NOTICE: Auto_explain suggestion: CREATE STATISTICS customer_c_acctbal_c_phone ON c_acctbal, c_phone FROM customer NOTICE: Auto_explain suggestion: CREATE STATISTICS customer_c_acctbal_c_phone ON c_acctbal, c_phone FROM customer So, when we set add_statistics_suggest_only to off, we get the following error: ERROR: duplicate key value violates unique constraint "pg_statistic_ext_name_index" DETAIL: Key (stxname, stxnamespace)=(customer_c_acctbal_c_phone, 2200) already exists. Regards, Yugo Nagata On Sat, 20 Mar 2021 12:41:44 +0300 Konstantin Knizhnik wrote: > > > On 19.03.2021 20:32, Zhihong Yu wrote: > > Hi, > > In AddMultiColumnStatisticsForQual(), > > > > + /* Loop until we considered all vars */ > > + while (vars != NULL) > > ... > > + /* Contruct list of unique vars */ > > + foreach (cell, vars) > > > > What if some cell / node, gets into the else block: > > > > + else > > + { > > + continue; > > > > and being left in vars. Is there a chance for infinite loop ? > > It seems there should be a bool variable indicating whether any cell > > gets to the following: > > > > + vars = foreach_delete_current(vars, cell); > > > > If no cell gets removed in the current iteration, the outer while loop > > should exit. > > Each iteration of outer loop (while (vars != NULL)) > process variables belonging to one relation. > We take "else continue" branch only if variable belongs to some other > relation. > At first iteration of foreach (cell, vars) > variable "cols" is NULL and we always take first branch of the if. > In other words, at each iteration of outer loop we always make some > progress in processing "vars" list and remove some elements > from this list. So infinite loop can never happen. > > > > > Cheers > > > > On Fri, Mar 19, 2021 at 9:58 AM Konstantin Knizhnik > > mailto:k.knizh...@postgrespro.ru>> wrote: > > > > > > > > On 19.03.2021 12:17, Yugo NAGATA wrote: > > > On Wed, 10 Mar 2021 03:00:25 +0100 > > > Tomas Vondra > <mailto:tomas.von...@enterprisedb.com>> wrote: > > > > > >> What is being proposed here - an extension suggesting which > > statistics > > >> to create (and possibly creating them automatically) is certainly > > >> useful, but I'm not sure I'd call it "adaptive query > > optimization". I > > >> think "adaptive" means the extension directly modifies the > > estimates > > >> based on past executions. So I propose calling it maybe "statistics > > >> advisor" or something like that. > > > I am also agree with the idea to implement this feature as a new > > > extension for statistics advisor. > > > > > >> BTW Why is "qual" in > > >> > > >> static void > > >> AddMultiColumnStatist
Re: Implementing Incremental View Maintenance
Hi, I rebased the patch because the cfbot failed. Regards, Yugo Nagata On Tue, 9 Mar 2021 17:27:50 +0900 Yugo NAGATA wrote: > On Tue, 9 Mar 2021 09:20:49 +0900 > Yugo NAGATA wrote: > > > On Mon, 8 Mar 2021 15:42:00 -0500 > > Andrew Dunstan wrote: > > > > > > > > On 2/18/21 9:01 PM, Yugo NAGATA wrote: > > > > On Thu, 18 Feb 2021 19:38:44 +0800 > > > > Andy Fan wrote: > > > > > > > >> On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA > > > >> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> Attached is a rebased patch (v22a). > > > >>> > > > >> Thanks for the patch. Will you think posting a patch with the latest > > > >> commit > > > >> at that > > > >> time is helpful? If so, when others want to review it, they know which > > > >> commit to > > > >> apply the patch without asking for a new rebase usually. > > > > I rebased the patch because cfbot failed. > > > > http://cfbot.cputube.org/ > > > > > > > > > > It's bitrotted a bit more dues to commits bb437f995d and 25936fd46c > > > > Thank you for letting me konw. I'll rebase it soon. > > Done. Attached is a rebased patch set. > > Regards, > Yugo Nagata > > -- > Yugo NAGATA -- Yugo NAGATA IVM_patches_v22c.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
Hi, Attached is the rebased patch (v16) to add support for Incremental Materialized View Maintenance (IVM). It is able to be applied to current latest master branch. This also includes the following small fixes: - Add a query check for expressions containing aggregates in it - [doc] Add description about which situations IVM is effective or not in - Improve hint in log messages - Reorganize include directives in codes Regards, Yugo Nagata -- Yugo NAGATA IVM_patches_v16.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
On Wed, 19 Aug 2020 10:02:42 +0900 (JST) Tatsuo Ishii wrote: > I have looked into this. Thank you for your reviewing! > - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch: > This one needs a comment to describe what the function does etc. > > +void > +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > +{ I added a comment for this function and related places. +/* + * SetTransitionTablePreserved + * + * Prolong lifespan of transition tables corresponding specified relid and + * command type to the end of the outmost query instead of each nested query. + * This enables to use nested AFTER trigger's transition tables from outer + * query's triggers. Currently, only immediate incremental view maintenance + * uses this. + */ +void +SetTransitionTablePreserved(Oid relid, CmdType cmdType) Also, I removed releted unnecessary code which was left accidentally. > - 0007-Add-aggregates-support-in-IVM.patch > "Check if the given aggregate function is supporting" shouldn't be > "Check if the given aggregate function is supporting IVM"? Yes, you are right. I fixed this, too. > > + * check_aggregate_supports_ivm > + * > + * Check if the given aggregate function is supporting Regards, Yugo Nagata -- Yugo NAGATA IVM_patches_v17.tar.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
Hi, I updated the wiki page. https://wiki.postgresql.org/wiki/Incremental_View_Maintenance On Fri, 21 Aug 2020 21:40:50 +0900 (JST) Tatsuo Ishii wrote: > From: Yugo NAGATA > Subject: Re: Implementing Incremental View Maintenance > Date: Fri, 21 Aug 2020 17:23:20 +0900 > Message-ID: <20200821172320.a2506577d5244b6066f69...@sraoss.co.jp> > > > On Wed, 19 Aug 2020 10:02:42 +0900 (JST) > > Tatsuo Ishii wrote: > > > >> I have looked into this. > > > > Thank you for your reviewing! > > > >> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch: > >> This one needs a comment to describe what the function does etc. > >> > >> +void > >> +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > >> +{ > > > > I added a comment for this function and related places. > > > > +/* > > + * SetTransitionTablePreserved > > + * > > + * Prolong lifespan of transition tables corresponding specified relid and > > + * command type to the end of the outmost query instead of each nested > > query. > > + * This enables to use nested AFTER trigger's transition tables from outer > > + * query's triggers. Currently, only immediate incremental view > > maintenance > > + * uses this. > > + */ > > +void > > +SetTransitionTablePreserved(Oid relid, CmdType cmdType) > > > > Also, I removed releted unnecessary code which was left accidentally. > > > > > >> - 0007-Add-aggregates-support-in-IVM.patch > >> "Check if the given aggregate function is supporting" shouldn't be > >> "Check if the given aggregate function is supporting IVM"? > > > > Yes, you are right. I fixed this, too. > > > >> > >> + * check_aggregate_supports_ivm > >> + * > >> + * Check if the given aggregate function is supporting > > Thanks for the fixes. I have changed the commit fest status to "Ready > for Committer". > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp > > -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
Hi Thomas, Thank you for your comment! On Sat, 5 Sep 2020 17:56:18 +1200 Thomas Munro wrote: > + /* > +* Wait for concurrent transactions which update this materialized view at > +* READ COMMITED. This is needed to see changes committed in other > +* transactions. No wait and raise an error at REPEATABLE READ or > +* SERIALIZABLE to prevent update anomalies of matviews. > +* XXX: dead-lock is possible here. > +*/ > + if (!IsolationUsesXactSnapshot()) > + LockRelationOid(matviewOid, ExclusiveLock); > + else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock)) > > Could you please say a bit more about your plans for concurrency control? > > Simple hand-crafted "rollup" triggers typically conflict only when > modifying the same output rows due to update/insert conflicts, or > perhaps some explicit row level locking if they're doing something > complex (unfortunately, they also very often have concurrency > bugs...). In some initial reading about MV maintenance I did today in > the hope of understanding some more context for this very impressive > but rather intimidating patch set, I gained the impression that > aggregate-row locking granularity is assumed as a baseline for eager > incremental aggregate maintenance. I understand that our > MVCC/snapshot scheme introduces extra problems, but I'm wondering if > these problems can be solved using the usual update semantics (the > EvalPlanQual mechanism), and perhaps also some UPSERT logic. Why is > it not sufficient to have locked all the base table rows that you have > modified, captured the before-and-after values generated by those > updates, and also locked all the IMV aggregate rows you will read, and > in the process acquired a view of the latest committed state of the > IMV aggregate rows you will modify (possibly having waited first)? In > other words, what other data do you look at, while computing the > incremental update, that might suffer from anomalies because of > snapshots and concurrency? For one thing, I am aware that unique > indexes for groups would probably be necessary; perhaps some subtle > problems of the sort usually solved with predicate locks lurk there? I decided to lock a matview considering views joining tables. For example, let V = R*S is an incrementally maintainable materialized view which joins tables R and S. Suppose there are two concurrent transactions T1 which changes table R to R' and T2 which changes S to S'. Without any lock, in READ COMMITTED mode, V would be updated to represent V=R'*S in T1, and V=R*S' in T2, so it would cause inconsistency. By locking the view V, transactions T1, T2 are processed serially and this inconsistency can be avoided. I also thought it might be resolved using tuple locks and EvalPlanQual instead of table level lock, but there is still a unavoidable case. For example, suppose that tuple dR is inserted into R in T1, and dS is inserted into S in T2. Also, suppose that dR and dS will be joined in according to the view definition. In this situation, without any lock, the change of V is computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not be included in the results. This causes inconsistency. I don't think this could be resolved even if we use tuple locks. As to aggregate view without join , however, we might be able to use a lock of more low granularity as you said, because if rows belonging a group in a table is changes, we just update (or delete) corresponding rows in the view. Even if there are concurrent transactions updating the same table, we would be able to make one of them wait using tuple lock. If concurrent transactions are trying to insert a tuple into the same table, we might need to use unique index and UPSERT to avoid to insert multiple rows with same group key into the view. Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT can be used for optimization for some classes of view, but we don't have any other better idea than using table lock for views joining tables. We would appreciate it if you could suggest better solution. > (Newer papers describe locking schemes that avoid even aggregate-row > level conflicts, by taking advantage of the associativity and > commutativity of aggregates like SUM and COUNT. You can allow N > writers to update the aggregate concurrently, and if any transaction > has to roll back it subtracts what it added, not necessarily restoring > the original value, so that nobody conflicts with anyone else, or > something like that... Contemplating an MVCC, no-rollbacks version of > that sort of thing leads to ideas like, I dunno, update chains > containing differential update trees to be compacted later... egad!) I am interested in papers you mentioned! Are they literatures in context of incremental view maintenance? Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Wed, 9 Sep 2020 14:22:28 +1200 Thomas Munro wrote: > > Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT > > can be used for optimization for some classes of view, but we don't have any > > other better idea than using table lock for views joining tables. We would > > appreciate it if you could suggest better solution. > > I have nothing, I'm just reading starter papers and trying to learn a > bit more about the concepts at this stage. I was thinking of > reviewing some of the more mechanical parts of the patch set, though, > like perhaps the transition table lifetime management, since I have > worked on that area before. Thank you for your interrest. It would be greatly appreciated if you could review the patch. > > > (Newer papers describe locking schemes that avoid even aggregate-row > > > level conflicts, by taking advantage of the associativity and > > > commutativity of aggregates like SUM and COUNT. You can allow N > > > writers to update the aggregate concurrently, and if any transaction > > > has to roll back it subtracts what it added, not necessarily restoring > > > the original value, so that nobody conflicts with anyone else, or > > > something like that... Contemplating an MVCC, no-rollbacks version of > > > that sort of thing leads to ideas like, I dunno, update chains > > > containing differential update trees to be compacted later... egad!) > > > > I am interested in papers you mentioned! Are they literatures in context of > > incremental view maintenance? > > Yeah. I was skim-reading some parts of [1] including section 2.5.1 > "Concurrency Control", which opens with some comments about > aggregates, locking and pointers to "V-locking" [2] for high > concurrency aggregates. There is also a pointer to G. Graefe and M. > J. Zwilling, "Transaction support for indexed views," which I haven't > located; apparently indexed views are Graefe's name for MVs, and > apparently this paper has a section on MVCC systems which sounds > interesting for us. > > [1] https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf > [2] http://pages.cs.wisc.edu/~gangluo/latch.pdf Thanks for your information! I will also check references regarding with IVM and concurrency control. Regards, Yugo Nagata -- Yugo NAGATA
Re: pgbench bug candidate: negative "initial connection time"
On Tue, 2 Nov 2021 23:11:39 +0900 Fujii Masao wrote: > > > On 2021/11/01 23:01, Fujii Masao wrote: > >> The remainings are the changes of handling of initial connection or > >> logfile open failures. I agree to push them at least for the master. > >> But I'm not sure if they should be back-patched. Without these changes, > >> even when those failures happen, pgbench proceeds the benchmark and > >> reports the result. But with the changes, pgbench exits immediately in > >> that case. I'm not sure if there are people who expect this behavior, > >> but if there are, maybe we should not change it at least at stable > >> branches. > >> Thought? > > > > The current behavior should be improved, but not a bug. > > So I don't think that the patch needs to be back-patched. > > Barring any objection, I will push the attached patch to the master. > > Pushed. Thanks! Thanks! > > I also pushed the typo-fix patch that I proposed upthread. > > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION > > -- Yugo NAGATA
Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error
Hello Daniel Gustafsson, On Mon, 15 Nov 2021 14:13:32 +0100 Daniel Gustafsson wrote: > > On 21 Jul 2021, at 03:49, Yugo NAGATA wrote: > > > I attached the updated patch v2, which includes a comment fix and a TAP > > test. > > This patch fails the TAP test for pgbench: Thank you for pointing it out! I attached the updated patch. Regards, Yugo Nagata > # Tests were run but no plan was declared and done_testing() was not seen. > # Looks like your test exited with 25 just after 224. > t/001_pgbench_with_server.pl .. > Dubious, test returned 25 (wstat 6400, 0x1900) > All 224 subtests passed > t/002_pgbench_no_server.pl ok > Test Summary Report > --- > t/001_pgbench_with_server.pl (Wstat: 6400 Tests: 224 Failed: 0) > Non-zero exit status: 25 > Parse errors: No plan found in TAP output > Files=2, Tests=426, 3 wallclock secs ( 0.04 usr 0.00 sys + 1.20 cusr 0.36 > csys = 1.60 CPU) > Result: FAIL > > -- > Daniel Gustafsson https://vmware.com/ > -- Yugo NAGATA diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index c12b6f0615..c7c3963600 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2859,6 +2859,30 @@ chooseScript(TState *thread) return i - 1; } +/* Prepare SQL commands in the chosen script */ +static void +prepareCommands(CState *st) +{ + int j; + Command **commands = sql_script[st->use_file].commands; + + for (j = 0; commands[j] != NULL; j++) + { + PGresult *res; + char name[MAX_PREPARE_NAME]; + + if (commands[j]->type != SQL_COMMAND) + continue; + preparedStatementName(name, st->use_file, j); + res = PQprepare(st->con, name, + commands[j]->argv[0], commands[j]->argc - 1, NULL); + if (PQresultStatus(res) != PGRES_COMMAND_OK) + pg_log_error("%s", PQerrorMessage(st->con)); + PQclear(res); + } + st->prepared[st->use_file] = true; +} + /* Send a SQL command, using the chosen querymode */ static bool sendCommand(CState *st, Command *command) @@ -2892,42 +2916,6 @@ sendCommand(CState *st, Command *command) char name[MAX_PREPARE_NAME]; const char *params[MAX_ARGS]; - if (!st->prepared[st->use_file]) - { - int j; - Command **commands = sql_script[st->use_file].commands; - - for (j = 0; commands[j] != NULL; j++) - { -PGresult *res; -char name[MAX_PREPARE_NAME]; - -if (commands[j]->type != SQL_COMMAND) - continue; -preparedStatementName(name, st->use_file, j); -if (PQpipelineStatus(st->con) == PQ_PIPELINE_OFF) -{ - res = PQprepare(st->con, name, - commands[j]->argv[0], commands[j]->argc - 1, NULL); - if (PQresultStatus(res) != PGRES_COMMAND_OK) - pg_log_error("%s", PQerrorMessage(st->con)); - PQclear(res); -} -else -{ - /* - * In pipeline mode, we use asynchronous functions. If a - * server-side error occurs, it will be processed later - * among the other results. - */ - if (!PQsendPrepare(st->con, name, - commands[j]->argv[0], commands[j]->argc - 1, NULL)) - pg_log_error("%s", PQerrorMessage(st->con)); -} - } - st->prepared[st->use_file] = true; - } - getQueryParams(st, command, params); preparedStatementName(name, st->use_file, st->command); @@ -3199,6 +3187,11 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) memset(st->prepared, 0, sizeof(st->prepared)); } + +/* Prepare SQL commands if needed */ +if (querymode == QUERY_PREPARED && !st->prepared[st->use_file]) + prepareCommands(st); + /* record transaction start time */ st->txn_begin = now; diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 69ffa595dd..7b7a8518d1 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -830,6 +830,23 @@ select 1 \gset f } }); +# Working \startpipeline in prepared query mode with serializable +$node->pgbench( + '-t 1 -n -M prepared', + 0, + [ qr{type: .*/001_pgbench_pipeline_serializable}, qr{actually processed: 1/1} ], + [], + 'working \startpipeline with serializable', + { + '001_pgbench_pipeline_serializable' => q{ +-- test startpipeline with serializable +\startpipeline +BEGIN ISOLATION LEVEL SERIALIZABLE; +} . "select 1;\n" x 10 . q{ +END; +\endpipeline +} + }); # trigger many expression errors my @errors = (
Re: Implementing Incremental View Maintenance
Hello Takahashi-san, On Wed, 24 Nov 2021 04:27:13 + "r.takahash...@fujitsu.com" wrote: > Hi Nagata-san, > > > Sorry for late reply. > > > > However, even if we create triggers recursively on the parents or children, > > we would still > > need more consideration. This is because we will have to convert the format > > of tuple of > > modified table to the format of the table specified in the view for cases > > that the parent > > and some children have different format. > > > > I think supporting partitioned tables can be left for the next release. > > OK. I understand. > In the v24-patch, creating IVM on partions or partition table is prohibited. > It is OK but it should be documented. > > Perhaps, the following statement describe this. > If so, I think the definition of "simple base table" is ambiguous for some > users. > > + IMMVs must be based on simple base tables. It's not supported to > + create them on top of views or materialized views. Oh, I forgot to fix the documentation. I'll fix it. Ragards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
On Wed, 24 Nov 2021 04:31:25 + "r.takahash...@fujitsu.com" wrote: > > > ivm=# create table t (c1 int, c2 int); > > > CREATE TABLE > > > ivm=# create incremental materialized view ivm_t as select distinct c1 > > > from t; > > > NOTICE: created index "ivm_t_index" on materialized view "ivm_t" > > > SELECT 0 > > > > > > Then I executed pg_dump. > > > > > > In the dump, the following SQLs appear. > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS > > > SELECT DISTINCT t.c1 > > >FROM public.t > > > WITH NO DATA; > > > > > > ALTER TABLE ONLY public.ivm_t > > > ADD CONSTRAINT ivm_t_index UNIQUE (c1); > > > > > > If I execute psql with the result of pg_dump, following error occurs. > > > > > > ERROR: ALTER action ADD CONSTRAINT cannot be performed on relation > > "ivm_t" > > > DETAIL: This operation is not supported for materialized views. > > > > Good catch! It was my mistake creating unique constraints on IMMV in spite > > of > > we cannot defined them via SQL. I'll fix it to use unique indexes instead of > > constraints. > > I checked the same procedure on v24 patch. > But following error occurs instead of the original error. > > ERROR: relation "ivm_t_index" already exists Thank you for pointing out it! Hmmm, an index is created when IMMV is defined, so CREAE INDEX called after this would fail... Maybe, we should not create any index automatically if IMMV is created WITH NO DATA. I'll fix it after some investigation. Regards, Yugo Nagata -- Yugo NAGATA
Re: Implementing Incremental View Maintenance
base tables in the target list. It is necessary for efficient view maintenance. This feature is based on a suggestion from Konstantin Knizhnik [12]. [12] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru ** Others There are some other changes in core for IVM implementation. There has been no opposite opinion on any ever. - syntax The command to create an incrementally maintainable materialized view (IMMV) is "CREATE INCREMENTAL MATERIALIZED VIEW". The new keyword "INCREMENTAL" is added. - pg_class A new attribue "relisivm" is added to pg_class to indicate that the relation is an IMMV. - deptype DEPENDENCY_IMMV(m) was added to pg_depend as a new deptype. This is necessary to clear that a certain trigger is related to IMMV, especially when We dropped IVM triggers from the view when REFRESH ... WITH NO DATA is executed [13]. [13] https://www.postgresql.org/message-id/20210922185343.548883e81b8baef14a0193c5%40sraoss.co.jp --- Regards, Yugo Nagata -- Yugo NAGATA
Re: Commitfest 2021-11 Patch Triage - Part 1
On Fri, 5 Nov 2021 22:15:49 +0100 Daniel Gustafsson wrote: > 2138: Incremental Materialized View Maintenance > === > There seems to be concensus on the thread that this is a feature that we want, > and after initial design discussions there seems to be no disagreements with > the approach taken. The patch was marked ready for committer almost a year > ago, but have since been needs review (which seems correct). The size of the > patchset and the length of the thread make it hard to gauge just far away it > is, maybe the author or a review can summarize the current state and outline > what is left for it to be committable. Thanks for taking care of this. I've responded to it in the following thread, and described the recent discussions, current status, summary of IVM feature and design, and past discussions. https://www.postgresql.org/message-id/20211129144826.c9d42c1f61495c6983d8b6b1%40sraoss.co.jp Regards, Yugo Nagata -- Yugo NAGATA
Re: Commitfest 2021-11 Patch Triage - Part 1
On Mon, 29 Nov 2021 09:03:06 -0300 Marcos Pegoraro wrote: > > > > > 2138: Incremental Materialized View Maintenance > > > > I've responded to it in the following thread, and described the recent > > discussions, > > current status, summary of IVM feature and design, and past discussions. > > > > IVM is a wonderful feature, but some features were omitted because of > its complexity, so maybe IVM will spend more time to completely solve what > it wants to do. > I did not understand, and didn´t find on docs, if a Materialized View is a > table, why I cannot change a specific record ? > Because if I have a way to create and refresh the entire view and update a > single record it would give me all power of IVM is trying to. I think the reason why we can't update a materialized view directly is because it is basically a "view" and it should not contains any data irrelevant to its definition and underlying tables. If we would have a feature to update a materialized view direcly, maybe, it should behave as updatable-view as well as normal (virtual) views, although I am not sure -- Yugo NAGATA
Re: [HACKERS] [PATCH] Lockable views
On Tue, 6 Feb 2018 11:12:37 -0500 Robert Haas wrote: > On Tue, Feb 6, 2018 at 1:28 AM, Tatsuo Ishii wrote: > >> But what does that have to do with locking? > > > > Well, if the view is not updatable, I think there will be less point > > to allow to lock the base tables in the view because locking is > > typically used in a case when updates are required. > > > > Of course we could add special triggers to allow to update views that > > are not automatically updatable but that kind of views are tend to > > complex and IMO there's less need the automatic view locking feature. > > Hmm. Well, I see now why you've designed the feature in the way that > you have, but I guess it still seems somewhat arbitrary to me. If you > ignore the deadlock consideration, then there's no reason not to > define the feature as locking every table mentioned anywhere in the > query, including subqueries, and it can work for all views whether > updatable or not. If the deadlock consideration is controlling, then > I guess we can't do better than what you have, but I'm not sure how > future-proof it is. If in the future somebody makes views updateable > that involve a join, say from the primary key of one table to a unique > key of another so that no duplicate rows can be introduced, then > they'll either have to write code to make this feature identify and > lock the "main" table, which I'm not sure would be strong enough in > all cases, or lock them all, which reintroduces the deadlock problem. > > Personally, I would be inclined to view the deadlock problem as not > very important. I just don't see how that is going to come up very I agree that the deadlock won't occur very often and this is not so important. I have updated the lockable-view patch to v8. This new version doen't consider the deadlock problem, and all tables or views appearing in the view definition query are locked recursively. Also, this allows all kinds of views to be locked even if it is not auto-updatable view. > often. What I do think will be an issue is that if you start locking > lots of tables, you might prevent the system from getting much work > done, whether or not you also cause any deadlocks. But I don't see > what we can do about that, really. If users want full control over > which tables get locked, then they have to name them explicitly. Or > alternatively, maybe they should avoid the need for full-table locks > by using SSI, gaining the benefits of (1) optimistic rather than > pessimistic concurrency control, (2) finer-grained locking, and (3) > not needing to issue explicit LOCK commands. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index b2c7203..96d477c 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,6 +46,11 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] + When a view is specified to be locked, all relations appearing in the view + definition query are also locked recursively with the same lock mode. + + + When acquiring locks automatically for commands that reference tables, PostgreSQL always uses the least restrictive lock mode possible. LOCK TABLE diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 6479dcb..daf3d81 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -23,11 +23,15 @@ #include "utils/acl.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "rewrite/rewriteHandler.h" +#include "access/heapam.h" +#include "nodes/nodeFuncs.h" -static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait); -static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode); +static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid); +static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); +static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait); /* * LOCK TABLE @@ -62,8 +66,10 @@ LockTableCommand(LockStmt *lockstmt) RangeVarCallbackForLockTable, (void *) &lockstmt->mode); - if (recurse) - LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait); + if (get_rel_relkind(reloid) == RELKIND_VIEW) + LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait); + else if (recurse) + LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId()); } } @@ -86,15 +92,17 @@ RangeVarCallbackForLockT
Re: [HACKERS] [PATCH] Lockable views
On Tue, 27 Mar 2018 23:28:04 +0900 Yugo Nagata wrote: I found the previous patch was broken and this can't handle views that has subqueries as bellow; CREATE VIEW lock_view6 AS SELECT * from (select * from lock_tbl1) sub; I fixed this and attached the updated version including additional tests. Regards, > On Tue, 6 Feb 2018 11:12:37 -0500 > Robert Haas wrote: > > > On Tue, Feb 6, 2018 at 1:28 AM, Tatsuo Ishii wrote: > > >> But what does that have to do with locking? > > > > > > Well, if the view is not updatable, I think there will be less point > > > to allow to lock the base tables in the view because locking is > > > typically used in a case when updates are required. > > > > > > Of course we could add special triggers to allow to update views that > > > are not automatically updatable but that kind of views are tend to > > > complex and IMO there's less need the automatic view locking feature. > > > > Hmm. Well, I see now why you've designed the feature in the way that > > you have, but I guess it still seems somewhat arbitrary to me. If you > > ignore the deadlock consideration, then there's no reason not to > > define the feature as locking every table mentioned anywhere in the > > query, including subqueries, and it can work for all views whether > > updatable or not. If the deadlock consideration is controlling, then > > I guess we can't do better than what you have, but I'm not sure how > > future-proof it is. If in the future somebody makes views updateable > > that involve a join, say from the primary key of one table to a unique > > key of another so that no duplicate rows can be introduced, then > > they'll either have to write code to make this feature identify and > > lock the "main" table, which I'm not sure would be strong enough in > > all cases, or lock them all, which reintroduces the deadlock problem. > > > > Personally, I would be inclined to view the deadlock problem as not > > very important. I just don't see how that is going to come up very > > I agree that the deadlock won't occur very often and this is not > so important. > > I have updated the lockable-view patch to v8. > > This new version doen't consider the deadlock problem, and all tables > or views appearing in the view definition query are locked recursively. > Also, this allows all kinds of views to be locked even if it is not > auto-updatable view. > > > > often. What I do think will be an issue is that if you start locking > > lots of tables, you might prevent the system from getting much work > > done, whether or not you also cause any deadlocks. But I don't see > > what we can do about that, really. If users want full control over > > which tables get locked, then they have to name them explicitly. Or > > alternatively, maybe they should avoid the need for full-table locks > > by using SSI, gaining the benefits of (1) optimistic rather than > > pessimistic concurrency control, (2) finer-grained locking, and (3) > > not needing to issue explicit LOCK commands. > > > > > -- > > Robert Haas > > EnterpriseDB: http://www.enterprisedb.com > > The Enterprise PostgreSQL Company > > > -- > Yugo Nagata -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index b2c7203..96d477c 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,6 +46,11 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] + When a view is specified to be locked, all relations appearing in the view + definition query are also locked recursively with the same lock mode. + + + When acquiring locks automatically for commands that reference tables, PostgreSQL always uses the least restrictive lock mode possible. LOCK TABLE diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 6479dcb..e15d87d 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -23,11 +23,15 @@ #include "utils/acl.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "rewrite/rewriteHandler.h" +#include "access/heapam.h" +#include "nodes/nodeFuncs.h" -static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait); -static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode); +static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid); +static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldre
Re: [HACKERS] [PATCH] Lockable views
On Wed, 28 Mar 2018 15:45:09 +0900 (JST) Tatsuo Ishii wrote: > >> I found the previous patch was broken and this can't handle > >> views that has subqueries as bellow; > >> > >> CREATE VIEW lock_view6 AS SELECT * from (select * from lock_tbl1) sub; > >> > >> I fixed this and attached the updated version including additional tests. > > > > This patch gives a warning while compiling: > > > > lockcmds.c:186:1: warning: no semicolon at end of struct or union > > } LockViewRecurse_context; > > ^ > > Also I get a regression test failure: Thank you for your reviewing my patch. I attached the updated patch, v10. Regards, > > *** > /usr/local/src/pgsql/current/postgresql/src/test/regress/expected/lock.out > 2018-03-28 15:24:13.805314577 +0900 > --- /usr/local/src/pgsql/current/postgresql/src/test/regress/results/lock.out > 2018-03-28 15:42:07.975592594 +0900 > *** > *** 118,124 > >lock_tbl1 >lock_view6 > ! (2 rows) > > ROLLBACK; > -- Verify that we can lock a table with inheritance children. > --- 118,125 > >lock_tbl1 >lock_view6 > ! mvtest_tm > ! (3 rows) > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index b2c7203..96d477c 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,6 +46,11 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] + When a view is specified to be locked, all relations appearing in the view + definition query are also locked recursively with the same lock mode. + + + When acquiring locks automatically for commands that reference tables, PostgreSQL always uses the least restrictive lock mode possible. LOCK TABLE diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 6479dcb..e8a8877 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -23,11 +23,15 @@ #include "utils/acl.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "rewrite/rewriteHandler.h" +#include "access/heapam.h" +#include "nodes/nodeFuncs.h" -static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait); -static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode); +static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid); +static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); +static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait); /* * LOCK TABLE @@ -62,8 +66,10 @@ LockTableCommand(LockStmt *lockstmt) RangeVarCallbackForLockTable, (void *) &lockstmt->mode); - if (recurse) - LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait); + if (get_rel_relkind(reloid) == RELKIND_VIEW) + LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait); + else if (recurse) + LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId()); } } @@ -86,15 +92,17 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, return; /* woops, concurrently dropped; no permissions * check */ - /* Currently, we only allow plain tables to be locked */ - if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) + + /* Currently, we only allow plain tables or views to be locked */ + if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE && + relkind != RELKIND_VIEW) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("\"%s\" is not a table", + errmsg("\"%s\" is not a table or view", rv->relname))); /* Check permissions. */ - aclresult = LockTableAclCheck(relid, lockmode); + aclresult = LockTableAclCheck(relid, lockmode, GetUserId()); if (aclresult != ACLCHECK_OK) aclcheck_error(aclresult, get_relkind_objtype(get_rel_relkind(relid)), rv->relname); } @@ -107,7 +115,7 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, * multiply-inheriting children more than once, but that's no problem. */ static void -LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) +LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid) { List *children; ListCell *lc; @@ -120,7 +128,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) AclResult aclresult; /* Check per
Re: [HACKERS] [PATCH] Lockable views
On Thu, 29 Mar 2018 17:26:36 -0700 Andres Freund wrote: Thank you for your comments. I attach a patch to fix issues you've pointed out. > Hi, > > On 2018-03-28 20:26:48 +0900, Yugo Nagata wrote: > > diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml > > index b2c7203..96d477c 100644 > > --- a/doc/src/sgml/ref/lock.sgml > > +++ b/doc/src/sgml/ref/lock.sgml > > @@ -46,6 +46,11 @@ LOCK [ TABLE ] [ ONLY ] > class="parameter">name [ * ] > > > > > > > > + When a view is specified to be locked, all relations appearing in the > > view > > + definition query are also locked recursively with the same lock mode. > > + > > Trailing space added. I'd remove "specified to be" from the sentence. Fixed. > > I think mentioning how this interacts with permissions would be a good > idea too. Given how operations use the view's owner to recurse, that's > not obvious. Should also explain what permissions are required to do the > operation on the view. Added a description about permissions into the documentation. > > > > @@ -86,15 +92,17 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid > > relid, Oid oldrelid, > > return; /* woops, concurrently > > dropped; no permissions > > * check */ > > > > - /* Currently, we only allow plain tables to be locked */ > > - if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) > > + > > This newline looks spurious to me. Removed. > > > > /* > > + * Apply LOCK TABLE recursively over a view > > + * > > + * All tables and views appearing in the view definition query are locked > > + * recursively with the same lock mode. > > + */ > > + > > +typedef struct > > +{ > > + Oid root_reloid; > > + LOCKMODE lockmode; > > + bool nowait; > > + Oid viewowner; > > + Oid viewoid; > > +} LockViewRecurse_context; > > Probably wouldn't hurt to pgindent the larger changes in the patch. > > > > +static bool > > +LockViewRecurse_walker(Node *node, LockViewRecurse_context *context) > > +{ > > + if (node == NULL) > > + return false; > > + > > + if (IsA(node, Query)) > > + { > > + Query *query = (Query *) node; > > + ListCell*rtable; > > + > > + foreach(rtable, query->rtable) > > + { > > + RangeTblEntry *rte = lfirst(rtable); > > + AclResultaclresult; > > + > > + Oid relid = rte->relid; > > + char relkind = rte->relkind; > > + char *relname = get_rel_name(relid); > > + > > + /* The OLD and NEW placeholder entries in the view's > > rtable are skipped. */ > > + if (relid == context->viewoid && > > + (!strcmp(rte->eref->aliasname, "old") || > > !strcmp(rte->eref->aliasname, "new"))) > > + continue; > > + > > + /* Currently, we only allow plain tables or views to be > > locked. */ > > + if (relkind != RELKIND_RELATION && relkind != > > RELKIND_PARTITIONED_TABLE && > > + relkind != RELKIND_VIEW) > > + continue; > > + > > + /* Check infinite recursion in the view definition. */ > > + if (relid == context->root_reloid) > > + ereport(ERROR, > > + > > (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > > + errmsg("infinite recursion > > detected in rules for relation \"%s\"", > > + > > get_rel_name(context->root_reloid; > > Hm, how can that happen? And if it can happen, why can it only happen > with the root relation? For example, the following queries cause the infinite recursion of views. This is detected and the error is raised. create table t (i int); create view v1 as select 1; create view v2 as select * from v1; create or replace view v1 as select * from v2; begin; lock v1; abort; However, I found that the previous patch could not handle the following situation
Re: [HACKERS] [PATCH] Lockable views
On Mon, 2 Apr 2018 18:32:53 +0900 Yugo Nagata wrote: > On Thu, 29 Mar 2018 17:26:36 -0700 > Andres Freund wrote: > > Thank you for your comments. I attach a patch to fix issues > you've pointed out. I found a typo in the documentation and attach the updated patch. Regards, > > > Hi, > > > > On 2018-03-28 20:26:48 +0900, Yugo Nagata wrote: > > > diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml > > > index b2c7203..96d477c 100644 > > > --- a/doc/src/sgml/ref/lock.sgml > > > +++ b/doc/src/sgml/ref/lock.sgml > > > @@ -46,6 +46,11 @@ LOCK [ TABLE ] [ ONLY ] > > class="parameter">name [ * ] > > > > > > > > > > > > + When a view is specified to be locked, all relations appearing in the > > > view > > > + definition query are also locked recursively with the same lock mode. > > > + > > > > Trailing space added. I'd remove "specified to be" from the sentence. > > Fixed. > > > > > I think mentioning how this interacts with permissions would be a good > > idea too. Given how operations use the view's owner to recurse, that's > > not obvious. Should also explain what permissions are required to do the > > operation on the view. > > Added a description about permissions into the documentation. > > > > > > > > @@ -86,15 +92,17 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid > > > relid, Oid oldrelid, > > > return; /* woops, concurrently > > > dropped; no permissions > > >* check */ > > > > > > - /* Currently, we only allow plain tables to be locked */ > > > - if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) > > > + > > > > This newline looks spurious to me. > > Removed. > > > > > > > > /* > > > + * Apply LOCK TABLE recursively over a view > > > + * > > > + * All tables and views appearing in the view definition query are locked > > > + * recursively with the same lock mode. > > > + */ > > > + > > > +typedef struct > > > +{ > > > + Oid root_reloid; > > > + LOCKMODE lockmode; > > > + bool nowait; > > > + Oid viewowner; > > > + Oid viewoid; > > > +} LockViewRecurse_context; > > > > Probably wouldn't hurt to pgindent the larger changes in the patch. > > > > > > > +static bool > > > +LockViewRecurse_walker(Node *node, LockViewRecurse_context *context) > > > +{ > > > + if (node == NULL) > > > + return false; > > > + > > > + if (IsA(node, Query)) > > > + { > > > + Query *query = (Query *) node; > > > + ListCell*rtable; > > > + > > > + foreach(rtable, query->rtable) > > > + { > > > + RangeTblEntry *rte = lfirst(rtable); > > > + AclResultaclresult; > > > + > > > + Oid relid = rte->relid; > > > + char relkind = rte->relkind; > > > + char *relname = get_rel_name(relid); > > > + > > > + /* The OLD and NEW placeholder entries in the view's > > > rtable are skipped. */ > > > + if (relid == context->viewoid && > > > + (!strcmp(rte->eref->aliasname, "old") || > > > !strcmp(rte->eref->aliasname, "new"))) > > > + continue; > > > + > > > + /* Currently, we only allow plain tables or views to be > > > locked. */ > > > + if (relkind != RELKIND_RELATION && relkind != > > > RELKIND_PARTITIONED_TABLE && > > > + relkind != RELKIND_VIEW) > > > + continue; > > > + > > > + /* Check infinite recursion in the view definition. */ > > > + if (relid == context->root_reloid) > > > + ereport(ERROR, > > > + > > > (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > > > + errmsg("infinite recursion > > > detected in rules f
Re: [HACKERS] [PATCH] Lockable views
On Thu, 05 Apr 2018 07:53:42 +0900 (JST) Tatsuo Ishii wrote: I update the patch to fix the lockable view issues. > >> > > +typedef struct > >> > > +{ > >> > > + Oid root_reloid; > >> > > + LOCKMODE lockmode; > >> > > + bool nowait; > >> > > + Oid viewowner; > >> > > + Oid viewoid; > >> > > +} LockViewRecurse_context; > >> > > >> > Probably wouldn't hurt to pgindent the larger changes in the patch. > > Yeah. Also, each struct member needs a comment. I applied pgindent and added comments to struct members. > > >> > Hm, how can that happen? And if it can happen, why can it only happen > >> > with the root relation? > >> > >> For example, the following queries cause the infinite recursion of views. > >> This is detected and the error is raised. > >> > >> create table t (i int); > >> create view v1 as select 1; > >> create view v2 as select * from v1; > >> create or replace view v1 as select * from v2; > >> begin; > >> lock v1; > >> abort; > >> > >> However, I found that the previous patch could not handle the following > >> situation in which the root relation itself doesn't have infinite > >> recursion. > >> > >> create view v3 as select * from v1; > >> begin; > >> lock v3; > >> abort; > > Shouldn't they be in the regression test? Added tests for the infinite recursion detection. Regards, > > It's shame that create_view test does not include the cases, but it's > another story. > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index 96d477c..a225cea 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,8 +46,8 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] - When a view is specified to be locked, all relations appearing in the view - definition query are also locked recursively with the same lock mode. + When a view is locked, all relations appearing in the view definition + query are also locked recursively with the same lock mode. @@ -174,6 +174,13 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] +The user performing the lock on the view must have the corresponding privilege +on the view. In addition the view's owner must have the relevant privileges on +the underlying base relations, but the user performing the lock does +not need any permissions on the underlying base relations. + + + LOCK TABLE is useless outside a transaction block: the lock would remain held only to the completion of the statement. Therefore PostgreSQL reports an error if LOCK diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index b247c0f..5e0ef48 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -31,7 +31,7 @@ static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid use static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); -static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait); +static void LockViewRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, List *ancestor_views); /* * LOCK TABLE @@ -67,7 +67,7 @@ LockTableCommand(LockStmt *lockstmt) (void *) &lockstmt->mode); if (get_rel_relkind(reloid) == RELKIND_VIEW) - LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait); + LockViewRecurse(reloid, lockstmt->mode, lockstmt->nowait, NIL); else if (recurse) LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId()); } @@ -92,7 +92,6 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, return; /* woops, concurrently dropped; no permissions * check */ - /* Currently, we only allow plain tables or views to be locked */ if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE && relkind != RELKIND_VIEW) @@ -178,11 +177,11 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid) typedef struct { - Oid root_reloid; - LOCKMODE lockmode; - bool nowait; - Oid viewowner; - Oid viewoid; + LOCKMODE lockmode; /* lock mode to use */ + bool nowait; /* no wait mode */ + Oid viewowner; /* view owner for checking the privilege */ + Oid viewoid; /* OID of the view to be locked *
Re: Implementing Incremental View Maintenance
Hi Paul, Thank you for your suggestion. On Sun, 15 Sep 2019 11:52:22 -0600 Paul Draper wrote: > As I understand it, the current patch performs immediate IVM using AFTER > STATEMENT trigger transition tables. > > However, multiple tables can be modified *before* AFTER STATEMENT triggers > are fired. > > CREATE TABLE example1 (a int); > CREATE TABLE example2 (a int); > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS > SELECT example1.a, example2.a > FROM example1 JOIN example2 ON a; > > WITH > insert1 AS (INSERT INTO example1 VALUES (1)), > insert2 AS (INSERT INTO example2 VALUES (1)) > SELECT NULL; > > Changes to example1 are visible in an AFTER STATEMENT trigger on example2, > and vice versa. Would this not result in the (1, 1) tuple being > "double-counted"? > > IVM needs to either: > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers) > > (2) Have simultaneous access to multiple deltas: > delta_mv = example1 x delta_example2 + example2 x delta_example1 - > delta_example1 x delta_example2 > > This latter method is the "logged" approach that has been discussed for > deferred evaluation. > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like > implementation anyway. You are right, the latest patch doesn't support the situation where multiple tables are modified in a query. I noticed this when working on self-join, which also virtually need to handle multiple table modification. I am now working on this issue and the next patch will enable to handle this situation. I plan to submit the patch during this month. Roughly speaking, in the new implementation, AFTER STATEMENT triggers are used to collect information of modified table and its changes (= transition tables), and then the only last trigger updates the view. This will avoid the double-counting. I think this implementation also would be a base of deferred approach implementation in future where "logs" are used instead of transition tables. Regards, Yugo Nagata -- Yugo Nagata
Re: Implementing Incremental View Maintenance
On Tue, 17 Sep 2019 12:03:20 -0600 Paul Draper wrote: > Have you had any thoughts for more than two joined tables? > > Either there needs to be an quadratic number of joins, or intermediate join > results need to be stored and reused. I don't think that we need to store intermediate join results. Suppose that we have a view V joining table R,S, and new tuples are inserted to each table, dR,dS, and dT respectively. V = R*S*T R_new = R + dR S_new = S + dS T_new = T + dT In this situation, we can calculate the new view state as bellow. V_new = R_new * S_new * T_new = (R + dR) * (S + dS) * (T + dT) = R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT Although the number of terms is 2^3(=8), if we can use both of pre-update state (eg. R) and post-update state (eg. R+dR), we only need only three joins. Actually, post-update state is available in AFTER trigger, and pre-update state can be calculated by using delta tables (transition tables) and cmin/xmin system columns (or snapshot). This is the approach my implementation adopts. > > On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata wrote: > > > Hi Paul, > > > > Thank you for your suggestion. > > > > On Sun, 15 Sep 2019 11:52:22 -0600 > > Paul Draper wrote: > > > > > As I understand it, the current patch performs immediate IVM using AFTER > > > STATEMENT trigger transition tables. > > > > > > However, multiple tables can be modified *before* AFTER STATEMENT > > triggers > > > are fired. > > > > > > CREATE TABLE example1 (a int); > > > CREATE TABLE example2 (a int); > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS > > > SELECT example1.a, example2.a > > > FROM example1 JOIN example2 ON a; > > > > > > WITH > > > insert1 AS (INSERT INTO example1 VALUES (1)), > > > insert2 AS (INSERT INTO example2 VALUES (1)) > > > SELECT NULL; > > > > > > Changes to example1 are visible in an AFTER STATEMENT trigger on > > example2, > > > and vice versa. Would this not result in the (1, 1) tuple being > > > "double-counted"? > > > > > > IVM needs to either: > > > > > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers) > > > > > > (2) Have simultaneous access to multiple deltas: > > > delta_mv = example1 x delta_example2 + example2 x delta_example1 - > > > delta_example1 x delta_example2 > > > > > > This latter method is the "logged" approach that has been discussed for > > > deferred evaluation. > > > > > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like > > > implementation anyway. > > > > You are right, the latest patch doesn't support the situation where > > multiple tables are modified in a query. I noticed this when working > > on self-join, which also virtually need to handle multiple table > > modification. > > > > I am now working on this issue and the next patch will enable to handle > > this situation. I plan to submit the patch during this month. Roughly > > speaking, in the new implementation, AFTER STATEMENT triggers are used to > > collect information of modified table and its changes (= transition > > tables), > > and then the only last trigger updates the view. This will avoid the > > double-counting. I think this implementation also would be a base of > > deferred approach implementation in future where "logs" are used instead > > of transition tables. > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo Nagata > > -- Yugo Nagata
Re: Implementing Incremental View Maintenance
Hi, Attached is the latest patch for supporting self-join views. This also including the following fix mentioned by Tatsuo Ishii. > > On 2019-Aug-06, Tatsuo Ishii wrote: > > > >> It's not mentioned below but some bugs including seg fault when > >> --enable-casser is enabled was also fixed in this patch. > >> > >> BTW, I found a bug with min/max support in this patch and I believe > >> Yugo is working on it. Details: > >> https://github.com/sraoss/pgsql-ivm/issues/20 This patch allows to support self-join views, simultaneous updates of more than one base tables, and also multiple updates of the same base table. I first tried to support just self-join, but I found that this is essentially same as to support simultaneous table updates, so I decided to support them in the same commit. I think this will be a base for implementing Deferred-maintenance in future. In the new implementation, AFTER triggers are used to collecting tuplestores containing transition table contents. When multiple tables are changed, multiple AFTER triggers are invoked, then the final AFTER trigger performs actual update of the matview. In addition AFTER trigger, also BEFORE trigger is used to handle global information for view maintenance. For example, suppose that we have a view V joining table R,S, and new tuples are inserted to each table, dR,dS, and dT respectively. V = R*S*T R_new = R + dR S_new = S + dS T_new = T + dT In this situation, we can calculate the new view state as bellow. V_new = R_new * S_new * T_new = (R + dR) * (S + dS) * (T + dT) = R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT = V + (dR *S_new*T_new) + (R*dS*T_new) + (R*S*dT) To calculate view deltas, we need both pre-state (R,S, and T) and post-state (R_new, S_new, and T_new) of base tables. Post-update states are available in AFTER trigger, and we calculate pre-update states by filtering inserted tuples using cmin/xmin system columns, and appendding deleted tuples which are contained in a old transition table. In the original core implementation, tuplestores of transition tables were freed for each query depth. However, we want to prolong their life plan because we have to preserve these for a whole query assuming some base tables are changed in other trigger functions, so I added a hack to trigger.c. Regression tests are also added for self join view, multiple change on the same table, simultaneous two table changes, and foreign reference constrains. Here are behavior examples: 1. Table definition - t: for self-join - r,s: for 2-ways join CREATE TABLE r (i int, v int); CREATE TABLE CREATE TABLE s (i int, v int); CREATE TABLE CREATE TABLE t (i int, v int); CREATE TABLE 2. Initial data INSERT INTO r VALUES (1, 10), (2, 20), (3, 30); INSERT 0 3 INSERT INTO s VALUES (1, 100), (2, 200), (3, 300); INSERT 0 3 INSERT INTO t VALUES (1, 10), (2, 20), (3, 30); INSERT 0 3 3. View definition 3.1. self-join(mv_self, v_slef) CREATE INCREMENTAL MATERIALIZED VIEW mv_self(v1, v2) AS SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i; SELECT 3 CREATE VIEW v_self(v1, v2) AS SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i; CREATE VIEW 3.2. 2-ways join (mv, v) CREATE INCREMENTAL MATERIALIZED VIEW mv(v1, v2) AS SELECT r.v, s.v FROM r JOIN s USING(i); SELECT 3 CREATE VIEW v(v1, v2) AS SELECT r.v, s.v FROM r JOIN s USING(i); CREATE VIEW 3.3 Initial contents SELECT * FROM mv_self ORDER BY v1; v1 | v2 + 10 | 10 20 | 20 30 | 30 (3 rows) SELECT * FROM mv ORDER BY v1; v1 | v2 +- 10 | 100 20 | 200 30 | 300 (3 rows) 4. Update a base table for the self-join view INSERT INTO t VALUES (4,40); INSERT 0 1 DELETE FROM t WHERE i = 1; DELETE 1 UPDATE t SET v = v*10 WHERE i=2; UPDATE 1 4.1. Results - Comparison with the normal view SELECT * FROM mv_self ORDER BY v1; v1 | v2 -+- 30 | 30 40 | 40 200 | 200 (3 rows) SELECT * FROM v_self ORDER BY v1; v1 | v2 -+- 30 | 30 40 | 40 200 | 200 (3 rows) 5. pdate a base table for the 2-way join view WITH ins_r AS (INSERT INTO r VALUES (1,11) RETURNING 1), ins_r2 AS (INSERT INTO r VALUES (3,33) RETURNING 1), ins_s AS (INSERT INTO s VALUES (2,222) RETURNING 1), upd_r AS (UPDATE r SET v = v + 1000 WHERE i = 2 RETURNING 1), dlt_s AS (DELETE FROM s WHERE i = 3 RETURNING 1) SELECT NULL; ?column? -- (1 row) 5.1. Results - Comparison with the normal view SELECT * FROM mv ORDER BY v1; v1 | v2 --+- 10 | 100 11 | 100 1020 | 200 1020 | 222 (4 rows) SELECT * FROM v ORDER BY v1; v1 | v2 --+- 10 | 100 11 | 100 1020 | 200 1020 | 222 (4 rows) Best Regards, Yugo Nagata -- Yugo Nagata diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 5e71a2e865..a288bddcda 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgm
Re: Proposal to suppress errors thrown by to_reg*()
Hi Takuma, On Thu, 14 Mar 2019 13:37:00 +0900 Takuma Hoshiai wrote: > Hi, hackers, > > According to the document, "to_reg* functions return null rather than > throwing an error if the name is not found", but this is not the case > if the arguments to those functions are schema qualified and the > caller does not have access permission of the schema even if the table > (or other object) does exist -- we get an error. > > For example, to_regclass() throws an error if its argument is > 'schema_name.table_name'' (i.e. contains schema name) and caller's > role doesn't have access permission of the schema. Same thing can be > said to Other functions, > > I get complain from Pgpool-II users because it uses to_regclass() > internally to confirm table's existence but in the case above it's > not useful because the error aborts user's transaction. > > To be more consistent with the doc and to make those functions more > useful, I propose to change current implementation so that they do not > throw an error if the name space cannot be accessible by the caller. > > Attached patch implements this. Comments and suggestions are welcome. I reviewed the patch. Here are some comments: /* + * RangeVarCheckNamespaceAccessNoError + * Returns true if given relation's namespace can be accessable by the + * current user. If no namespace is given in the relation, just returns + * true. + */ +bool +RangeVarCheckNamespaceAccessNoError(RangeVar *relation) Although it might be trivial, the new function's name 'RangeVar...' seems a bit weird to me because this is used for not only to_regclass but also to_regproc, to_regtype, and so on, that is, the argument "relation" is not always a relation. This function is used always with makeRangeVarFromNameList() as if (!RangeVarCheckNamespaceAccessNoError(makeRangeVarFromNameList(names))) , so how about merging the two function as below, for example: /* * CheckNamespaceAccessNoError * Returns true if the namespace in given qualified-name can be accessable * by the current user. If no namespace is given in the names, just returns * true. */ bool CheckNamespaceAccessNoError(List *names); BTW, this patch supresses also "Cross-database references is not allowed" error in addition to the namespace ACL error. Is this an intentional change? If this error can be allowed, you can use DeconstructQualifiedName() instead of makeRangeVarFromNameList(). In the regression test, you are using \gset and \connect psql meta-commands to test the user privilege to a namespace, but I think we can make this more simpler by using SET SESSION AUTHORIZATION and RESET AUTHORIZATION. Regards, -- Yugo Nagata
Improvement of installation document
Hi, One of our clients suggested that the installation document[1] lacks description about requriements of installing *-devel packages. For example, postgresqlxx-devel is required for using --with-pgsql, and openssl-devel for --with-openssl, and so on, but these are not documented. [1] http://www.pgpool.net/docs/pgpool-II-3.7.4/en/html/install-pgpool.html I know the document of PostgreSQL[2] also lacks the description about openssl-devel, kerberos-devel, etc. (except to readline-devl). However, it would be convenient for users who want to install Pgpool-II from source code if the required packages for installation are described in the document explicitly. [2] https://www.postgresql.org/docs/current/install-requirements.html Is it not worth to consider this? BTW, the Pgpool-II doc[2] says: --with-memcached=path Pgpool-II binaries will use memcached for in memory query cache. You have to install libmemcached. , but maybe libmemcached-devel is correct instead of libmemcached? Regards, -- Yugo Nagata -- Yugo Nagata
Re: Improvement of installation document
Hi, I apologize that I accidentally sent the following email to this list. Please disregard this. I am sorry for making a lot of noise. Regard, On Tue, 26 Mar 2019 20:38:31 +0900 Yugo Nagata wrote: > Hi, > > One of our clients suggested that the installation document[1] lacks > description > about requriements of installing *-devel packages. For example, > postgresqlxx-devel > is required for using --with-pgsql, and openssl-devel for --with-openssl, and > so on, > but these are not documented. > > [1] http://www.pgpool.net/docs/pgpool-II-3.7.4/en/html/install-pgpool.html > > I know the document of PostgreSQL[2] also lacks the description about > openssl-devel, > kerberos-devel, etc. (except to readline-devl). However, it would be > convenient > for users who want to install Pgpool-II from source code if the required > packages > for installation are described in the document explicitly. > > [2] https://www.postgresql.org/docs/current/install-requirements.html > > Is it not worth to consider this? > > > BTW, the Pgpool-II doc[2] says: > > --with-memcached=path > Pgpool-II binaries will use memcached for in memory query cache. You have > to install libmemcached. > > , but maybe libmemcached-devel is correct instead of libmemcached? > > Regards, > > -- > Yugo Nagata > > > -- > Yugo Nagata > -- Yugo Nagata
Re: Improvement of installation document
On Tue, 26 Mar 2019 22:18:53 +0900 (JST) Tatsuo Ishii wrote: > > One of our clients suggested that the installation document[1] lacks > > description > > about requriements of installing *-devel packages. For example, > > postgresqlxx-devel > > is required for using --with-pgsql, and openssl-devel for --with-openssl, > > and so on, > > but these are not documented. > > > > [1] http://www.pgpool.net/docs/pgpool-II-3.7.4/en/html/install-pgpool.html > > > > I know the document of PostgreSQL[2] also lacks the description about > > openssl-devel, > > kerberos-devel, etc. (except to readline-devl). However, it would be > > convenient > > for users who want to install Pgpool-II from source code if the required > > packages > > for installation are described in the document explicitly. > > > > [2] https://www.postgresql.org/docs/current/install-requirements.html > > > > Is it not worth to consider this? > > I am against the idea. > > Development package names could differ according to > distributions/OS. For example, the developement package of OpenSSL is > "openssl-dev", not "openssl-devel" in Debian or Debian derived > systems. > > Another reason is, a user who is installaing software from source code > should be familiar enough with the fact that each software requires > development libraries. > > In summary adding not-so-complete-list-of-development-package-names to > our document will give incorrect information to novice users, and will > be annoying for skilled users. OK. I agreed. # From this viewpoint, it would not be so good that PostgreSQL doc[2] # mentions readline-devel, but this is noa a topic here. > > > BTW, the Pgpool-II doc[2] says: > > > > --with-memcached=path > > Pgpool-II binaries will use memcached for in memory query cache. You > > have to install libmemcached. > > > > , but maybe libmemcached-devel is correct instead of libmemcached? > > I don't think so. "libmemcached-devel" is just a package name in a > cetain Linux distribution. "libmemcached" is a more geneal and non > distribution dependent term. Thanks for your explaination. I understood it. > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata
Re: Implementing Incremental View Maintenance
On Thu, 27 Dec 2018 21:57:26 +0900 Yugo Nagata wrote: > Hi, > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. I am now working on an initial patch for implementing IVM on PostgreSQL. This enables materialized views to be updated incrementally after one of their base tables is modified. At the first patch, I want to start from very simple features. Firstly, this will handle simple definition views which includes only selection, projection, and join. Standard aggregations (count, sum, avg, min, max) are not planned to be implemented in the first patch, but these are commonly used in materialized views, so I'll implement them later on. Views which include sub-query, outer-join, CTE, and window functions are also out of scope of the first patch. Also, views including self-join or views including other views in their definition is not considered well, either. I need more investigation on these type of views although I found some papers explaining how to handle sub-quries and outer-joins. Next, this will handle materialized views with no duplicates in their tuples. I am thinking of implementing an algorithm to handle duplicates called "counting-algorithm" afterward, but I'll start from this no-duplicates assumption in the first patch for simplicity. In the first patch, I will implement only "immediate maintenance", that is, materialized views are updated immediately in a transaction where a base table is modified. On other hand, in "deferred maintenance", materialized views are updated after the transaction, for example, by the user command like REFRESH. Although I plan to implement both eventually, I'll start from "immediate" because this seems to need smaller code than "deferred". For implementing "deferred", it is need to implement a mechanism to maintain logs for recording changes and an algorithm to compute the delta to be applied to materialized views are necessary. I plan to implement the immediate maintenance using AFTER triggers created automatically on a materialized view's base tables. In AFTER trigger using transition table features, changes occurs on base tables is recorded ephemeral relations. We can compute the delta to be applied to materialized views by using these ephemeral relations and the view definition query, then update the view by applying this delta. -- Yugo Nagata
Re: Implementing Incremental View Maintenance
Hi, I've updated the wiki page of Incremental View Maintenance. https://wiki.postgresql.org/wiki/Incremental_View_Maintenance On Thu, 11 Jul 2019 13:28:04 +0900 Yugo Nagata wrote: > Hi Thomas, > > Thank you for your review and discussion on this patch! > > > > 2019年7月8日(月) 15:32 Thomas Munro : > > > > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata > > > > wrote: > > > > > Attached is a WIP patch of IVM which supports some aggregate > > > > > functions. > > > > > > > > Hi Nagata-san and Hoshiai-san, > > > > > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > > > added Kevin Grittner just in case he missed this thread; he has talked > > > > often about implementing the counting algorithm, and he wrote the > > > > "trigger transition tables" feature to support exactly this. While > > > > integrating trigger transition tables with the new partition features, > > > > we had to make a number of decisions about how that should work, and > > > > we tried to come up with answers that would work for IMV, and I hope > > > > we made the right choices! > > Transition tables is a great feature. I am now using this in my implementation > of IVM, but the first time I used this feature was when I implemented a PoC > for extending view updatability of PostgreSQL[1]. At that time, I didn't know > that this feature is made originally aiming to support IVM. > > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html > > I think transition tables is a good choice to implement a statement level > immediate view maintenance where materialized views are refreshed in a > statement > level after trigger. However, when implementing a transaction level immediate > view maintenance where views are refreshed per transaction, or deferred view > maintenance, we can't update views in a after trigger, and we will need an > infrastructure to manage change logs of base tables. Transition tables can be > used to collect these logs, but using logical decoding of WAL is another > candidate. > In any way, if these logs can be collected in a tuplestore, we might able to > convert this to "ephemeral named relation (ENR)" and use this to calculate > diff > sets for views. > > > > > > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > > > > > > > Nagata-san has been studying this. Nagata-san, any comment? > > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other > ransactions are not visible, so views can not be maintained correctly in AFTER > triggers. If a view is defined on two tables and each table is modified in > different concurrent transactions respectively, the result of view maintenance > done in trigger functions can be incorrect due to the race condition. This is > the > reason why such transactions are aborted immediately in that case in my > current > implementation. > > One idea to resolve this is performing view maintenance in two phases. > Firstly, > views are updated using only table changes visible in this transaction. Then, > just after this transaction is committed, views have to be updated > additionally > using changes happened in other transactions to keep consistency. This is a > just > idea, but to implement this idea, I think, we will need keep to keep and > maintain change logs. > > > > > A couple of superficial review comments: > > > > > > > +const char *aggname = get_func_name(aggref->aggfnoid); > > > > ... > > > > +else if (!strcmp(aggname, "sum")) > > > > > > > > I guess you need a more robust way to detect the supported aggregates > > > > than their name, or I guess some way for aggregates themselves to > > > > specify that they support this and somehow supply the extra logic. > > > > Perhaps I just waid what Greg Stark already said, except not as well. > > Yes. Using name is not robust because users can make same name aggregates > like > sum(text) (although I am not sure this makes sense). We can use oids instead > of names, but it would be nice to extend pg_aggregate and add new attributes > for informing that this supports IVM and for providing functions for IVM > logic. > > > > > As for the question of how > > > > to reserve a namespace for system columns that won't clash with user > > > > columns, according to our manual the SQL stan
Re: Implementing Incremental View Maintenance
Hi, Attached is the latest patch for supporting min and max aggregate functions. When new tuples are inserted into base tables, if new values are small (for min) or large (for max), matview just have to be updated with these new values. Otherwise, old values just remains. However, in the case of deletion, this is more complicated. If deleted values exists in matview as current min or max, we have to recomputate new min or max values from base tables for affected groups, and matview should be updated with these recomputated values. Also, regression tests for min/max are also added. In addition, incremental update algorithm of avg aggregate values is a bit improved. If an avg result in materialized views is updated incrementally y using the old avg value, numerical errors in avg values are accumulated and the values get wrong eventually. To prevent this, both of sum and count values are contained in views as hidden columns and use them to calculate new avg value instead of using old avg values. Regards, On Fri, 26 Jul 2019 11:35:53 +0900 Yugo Nagata wrote: > Hi, > > I've updated the wiki page of Incremental View Maintenance. > > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance > > On Thu, 11 Jul 2019 13:28:04 +0900 > Yugo Nagata wrote: > > > Hi Thomas, > > > > Thank you for your review and discussion on this patch! > > > > > > 2019年7月8日(月) 15:32 Thomas Munro : > > > > > > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata > > > > > wrote: > > > > > > Attached is a WIP patch of IVM which supports some aggregate > > > > > > functions. > > > > > > > > > > Hi Nagata-san and Hoshiai-san, > > > > > > > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > > > > added Kevin Grittner just in case he missed this thread; he has talked > > > > > often about implementing the counting algorithm, and he wrote the > > > > > "trigger transition tables" feature to support exactly this. While > > > > > integrating trigger transition tables with the new partition features, > > > > > we had to make a number of decisions about how that should work, and > > > > > we tried to come up with answers that would work for IMV, and I hope > > > > > we made the right choices! > > > > Transition tables is a great feature. I am now using this in my > > implementation > > of IVM, but the first time I used this feature was when I implemented a PoC > > for extending view updatability of PostgreSQL[1]. At that time, I didn't > > know > > that this feature is made originally aiming to support IVM. > > > > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html > > > > I think transition tables is a good choice to implement a statement level > > immediate view maintenance where materialized views are refreshed in a > > statement > > level after trigger. However, when implementing a transaction level > > immediate > > view maintenance where views are refreshed per transaction, or deferred view > > maintenance, we can't update views in a after trigger, and we will need an > > infrastructure to manage change logs of base tables. Transition tables can > > be > > used to collect these logs, but using logical decoding of WAL is another > > candidate. > > In any way, if these logs can be collected in a tuplestore, we might able to > > convert this to "ephemeral named relation (ENR)" and use this to calculate > > diff > > sets for views. > > > > > > > > > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > > > > > > > > > > Nagata-san has been studying this. Nagata-san, any comment? > > > > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other > > ransactions are not visible, so views can not be maintained correctly in > > AFTER > > triggers. If a view is defined on two tables and each table is modified in > > different concurrent transactions respectively, the result of view > > maintenance > > done in trigger functions can be incorrect due to the race condition. This > > is the > > reason why such transactions are aborted immediately in that case in my > > current > > implementation. > > > > One idea to resolve this is performing view maintenance in two phases. > > Firstly, > > views are updated using only table changes visible in this transaction. &g
Re: [PATCH] GET DIAGNOSTICS FUNCTION_NAME
On Sun, 31 Dec 2017 11:57:02 -0500 Tom Lane wrote: > Yugo Nagata writes: > > Attached is a patch to implement a feature to get the current function > > name by GET DIAGNOSTICS in PL/pgSQL function. > > While this is certainly not a very large patch, it's still code that > we'd have to maintain forever, so I think it's appropriate to ask > some harder questions before accepting it. > > 1. I'm having a hard time visualizing an actual concrete use case for > this --- exactly when would a function not know its own name? Neither > your "our client wanted it" justification nor the cited stackoverflow > question provide anything close to an adequate rationale. I can think of > concrete uses for an operation like "give me the name of my immediate > caller", but that's not what this is. Our client's use case was mainly to output debug messages at begining and end of functions by using the same code. In addition, names of cursors declared in the function were based on the function name, and they wanted to get the function name to handle cursors. However, I don't inisist on this patch, so If anyone other don't need this feature, I'll withdraw this. Regards, > > 2. The specific semantics you've chosen --- in effect, regprocedureout > results --- seem to be more because that was already available than > anything else. I can imagine wanting just the bare name, or the > schema-qualified name, or even the numeric OID (if we're in the > business of introspection, being able to look up the function's own > pg_proc entry might be useful). I'm not proposing that we offer > all those variants, certainly, but without concrete use cases it's > pretty hard to be sure we picked the most useful behavior. > > 3. In connection with #2, I'm dubious that FUNCTION_NAME is le mot > juste, because that would seem to imply that it is just the name, > which it isn't. If we stick with the regprocedureout semantics > I'd be inclined to propose FUNCTION_SIGNATURE. > > regards, tom lane > -- Yugo Nagata
Re: [HACKERS] [PATCH] Lockable views
On Thu, 25 Jan 2018 20:51:41 +1300 Thomas Munro wrote: > On Sun, Dec 31, 2017 at 11:57 PM, Yugo Nagata wrote: > > On Fri, 29 Dec 2017 23:39:39 +0900 (JST) > > Tatsuo Ishii wrote: > >> Your addition to the doc: > >> + Automatically updatable views (see ) > >> + that do not have INSTEAD OF triggers or INSTEAD > >> + rules are also lockable. When a view is locked, its base relations are > >> + also locked recursively with the same lock mode. > > > > I added this point to the documentation. > > + Automatically updatable views (see ) > + that do not have INSTEAD OF triggers or INSTEAD Thank you for pointing out this. Also, I've noticed this patch cannot be applied to the HEAD, so I'll fix them together and update the patch. > > Tthe documentation doesn't build: you now need to say > instead of , and you need to say . > > -- > Thomas Munro > http://www.enterprisedb.com > -- Yugo Nagata
Re: [HACKERS] [PATCH] Lockable views
On Fri, 26 Jan 2018 21:30:49 +0900 Yugo Nagata wrote: > On Thu, 25 Jan 2018 20:51:41 +1300 > Thomas Munro wrote: > > > On Sun, Dec 31, 2017 at 11:57 PM, Yugo Nagata wrote: > > > On Fri, 29 Dec 2017 23:39:39 +0900 (JST) > > > Tatsuo Ishii wrote: > > >> Your addition to the doc: > > >> + Automatically updatable views (see ) > > >> + that do not have INSTEAD OF triggers or > > >> INSTEAD > > >> + rules are also lockable. When a view is locked, its base relations > > >> are > > >> + also locked recursively with the same lock mode. > > > > > > I added this point to the documentation. > > > > + Automatically updatable views (see ) > > + that do not have INSTEAD OF triggers or INSTEAD > > Thank you for pointing out this. > > Also, I've noticed this patch cannot be applied to the HEAD, so I'll fix > them together and update the patch. Attached is the updated patch v5 including fixing SGML and rebase to HEAD. Regards, > > > > > Tthe documentation doesn't build: you now need to say > > instead of , and you need to say . > > > > -- > > Thomas Munro > > http://www.enterprisedb.com > > > > > -- > Yugo Nagata > -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index b2c7203..6ddd128 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,6 +46,14 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] + Automatically updatable views (see ) + that do not have INSTEAD OF triggers or + INSTEAD rules are also lockable. When a view is + locked, its base relations are also locked recursively with the same + lock mode. Tables appearing in a subquery are ignored and not locked. + + + When acquiring locks automatically for commands that reference tables, PostgreSQL always uses the least restrictive lock mode possible. LOCK TABLE diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 6479dcb..7ae3a84 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -19,15 +19,20 @@ #include "commands/lockcmds.h" #include "miscadmin.h" #include "parser/parse_clause.h" +#include "parser/parsetree.h" #include "storage/lmgr.h" #include "utils/acl.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "rewrite/rewriteHandler.h" +#include "access/heapam.h" -static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait); -static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode); +static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid); +static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); +static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait); +static void LockViewCheck(Relation view, Query *viewquery); /* * LOCK TABLE @@ -62,8 +67,10 @@ LockTableCommand(LockStmt *lockstmt) RangeVarCallbackForLockTable, (void *) &lockstmt->mode); - if (recurse) - LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait); + if (get_rel_relkind(reloid) == RELKIND_VIEW) + LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait); + else if (recurse) + LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId()); } } @@ -86,15 +93,17 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, return; /* woops, concurrently dropped; no permissions * check */ - /* Currently, we only allow plain tables to be locked */ - if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) + + /* Currently, we only allow plain tables or auto-updatable views to be locked */ + if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE && + relkind != RELKIND_VIEW) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("\"%s\" is not a table", + errmsg("\"%s\" is not a table or view", rv->relname))); /* Check permissions. */ - aclresult = LockTableAclCheck(relid, lockmode); + aclresult = LockTableAclCheck(relid, lockmode, GetUserId()); if (aclresult != ACLCHECK_OK) aclcheck_error(aclresult, get_relkind_objtype(get_rel_relkind(relid)), rv->relname); } @@ -107,7 +116,7 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, * multiply-inheriting children more than once, but that's no problem. */ static void -LockTableRecurse(Oid reloid
Re: [HACKERS] [PATCH] Lockable views
On Tue, 30 Jan 2018 13:58:30 +0900 (JST) Tatsuo Ishii wrote: > > Attached is the updated patch v5 including fixing SGML and rebase to HEAD. > > You need to DROP VIEW lock_view4 and lock_view5 in the regression > test as well. Thank you for reviewing the patch. I fixed this and attached a updated patch v6. Regards, > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index b2c7203..6ddd128 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,6 +46,14 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] + Automatically updatable views (see ) + that do not have INSTEAD OF triggers or + INSTEAD rules are also lockable. When a view is + locked, its base relations are also locked recursively with the same + lock mode. Tables appearing in a subquery are ignored and not locked. + + + When acquiring locks automatically for commands that reference tables, PostgreSQL always uses the least restrictive lock mode possible. LOCK TABLE diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 6479dcb..7ae3a84 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -19,15 +19,20 @@ #include "commands/lockcmds.h" #include "miscadmin.h" #include "parser/parse_clause.h" +#include "parser/parsetree.h" #include "storage/lmgr.h" #include "utils/acl.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "rewrite/rewriteHandler.h" +#include "access/heapam.h" -static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait); -static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode); +static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid); +static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); +static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait); +static void LockViewCheck(Relation view, Query *viewquery); /* * LOCK TABLE @@ -62,8 +67,10 @@ LockTableCommand(LockStmt *lockstmt) RangeVarCallbackForLockTable, (void *) &lockstmt->mode); - if (recurse) - LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait); + if (get_rel_relkind(reloid) == RELKIND_VIEW) + LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait); + else if (recurse) + LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId()); } } @@ -86,15 +93,17 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, return; /* woops, concurrently dropped; no permissions * check */ - /* Currently, we only allow plain tables to be locked */ - if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) + + /* Currently, we only allow plain tables or auto-updatable views to be locked */ + if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE && + relkind != RELKIND_VIEW) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("\"%s\" is not a table", + errmsg("\"%s\" is not a table or view", rv->relname))); /* Check permissions. */ - aclresult = LockTableAclCheck(relid, lockmode); + aclresult = LockTableAclCheck(relid, lockmode, GetUserId()); if (aclresult != ACLCHECK_OK) aclcheck_error(aclresult, get_relkind_objtype(get_rel_relkind(relid)), rv->relname); } @@ -107,7 +116,7 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, * multiply-inheriting children more than once, but that's no problem. */ static void -LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) +LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid) { List *children; ListCell *lc; @@ -120,7 +129,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) AclResult aclresult; /* Check permissions before acquiring the lock. */ - aclresult = LockTableAclCheck(childreloid, lockmode); + aclresult = LockTableAclCheck(childreloid, lockmode, userid); if (aclresult != ACLCHECK_OK) { char *relname = get_rel_name(childreloid); @@ -157,15 +166,82 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait) continue; } - LockTableRecurse(childreloid, lockmode, nowait); + LockTableRecurse(childreloid, lockmode, nowait, userid); } } /* + * Apply LOCK TABLE recursively over a view + */ +static void +LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait) +{ + Rela
Re: [HACKERS] [PATCH] Lockable views
On Tue, 30 Jan 2018 19:21:04 +1300 Thomas Munro wrote: > On Tue, Jan 30, 2018 at 6:48 PM, Tatsuo Ishii wrote: > >>> You need to DROP VIEW lock_view4 and lock_view5 in the regression > >>> test as well. > >> > >> Thank you for reviewing the patch. > >> > >> I fixed this and attached a updated patch v6. > > > > Looks good to me. If there's no objection, especially from Thomas > > Munro, I will mark this as "ready for committer". > > About the idea: it makes some kind of sense to me that we should lock > the underlying table, in all the same cases that you could do DML on > the view automatically. I wonder if this is a problem for the > soundness: "Tables appearing in a subquery are ignored and not > locked." I can imagine using this for making backwards-compatible > schema changes, in which case the LOCK-based transaction isolation > techniques might benefit from this behaviour. I'd be interested to > hear about the ideal use case you have in mind. I think the use case is almost similar to that of auto-updatable views. There are some purpose to use views, for example 1) preventing from modifying application due to schema changes, 2) protecting the underlying table from users without proper privileges, 3) making a shorthand of a query with complex WHERE condition. When these are updatable views and users need transaction isolation during updating them, I think the lockable views feature is benefitical because users don't need to refer to the underlying table. Users might don't know the underlying table, or even might not have the privilege to lock this. > About the patch: I didn't study it in detail. It builds, has > documentation and passes all tests. Would it be a good idea to add an > isolation test to show that the underlying relation is actually > locked? Whether the underlying relation is actually locked or not is confirmed in the regression test using pg_locks, so I don't believe that we need to add an isolation test. > Typo: > > + /* Check permissions with the view owner's priviledge. */ > > s/priviledge/privilege/ > > Grammar: > > +/* > + * Check whether the view is lockable. > + * > + * Currently, only auto-updatable views can be locked, that is, > + * views whose definition are simple and that doesn't have > + * instead of rules or triggers are lockable. > > s/definition are simple and that doesn't/definition is simple and that don't/ > s/instead of/INSTEAD OF/ ? Thank you for pointing out these. I attached the fixed patch. Regards, > -- > Thomas Munro > http://www.enterprisedb.com -- Yugo Nagata diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml index b2c7203..6ddd128 100644 --- a/doc/src/sgml/ref/lock.sgml +++ b/doc/src/sgml/ref/lock.sgml @@ -46,6 +46,14 @@ LOCK [ TABLE ] [ ONLY ] name [ * ] + Automatically updatable views (see ) + that do not have INSTEAD OF triggers or + INSTEAD rules are also lockable. When a view is + locked, its base relations are also locked recursively with the same + lock mode. Tables appearing in a subquery are ignored and not locked. + + + When acquiring locks automatically for commands that reference tables, PostgreSQL always uses the least restrictive lock mode possible. LOCK TABLE diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c index 6479dcb..f6b5962 100644 --- a/src/backend/commands/lockcmds.c +++ b/src/backend/commands/lockcmds.c @@ -19,15 +19,20 @@ #include "commands/lockcmds.h" #include "miscadmin.h" #include "parser/parse_clause.h" +#include "parser/parsetree.h" #include "storage/lmgr.h" #include "utils/acl.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "rewrite/rewriteHandler.h" +#include "access/heapam.h" -static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait); -static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode); +static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid); +static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid); static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); +static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait); +static void LockViewCheck(Relation view, Query *viewquery); /* * LOCK TABLE @@ -62,8 +67,10 @@ LockTableCommand(LockStmt *lockstmt) RangeVarCallbackForLockTable, (void *) &lockstmt->mode); - if (recurse) - LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait); + if (get_rel_relkind(reloid) == RELKIND_VIEW) + Lock
CURRENT OF causes an error when IndexOnlyScan is used
Hi, I found that updating a cursor by using CURRENT OF causes the following error when the query is executed by IndexOnlyScan. ERROR: cannot extract system attribute from virtual tuple IndexOnlyScan returns a virtual tuple that doesn't have system column, so we can not get ctid in the same way of other plans. However, the error message is not convinient and users would not understand why the error occurs. Attached is a patch to fix this. By this fix, execCurrentOf get ctid from IndexScanDesc->xs_ctup.t_self when the plan is IndexOnlyScan, and it works sucessfully without errors. Here is the example of the error: === postgres=# create table test (i int primary key); CREATE TABLE postgres=# insert into test values(1); INSERT 0 1 postgres=# set enable_seqscan to off; SET postgres=# explain select * from test where i = 1; QUERY PLAN --- Index Only Scan using test_pkey on test (cost=0.15..8.17 rows=1 width=4) Index Cond: (i = 1) (2 rows) postgres=# begin; BEGIN postgres=# declare c cursor for select * from test where i = 1; DECLARE CURSOR postgres=# fetch from c; i --- 1 (1 row) postgres=# update test set i=i+1 where current of c; ERROR: cannot extract system attribute from virtual tuple === The patch fixes the error and allows this update successfully. Regards, -- Yugo Nagata diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c index ce7d4ac..b37cf3d 100644 --- a/src/backend/executor/execCurrent.c +++ b/src/backend/executor/execCurrent.c @@ -19,6 +19,7 @@ #include "utils/lsyscache.h" #include "utils/portal.h" #include "utils/rel.h" +#include "access/relscan.h" static char *fetch_cursor_param_value(ExprContext *econtext, int paramId); @@ -183,21 +184,35 @@ execCurrentOf(CurrentOfExpr *cexpr, if (TupIsNull(scanstate->ss_ScanTupleSlot)) return false; - /* Use slot_getattr to catch any possible mistakes */ - tuple_tableoid = - DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot, - TableOidAttributeNumber, - &lisnull)); - Assert(!lisnull); - tuple_tid = (ItemPointer) - DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot, - SelfItemPointerAttributeNumber, - &lisnull)); - Assert(!lisnull); - - Assert(tuple_tableoid == table_oid); - - *current_tid = *tuple_tid; + /* In IndexOnlyScan case, the tuple stored in ss_ScanTupleSlot is a + * virtual tuple that does not have ctid column, so we have to get TID + * from xs_ctup.t_self. */ + if (IsA(scanstate, IndexOnlyScanState)) + { + IndexScanDesc scan = ((IndexOnlyScanState *)scanstate)->ScanDesc; + + Assert(RelationGetRelid(scan.heapRelation) == table_oid); + + *current_tid = scan->xs_ctup.t_self; + } + else + { + /* Use slot_getattr to catch any possible mistakes */ + tuple_tableoid = +DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot, + TableOidAttributeNumber, + &lisnull)); + Assert(!lisnull); + tuple_tid = (ItemPointer) +DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot, + SelfItemPointerAttributeNumber, + &lisnull)); + Assert(!lisnull); + + Assert(tuple_tableoid == table_oid); + + *current_tid = *tuple_tid; + } return true; }
Re: CURRENT OF causes an error when IndexOnlyScan is used
On Thu, 1 Feb 2018 01:33:49 +0900 Yugo Nagata wrote: I'm sorry the patch attached in the previous mail is broken and not raises a compile error. I attached the fixed patch. Regards, > Hi, > > I found that updating a cursor by using CURRENT OF causes the > following error when the query is executed by IndexOnlyScan. > > ERROR: cannot extract system attribute from virtual tuple > > IndexOnlyScan returns a virtual tuple that doesn't have system > column, so we can not get ctid in the same way of other plans. > However, the error message is not convinient and users would > not understand why the error occurs. > > Attached is a patch to fix this. By this fix, execCurrentOf > get ctid from IndexScanDesc->xs_ctup.t_self when the plan is > IndexOnlyScan, and it works sucessfully without errors. > > > Here is the example of the error: > > === > postgres=# create table test (i int primary key); > CREATE TABLE > postgres=# insert into test values(1); > INSERT 0 1 > postgres=# set enable_seqscan to off; > SET > > postgres=# explain select * from test where i = 1; > QUERY PLAN > --- > Index Only Scan using test_pkey on test (cost=0.15..8.17 rows=1 width=4) >Index Cond: (i = 1) > (2 rows) > > postgres=# begin; > BEGIN > postgres=# declare c cursor for select * from test where i = 1; > DECLARE CURSOR > postgres=# fetch from c; > i > --- > 1 > (1 row) > > postgres=# update test set i=i+1 where current of c; > ERROR: cannot extract system attribute from virtual tuple > === > > The patch fixes the error and allows this update successfully. > > Regards, > > -- > Yugo Nagata -- Yugo Nagata diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c index ce7d4ac..aa2da16 100644 --- a/src/backend/executor/execCurrent.c +++ b/src/backend/executor/execCurrent.c @@ -19,6 +19,7 @@ #include "utils/lsyscache.h" #include "utils/portal.h" #include "utils/rel.h" +#include "access/relscan.h" static char *fetch_cursor_param_value(ExprContext *econtext, int paramId); @@ -183,21 +184,35 @@ execCurrentOf(CurrentOfExpr *cexpr, if (TupIsNull(scanstate->ss_ScanTupleSlot)) return false; - /* Use slot_getattr to catch any possible mistakes */ - tuple_tableoid = - DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot, - TableOidAttributeNumber, - &lisnull)); - Assert(!lisnull); - tuple_tid = (ItemPointer) - DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot, - SelfItemPointerAttributeNumber, - &lisnull)); - Assert(!lisnull); - - Assert(tuple_tableoid == table_oid); - - *current_tid = *tuple_tid; + /* In IndexOnlyScan case, the tuple stored in ss_ScanTupleSlot is a + * virtual tuple that does not have ctid column, so we have to get TID + * from xs_ctup.t_self. */ + if (IsA(scanstate, IndexOnlyScanState)) + { + IndexScanDesc scan = ((IndexOnlyScanState *)scanstate)->ioss_ScanDesc; + + Assert(RelationGetRelid(scan.heapRelation) == table_oid); + + *current_tid = scan->xs_ctup.t_self; + } + else + { + /* Use slot_getattr to catch any possible mistakes */ + tuple_tableoid = +DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot, + TableOidAttributeNumber, + &lisnull)); + Assert(!lisnull); + tuple_tid = (ItemPointer) +DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot, + SelfItemPointerAttributeNumber, + &lisnull)); + Assert(!lisnull); + + Assert(tuple_tableoid == table_oid); + + *current_tid = *tuple_tid; + } return true; }
Re: [HACKERS] [PATCH] Lockable views
On Thu, 01 Feb 2018 09:48:49 +0900 (JST) Tatsuo Ishii wrote: > > On Tue, Jan 30, 2018 at 6:48 PM, Tatsuo Ishii wrote: > >> Looks good to me. If there's no objection, especially from Thomas > >> Munro, I will mark this as "ready for committer". > > > > No objection from me. > > I marked this as "Ready for Committer". Thank you for reviewing the patch! Regards, > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata
Re: CURRENT OF causes an error when IndexOnlyScan is used
On Wed, 31 Jan 2018 21:12:51 -0500 Tom Lane wrote: > Yugo Nagata writes: > > I'm sorry the patch attached in the previous mail is broken and > > not raises a compile error. I attached the fixed patch. > > This patch is almost certainly wrong: you can't assume that the scan-level > state matches the tuple we are currently processing at top level. Any > sort of delaying action, for instance a sort or materialize node in > between, would break it. In execCurrentOf(), when FOR UPDATE is not used, search_plan_tree() searches through the PlanState tree for a scan node and if a sort or materialize node (for example) is found it fails with the following error. ERROR cursor xxx is not a simply updatable scan of table yyy So, I think what you concern would not occur by the patch as well as the orginal code. However, I may be missing something. Could you explain more about this if so? > > We need to either fix this aspect: > > >> IndexOnlyScan returns a virtual tuple that doesn't have system > >> column, so we can not get ctid in the same way of other plans. > > or else disallow using IndexOnlyScan when the ctid is needed. CURRENT OF is used after the scan is executed and a tuple is fetched, so we can't know whether the ctid is needed or not in advance in this case. We can raise an error message when CURRENT OF is used for IndexOnlyScan plan, though. Regards, > > regards, tom lane -- Yugo Nagata
Re: Implementing Incremental View Maintenance
Hi, Attached is the latest patch (v8) to add support for Incremental View Maintenance (IVM). This patch adds OUTER join support in addition to the patch (v7) submitted last week in the following post. On Fri, 22 Nov 2019 15:29:45 +0900 (JST) Tatsuo Ishii wrote: > Up to now, IVM supports materialized views using: > > - Inner joins > - Some aggregate functions (count, sum, min, max, avg) > - GROUP BY > - Self joins > > With the latest patch now IVM supports subqueries in addition to > above. > > Known limitations are listed here: > > https://github.com/sraoss/pgsql-ivm/issues > > See more details at: > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance * About outer join support: In case of outer-join, when a table is modified, in addition to deltas which occur in inner-join case, we also need to deletion or insertion of dangling tuples, that is, null-extended tuples generated when a join condition isn't met. [Example] - -- Create base tables and an outer join view CREATE TABLE r(i int); CREATE TABLE s(j int); INSERT INTO r VALUES (1); CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT * FROM r LEFT JOIN s ON r.i=s.j; SELECT * FROM mv; i | j ---+--- (1 row) -- After an insertion to a base table ... INSERT INTO s VALUES (1); -- (1,1) is inserted and (1,null) is deleted from the view. SELECT * FROM mv; i | j ---+--- 1 | 1 (1 row) - Our implementation is basically based on the algorithm of Larson & Zhou (2007) [1]. Before view maintenances, the view definition query's jointree is analysed to make "view maintenance graph". This graph represents which tuples in the views are affected when a base table is modified. Specifically, tuples which are not null-extended on the modified table (that is, tuples generated by joins with the modiifed table) are directly affected. The delta of such effects are calculated similarly to inner-joins. On the other hand, dangling tuples generated by anti-joins with directly affected tuples can be indirectly affected. This means that we may need to delete dangling tuples when any tuples are inserted to a table, as well as to insert dangling tuples when tuples are deleted from a table. [1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, 2007) https://ieeexplore.ieee.org/document/4221654 Although the original paper assumes that every base table and view have a unique key and tuple duplicates is disallowed, we allow this. If a view has tuple duplicates, we have to determine the number of each dangling tuple to be inserted into the view when tuples in a table are deleted. For this purpose, we count the number of each tuples which constitute a deleted tuple. These counts are stored as JSONB object in the delta table, and we use this information to maintain outer-join views. Also, we support outer self-joins that is not assumed in the original paper. * Restrictions Currently, we have following restrictions: - outer join view's targetlist must contain attributes used in join conditions - outer join view's targetlist cannot contain non-strict functions - outer join supports only simple equijoin - outer join view's WHERE clause cannot contain non null-rejecting predicates - aggregate is not supported with outer join - subquery (including EXSITS) is not supported with outer join Regression tests for all patterns of 3-way outer join and are added. Moreover, I reordered IVM related functions in matview.c so that ones which have relationship will be located closely. Moreover, I added more comments. Regards, Yugo Nagata -- Yugo Nagata IVM_v8.patch.gz Description: application/gzip
Re: Implementing Incremental View Maintenance
On Thu, 28 Nov 2019 11:26:40 +0900 (JST) Tatsuo Ishii wrote: > > Note that this is the last patch in the series of IVM patches: now we > > would like focus on blushing up the patches, rather than adding new > > SQL support to IVM, so that the patch is merged into PostgreSQL 13 > > (hopefully). We are very welcome reviews, comments on the patch. > > > > BTW, the SGML docs in the patch is very poor at this point. I am going > > to add more descriptions to the doc. > > As promised, I have created the doc (CREATE MATERIALIZED VIEW manual) > patch. - because the triggers will be invoked. + because the triggers will be invoked. We call this form of materialized + view as "Incremantal materialized View Maintenance" (IVM). This part seems incorrect to me. Incremental (materialized) View Maintenance (IVM) is a way to maintain materialized views and is not a word to refer views to be maintained. However, it would be useful if there is a term referring views which can be maintained using IVM. Off the top of my head, we can call this Incrementally Maintainable Views (= IMVs), but this might cofusable with IVM, so I'll think about that a little more Regards, Yugo Nagata > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata
Re: Implementing Incremental View Maintenance
On Fri, 29 Nov 2019 09:50:49 +0900 (JST) Tatsuo Ishii wrote: > > Hi, > > > > Attached is the latest patch (v8) to add support for Incremental View > > Maintenance (IVM). This patch adds OUTER join support in addition > > to the patch (v7) submitted last week in the following post. > > There's a compiler warning: > > matview.c: In function ‘getRteListCell’: > matview.c:2685:9: warning: ‘rte_lc’ may be used uninitialized in this > function [-Wmaybe-uninitialized] > return rte_lc; > ^~ Thanks! I'll fix this. > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp -- Yugo Nagata