Hi, I've taken a quick look at this on the way back from pgconf.eu, and it seems like a nice COPY improvement in a fairly good shape.
Firstly, I think it's a valuable because it allows efficiently importing a subset of data. Currently, we either have to create an intermediate table, copy all the data into that, do the filtering, and then insert the subset into the actual target table. Another common approach that I see in practice is using file_fdw to do this, but it's not particularly straight-forward (having to create the FDW servers etc. first) not efficient (particularly for large amounts of data). This feature allows skipping this extra step (at least in simpler ETL cases). So, I like the idea and I think it makes sense. A couple of comments regarding the code/docs: 1) I think this deserves at least some regression tests. Plenty of tests already use COPY, but there's no coverage for the new piece. So let's add a new test suite, or maybe add a couple of tests into copy2.sql. 2) In copy.sqml, the new item is defined like this <term><literal>WHEN Clause</literal></term> I suggest we use just <term><literal>WHEN</literal></term>, that's what the other items do (see ENCODING). The other thing is that this does not say what expressions are allowed in the WHEN clause. It seems pretty close to WHEN clause for triggers, which e.g. mentions that subselects are not allowed. I'm pretty sure that's true here too, because it fails like this (118 = T_SubLink): test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10); ERROR: unrecognized node type: 118 So, the patch needs to detect this, produce a reasonable error message and document the limitations in copy.sqml, just like we do for CREATE TRIGGER. 3) For COPY TO, the WHEN clause is accepted but ignored, leading to confusing cases like this: test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10); COPY 151690 So, it contains subselect, but unlike COPY FROM it does not fail (because we never execute it). The fun part is that the expression is logically false, so a user might expect it to filter rows, yet we copy everything. IMHO we need to either error-out in these cases, complaining about WHEN not being supported for COPY TO, or make it work (effectively treating it as a simpler alternative to COPY (subselect) TO). 4) There are some minor code style issues in copy.c - the variable is misnamed as when_cluase, there are no spaces after 'if' etc. See the attached patch fixing this. AFAICS we could just get rid of the extra when_cluase variable and mess with the cstate->whenClause directly, depending on how (3) gets fixed. 5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it requires it to be 'WHEN (expr)'. I suggest we do the same thing here, requiring the parentheses. 6) The skip logic in CopyFrom() seems to be slightly wrong. It does work, but the next_record label is defined after CHECK_FOR_INTERRUPTS() so a COPY will not respond to Ctrl-C unless it finds a row matching the WHEN condition. If you have a highly selective condition, that's a bit inconvenient. It also skips MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); so I wonder what the heap_form_tuple() right after the next_record label will use for tuples right after a skipped one. I'd bet it'll use the oldcontext (essentially the long-lived context), essentially making it a memory leak. So I suggest to get rid of the next_record label, and use 'continue' instead of the 'goto next_record'. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From fcf4b43dae7062f4786536fb262a665ad0b05b26 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <to...@2ndquadrant.com> Date: Sun, 28 Oct 2018 16:09:37 +0100 Subject: [PATCH 2/2] review --- doc/src/sgml/ref/copy.sgml | 2 +- src/backend/commands/copy.c | 22 +++++++++++----------- src/backend/parser/gram.y | 2 +- src/include/nodes/parsenodes.h | 2 +- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 8088e779b6..17bedf95cc 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -366,7 +366,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </varlistentry> <varlistentry> - <term><literal>WHEN Clause</literal></term> + <term><literal>WHEN</literal></term> <listitem> <para> The optional <literal>WHEN</literal> clause has the general form diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 192a1c576b..8e87605135 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -803,7 +803,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, Relation rel; Oid relid; RawStmt *query = NULL; - Node *when_cluase= NULL; + Node *whenClause = NULL; /* * Disallow COPY to/from file or program except to users with the @@ -857,19 +857,19 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, NULL, false, false); rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT); - if(stmt->whenClause) + if (stmt->whenClause) { /* add rte to column namespace */ addRTEtoQuery(pstate, rte, false, false, true); /* Transform the raw expression tree */ - when_cluase = transformExpr(pstate, stmt->whenClause, EXPR_KIND_OTHER); + whenClause = transformExpr(pstate, stmt->whenClause, EXPR_KIND_OTHER); /* Make sure it yields a boolean result. */ - when_cluase = coerce_to_boolean(pstate, when_cluase, "WHEN"); + whenClause = coerce_to_boolean(pstate, whenClause, "WHEN"); - when_cluase = (Node *) canonicalize_qual((Expr *) when_cluase, false); - when_cluase = (Node *) make_ands_implicit((Expr *) when_cluase); + whenClause = (Node *) canonicalize_qual((Expr *) whenClause, false); + whenClause = (Node *) make_ands_implicit((Expr *) whenClause); } tupDesc = RelationGetDescr(rel); @@ -1020,7 +1020,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program, NULL, stmt->attlist, stmt->options); - cstate->whenClause=when_cluase; + cstate->whenClause = whenClause; *processed = CopyFrom(cstate); /* copy from file to database */ EndCopyFrom(cstate); } @@ -2559,9 +2559,9 @@ CopyFrom(CopyState cstate) ExecSetupChildParentMapForLeaf(proute); } - if(cstate->whenClause ) + if (cstate->whenClause) cstate->qualexpr = ExecInitQual(castNode(List, cstate->whenClause), - &mtstate->ps); + &mtstate->ps); /* * It's more efficient to prepare a bunch of tuples for insertion, and @@ -2717,11 +2717,11 @@ next_record: slot = myslot; ExecStoreHeapTuple(tuple, slot, false); - if(cstate->whenClause) + if (cstate->whenClause) { econtext->ecxt_scantuple = myslot; if (!ExecQual(cstate->qualexpr, econtext)) - goto next_record; + goto next_record; } /* Determine the partition to heap_insert the tuple into */ diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 0f316846b6..6063644850 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -3180,7 +3180,7 @@ copy_generic_opt_arg_list: ; opt_when_clause: - WHEN a_expr { $$ = $2; } + WHEN '(' a_expr ')' { $$ = $3; } | /*EMPTY*/ { $$ = NULL; } ; diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 4d3a97f9bc..d9d41b5119 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1972,7 +1972,7 @@ typedef struct CopyStmt bool is_program; /* is 'filename' a program to popen? */ char *filename; /* filename, or NULL for STDIN/STDOUT */ List *options; /* List of DefElem nodes */ - Node *whenClause; /* qualifications */ + Node *whenClause; /* WHEN condition (or NULL) */ } CopyStmt; /* ---------------------- -- 2.13.6
>From 3f87624019d0506920cee98dc5cba2e447b3c488 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <to...@2ndquadrant.com> Date: Sun, 28 Oct 2018 15:04:33 +0100 Subject: [PATCH 1/2] rebased patch --- doc/src/sgml/ref/copy.sgml | 19 ++++++++++++++++++ src/backend/commands/copy.c | 35 +++++++++++++++++++++++++++++++++ src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c | 1 + src/backend/parser/gram.y | 9 ++++++++- src/include/nodes/parsenodes.h | 1 + src/interfaces/ecpg/preproc/ecpg.addons | 2 +- 7 files changed, 66 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 13a8b68d95..8088e779b6 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -25,6 +25,7 @@ PostgreSQL documentation COPY <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] FROM { '<replaceable class="parameter">filename</replaceable>' | PROGRAM '<replaceable class="parameter">command</replaceable>' | STDIN } [ [ WITH ] ( <replaceable class="parameter">option</replaceable> [, ...] ) ] + [ WHEN <replaceable class="parameter">condition</replaceable> ] COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] | ( <replaceable class="parameter">query</replaceable> ) } TO { '<replaceable class="parameter">filename</replaceable>' | PROGRAM '<replaceable class="parameter">command</replaceable>' | STDOUT } @@ -364,6 +365,24 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </listitem> </varlistentry> + <varlistentry> + <term><literal>WHEN Clause</literal></term> + <listitem> + <para> + The optional <literal>WHEN</literal> clause has the general form +<synopsis> +WHEN <replaceable class="parameter">condition</replaceable> +</synopsis> + where <replaceable class="parameter">condition</replaceable> is + any expression that evaluates to a result of type + <type>boolean</type>. Any row that does not satisfy this + condition will not be inserted to the table. A row satisfies the + condition if it returns true when the actual row values are + substituted for any variable references. + </para> + </listitem> + </varlistentry> + </variablelist> </refsect1> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index b58a74f4e3..192a1c576b 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -39,7 +39,10 @@ #include "miscadmin.h" #include "optimizer/clauses.h" #include "optimizer/planner.h" +#include "optimizer/prep.h" #include "nodes/makefuncs.h" +#include "parser/parse_coerce.h" +#include "parser/parse_expr.h" #include "parser/parse_relation.h" #include "port/pg_bswap.h" #include "rewrite/rewriteHandler.h" @@ -148,6 +151,7 @@ typedef struct CopyStateData bool convert_selectively; /* do selective binary conversion? */ List *convert_select; /* list of column names (can be NIL) */ bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ + Node *whenClause; /* qualifications */ /* these are just for error messages, see CopyFromErrorCallback */ const char *cur_relname; /* table name for error messages */ @@ -179,6 +183,7 @@ typedef struct CopyStateData ExprState **defexprs; /* array of default att expressions */ bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; + ExprState *qualexpr; TransitionCaptureState *transition_capture; @@ -798,6 +803,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, Relation rel; Oid relid; RawStmt *query = NULL; + Node *when_cluase= NULL; /* * Disallow COPY to/from file or program except to users with the @@ -851,6 +857,21 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, NULL, false, false); rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT); + if(stmt->whenClause) + { + /* add rte to column namespace */ + addRTEtoQuery(pstate, rte, false, false, true); + + /* Transform the raw expression tree */ + when_cluase = transformExpr(pstate, stmt->whenClause, EXPR_KIND_OTHER); + + /* Make sure it yields a boolean result. */ + when_cluase = coerce_to_boolean(pstate, when_cluase, "WHEN"); + + when_cluase = (Node *) canonicalize_qual((Expr *) when_cluase, false); + when_cluase = (Node *) make_ands_implicit((Expr *) when_cluase); + } + tupDesc = RelationGetDescr(rel); attnums = CopyGetAttnums(tupDesc, rel, stmt->attlist); foreach(cur, attnums) @@ -999,6 +1020,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt, cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program, NULL, stmt->attlist, stmt->options); + cstate->whenClause=when_cluase; *processed = CopyFrom(cstate); /* copy from file to database */ EndCopyFrom(cstate); } @@ -2537,6 +2559,10 @@ CopyFrom(CopyState cstate) ExecSetupChildParentMapForLeaf(proute); } + if(cstate->whenClause ) + cstate->qualexpr = ExecInitQual(castNode(List, cstate->whenClause), + &mtstate->ps); + /* * It's more efficient to prepare a bunch of tuples for insertion, and * insert them in one heap_multi_insert() call, than call heap_insert() @@ -2667,6 +2693,8 @@ CopyFrom(CopyState cstate) /* Switch into its memory context */ MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); +next_record: + if (!NextCopyFrom(cstate, econtext, values, nulls, &loaded_oid)) break; @@ -2689,6 +2717,13 @@ CopyFrom(CopyState cstate) slot = myslot; ExecStoreHeapTuple(tuple, slot, false); + if(cstate->whenClause) + { + econtext->ecxt_scantuple = myslot; + if (!ExecQual(cstate->qualexpr, econtext)) + goto next_record; + } + /* Determine the partition to heap_insert the tuple into */ if (proute) { diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index e8ea59e34a..6fa13de960 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -3310,6 +3310,7 @@ _copyCopyStmt(const CopyStmt *from) COPY_SCALAR_FIELD(is_program); COPY_STRING_FIELD(filename); COPY_NODE_FIELD(options); + COPY_NODE_FIELD(whenClause); return newnode; } diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 3bb91c9595..18456ffdf4 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -1221,6 +1221,7 @@ _equalCopyStmt(const CopyStmt *a, const CopyStmt *b) COMPARE_SCALAR_FIELD(is_program); COMPARE_STRING_FIELD(filename); COMPARE_NODE_FIELD(options); + COMPARE_NODE_FIELD(whenClause); return true; } diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 6d23bfb0b3..0f316846b6 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -509,6 +509,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <defelt> copy_generic_opt_elem %type <list> copy_generic_opt_list copy_generic_opt_arg_list %type <list> copy_options +%type <node> opt_when_clause %type <typnam> Typename SimpleTypename ConstTypename GenericType Numeric opt_float @@ -2969,7 +2970,7 @@ ClosePortalStmt: *****************************************************************************/ CopyStmt: COPY opt_binary qualified_name opt_column_list opt_oids - copy_from opt_program copy_file_name copy_delimiter opt_with copy_options + copy_from opt_program copy_file_name copy_delimiter opt_with copy_options opt_when_clause { CopyStmt *n = makeNode(CopyStmt); n->relation = $3; @@ -2978,6 +2979,7 @@ CopyStmt: COPY opt_binary qualified_name opt_column_list opt_oids n->is_from = $6; n->is_program = $7; n->filename = $8; + n->whenClause = $12; if (n->is_program && n->filename == NULL) ereport(ERROR, @@ -3177,6 +3179,11 @@ copy_generic_opt_arg_list: } ; +opt_when_clause: + WHEN a_expr { $$ = $2; } + | /*EMPTY*/ { $$ = NULL; } + ; + /* beware of emitting non-string list elements here; see commands/define.c */ copy_generic_opt_arg_list_item: opt_boolean_or_string { $$ = (Node *) makeString($1); } diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index aa4a0dba2a..4d3a97f9bc 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1972,6 +1972,7 @@ typedef struct CopyStmt bool is_program; /* is 'filename' a program to popen? */ char *filename; /* filename, or NULL for STDIN/STDOUT */ List *options; /* List of DefElem nodes */ + Node *whenClause; /* qualifications */ } CopyStmt; /* ---------------------- diff --git a/src/interfaces/ecpg/preproc/ecpg.addons b/src/interfaces/ecpg/preproc/ecpg.addons index ca3efadc48..707d47b385 100644 --- a/src/interfaces/ecpg/preproc/ecpg.addons +++ b/src/interfaces/ecpg/preproc/ecpg.addons @@ -192,7 +192,7 @@ ECPG: where_or_current_clauseWHERECURRENT_POFcursor_name block char *cursor_marker = $4[0] == ':' ? mm_strdup("$0") : $4; $$ = cat_str(2,mm_strdup("where current of"), cursor_marker); } -ECPG: CopyStmtCOPYopt_binaryqualified_nameopt_column_listopt_oidscopy_fromopt_programcopy_file_namecopy_delimiteropt_withcopy_options addon +ECPG: CopyStmtCOPYopt_binaryqualified_nameopt_column_listopt_oidscopy_fromopt_programcopy_file_namecopy_delimiteropt_withcopy_optionsopt_when_clause addon if (strcmp($6, "from") == 0 && (strcmp($7, "stdin") == 0 || strcmp($7, "stdout") == 0)) mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not implemented"); -- 2.13.6