Hi, On 2024-02-06 15:11:05 +0900, Michael Paquier wrote: > On Mon, Feb 05, 2024 at 09:46:42PM -0800, Andres Freund wrote: > > No - what I mean is that it doesn't make sense to have copy_attribute_out(), > > as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it > > can directly call the right function. That does require splitting a bit more > > between csv and text output, but I think that can be done without much > > duplication. > > I am not sure to understand here. In what is that different from > reverting 2889fd23be56 then mark CopyAttributeOutCSV and > CopyAttributeOutText as static inline?
Well, you can't just do that, because there's only one caller, namely CopyToTextOneRow(). What I am trying to suggest is something like the attached, just a quick hacky POC. Namely to split out CSV support from CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code duplication by moving the code into a new CopyToTextLikeOneRow(). I named it CopyToTextLike* here, because it seems confusing that some Text* are used for both CSV and text and others are actually just for text. But if were to go for that, we should go further. To test the performnce effects I chose to remove the pointless encoding "check" we're discussing in the other thread, as it makes it harder to see the time differences due to the per-attribute code. I did three runs of pgbench -t of [1] and chose the fastest result for each. With turbo mode and power saving disabled: Avg Time HEAD 995.349 Remove Encoding Check 870.793 v13-0001 869.678 Remove out callback 839.508 Greetings, Andres Freund [1] COPY (SELECT 1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2, generate_series(1, 1000000::int4)) TO '/dev/null';
>From 28be7510a62984eb09b41aabe8c345a07b2b0e97 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Tue, 6 Feb 2024 14:57:20 -0800 Subject: [PATCH v13a 1/3] WIP: COPY TO: remove unnecessary and ineffective encoding "checks" Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- src/backend/commands/copyto.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 52b42f8a522..ea317b5b3d5 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -637,9 +637,10 @@ BeginCopyTo(ParseState *pstate, * are the same, we must apply pg_any_to_server() to validate data in * multibyte encodings. */ - cstate->need_transcoding = - (cstate->file_encoding != GetDatabaseEncoding() || - pg_database_encoding_max_length() > 1); + cstate->need_transcoding = ( + cstate->file_encoding != GetDatabaseEncoding() + /* || pg_database_encoding_max_length() > 1 */ + ); /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); -- 2.38.0
>From 176442831183baa7648fee87ab4aa016d9c4d8e4 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Tue, 6 Feb 2024 08:40:17 +0900 Subject: [PATCH v13a 2/3] Extract COPY FROM/TO format implementations This doesn't change the current behavior. This just introduces a set of copy routines called CopyFromRoutine and CopyToRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. There are plans to extend that more in the future with custom formats. This improves performance when a relation has many attributes as this eliminates format-based if branches. The more the attributes, the better the performance gain. Blah add some numbers. --- src/include/commands/copyapi.h | 100 ++++++ src/include/commands/copyfrom_internal.h | 10 + src/backend/commands/copyfrom.c | 209 ++++++++--- src/backend/commands/copyfromparse.c | 362 ++++++++++--------- src/backend/commands/copyto.c | 438 +++++++++++++++-------- src/tools/pgindent/typedefs.list | 2 + 6 files changed, 770 insertions(+), 351 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 00000000000..87e0629c2f7 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,100 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* These are private in commands/copy[from|to].c */ +typedef struct CopyFromStateData *CopyFromState; +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated to the relation's attributes writing to. `finfo` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optionally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated to the relation's attributes reading from. `finfo` can be + * optionally filled. `atttypid` is the OID of data type used by the + * relation's attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 759f8e3d090..5fb52dc6298 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -65,6 +66,9 @@ typedef int (*CopyReadAttributes) (CopyFromState cstate); */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ @@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate); extern int CopyReadAttributesCSV(CopyFromState cstate); extern int CopyReadAttributesText(CopyFromState cstate); +/* Callbacks for CopyFromRoutine->CopyFromOneRow */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 41f6bc43e49..a90b7189b5d 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -108,6 +108,160 @@ static char *limit_printout_length(const char *str); static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations for text and CSV. + */ + +/* + * CopyFromTextInFunc + * + * Assign input function data for a relation's attribute in text/CSV format. + */ +static void +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromTextStart + * + * Start of COPY FROM for text/CSV format. + */ +static void +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* create workspace for CopyReadAttributes results */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); + + /* Set read attribute callback */ + if (cstate->opts.csv_mode) + cstate->copy_read_attributes = CopyReadAttributesCSV; + else + cstate->copy_read_attributes = CopyReadAttributesText; +} + +/* + * CopyFromTextEnd + * + * End of COPY FROM for text/CSV format. + */ +static void +CopyFromTextEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* + * CopyFromBinaryInFunc + * + * Assign input function data for a relation's attribute in binary format. + */ +static void +CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromBinaryStart + * + * Start of COPY FROM for binary format. + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * CopyFromBinaryEnd + * + * End of COPY FROM for binary format. + */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * Routines assigned to each format. ++ + * CSV and text share the same implementation, at the exception of the + * copy_read_attributes callback. + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextInFunc, + .CopyFromStart = CopyFromTextStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextEnd, +}; + +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextInFunc, + .CopyFromStart = CopyFromTextStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Define the COPY FROM routines to use for a format. + */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + + /* * error context callback for COPY FROM * @@ -1386,7 +1540,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1418,6 +1571,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1571,25 +1727,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1622,13 +1759,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1763,25 +1896,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - - if (cstate->opts.csv_mode) - cstate->copy_read_attributes = CopyReadAttributesCSV; - else - cstate->copy_read_attributes = CopyReadAttributesText; - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1794,6 +1909,8 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 906756362e9..c45f9ae134e 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -832,6 +832,200 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +/* + * CopyFromTextOneRow + * + * Copy one row to a set of `values` and `nulls` for the text and CSV + * formats. + */ +bool +CopyFromTextOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (cstate->opts.csv_mode) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + cstate->num_errors++; + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +/* + * CopyFromBinaryOneRow + * + * Copy one row to a set of `values` and `nulls` for the binary format. + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -841,7 +1035,8 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) * read from the file, and DEFAULT option is unset. * * 'values' and 'nulls' arrays must be the same length as columns of the - * relation passed to BeginCopyFrom. This function fills the arrays. + * relation passed to BeginCopyFrom. This function calls the format routine + * that should fill the arrays. */ bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, @@ -849,181 +1044,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - cstate->num_errors++; - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, + nulls)) + return false; /* * Now compute and insert any defaults available for the columns not diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index ea317b5b3d5..8923627d9ad 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -24,6 +24,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -79,6 +80,9 @@ typedef void (*CopyAttributeOut) (CopyToState cstate, const char *string, */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -143,6 +147,291 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToTextSendEndOfRow + * + * Apply line terminations for a line sent in text or CSV format depending + * on the destination, then send the end of a row. + */ +static inline void +CopyToTextSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextStart + * + * Start of COPY TO for text and CSV format. + */ +static void +CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) +{ + ListCell *cur; + + /* Set output representation callback */ + if (cstate->opts.csv_mode) + cstate->copy_attribute_out = CopyAttributeOutCSV; + else + cstate->copy_attribute_out = CopyAttributeOutText; + + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + /* Ignore quotes */ + cstate->copy_attribute_out(cstate, colname, false); + } + + CopyToTextSendEndOfRow(cstate); + } +} + +/* + * CopyToTextOutFunc + * + * Assign output function data for a relation's attribute in text/CSV format. + */ +static void +CopyToTextOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToTextOneRow + * + * Process one row for text/CSV format. + */ +static void +CopyToTextOneRow(CopyToState cstate, + TupleTableSlot *slot) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], value); + + cstate->copy_attribute_out(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + } + } + + CopyToTextSendEndOfRow(cstate); +} + +/* + * CopyToTextEnd + * + * End of COPY TO for text/CSV format. + */ +static void +CopyToTextEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* + * CopyToBinaryStart + * + * Start of COPY TO for binary format. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * CopyToBinaryOutFunc + * + * Assign output function data for a relation's attribute in binary format. + */ +static void +CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToBinaryOneRow + * + * Process one row for binary format. + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* + * CopyToBinaryEnd + * + * End of COPY TO for binary format. + */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CSV and text share the same implementation, at the exception of the + * output representation function. + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextStart, + .CopyToOutFunc = CopyToTextOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextEnd, +}; + +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextStart, + .CopyToOutFunc = CopyToTextOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Define the COPY TO routines to use for a format. This should be called + * after options are parsed. + */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -210,16 +499,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -254,10 +533,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -445,14 +720,8 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); - /* Set output representation callback */ - if (!cstate->opts.binary) - { - if (cstate->opts.csv_mode) - cstate->copy_attribute_out = CopyAttributeOutCSV; - else - cstate->copy_attribute_out = CopyAttributeOutText; - } + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); /* Process the source/target relation or query */ if (rel) @@ -792,19 +1061,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -817,54 +1077,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - /* Ignore quotes */ - cstate->copy_attribute_out(cstate, colname, false); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -903,13 +1116,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -925,68 +1132,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - bool need_delim = false; - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; - ListCell *cur; - char *string; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) - { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - } - - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else - { - if (!cstate->opts.binary) - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - - cstate->copy_attribute_out(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - } - else - { - bytea *outputbytes; - - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 91433d439b7..d02a7773e37 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -473,6 +473,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice @@ -482,6 +483,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.38.0
>From 6f505a0b94564ad2dfa628f61ddee7c9178c4df0 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Tue, 6 Feb 2024 14:50:46 -0800 Subject: [PATCH v13a 3/3] WIP: Remove out callback --- src/backend/commands/copyto.c | 66 ++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 8923627d9ad..8cbcc399ea0 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -55,14 +55,6 @@ typedef enum CopyDest COPY_CALLBACK, /* to callback function */ } CopyDest; -/* - * Per-format callback to send output representation of one attribute for - * a `string`. `use_quote` tracks if quotes are required in the output - * representation. - */ -typedef void (*CopyAttributeOut) (CopyToState cstate, const char *string, - bool use_quote); - /* * This struct contains all the state variables used throughout a COPY TO * operation. @@ -109,7 +101,6 @@ typedef struct CopyToStateData MemoryContext copycontext; /* per-copy execution context */ FmgrInfo *out_functions; /* lookup info for output functions */ - CopyAttributeOut copy_attribute_out; /* output representation callback */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ } CopyToStateData; @@ -131,9 +122,8 @@ static void EndCopy(CopyToState cstate); static void ClosePipeToProgram(CopyToState cstate); static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot); -/* Callbacks for copy_attribute_out */ -static void CopyAttributeOutText(CopyToState cstate, const char *string, - bool use_quote); +/* Helpers for for CopyToTextStart / CopyToTextLikeOneRow */ +static void CopyAttributeOutText(CopyToState cstate, const char *string); static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); @@ -192,12 +182,6 @@ CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) { ListCell *cur; - /* Set output representation callback */ - if (cstate->opts.csv_mode) - cstate->copy_attribute_out = CopyAttributeOutCSV; - else - cstate->copy_attribute_out = CopyAttributeOutText; - /* * For non-binary copy, we need to convert null_print to file encoding, * because it will be sent directly with CopySendString. @@ -224,7 +208,10 @@ CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); /* Ignore quotes */ - cstate->copy_attribute_out(cstate, colname, false); + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false); + else + CopyAttributeOutText(cstate, colname); } CopyToTextSendEndOfRow(cstate); @@ -248,13 +235,16 @@ CopyToTextOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) } /* - * CopyToTextOneRow + * CopyToTextLikeOneRow * - * Process one row for text/CSV format. + * Process one row for text/CSV format. This is intended to be used by callers + * that pass in a constant value for is_csv, allowing the compiler to optimize + * relevant branches away. */ -static void -CopyToTextOneRow(CopyToState cstate, - TupleTableSlot *slot) +static pg_attribute_always_inline void +CopyToTextLikeOneRow(CopyToState cstate, + TupleTableSlot *slot, + bool is_csv) { bool need_delim = false; FmgrInfo *out_functions = cstate->out_functions; @@ -280,14 +270,33 @@ CopyToTextOneRow(CopyToState cstate, string = OutputFunctionCall(&out_functions[attnum - 1], value); - cstate->copy_attribute_out(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); + /* optimized away by compiler, argument is constant at caller */ + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); } } CopyToTextSendEndOfRow(cstate); } +static void +CopyToTextOneRow(CopyToState cstate, + TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +static void +CopyToCSVOneRow(CopyToState cstate, + TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} + + /* * CopyToTextEnd * @@ -406,7 +415,7 @@ static const CopyToRoutine CopyToRoutineText = { static const CopyToRoutine CopyToRoutineCSV = { .CopyToStart = CopyToTextStart, .CopyToOutFunc = CopyToTextOutFunc, - .CopyToOneRow = CopyToTextOneRow, + .CopyToOneRow = CopyToCSVOneRow, .CopyToEnd = CopyToTextEnd, }; @@ -1155,8 +1164,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) } while (0) static void -CopyAttributeOutText(CopyToState cstate, const char *string, - bool use_quote) +CopyAttributeOutText(CopyToState cstate, const char *string) { const char *ptr; const char *start; -- 2.38.0