On Mon, Nov 18, 2024 at 8:44 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Mon, Nov 18, 2024 at 5:31 PM Sutou Kouhei <k...@clear-code.com> wrote: > > > > Hi, > > > > In <CAD21AoC=DX5QQVb27C6UdpPfY-F=-PGnQ1u6rWo69DV=4et...@mail.gmail.com> > > "Re: Make COPY format extendable: Extract COPY TO format implementations" > > on Mon, 18 Nov 2024 17:02:41 -0800, > > Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > > I have a question about v22. We use pg_attribute_always_inline for > > > some functions to avoid function call overheads. Applying it to > > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as > > > we've discussed. But there are more function where the patch applied > > > it to: > > > > > > -bool > > > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) > > > +static pg_attribute_always_inline bool > > > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int > > > *nfields, bool is_csv) > > > > > > -static bool > > > -CopyReadLineText(CopyFromState cstate) > > > +static pg_attribute_always_inline bool > > > +CopyReadLineText(CopyFromState cstate, bool is_csv) > > > > > > +static pg_attribute_always_inline void > > > +CopyToTextLikeSendEndOfRow(CopyToState cstate) > > > > > > I think it's out of scope of this patch even if these changes are > > > legitimate. Is there any reason for these changes? > > > > Yes for NextCopyFromRawFields() and CopyReadLineText(). > > No for CopyToTextLikeSendEndOfRow(). > > > > NextCopyFromRawFields() and CopyReadLineText() have "bool > > is_csv". So I think that we should use > > pg_attribute_always_inline (or inline) like > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow(). I think > > that it's not out of scope of this patch because it's a part > > of CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() > > optimization. > > > > Note: The optimization is based on "bool is_csv" parameter > > and constant "true"/"false" argument function call. If we > > can inline this function call, all "if (is_csv)" checks in > > the function are removed. > > Understood, thank you for pointing this out. > > > > > pg_attribute_always_inline (or inline) for > > CopyToTextLikeSendEndOfRow() is out of scope of this > > patch. You're right. > > > > I think that inlining CopyToTextLikeSendEndOfRow() is better > > because it's called per row. But it's not related to the > > optimization. > > > > > > Should I create a new patch set without > > pg_attribute_always_inline/inline for > > CopyToTextLikeSendEndOfRow()? Or could you remove it when > > you push? > > Since I'm reviewing the patch and the patch organization I'll include it. >
I've extracted the changes to refactor COPY TO/FROM to use the format callback routines from v23 patch set, which seems to be a better patch split to me. Also, I've reviewed these changes and made some changes on top of them. The attached patches are: 0001: make COPY TO use CopyToRoutine. 0002: minor changes to 0001 patch. will be fixed up. 0003: make COPY FROM use CopyFromRoutine. 0004: minor changes to 0003 patch. will be fixed up. I've confirmed that v24 has a similar performance improvement to v23. Please check these extractions and minor change suggestions. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
From 257a284447e64753277f7bc08b387e901bcab8bb Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.mshk@gmail.com> Date: Tue, 19 Nov 2024 11:52:33 -0800 Subject: [PATCH v24 2/4] fixup: fixup: minor updates for COPY TO refactoring. includes: - reroder function definitions. - clenaup comments. --- src/backend/commands/copyto.c | 242 +++++++++++++++------------------ src/include/commands/copyapi.h | 23 ++-- 2 files changed, 121 insertions(+), 144 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 46f3507a8b..73b9ca4457 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -65,7 +65,7 @@ typedef enum CopyDest */ typedef struct CopyToStateData { - /* format routine */ + /* format-specific routines */ const CopyToRoutine *routine; /* low-level state data */ @@ -118,6 +118,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string); static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); +/* built-in format-specific routines */ +static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot, + bool is_csv); +static void CopyToTextLikeEnd(CopyToState cstate); +static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToBinaryEnd(CopyToState cstate); + /* Low-level communications functions */ static void SendCopyBegin(CopyToState cstate); static void SendCopyEnd(CopyToState cstate); @@ -125,49 +138,55 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize); static void CopySendString(CopyToState cstate, const char *str); static void CopySendChar(CopyToState cstate, char c); static void CopySendEndOfRow(CopyToState cstate); +static void CopySendTextLikeEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); /* - * CopyToRoutine implementations. - */ - -/* - * CopyToTextLikeSendEndOfRow + * COPY TO routines for built-in formats. * - * Apply line terminations for a line sent in text or CSV format depending - * on the destination, then send the end of a row. + * CSV and text formats share the same TextLike routines except for the + * one-row callback. */ -static pg_attribute_always_inline void -CopyToTextLikeSendEndOfRow(CopyToState cstate) + +/* TEXT format */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +/* CSV format */ +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +/* BINARY format */ +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* Return COPY TO routines for the given option */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) { - switch (cstate->copy_dest) - { - case COPY_FILE: - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - break; - case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - CopySendChar(cstate, '\n'); - break; - default: - break; - } + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; - /* Now take the actions related to the end of a row */ - CopySendEndOfRow(cstate); + /* default is text */ + return &CopyToRoutineText; } -/* - * CopyToTextLikeStart - * - * Start of COPY TO for text and CSV format. - */ +/* Implementation of the start callback for text and CSV formats */ static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) { @@ -203,14 +222,13 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) CopyAttributeOutText(cstate, colname); } - CopyToTextLikeSendEndOfRow(cstate); + CopySendTextLikeEndOfRow(cstate); } } /* - * CopyToTextLikeOutFunc - * - * Assign output function data for a relation's attribute in text/CSV format. + * Implementation of the outfunc callback for text and CSV formats. Assign + * the output function data to the given *finfo. */ static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) @@ -223,13 +241,24 @@ CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) fmgr_info(func_oid, finfo); } +/* Implementation of the per-row callback for text format */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* Implementation of the per-row callback for CSV format */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} /* - * CopyToTextLikeOneRow - * - * Process one row for text/CSV format. - * * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow(). + * + * We use pg_attribute_always_inline to reduce function call overheads. */ static pg_attribute_always_inline void CopyToTextLikeOneRow(CopyToState cstate, @@ -271,36 +300,10 @@ CopyToTextLikeOneRow(CopyToState cstate, } } - CopyToTextLikeSendEndOfRow(cstate); + CopySendTextLikeEndOfRow(cstate); } -/* - * CopyToTextOneRow - * - * Per-row callback for COPY TO with text format. - */ -static void -CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) -{ - CopyToTextLikeOneRow(cstate, slot, false); -} - -/* - * CopyToTextOneRow - * - * Per-row callback for COPY TO with CSV format. - */ -static void -CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) -{ - CopyToTextLikeOneRow(cstate, slot, true); -} - -/* - * CopyToTextLikeEnd - * - * End of COPY TO for text/CSV format. - */ +/* Implementation of the end callback for text and CSV formats */ static void CopyToTextLikeEnd(CopyToState cstate) { @@ -308,18 +311,12 @@ CopyToTextLikeEnd(CopyToState cstate) } /* - * CopyToRoutine implementation for "binary". - */ - -/* - * CopyToBinaryStart - * - * Start of COPY TO for binary format. + * Implementation of the start callback for binary format. Send a header + * for a binary copy. */ static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) { - /* Generate header for a binary copy */ int32 tmp; /* Signature */ @@ -333,9 +330,8 @@ CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) } /* - * CopyToBinaryOutFunc - * - * Assign output function data for a relation's attribute in binary format. + * Implementation of the outfunc callback for binary format. Assign + * the binary output function to the given *finfo. */ static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) @@ -348,11 +344,7 @@ CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) fmgr_info(func_oid, finfo); } -/* - * CopyToBinaryOneRow - * - * Process one row for binary format. - */ +/* Implementation of the per-row callback for binary format */ static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) { @@ -385,11 +377,7 @@ CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) CopySendEndOfRow(cstate); } -/* - * CopyToBinaryEnd - * - * End of COPY TO for binary format. - */ +/* Implementation of the end callback for binary format */ static void CopyToBinaryEnd(CopyToState cstate) { @@ -399,47 +387,6 @@ CopyToBinaryEnd(CopyToState cstate) CopySendEndOfRow(cstate); } -/* - * CSV and text share the same implementation, at the exception of the - * output representation and per-row callbacks. - */ -static const CopyToRoutine CopyToRoutineText = { - .CopyToStart = CopyToTextLikeStart, - .CopyToOutFunc = CopyToTextLikeOutFunc, - .CopyToOneRow = CopyToTextOneRow, - .CopyToEnd = CopyToTextLikeEnd, -}; - -static const CopyToRoutine CopyToRoutineCSV = { - .CopyToStart = CopyToTextLikeStart, - .CopyToOutFunc = CopyToTextLikeOutFunc, - .CopyToOneRow = CopyToCSVOneRow, - .CopyToEnd = CopyToTextLikeEnd, -}; - -static const CopyToRoutine CopyToRoutineBinary = { - .CopyToStart = CopyToBinaryStart, - .CopyToOutFunc = CopyToBinaryOutFunc, - .CopyToOneRow = CopyToBinaryOneRow, - .CopyToEnd = CopyToBinaryEnd, -}; - -/* - * Define the COPY TO routines to use for a format. This should be called - * after options are parsed. - */ -static const CopyToRoutine * -CopyToGetRoutine(CopyFormatOptions opts) -{ - if (opts.csv_mode) - return &CopyToRoutineCSV; - else if (opts.binary) - return &CopyToRoutineBinary; - - /* default is text */ - return &CopyToRoutineText; -} - /* * Send copy start/stop messages for frontend copies. These have changed * in past protocol redesigns. @@ -555,6 +502,35 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the + * the line termination and do common appropriate things for the end of row. + */ +static inline void +CopySendTextLikeEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + /* * These functions do apply some data conversion */ @@ -1143,7 +1119,7 @@ DoCopyTo(CopyToState cstate) /* * Emit one row during DoCopyTo(). */ -static void +static inline void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { MemoryContext oldcontext; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 5ce24f195d..99981b1579 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -27,31 +27,32 @@ typedef struct CopyToStateData *CopyToState; typedef struct CopyToRoutine { /* - * Called when COPY TO is started to set up the output functions - * associated with the relation's attributes reading from. `finfo` can be - * optionally filled to provide the catalog information of the output - * function. `atttypid` is the OID of data type used by the relation's - * attribute. + * Set output function information. This callback is called once at the + * beginning of COPY TO. + * + * 'finfo' can be optionally filled to provide the catalog information of + * the output function. + * + * 'atttypid' is the OID of data type used by the relation's attribute. */ void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, FmgrInfo *finfo); /* - * Called when COPY TO is started. + * Start a COPY TO. This callback is called once at the beginning of COPY + * FROM. * - * `tupDesc` is the tuple descriptor of the relation from where the data + * 'tupDesc' is the tuple descriptor of the relation from where the data * is read. */ void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); /* - * Copy one row for COPY TO. - * - * `slot` is the tuple slot where the data is emitted. + * Write one row to the 'slot'. */ void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); - /* Called when COPY TO has ended */ + /* End a COPY TO. This callback is called once at the end of COPY FROM */ void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; -- 2.43.5
From b6b5c0409eed0558320e39bd642b2be17f17f590 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.mshk@gmail.com> Date: Tue, 19 Nov 2024 13:46:06 -0800 Subject: [PATCH v24 4/4] fixup: minor updates for COPY FROM refactoring. includes: - cleanup comments. - reorder function definitions. --- src/backend/commands/copyfrom.c | 161 +++++++++++------------ src/backend/commands/copyfromparse.c | 78 +++++------ src/include/commands/copyapi.h | 26 ++-- src/include/commands/copyfrom_internal.h | 2 +- 4 files changed, 121 insertions(+), 146 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index e77986f9a9..7f1de8a42b 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,31 +106,65 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); - /* - * CopyFromRoutine implementations for text and CSV. + * built-in format-specific routines. One-row callbacks are defined in + * copyfromparse.c */ +static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, + Oid *typioparam); +static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc); +static void CopyFromTextLikeEnd(CopyFromState cstate); +static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); +static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc); +static void CopyFromBinaryEnd(CopyFromState cstate); + /* - * CopyFromTextLikeInFunc - * - * Assign input function data for a relation's attribute in text/CSV format. + * COPY FROM routines for built-in formats. ++ + * CSV and text formats share the same TextLike routines except for the + * one-row callback. */ -static void -CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, - FmgrInfo *finfo, Oid *typioparam) + +/* TEXT format */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +/* CSV format */ +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +/* BINARY format */ +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* Return COPY FROM routines for the given option */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) { - Oid func_oid; + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; - getTypeInputInfo(atttypid, &func_oid, typioparam); - fmgr_info(func_oid, finfo); + /* default is text */ + return &CopyFromRoutineText; } -/* - * CopyFromTextLikeStart - * - * Start of COPY FROM for text/CSV format. - */ +/* Implementation of the start callback for text and CSV formats */ static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) { @@ -162,24 +196,37 @@ CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) } /* - * CopyFromTextLikeEnd - * - * End of COPY FROM for text/CSV format. + * Implementation of the infunc callback for text and CSV formats. Assign + * the input function data to the given *finfo. */ static void +CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, + Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the end callback for text and CSV formats */ +static void CopyFromTextLikeEnd(CopyFromState cstate) { /* nothing to do */ } -/* - * CopyFromRoutine implementation for "binary". - */ +/* Implementation of the start callback for binary format */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} /* - * CopyFromBinaryInFunc - * - * Assign input function data for a relation's attribute in binary format. + * Implementation of the infunc callback for binary format. Assign + * the binary input function to the given *finfo. */ static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, @@ -191,72 +238,13 @@ CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, fmgr_info(func_oid, finfo); } -/* - * CopyFromBinaryStart - * - * Start of COPY FROM for binary format. - */ -static void -CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) -{ - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); -} - -/* - * CopyFromBinaryEnd - * - * End of COPY FROM for binary format. - */ +/* Implementation of the end callback for binary format */ static void CopyFromBinaryEnd(CopyFromState cstate) { /* nothing to do */ } -/* - * Routines assigned to each format. -+ - * CSV and text share the same implementation, at the exception of the - * per-row callback. - */ -static const CopyFromRoutine CopyFromRoutineText = { - .CopyFromInFunc = CopyFromTextLikeInFunc, - .CopyFromStart = CopyFromTextLikeStart, - .CopyFromOneRow = CopyFromTextOneRow, - .CopyFromEnd = CopyFromTextLikeEnd, -}; - -static const CopyFromRoutine CopyFromRoutineCSV = { - .CopyFromInFunc = CopyFromTextLikeInFunc, - .CopyFromStart = CopyFromTextLikeStart, - .CopyFromOneRow = CopyFromCSVOneRow, - .CopyFromEnd = CopyFromTextLikeEnd, -}; - -static const CopyFromRoutine CopyFromRoutineBinary = { - .CopyFromInFunc = CopyFromBinaryInFunc, - .CopyFromStart = CopyFromBinaryStart, - .CopyFromOneRow = CopyFromBinaryOneRow, - .CopyFromEnd = CopyFromBinaryEnd, -}; - -/* - * Define the COPY FROM routines to use for a format. - */ -static const CopyFromRoutine * -CopyFromGetRoutine(CopyFormatOptions opts) -{ - if (opts.csv_mode) - return &CopyFromRoutineCSV; - else if (opts.binary) - return &CopyFromRoutineBinary; - - /* default is text */ - return &CopyFromRoutineText; -} - - /* * error context callback for COPY FROM * @@ -1578,7 +1566,7 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); - /* Set format routine */ + /* Set the format routine */ cstate->routine = CopyFromGetRoutine(cstate->opts); /* Process the target relation */ @@ -1918,6 +1906,7 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + /* Invoke the end callback */ cstate->routine->CopyFromEnd(cstate); /* No COPY FROM related resources except memory. */ diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 0447c4df7e..5416583e94 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -141,12 +141,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ static bool CopyReadLine(CopyFromState cstate, bool is_csv); -static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv); +static bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull); +static bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls, bool is_csv); /* Low-level communications functions */ @@ -740,6 +742,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * in the relation. * * NOTE: force_not_null option are not applied to the returned fields. + * + * We use pg_attribute_always_inline to reduce function call overheads. */ static pg_attribute_always_inline bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) @@ -839,20 +843,30 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i return true; } +/* Implementation of the per-row callback for text format */ +bool +CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); +} + +/* Implementation of the per-row callback for CSV format */ +bool +CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); +} + /* - * CopyFromTextLikeOneRow - * - * Copy one row to a set of `values` and `nulls` for the text and CSV - * formats. - * * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). + * + * We use pg_attribute_always_inline to reduce function call overheads. */ static pg_attribute_always_inline bool -CopyFromTextLikeOneRow(CopyFromState cstate, - ExprContext *econtext, - Datum *values, - bool *nulls, - bool is_csv) +CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls, bool is_csv) { TupleDesc tupDesc; AttrNumber attr_count; @@ -1001,43 +1015,10 @@ CopyFromTextLikeOneRow(CopyFromState cstate, return true; } - -/* - * CopyFromTextOneRow - * - * Per-row callback for COPY FROM with text format. - */ -bool -CopyFromTextOneRow(CopyFromState cstate, - ExprContext *econtext, - Datum *values, - bool *nulls) -{ - return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); -} - -/* - * CopyFromCSVOneRow - * - * Per-row callback for COPY FROM with CSV format. - */ -bool -CopyFromCSVOneRow(CopyFromState cstate, - ExprContext *econtext, - Datum *values, - bool *nulls) -{ - return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); -} - -/* - * CopyFromBinaryOneRow - * - * Copy one row to a set of `values` and `nulls` for the binary format. - */ +/* Implementation of the per-row callback for binary format */ bool -CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, - Datum *values, bool *nulls) +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) { TupleDesc tupDesc; AttrNumber attr_count; @@ -1130,6 +1111,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); + /* Get one row from source */ if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) return false; @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static pg_attribute_always_inline bool +static bool CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 224fda172e..ff269def9d 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -63,38 +63,42 @@ typedef struct CopyToRoutine typedef struct CopyFromRoutine { /* - * Called when COPY FROM is started to set up the input functions - * associated with the relation's attributes writing to. `finfo` can be - * optionally filled to provide the catalog information of the input - * function. `typioparam` can be optionally filled to define the OID of - * the type to pass to the input function. `atttypid` is the OID of data - * type used by the relation's attribute. + * Set input function information. This callback is called once at the + * beginning of COPY FROM. + * + * 'finfo' can be optionally filled to provide the catalog information of + * the input function. + * + * 'typioparam' can be optionally filled to define the OID of the type to + * pass to the input function.'atttypid' is the OID of data type used by + * the relation's attribute. */ void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, Oid *typioparam); /* - * Called when COPY FROM is started. + * Start a COPY FROM. This callback is called once at the beginning of + * COPY FROM. * - * `tupDesc` is the tuple descriptor of the relation where the data needs + * 'tupDesc' is the tuple descriptor of the relation where the data needs * to be copied. This can be used for any initialization steps required * by a format. */ void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); /* - * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * Read one row from the source and fill *values and *nulls. * * 'econtext' is used to evaluate default expression for each column that * is either not read from the file or is using the DEFAULT option of COPY * FROM. It is NULL if no default values are used. * - * Returns false if there are no more tuples to copy. + * Returns false if there are no more tuples to read. */ bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); - /* Called when COPY FROM has ended. */ + /* End a COPY FROM. This callback is called once at the end of COPY FROM */ void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c11b5ff3cc..55fe24d728 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -187,7 +187,7 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); -/* Callbacks for CopyFromRoutine->CopyFromOneRow */ +/* One-row callbacks for built-in formats defined in copyfromparse.c */ extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, -- 2.43.5
From e1f2e5f906443487229b4c6aa664bfa9e3c7fbdc Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.mshk@gmail.com> Date: Mon, 18 Nov 2024 16:32:43 -0800 Subject: [PATCH v24 3/4] Refactor COPY FROM to use format callback functions. This commit introduces a new CopyFromRoutine struct, which is a set of callback routines to read tuples in a specific format. It also makes COPY FROM with the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Similar to XXXX, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed up to a 10% performance gain in text or CSV mode. Author: Sutou Kouhei Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com --- src/backend/commands/copyfrom.c | 201 ++++++++-- src/backend/commands/copyfromparse.c | 487 +++++++++++++---------- src/include/commands/copy.h | 2 - src/include/commands/copyapi.h | 44 +- src/include/commands/copyfrom_internal.h | 12 + src/tools/pgindent/typedefs.list | 1 + 6 files changed, 501 insertions(+), 246 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 754cb49616..e77986f9a9 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,157 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations for text and CSV. + */ + +/* + * CopyFromTextLikeInFunc + * + * Assign input function data for a relation's attribute in text/CSV format. + */ +static void +CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromTextLikeStart + * + * Start of COPY FROM for text/CSV format. + */ +static void +CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Create workspace for CopyReadAttributes results; used by CSV and text + * format. + */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +/* + * CopyFromTextLikeEnd + * + * End of COPY FROM for text/CSV format. + */ +static void +CopyFromTextLikeEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* + * CopyFromBinaryInFunc + * + * Assign input function data for a relation's attribute in binary format. + */ +static void +CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromBinaryStart + * + * Start of COPY FROM for binary format. + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * CopyFromBinaryEnd + * + * End of COPY FROM for binary format. + */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * Routines assigned to each format. ++ + * CSV and text share the same implementation, at the exception of the + * per-row callback. + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Define the COPY FROM routines to use for a format. + */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + + /* * error context callback for COPY FROM * @@ -1396,7 +1547,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1428,6 +1578,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1583,25 +1736,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1634,13 +1768,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1775,20 +1905,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1801,6 +1918,8 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d1d43b53d8..0447c4df7e 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -140,8 +140,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static bool CopyReadLine(CopyFromState cstate, bool is_csv); +static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -741,8 +741,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * * NOTE: force_not_null option are not applied to the returned fields. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static pg_attribute_always_inline bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) { int fldct; bool done; @@ -759,13 +759,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { int fldnum; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -809,7 +813,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -819,8 +823,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) if (done && cstate->line_buf.len == 0) return false; - /* Parse the line into de-escaped field values */ - if (cstate->opts.csv_mode) + /* + * Parse the line into de-escaped field values + * + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -831,233 +840,299 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) } /* - * Read next tuple from file for COPY FROM. Return false if no more tuples. + * CopyFromTextLikeOneRow * - * 'econtext' is used to evaluate default expression for each column that is - * either not read from the file or is using the DEFAULT option of COPY FROM. - * It can be NULL when no default values are used, i.e. when all columns are - * read from the file, and DEFAULT option is unset. + * Copy one row to a set of `values` and `nulls` for the text and CSV + * formats. * - * 'values' and 'nulls' arrays must be the same length as columns of the - * relation passed to BeginCopyFrom. This function fills the arrays. + * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). */ -bool -NextCopyFrom(CopyFromState cstate, ExprContext *econtext, - Datum *values, bool *nulls) +static pg_attribute_always_inline bool +CopyFromTextLikeOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls, + bool is_csv) { TupleDesc tupDesc; - AttrNumber num_phys_attrs, - attr_count, - num_defaults = cstate->num_defaults; + AttrNumber attr_count; FmgrInfo *in_functions = cstate->in_functions; Oid *typioparams = cstate->typioparams; - int i; - int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; tupDesc = RelationGetDescr(cstate->rel); - num_phys_attrs = tupDesc->natts; attr_count = list_length(cstate->attnumlist); - /* Initialize all values for row to NULL */ - MemSet(values, 0, num_phys_attrs * sizeof(Datum)); - MemSet(nulls, true, num_phys_attrs * sizeof(bool)); - MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv)) + return false; - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) + if (fieldno >= fldct) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } + /* ignore input field, leaving column as NULL */ + continue; + } - if (cstate->opts.csv_mode) + if (is_csv) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) { /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + string = NULL; } + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - cstate->num_errors++; + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information - * other than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below + * notice message, we suppress error context information other + * than the relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); - return true; + /* reset relname_only */ + cstate->relname_only = false; } - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; + return true; } - Assert(fieldno == attr_count); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - cstate->cur_lineno++; + Assert(fieldno == attr_count); - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } + return true; +} - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } +/* + * CopyFromTextOneRow + * + * Per-row callback for COPY FROM with text format. + */ +bool +CopyFromTextOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); +} + +/* + * CopyFromCSVOneRow + * + * Per-row callback for COPY FROM with CSV format. + */ +bool +CopyFromCSVOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); +} + +/* + * CopyFromBinaryOneRow + * + * Copy one row to a set of `values` and `nulls` for the binary format. + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; - if (fld_count != attr_count) + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); + errmsg("received copy data after EOF marker"))); + return false; + } - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; } + return true; +} + +/* + * Read next tuple from file for COPY FROM. Return false if no more tuples. + * + * 'econtext' is used to evaluate default expression for each column that is + * either not read from the file or is using the DEFAULT option of COPY FROM. + * It can be NULL when no default values are used, i.e. when all columns are + * read from the file, and DEFAULT option is unset. + * + * 'values' and 'nulls' arrays must be the same length as columns of the + * relation passed to BeginCopyFrom. This function fills the arrays. + */ +bool +NextCopyFrom(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber num_phys_attrs, + num_defaults = cstate->num_defaults; + int i; + int *defmap = cstate->defmap; + ExprState **defexprs = cstate->defexprs; + + tupDesc = RelationGetDescr(cstate->rel); + num_phys_attrs = tupDesc->natts; + + /* Initialize all values for row to NULL */ + MemSet(values, 0, num_phys_attrs * sizeof(Datum)); + MemSet(nulls, true, num_phys_attrs * sizeof(bool)); + MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); + + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; + /* * Now compute and insert any defaults available for the columns not * provided by the input data. Anything not processed here or above will @@ -1087,7 +1162,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * in the final value of line_buf. */ static bool -CopyReadLine(CopyFromState cstate) +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1095,7 +1170,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1162,8 +1237,8 @@ CopyReadLine(CopyFromState cstate) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static bool -CopyReadLineText(CopyFromState cstate) +static pg_attribute_always_inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1178,7 +1253,11 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1255,7 +1334,11 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant + * at caller. + */ + if (is_csv) { /* * If character is '\r', we may need to look ahead below. Force @@ -1294,7 +1377,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1322,10 +1405,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1339,10 +1422,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1350,15 +1433,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1370,7 +1453,7 @@ CopyReadLineText(CopyFromState cstate) * Process backslash, except in CSV mode where backslash is a normal * character. */ - if (c == '\\' && !cstate->opts.csv_mode) + if (c == '\\' && !is_csv) { char c2; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 4002a7f538..f2409013fb 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where extern void EndCopyFrom(CopyFromState cstate); extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); -extern bool NextCopyFromRawFields(CopyFromState cstate, - char ***fields, int *nfields); extern void CopyFromErrorCallback(void *arg); extern char *CopyLimitPrintoutLength(const char *str); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 99981b1579..224fda172e 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -1,7 +1,7 @@ /*------------------------------------------------------------------------- * * copyapi.h - * API for COPY TO handlers + * API for COPY TO/FROM handlers * * * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group @@ -56,4 +56,46 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated with the relation's attributes writing to. `finfo` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optionally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc78..c11b5ff3cc 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +59,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ @@ -183,4 +187,12 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +/* Callbacks for CopyFromRoutine->CopyFromOneRow */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index e3334d9485..7fab5c479e 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -492,6 +492,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice -- 2.43.5
From f75c34d7420ed7fc47be30e4ebfbff855d0cb2ff Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:24:49 +0900 Subject: [PATCH v24 1/4] Refactor COPY TO to use format callback functions. This commit introduces a new CopyToRoutine struct, which is a set of callback routines to copy tuples in a specific format. It also makes the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Additionally, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed up to a 5% performance gain in text or CSV mode. Author: Sutou Kouhei Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com --- src/backend/commands/copyto.c | 462 +++++++++++++++++++++---------- src/include/commands/copyapi.h | 58 ++++ src/tools/pgindent/typedefs.list | 1 + 3 files changed, 382 insertions(+), 139 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index f55e6d9675..46f3507a8b 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -64,6 +65,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -124,6 +128,317 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToTextLikeSendEndOfRow + * + * Apply line terminations for a line sent in text or CSV format depending + * on the destination, then send the end of a row. + */ +static pg_attribute_always_inline void +CopyToTextLikeSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextLikeStart + * + * Start of COPY TO for text and CSV format. + */ +static void +CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + ListCell *cur; + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false); + else + CopyAttributeOutText(cstate, colname); + } + + CopyToTextLikeSendEndOfRow(cstate); + } +} + +/* + * CopyToTextLikeOutFunc + * + * Assign output function data for a relation's attribute in text/CSV format. + */ +static void +CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + + +/* + * CopyToTextLikeOneRow + * + * Process one row for text/CSV format. + * + * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow(). + */ +static pg_attribute_always_inline void +CopyToTextLikeOneRow(CopyToState cstate, + TupleTableSlot *slot, + bool is_csv) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], + value); + + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); + } + } + + CopyToTextLikeSendEndOfRow(cstate); +} + +/* + * CopyToTextOneRow + * + * Per-row callback for COPY TO with text format. + */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* + * CopyToTextOneRow + * + * Per-row callback for COPY TO with CSV format. + */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} + +/* + * CopyToTextLikeEnd + * + * End of COPY TO for text/CSV format. + */ +static void +CopyToTextLikeEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* + * CopyToBinaryStart + * + * Start of COPY TO for binary format. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * CopyToBinaryOutFunc + * + * Assign output function data for a relation's attribute in binary format. + */ +static void +CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToBinaryOneRow + * + * Process one row for binary format. + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], + value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* + * CopyToBinaryEnd + * + * End of COPY TO for binary format. + */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CSV and text share the same implementation, at the exception of the + * output representation and per-row callbacks. + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Define the COPY TO routines to use for a format. This should be called + * after options are parsed. + */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -191,16 +506,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -235,10 +540,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -426,6 +727,9 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); + /* Process the source/target relation or query */ if (rel) { @@ -771,19 +1075,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -796,56 +1091,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1130,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -906,71 +1146,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - if (!cstate->opts.binary) - { - bool need_delim = false; - - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - char *string; - - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - - if (isnull) - CopySendString(cstate, cstate->opts.null_print_client); - else - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - else - CopyAttributeOutText(cstate, string); - } - } - } - else - { - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - bytea *outputbytes; - - if (isnull) - CopySendInt32(cstate, -1); - else - { - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..5ce24f195d --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,58 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* This is private in commands/copyto.c */ +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated with the relation's attributes reading from. `finfo` can be + * optionally filled to provide the catalog information of the output + * function. `atttypid` is the OID of data type used by the relation's + * attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 08521d51a9..e3334d9485 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -503,6 +503,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.43.5