On Mon, Nov 18, 2024 at 8:44 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Mon, Nov 18, 2024 at 5:31 PM Sutou Kouhei <k...@clear-code.com> wrote:
> >
> > Hi,
> >
> > In <CAD21AoC=DX5QQVb27C6UdpPfY-F=-PGnQ1u6rWo69DV=4et...@mail.gmail.com>
> >   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> > on Mon, 18 Nov 2024 17:02:41 -0800,
> >   Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > > I have a question about v22. We use pg_attribute_always_inline for
> > > some functions to avoid function call overheads. Applying it to
> > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as
> > > we've discussed. But there are more function where the patch applied
> > > it to:
> > >
> > > -bool
> > > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> > > +static pg_attribute_always_inline bool
> > > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int
> > > *nfields, bool is_csv)
> > >
> > > -static bool
> > > -CopyReadLineText(CopyFromState cstate)
> > > +static pg_attribute_always_inline bool
> > > +CopyReadLineText(CopyFromState cstate, bool is_csv)
> > >
> > > +static pg_attribute_always_inline void
> > > +CopyToTextLikeSendEndOfRow(CopyToState cstate)
> > >
> > > I think it's out of scope of this patch even if these changes are
> > > legitimate. Is there any reason for these changes?
> >
> > Yes for NextCopyFromRawFields() and CopyReadLineText().
> > No for CopyToTextLikeSendEndOfRow().
> >
> > NextCopyFromRawFields() and CopyReadLineText() have "bool
> > is_csv". So I think that we should use
> > pg_attribute_always_inline (or inline) like
> > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow(). I think
> > that it's not out of scope of this patch because it's a part
> > of CopyToTextLikeOneRow() and CopyFromTextLikeOneRow()
> > optimization.
> >
> > Note: The optimization is based on "bool is_csv" parameter
> > and constant "true"/"false" argument function call. If we
> > can inline this function call, all "if (is_csv)" checks in
> > the function are removed.
>
> Understood, thank you for pointing this out.
>
> >
> > pg_attribute_always_inline (or inline) for
> > CopyToTextLikeSendEndOfRow() is out of scope of this
> > patch. You're right.
> >
> > I think that inlining CopyToTextLikeSendEndOfRow() is better
> > because it's called per row. But it's not related to the
> > optimization.
> >
> >
> > Should I create a new patch set without
> > pg_attribute_always_inline/inline for
> > CopyToTextLikeSendEndOfRow()? Or could you remove it when
> > you push?
>
> Since I'm reviewing the patch and the patch organization I'll include it.
>

I've extracted the changes to refactor COPY TO/FROM to use the format
callback routines from v23 patch set, which seems to be a better patch
split to me. Also, I've reviewed these changes and made some changes
on top of them. The attached patches are:

0001: make COPY TO use CopyToRoutine.
0002: minor changes to 0001 patch. will be fixed up.
0003: make COPY FROM use CopyFromRoutine.
0004: minor changes to 0003 patch. will be fixed up.

I've confirmed that v24 has a similar performance improvement to v23.
Please check these extractions and minor change suggestions.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From 257a284447e64753277f7bc08b387e901bcab8bb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 Nov 2024 11:52:33 -0800
Subject: [PATCH v24 2/4] fixup: fixup: minor updates for COPY TO refactoring.

includes:

- reroder function definitions.
- clenaup comments.
---
 src/backend/commands/copyto.c  | 242 +++++++++++++++------------------
 src/include/commands/copyapi.h |  23 ++--
 2 files changed, 121 insertions(+), 144 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 46f3507a8b..73b9ca4457 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -65,7 +65,7 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
-	/* format routine */
+	/* format-specific routines */
 	const CopyToRoutine *routine;
 
 	/* low-level state data */
@@ -118,6 +118,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 								bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+								 bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
@@ -125,49 +138,55 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
 static void CopySendEndOfRow(CopyToState cstate);
+static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
 /*
- * CopyToRoutine implementations.
- */
-
-/*
- * CopyToTextLikeSendEndOfRow
+ * COPY TO routines for built-in formats.
  *
- * Apply line terminations for a line sent in text or CSV format depending
- * on the destination, then send the end of a row.
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
  */
-static pg_attribute_always_inline void
-CopyToTextLikeSendEndOfRow(CopyToState cstate)
+
+/* TEXT format */
+static const CopyToRoutine CopyToRoutineText = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToTextOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyToRoutine CopyToRoutineCSV = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToCSVOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* BINARY format */
+static const CopyToRoutine CopyToRoutineBinary = {
+	.CopyToStart = CopyToBinaryStart,
+	.CopyToOutFunc = CopyToBinaryOutFunc,
+	.CopyToOneRow = CopyToBinaryOneRow,
+	.CopyToEnd = CopyToBinaryEnd,
+};
+
+/* Return COPY TO routines for the given option */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
 {
-	switch (cstate->copy_dest)
-	{
-		case COPY_FILE:
-			/* Default line termination depends on platform */
-#ifndef WIN32
-			CopySendChar(cstate, '\n');
-#else
-			CopySendString(cstate, "\r\n");
-#endif
-			break;
-		case COPY_FRONTEND:
-			/* The FE/BE protocol uses \n as newline for all platforms */
-			CopySendChar(cstate, '\n');
-			break;
-		default:
-			break;
-	}
+	if (opts.csv_mode)
+		return &CopyToRoutineCSV;
+	else if (opts.binary)
+		return &CopyToRoutineBinary;
 
-	/* Now take the actions related to the end of a row */
-	CopySendEndOfRow(cstate);
+	/* default is text */
+	return &CopyToRoutineText;
 }
 
-/*
- * CopyToTextLikeStart
- *
- * Start of COPY TO for text and CSV format.
- */
+/* Implementation of the start callback for text and CSV formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
@@ -203,14 +222,13 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 				CopyAttributeOutText(cstate, colname);
 		}
 
-		CopyToTextLikeSendEndOfRow(cstate);
+		CopySendTextLikeEndOfRow(cstate);
 	}
 }
 
 /*
- * CopyToTextLikeOutFunc
- *
- * Assign output function data for a relation's attribute in text/CSV format.
+ * Implementation of the outfunc callback for text and CSV formats. Assign
+ * the output function data to the given *finfo.
  */
 static void
 CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
@@ -223,13 +241,24 @@ CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 	fmgr_info(func_oid, finfo);
 }
 
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	CopyToTextLikeOneRow(cstate, slot, true);
+}
 
 /*
- * CopyToTextLikeOneRow
- *
- * Process one row for text/CSV format.
- *
  * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
 static pg_attribute_always_inline void
 CopyToTextLikeOneRow(CopyToState cstate,
@@ -271,36 +300,10 @@ CopyToTextLikeOneRow(CopyToState cstate,
 		}
 	}
 
-	CopyToTextLikeSendEndOfRow(cstate);
+	CopySendTextLikeEndOfRow(cstate);
 }
 
-/*
- * CopyToTextOneRow
- *
- * Per-row callback for COPY TO with text format.
- */
-static void
-CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
-{
-	CopyToTextLikeOneRow(cstate, slot, false);
-}
-
-/*
- * CopyToTextOneRow
- *
- * Per-row callback for COPY TO with CSV format.
- */
-static void
-CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
-{
-	CopyToTextLikeOneRow(cstate, slot, true);
-}
-
-/*
- * CopyToTextLikeEnd
- *
- * End of COPY TO for text/CSV format.
- */
+/* Implementation of the end callback for text and CSV formats */
 static void
 CopyToTextLikeEnd(CopyToState cstate)
 {
@@ -308,18 +311,12 @@ CopyToTextLikeEnd(CopyToState cstate)
 }
 
 /*
- * CopyToRoutine implementation for "binary".
- */
-
-/*
- * CopyToBinaryStart
- *
- * Start of COPY TO for binary format.
+ * Implementation of the start callback for binary format. Send a header
+ * for a binary copy.
  */
 static void
 CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
 {
-	/* Generate header for a binary copy */
 	int32		tmp;
 
 	/* Signature */
@@ -333,9 +330,8 @@ CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
 }
 
 /*
- * CopyToBinaryOutFunc
- *
- * Assign output function data for a relation's attribute in binary format.
+ * Implementation of the outfunc callback for binary format. Assign
+ * the binary output function to the given *finfo.
  */
 static void
 CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
@@ -348,11 +344,7 @@ CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 	fmgr_info(func_oid, finfo);
 }
 
-/*
- * CopyToBinaryOneRow
- *
- * Process one row for binary format.
- */
+/* Implementation of the per-row callback for binary format */
 static void
 CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
 {
@@ -385,11 +377,7 @@ CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
 	CopySendEndOfRow(cstate);
 }
 
-/*
- * CopyToBinaryEnd
- *
- * End of COPY TO for binary format.
- */
+/* Implementation of the end callback for binary format */
 static void
 CopyToBinaryEnd(CopyToState cstate)
 {
@@ -399,47 +387,6 @@ CopyToBinaryEnd(CopyToState cstate)
 	CopySendEndOfRow(cstate);
 }
 
-/*
- * CSV and text share the same implementation, at the exception of the
- * output representation and per-row callbacks.
- */
-static const CopyToRoutine CopyToRoutineText = {
-	.CopyToStart = CopyToTextLikeStart,
-	.CopyToOutFunc = CopyToTextLikeOutFunc,
-	.CopyToOneRow = CopyToTextOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
-};
-
-static const CopyToRoutine CopyToRoutineCSV = {
-	.CopyToStart = CopyToTextLikeStart,
-	.CopyToOutFunc = CopyToTextLikeOutFunc,
-	.CopyToOneRow = CopyToCSVOneRow,
-	.CopyToEnd = CopyToTextLikeEnd,
-};
-
-static const CopyToRoutine CopyToRoutineBinary = {
-	.CopyToStart = CopyToBinaryStart,
-	.CopyToOutFunc = CopyToBinaryOutFunc,
-	.CopyToOneRow = CopyToBinaryOneRow,
-	.CopyToEnd = CopyToBinaryEnd,
-};
-
-/*
- * Define the COPY TO routines to use for a format.  This should be called
- * after options are parsed.
- */
-static const CopyToRoutine *
-CopyToGetRoutine(CopyFormatOptions opts)
-{
-	if (opts.csv_mode)
-		return &CopyToRoutineCSV;
-	else if (opts.binary)
-		return &CopyToRoutineBinary;
-
-	/* default is text */
-	return &CopyToRoutineText;
-}
-
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
  * in past protocol redesigns.
@@ -555,6 +502,35 @@ CopySendEndOfRow(CopyToState cstate)
 	resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * the line termination and do common appropriate things for the end of row.
+ */
+static inline void
+CopySendTextLikeEndOfRow(CopyToState cstate)
+{
+	switch (cstate->copy_dest)
+	{
+		case COPY_FILE:
+			/* Default line termination depends on platform */
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			break;
+		case COPY_FRONTEND:
+			/* The FE/BE protocol uses \n as newline for all platforms */
+			CopySendChar(cstate, '\n');
+			break;
+		default:
+			break;
+	}
+
+	/* Now take the actions related to the end of a row */
+	CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
@@ -1143,7 +1119,7 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
 	MemoryContext oldcontext;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 5ce24f195d..99981b1579 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -27,31 +27,32 @@ typedef struct CopyToStateData *CopyToState;
 typedef struct CopyToRoutine
 {
 	/*
-	 * Called when COPY TO is started to set up the output functions
-	 * associated with the relation's attributes reading from.  `finfo` can be
-	 * optionally filled to provide the catalog information of the output
-	 * function.  `atttypid` is the OID of data type used by the relation's
-	 * attribute.
+	 * Set output function information. This callback is called once at the
+	 * beginning of COPY TO.
+	 *
+	 * 'finfo' can be optionally filled to provide the catalog information of
+	 * the output function.
+	 *
+	 * 'atttypid' is the OID of data type used by the relation's attribute.
 	 */
 	void		(*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
 								  FmgrInfo *finfo);
 
 	/*
-	 * Called when COPY TO is started.
+	 * Start a COPY TO. This callback is called once at the beginning of COPY
+	 * FROM.
 	 *
-	 * `tupDesc` is the tuple descriptor of the relation from where the data
+	 * 'tupDesc' is the tuple descriptor of the relation from where the data
 	 * is read.
 	 */
 	void		(*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
 
 	/*
-	 * Copy one row for COPY TO.
-	 *
-	 * `slot` is the tuple slot where the data is emitted.
+	 * Write one row to the 'slot'.
 	 */
 	void		(*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
 
-	/* Called when COPY TO has ended */
+	/* End a COPY TO. This callback is called once at the end of COPY FROM */
 	void		(*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
-- 
2.43.5

From b6b5c0409eed0558320e39bd642b2be17f17f590 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 19 Nov 2024 13:46:06 -0800
Subject: [PATCH v24 4/4] fixup: minor updates for COPY FROM refactoring.

includes:

- cleanup comments.
- reorder function definitions.
---
 src/backend/commands/copyfrom.c          | 161 +++++++++++------------
 src/backend/commands/copyfromparse.c     |  78 +++++------
 src/include/commands/copyapi.h           |  26 ++--
 src/include/commands/copyfrom_internal.h |   2 +-
 4 files changed, 121 insertions(+), 146 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e77986f9a9..7f1de8a42b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,31 +106,65 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
-
 /*
- * CopyFromRoutine implementations for text and CSV.
+ * built-in format-specific routines. One-row callbacks are defined in
+ * copyfromparse.c
  */
+static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+								   Oid *typioparam);
+static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromTextLikeEnd(CopyFromState cstate);
+static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+								 FmgrInfo *finfo, Oid *typioparam);
+static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromBinaryEnd(CopyFromState cstate);
+
 
 /*
- * CopyFromTextLikeInFunc
- *
- * Assign input function data for a relation's attribute in text/CSV format.
+ * COPY FROM routines for built-in formats.
++
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
  */
-static void
-CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid,
-					   FmgrInfo *finfo, Oid *typioparam)
+
+/* TEXT format */
+static const CopyFromRoutine CopyFromRoutineText = {
+	.CopyFromInFunc = CopyFromTextLikeInFunc,
+	.CopyFromStart = CopyFromTextLikeStart,
+	.CopyFromOneRow = CopyFromTextOneRow,
+	.CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+	.CopyFromInFunc = CopyFromTextLikeInFunc,
+	.CopyFromStart = CopyFromTextLikeStart,
+	.CopyFromOneRow = CopyFromCSVOneRow,
+	.CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* BINARY format */
+static const CopyFromRoutine CopyFromRoutineBinary = {
+	.CopyFromInFunc = CopyFromBinaryInFunc,
+	.CopyFromStart = CopyFromBinaryStart,
+	.CopyFromOneRow = CopyFromBinaryOneRow,
+	.CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/* Return COPY FROM routines for the given option */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
 {
-	Oid			func_oid;
+	if (opts.csv_mode)
+		return &CopyFromRoutineCSV;
+	else if (opts.binary)
+		return &CopyFromRoutineBinary;
 
-	getTypeInputInfo(atttypid, &func_oid, typioparam);
-	fmgr_info(func_oid, finfo);
+	/* default is text */
+	return &CopyFromRoutineText;
 }
 
-/*
- * CopyFromTextLikeStart
- *
- * Start of COPY FROM for text/CSV format.
- */
+/* Implementation of the start callback for text and CSV formats */
 static void
 CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
 {
@@ -162,24 +196,37 @@ CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
 }
 
 /*
- * CopyFromTextLikeEnd
- *
- * End of COPY FROM for text/CSV format.
+ * Implementation of the infunc callback for text and CSV formats. Assign
+ * the input function data to the given *finfo.
  */
 static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+					   Oid *typioparam)
+{
+	Oid			func_oid;
+
+	getTypeInputInfo(atttypid, &func_oid, typioparam);
+	fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
 CopyFromTextLikeEnd(CopyFromState cstate)
 {
 	/* nothing to do */
 }
 
-/*
- * CopyFromRoutine implementation for "binary".
- */
+/* Implementation of the start callback for binary format */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+	/* Read and verify binary header */
+	ReceiveCopyBinaryHeader(cstate);
+}
 
 /*
- * CopyFromBinaryInFunc
- *
- * Assign input function data for a relation's attribute in binary format.
+ * Implementation of the infunc callback for binary format. Assign
+ * the binary input function to the given *finfo.
  */
 static void
 CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
@@ -191,72 +238,13 @@ CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
 	fmgr_info(func_oid, finfo);
 }
 
-/*
- * CopyFromBinaryStart
- *
- * Start of COPY FROM for binary format.
- */
-static void
-CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
-{
-	/* Read and verify binary header */
-	ReceiveCopyBinaryHeader(cstate);
-}
-
-/*
- * CopyFromBinaryEnd
- *
- * End of COPY FROM for binary format.
- */
+/* Implementation of the end callback for binary format */
 static void
 CopyFromBinaryEnd(CopyFromState cstate)
 {
 	/* nothing to do */
 }
 
-/*
- * Routines assigned to each format.
-+
- * CSV and text share the same implementation, at the exception of the
- * per-row callback.
- */
-static const CopyFromRoutine CopyFromRoutineText = {
-	.CopyFromInFunc = CopyFromTextLikeInFunc,
-	.CopyFromStart = CopyFromTextLikeStart,
-	.CopyFromOneRow = CopyFromTextOneRow,
-	.CopyFromEnd = CopyFromTextLikeEnd,
-};
-
-static const CopyFromRoutine CopyFromRoutineCSV = {
-	.CopyFromInFunc = CopyFromTextLikeInFunc,
-	.CopyFromStart = CopyFromTextLikeStart,
-	.CopyFromOneRow = CopyFromCSVOneRow,
-	.CopyFromEnd = CopyFromTextLikeEnd,
-};
-
-static const CopyFromRoutine CopyFromRoutineBinary = {
-	.CopyFromInFunc = CopyFromBinaryInFunc,
-	.CopyFromStart = CopyFromBinaryStart,
-	.CopyFromOneRow = CopyFromBinaryOneRow,
-	.CopyFromEnd = CopyFromBinaryEnd,
-};
-
-/*
- * Define the COPY FROM routines to use for a format.
- */
-static const CopyFromRoutine *
-CopyFromGetRoutine(CopyFormatOptions opts)
-{
-	if (opts.csv_mode)
-		return &CopyFromRoutineCSV;
-	else if (opts.binary)
-		return &CopyFromRoutineBinary;
-
-	/* default is text */
-	return &CopyFromRoutineText;
-}
-
-
 /*
  * error context callback for COPY FROM
  *
@@ -1578,7 +1566,7 @@ BeginCopyFrom(ParseState *pstate,
 	/* Extract options from the statement node tree */
 	ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
-	/* Set format routine */
+	/* Set the format routine */
 	cstate->routine = CopyFromGetRoutine(cstate->opts);
 
 	/* Process the target relation */
@@ -1918,6 +1906,7 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	/* Invoke the end callback */
 	cstate->routine->CopyFromEnd(cstate);
 
 	/* No COPY FROM related resources except memory. */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0447c4df7e..5416583e94 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -141,12 +141,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 									 Oid typioparam, int32 typmod,
 									 bool *isnull);
+static bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+								   Datum *values, bool *nulls, bool is_csv);
 
 
 /* Low-level communications functions */
@@ -740,6 +742,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
 static pg_attribute_always_inline bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
@@ -839,20 +843,30 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i
 	return true;
 }
 
+/* Implementation of the per-row callback for text format */
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+				   bool *nulls)
+{
+	return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+				  bool *nulls)
+{
+	return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
 /*
- * CopyFromTextLikeOneRow
- *
- * Copy one row to a set of `values` and `nulls` for the text and CSV
- * formats.
- *
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
 static pg_attribute_always_inline bool
-CopyFromTextLikeOneRow(CopyFromState cstate,
-					   ExprContext *econtext,
-					   Datum *values,
-					   bool *nulls,
-					   bool is_csv)
+CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+					   Datum *values, bool *nulls, bool is_csv)
 {
 	TupleDesc	tupDesc;
 	AttrNumber	attr_count;
@@ -1001,43 +1015,10 @@ CopyFromTextLikeOneRow(CopyFromState cstate,
 	return true;
 }
 
-
-/*
- * CopyFromTextOneRow
- *
- * Per-row callback for COPY FROM with text format.
- */
-bool
-CopyFromTextOneRow(CopyFromState cstate,
-				   ExprContext *econtext,
-				   Datum *values,
-				   bool *nulls)
-{
-	return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
-}
-
-/*
- * CopyFromCSVOneRow
- *
- * Per-row callback for COPY FROM with CSV format.
- */
-bool
-CopyFromCSVOneRow(CopyFromState cstate,
-				  ExprContext *econtext,
-				  Datum *values,
-				  bool *nulls)
-{
-	return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
-}
-
-/*
- * CopyFromBinaryOneRow
- *
- * Copy one row to a set of `values` and `nulls` for the binary format.
- */
+/* Implementation of the per-row callback for binary format */
 bool
-CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
-					 Datum *values, bool *nulls)
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+					 bool *nulls)
 {
 	TupleDesc	tupDesc;
 	AttrNumber	attr_count;
@@ -1130,6 +1111,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
 	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
+	/* Get one row from source */
 	if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
 		return false;
 
@@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static pg_attribute_always_inline bool
+static bool
 CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
 	char	   *copy_input_buf;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 224fda172e..ff269def9d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -63,38 +63,42 @@ typedef struct CopyToRoutine
 typedef struct CopyFromRoutine
 {
 	/*
-	 * Called when COPY FROM is started to set up the input functions
-	 * associated with the relation's attributes writing to.  `finfo` can be
-	 * optionally filled to provide the catalog information of the input
-	 * function.  `typioparam` can be optionally filled to define the OID of
-	 * the type to pass to the input function.	`atttypid` is the OID of data
-	 * type used by the relation's attribute.
+	 * Set input function information. This callback is called once at the
+	 * beginning of COPY FROM.
+	 *
+	 * 'finfo' can be optionally filled to provide the catalog information of
+	 * the input function.
+	 *
+	 * 'typioparam' can be optionally filled to define the OID of the type to
+	 * pass to the input function.'atttypid' is the OID of data type used by
+	 * the relation's attribute.
 	 */
 	void		(*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
 								   FmgrInfo *finfo, Oid *typioparam);
 
 	/*
-	 * Called when COPY FROM is started.
+	 * Start a COPY FROM. This callback is called once at the beginning of
+	 * COPY FROM.
 	 *
-	 * `tupDesc` is the tuple descriptor of the relation where the data needs
+	 * 'tupDesc' is the tuple descriptor of the relation where the data needs
 	 * to be copied.  This can be used for any initialization steps required
 	 * by a format.
 	 */
 	void		(*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
 
 	/*
-	 * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+	 * Read one row from the source and fill *values and *nulls.
 	 *
 	 * 'econtext' is used to evaluate default expression for each column that
 	 * is either not read from the file or is using the DEFAULT option of COPY
 	 * FROM.  It is NULL if no default values are used.
 	 *
-	 * Returns false if there are no more tuples to copy.
+	 * Returns false if there are no more tuples to read.
 	 */
 	bool		(*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
 								   Datum *values, bool *nulls);
 
-	/* Called when COPY FROM has ended. */
+	/* End a COPY FROM. This callback is called once at the end of COPY FROM */
 	void		(*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c11b5ff3cc..55fe24d728 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -187,7 +187,7 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-/* Callbacks for CopyFromRoutine->CopyFromOneRow */
+/* One-row callbacks for built-in formats defined in copyfromparse.c */
 extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
 							   Datum *values, bool *nulls);
 extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
-- 
2.43.5

From e1f2e5f906443487229b4c6aa664bfa9e3c7fbdc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 18 Nov 2024 16:32:43 -0800
Subject: [PATCH v24 3/4] Refactor COPY FROM to use format callback functions.

This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.

This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.

Similar to XXXX, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed up to a 10%
performance gain in text or CSV mode.

Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyfrom.c          | 201 ++++++++--
 src/backend/commands/copyfromparse.c     | 487 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           |  44 +-
 src/include/commands/copyfrom_internal.h |  12 +
 src/tools/pgindent/typedefs.list         |   1 +
 6 files changed, 501 insertions(+), 246 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 754cb49616..e77986f9a9 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,157 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations for text and CSV.
+ */
+
+/*
+ * CopyFromTextLikeInFunc
+ *
+ * Assign input function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid,
+					   FmgrInfo *finfo, Oid *typioparam)
+{
+	Oid			func_oid;
+
+	getTypeInputInfo(atttypid, &func_oid, typioparam);
+	fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromTextLikeStart
+ *
+ * Start of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+	AttrNumber	attr_count;
+
+	/*
+	 * If encoding conversion is needed, we need another buffer to hold the
+	 * converted input data.  Otherwise, we can just point input_buf to the
+	 * same buffer as raw_buf.
+	 */
+	if (cstate->need_transcoding)
+	{
+		cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+		cstate->input_buf_index = cstate->input_buf_len = 0;
+	}
+	else
+		cstate->input_buf = cstate->raw_buf;
+	cstate->input_reached_eof = false;
+
+	initStringInfo(&cstate->line_buf);
+
+	/*
+	 * Create workspace for CopyReadAttributes results; used by CSV and text
+	 * format.
+	 */
+	attr_count = list_length(cstate->attnumlist);
+	cstate->max_fields = attr_count;
+	cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * CopyFromTextLikeEnd
+ *
+ * End of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+	/* nothing to do */
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * CopyFromBinaryInFunc
+ *
+ * Assign input function data for a relation's attribute in binary format.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+					 FmgrInfo *finfo, Oid *typioparam)
+{
+	Oid			func_oid;
+
+	getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+	fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+	/* Read and verify binary header */
+	ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+	/* nothing to do */
+}
+
+/*
+ * Routines assigned to each format.
++
+ * CSV and text share the same implementation, at the exception of the
+ * per-row callback.
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+	.CopyFromInFunc = CopyFromTextLikeInFunc,
+	.CopyFromStart = CopyFromTextLikeStart,
+	.CopyFromOneRow = CopyFromTextOneRow,
+	.CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineCSV = {
+	.CopyFromInFunc = CopyFromTextLikeInFunc,
+	.CopyFromStart = CopyFromTextLikeStart,
+	.CopyFromOneRow = CopyFromCSVOneRow,
+	.CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+	.CopyFromInFunc = CopyFromBinaryInFunc,
+	.CopyFromStart = CopyFromBinaryStart,
+	.CopyFromOneRow = CopyFromBinaryOneRow,
+	.CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Define the COPY FROM routines to use for a format.
+ */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+	if (opts.csv_mode)
+		return &CopyFromRoutineCSV;
+	else if (opts.binary)
+		return &CopyFromRoutineBinary;
+
+	/* default is text */
+	return &CopyFromRoutineText;
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1396,7 +1547,6 @@ BeginCopyFrom(ParseState *pstate,
 				num_defaults;
 	FmgrInfo   *in_functions;
 	Oid		   *typioparams;
-	Oid			in_func_oid;
 	int		   *defmap;
 	ExprState **defexprs;
 	MemoryContext oldcontext;
@@ -1428,6 +1578,9 @@ BeginCopyFrom(ParseState *pstate,
 	/* Extract options from the statement node tree */
 	ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+	/* Set format routine */
+	cstate->routine = CopyFromGetRoutine(cstate->opts);
+
 	/* Process the target relation */
 	cstate->rel = rel;
 
@@ -1583,25 +1736,6 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 	cstate->raw_reached_eof = false;
 
-	if (!cstate->opts.binary)
-	{
-		/*
-		 * If encoding conversion is needed, we need another buffer to hold
-		 * the converted input data.  Otherwise, we can just point input_buf
-		 * to the same buffer as raw_buf.
-		 */
-		if (cstate->need_transcoding)
-		{
-			cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-			cstate->input_buf_index = cstate->input_buf_len = 0;
-		}
-		else
-			cstate->input_buf = cstate->raw_buf;
-		cstate->input_reached_eof = false;
-
-		initStringInfo(&cstate->line_buf);
-	}
-
 	initStringInfo(&cstate->attribute_buf);
 
 	/* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1634,13 +1768,9 @@ BeginCopyFrom(ParseState *pstate,
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (cstate->opts.binary)
-			getTypeBinaryInputInfo(att->atttypid,
-								   &in_func_oid, &typioparams[attnum - 1]);
-		else
-			getTypeInputInfo(att->atttypid,
-							 &in_func_oid, &typioparams[attnum - 1]);
-		fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+		cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+										&in_functions[attnum - 1],
+										&typioparams[attnum - 1]);
 
 		/* Get default info if available */
 		defexprs[attnum - 1] = NULL;
@@ -1775,20 +1905,7 @@ BeginCopyFrom(ParseState *pstate,
 
 	pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-	if (cstate->opts.binary)
-	{
-		/* Read and verify binary header */
-		ReceiveCopyBinaryHeader(cstate);
-	}
-
-	/* create workspace for CopyReadAttributes results */
-	if (!cstate->opts.binary)
-	{
-		AttrNumber	attr_count = list_length(cstate->attnumlist);
-
-		cstate->max_fields = attr_count;
-		cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-	}
+	cstate->routine->CopyFromStart(cstate, tupDesc);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1801,6 +1918,8 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+	cstate->routine->CopyFromEnd(cstate);
+
 	/* No COPY FROM related resources except memory. */
 	if (cstate->is_program)
 	{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d8..0447c4df7e 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -140,8 +140,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -741,8 +741,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static pg_attribute_always_inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
 	int			fldct;
 	bool		done;
@@ -759,13 +759,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 		tupDesc = RelationGetDescr(cstate->rel);
 
 		cstate->cur_lineno++;
-		done = CopyReadLine(cstate);
+		done = CopyReadLine(cstate, is_csv);
 
 		if (cstate->opts.header_line == COPY_HEADER_MATCH)
 		{
 			int			fldnum;
 
-			if (cstate->opts.csv_mode)
+			/*
+			 * is_csv will be optimized away by compiler, as argument is
+			 * constant at caller.
+			 */
+			if (is_csv)
 				fldct = CopyReadAttributesCSV(cstate);
 			else
 				fldct = CopyReadAttributesText(cstate);
@@ -809,7 +813,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	cstate->cur_lineno++;
 
 	/* Actually read the line into memory here */
-	done = CopyReadLine(cstate);
+	done = CopyReadLine(cstate, is_csv);
 
 	/*
 	 * EOF at start of line means we're done.  If we see EOF after some
@@ -819,8 +823,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 	if (done && cstate->line_buf.len == 0)
 		return false;
 
-	/* Parse the line into de-escaped field values */
-	if (cstate->opts.csv_mode)
+	/*
+	 * Parse the line into de-escaped field values
+	 *
+	 * is_csv will be optimized away by compiler, as argument is constant at
+	 * caller.
+	 */
+	if (is_csv)
 		fldct = CopyReadAttributesCSV(cstate);
 	else
 		fldct = CopyReadAttributesText(cstate);
@@ -831,233 +840,299 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 }
 
 /*
- * Read next tuple from file for COPY FROM. Return false if no more tuples.
+ * CopyFromTextLikeOneRow
  *
- * 'econtext' is used to evaluate default expression for each column that is
- * either not read from the file or is using the DEFAULT option of COPY FROM.
- * It can be NULL when no default values are used, i.e. when all columns are
- * read from the file, and DEFAULT option is unset.
+ * Copy one row to a set of `values` and `nulls` for the text and CSV
+ * formats.
  *
- * 'values' and 'nulls' arrays must be the same length as columns of the
- * relation passed to BeginCopyFrom. This function fills the arrays.
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  */
-bool
-NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
-			 Datum *values, bool *nulls)
+static pg_attribute_always_inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate,
+					   ExprContext *econtext,
+					   Datum *values,
+					   bool *nulls,
+					   bool is_csv)
 {
 	TupleDesc	tupDesc;
-	AttrNumber	num_phys_attrs,
-				attr_count,
-				num_defaults = cstate->num_defaults;
+	AttrNumber	attr_count;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
-	int			i;
-	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
+	char	  **field_strings;
+	ListCell   *cur;
+	int			fldct;
+	int			fieldno;
+	char	   *string;
 
 	tupDesc = RelationGetDescr(cstate->rel);
-	num_phys_attrs = tupDesc->natts;
 	attr_count = list_length(cstate->attnumlist);
 
-	/* Initialize all values for row to NULL */
-	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
-	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
-	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
+	/* read raw fields in the next line */
+	if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+		return false;
 
-	if (!cstate->opts.binary)
-	{
-		char	  **field_strings;
-		ListCell   *cur;
-		int			fldct;
-		int			fieldno;
-		char	   *string;
+	/* check for overflowing fields */
+	if (attr_count > 0 && fldct > attr_count)
+		ereport(ERROR,
+				(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+				 errmsg("extra data after last expected column")));
 
-		/* read raw fields in the next line */
-		if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-			return false;
+	fieldno = 0;
+
+	/* Loop to read the user attributes on the line. */
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		int			m = attnum - 1;
+		Form_pg_attribute att = TupleDescAttr(tupDesc, m);
 
-		/* check for overflowing fields */
-		if (attr_count > 0 && fldct > attr_count)
+		if (fieldno >= fldct)
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("extra data after last expected column")));
-
-		fieldno = 0;
+					 errmsg("missing data for column \"%s\"",
+							NameStr(att->attname))));
+		string = field_strings[fieldno++];
 
-		/* Loop to read the user attributes on the line. */
-		foreach(cur, cstate->attnumlist)
+		if (cstate->convert_select_flags &&
+			!cstate->convert_select_flags[m])
 		{
-			int			attnum = lfirst_int(cur);
-			int			m = attnum - 1;
-			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-			if (fieldno >= fldct)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("missing data for column \"%s\"",
-								NameStr(att->attname))));
-			string = field_strings[fieldno++];
-
-			if (cstate->convert_select_flags &&
-				!cstate->convert_select_flags[m])
-			{
-				/* ignore input field, leaving column as NULL */
-				continue;
-			}
+			/* ignore input field, leaving column as NULL */
+			continue;
+		}
 
-			if (cstate->opts.csv_mode)
+		if (is_csv)
+		{
+			if (string == NULL &&
+				cstate->opts.force_notnull_flags[m])
 			{
-				if (string == NULL &&
-					cstate->opts.force_notnull_flags[m])
-				{
-					/*
-					 * FORCE_NOT_NULL option is set and column is NULL -
-					 * convert it to the NULL string.
-					 */
-					string = cstate->opts.null_print;
-				}
-				else if (string != NULL && cstate->opts.force_null_flags[m]
-						 && strcmp(string, cstate->opts.null_print) == 0)
-				{
-					/*
-					 * FORCE_NULL option is set and column matches the NULL
-					 * string. It must have been quoted, or otherwise the
-					 * string would already have been set to NULL. Convert it
-					 * to NULL as specified.
-					 */
-					string = NULL;
-				}
+				/*
+				 * FORCE_NOT_NULL option is set and column is NULL - convert
+				 * it to the NULL string.
+				 */
+				string = cstate->opts.null_print;
 			}
-
-			cstate->cur_attname = NameStr(att->attname);
-			cstate->cur_attval = string;
-
-			if (string != NULL)
-				nulls[m] = false;
-
-			if (cstate->defaults[m])
+			else if (string != NULL && cstate->opts.force_null_flags[m]
+					 && strcmp(string, cstate->opts.null_print) == 0)
 			{
 				/*
-				 * The caller must supply econtext and have switched into the
-				 * per-tuple memory context in it.
+				 * FORCE_NULL option is set and column matches the NULL
+				 * string. It must have been quoted, or otherwise the string
+				 * would already have been set to NULL. Convert it to NULL as
+				 * specified.
 				 */
-				Assert(econtext != NULL);
-				Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-				values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+				string = NULL;
 			}
+		}
+
+		cstate->cur_attname = NameStr(att->attname);
+		cstate->cur_attval = string;
 
+		if (string != NULL)
+			nulls[m] = false;
+
+		if (cstate->defaults[m])
+		{
 			/*
-			 * If ON_ERROR is specified with IGNORE, skip rows with soft
-			 * errors
+			 * The caller must supply econtext and have switched into the
+			 * per-tuple memory context in it.
 			 */
-			else if (!InputFunctionCallSafe(&in_functions[m],
-											string,
-											typioparams[m],
-											att->atttypmod,
-											(Node *) cstate->escontext,
-											&values[m]))
-			{
-				Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+			Assert(econtext != NULL);
+			Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
 
-				cstate->num_errors++;
+			values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+		}
 
-				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-				{
-					/*
-					 * Since we emit line number and column info in the below
-					 * notice message, we suppress error context information
-					 * other than the relation name.
-					 */
-					Assert(!cstate->relname_only);
-					cstate->relname_only = true;
+		/*
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 */
+		else if (!InputFunctionCallSafe(&in_functions[m],
+										string,
+										typioparams[m],
+										att->atttypmod,
+										(Node *) cstate->escontext,
+										&values[m]))
+		{
+			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
-					if (cstate->cur_attval)
-					{
-						char	   *attval;
-
-						attval = CopyLimitPrintoutLength(cstate->cur_attval);
-						ereport(NOTICE,
-								errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
-									   (unsigned long long) cstate->cur_lineno,
-									   cstate->cur_attname,
-									   attval));
-						pfree(attval);
-					}
-					else
-						ereport(NOTICE,
-								errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input",
-									   (unsigned long long) cstate->cur_lineno,
-									   cstate->cur_attname));
-
-					/* reset relname_only */
-					cstate->relname_only = false;
+			cstate->num_errors++;
+
+			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			{
+				/*
+				 * Since we emit line number and column info in the below
+				 * notice message, we suppress error context information other
+				 * than the relation name.
+				 */
+				Assert(!cstate->relname_only);
+				cstate->relname_only = true;
+
+				if (cstate->cur_attval)
+				{
+					char	   *attval;
+
+					attval = CopyLimitPrintoutLength(cstate->cur_attval);
+					ereport(NOTICE,
+							errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+								   (unsigned long long) cstate->cur_lineno,
+								   cstate->cur_attname,
+								   attval));
+					pfree(attval);
 				}
+				else
+					ereport(NOTICE,
+							errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input",
+								   (unsigned long long) cstate->cur_lineno,
+								   cstate->cur_attname));
 
-				return true;
+				/* reset relname_only */
+				cstate->relname_only = false;
 			}
 
-			cstate->cur_attname = NULL;
-			cstate->cur_attval = NULL;
+			return true;
 		}
 
-		Assert(fieldno == attr_count);
+		cstate->cur_attname = NULL;
+		cstate->cur_attval = NULL;
 	}
-	else
-	{
-		/* binary */
-		int16		fld_count;
-		ListCell   *cur;
 
-		cstate->cur_lineno++;
+	Assert(fieldno == attr_count);
 
-		if (!CopyGetInt16(cstate, &fld_count))
-		{
-			/* EOF detected (end of file, or protocol-level EOF) */
-			return false;
-		}
+	return true;
+}
 
-		if (fld_count == -1)
-		{
-			/*
-			 * Received EOF marker.  Wait for the protocol-level EOF, and
-			 * complain if it doesn't come immediately.  In COPY FROM STDIN,
-			 * this ensures that we correctly handle CopyFail, if client
-			 * chooses to send that now.  When copying from file, we could
-			 * ignore the rest of the file like in text mode, but we choose to
-			 * be consistent with the COPY FROM STDIN case.
-			 */
-			char		dummy;
 
-			if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 errmsg("received copy data after EOF marker")));
-			return false;
-		}
+/*
+ * CopyFromTextOneRow
+ *
+ * Per-row callback for COPY FROM with text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+				   ExprContext *econtext,
+				   Datum *values,
+				   bool *nulls)
+{
+	return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Per-row callback for COPY FROM with CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+				  ExprContext *econtext,
+				  Datum *values,
+				  bool *nulls)
+{
+	return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/*
+ * CopyFromBinaryOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the binary format.
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+					 Datum *values, bool *nulls)
+{
+	TupleDesc	tupDesc;
+	AttrNumber	attr_count;
+	FmgrInfo   *in_functions = cstate->in_functions;
+	Oid		   *typioparams = cstate->typioparams;
+	int16		fld_count;
+	ListCell   *cur;
+
+	tupDesc = RelationGetDescr(cstate->rel);
+	attr_count = list_length(cstate->attnumlist);
+
+	cstate->cur_lineno++;
 
-		if (fld_count != attr_count)
+	if (!CopyGetInt16(cstate, &fld_count))
+	{
+		/* EOF detected (end of file, or protocol-level EOF) */
+		return false;
+	}
+
+	if (fld_count == -1)
+	{
+		/*
+		 * Received EOF marker.  Wait for the protocol-level EOF, and complain
+		 * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+		 * that we correctly handle CopyFail, if client chooses to send that
+		 * now.  When copying from file, we could ignore the rest of the file
+		 * like in text mode, but we choose to be consistent with the COPY
+		 * FROM STDIN case.
+		 */
+		char		dummy;
+
+		if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
 			ereport(ERROR,
 					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-					 errmsg("row field count is %d, expected %d",
-							(int) fld_count, attr_count)));
+					 errmsg("received copy data after EOF marker")));
+		return false;
+	}
 
-		foreach(cur, cstate->attnumlist)
-		{
-			int			attnum = lfirst_int(cur);
-			int			m = attnum - 1;
-			Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-			cstate->cur_attname = NameStr(att->attname);
-			values[m] = CopyReadBinaryAttribute(cstate,
-												&in_functions[m],
-												typioparams[m],
-												att->atttypmod,
-												&nulls[m]);
-			cstate->cur_attname = NULL;
-		}
+	if (fld_count != attr_count)
+		ereport(ERROR,
+				(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+				 errmsg("row field count is %d, expected %d",
+						(int) fld_count, attr_count)));
+
+	foreach(cur, cstate->attnumlist)
+	{
+		int			attnum = lfirst_int(cur);
+		int			m = attnum - 1;
+		Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+		cstate->cur_attname = NameStr(att->attname);
+		values[m] = CopyReadBinaryAttribute(cstate,
+											&in_functions[m],
+											typioparams[m],
+											att->atttypmod,
+											&nulls[m]);
+		cstate->cur_attname = NULL;
 	}
 
+	return true;
+}
+
+/*
+ * Read next tuple from file for COPY FROM. Return false if no more tuples.
+ *
+ * 'econtext' is used to evaluate default expression for each column that is
+ * either not read from the file or is using the DEFAULT option of COPY FROM.
+ * It can be NULL when no default values are used, i.e. when all columns are
+ * read from the file, and DEFAULT option is unset.
+ *
+ * 'values' and 'nulls' arrays must be the same length as columns of the
+ * relation passed to BeginCopyFrom. This function fills the arrays.
+ */
+bool
+NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+			 Datum *values, bool *nulls)
+{
+	TupleDesc	tupDesc;
+	AttrNumber	num_phys_attrs,
+				num_defaults = cstate->num_defaults;
+	int			i;
+	int		   *defmap = cstate->defmap;
+	ExprState **defexprs = cstate->defexprs;
+
+	tupDesc = RelationGetDescr(cstate->rel);
+	num_phys_attrs = tupDesc->natts;
+
+	/* Initialize all values for row to NULL */
+	MemSet(values, 0, num_phys_attrs * sizeof(Datum));
+	MemSet(nulls, true, num_phys_attrs * sizeof(bool));
+	MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
+
+	if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+		return false;
+
 	/*
 	 * Now compute and insert any defaults available for the columns not
 	 * provided by the input data.  Anything not processed here or above will
@@ -1087,7 +1162,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
 	bool		result;
 
@@ -1095,7 +1170,7 @@ CopyReadLine(CopyFromState cstate)
 	cstate->line_buf_valid = false;
 
 	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate);
+	result = CopyReadLineText(cstate, is_csv);
 
 	if (result)
 	{
@@ -1162,8 +1237,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1178,7 +1253,11 @@ CopyReadLineText(CopyFromState cstate)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
-	if (cstate->opts.csv_mode)
+	/*
+	 * is_csv will be optimized away by compiler, as argument is constant at
+	 * caller.
+	 */
+	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
 		escapec = cstate->opts.escape[0];
@@ -1255,7 +1334,11 @@ CopyReadLineText(CopyFromState cstate)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
-		if (cstate->opts.csv_mode)
+		/*
+		 * is_csv will be optimized away by compiler, as argument is constant
+		 * at caller.
+		 */
+		if (is_csv)
 		{
 			/*
 			 * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1377,7 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \r */
-		if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\r' && (!is_csv || !in_quote))
 		{
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1405,10 @@ CopyReadLineText(CopyFromState cstate)
 					if (cstate->eol_type == EOL_CRNL)
 						ereport(ERROR,
 								(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-								 !cstate->opts.csv_mode ?
+								 !is_csv ?
 								 errmsg("literal carriage return found in data") :
 								 errmsg("unquoted carriage return found in data"),
-								 !cstate->opts.csv_mode ?
+								 !is_csv ?
 								 errhint("Use \"\\r\" to represent carriage return.") :
 								 errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1422,10 @@ CopyReadLineText(CopyFromState cstate)
 			else if (cstate->eol_type == EOL_NL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 !is_csv ?
 						 errmsg("literal carriage return found in data") :
 						 errmsg("unquoted carriage return found in data"),
-						 !cstate->opts.csv_mode ?
+						 !is_csv ?
 						 errhint("Use \"\\r\" to represent carriage return.") :
 						 errhint("Use quoted CSV field to represent carriage return.")));
 			/* If reach here, we have found the line terminator */
@@ -1350,15 +1433,15 @@ CopyReadLineText(CopyFromState cstate)
 		}
 
 		/* Process \n */
-		if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+		if (c == '\n' && (!is_csv || !in_quote))
 		{
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-						 !cstate->opts.csv_mode ?
+						 !is_csv ?
 						 errmsg("literal newline found in data") :
 						 errmsg("unquoted newline found in data"),
-						 !cstate->opts.csv_mode ?
+						 !is_csv ?
 						 errhint("Use \"\\n\" to represent newline.") :
 						 errhint("Use quoted CSV field to represent newline.")));
 			cstate->eol_type = EOL_NL;	/* in case not set yet */
@@ -1370,7 +1453,7 @@ CopyReadLineText(CopyFromState cstate)
 		 * Process backslash, except in CSV mode where backslash is a normal
 		 * character.
 		 */
-		if (c == '\\' && !cstate->opts.csv_mode)
+		if (c == '\\' && !is_csv)
 		{
 			char		c2;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f538..f2409013fb 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 						 Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-								  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 99981b1579..224fda172e 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *	  API for COPY TO handlers
+ *	  API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
@@ -56,4 +56,46 @@ typedef struct CopyToRoutine
 	void		(*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.	 Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+	/*
+	 * Called when COPY FROM is started to set up the input functions
+	 * associated with the relation's attributes writing to.  `finfo` can be
+	 * optionally filled to provide the catalog information of the input
+	 * function.  `typioparam` can be optionally filled to define the OID of
+	 * the type to pass to the input function.	`atttypid` is the OID of data
+	 * type used by the relation's attribute.
+	 */
+	void		(*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+								   FmgrInfo *finfo, Oid *typioparam);
+
+	/*
+	 * Called when COPY FROM is started.
+	 *
+	 * `tupDesc` is the tuple descriptor of the relation where the data needs
+	 * to be copied.  This can be used for any initialization steps required
+	 * by a format.
+	 */
+	void		(*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+	/*
+	 * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+	 *
+	 * 'econtext' is used to evaluate default expression for each column that
+	 * is either not read from the file or is using the DEFAULT option of COPY
+	 * FROM.  It is NULL if no default values are used.
+	 *
+	 * Returns false if there are no more tuples to copy.
+	 */
+	bool		(*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+								   Datum *values, bool *nulls);
+
+	/* Called when COPY FROM has ended. */
+	void		(*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 #endif							/* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..c11b5ff3cc 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,6 +15,7 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +59,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+	/* format routine */
+	const CopyFromRoutine *routine;
+
 	/* low-level state data */
 	CopySource	copy_src;		/* type of copy source */
 	FILE	   *copy_file;		/* used if copy_src == COPY_FILE */
@@ -183,4 +187,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->CopyFromOneRow */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+							   Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+							  Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+								 Datum *values, bool *nulls);
+
 #endif							/* COPYFROM_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3334d9485..7fab5c479e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -492,6 +492,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.43.5

From f75c34d7420ed7fc47be30e4ebfbff855d0cb2ff Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v24 1/4] Refactor COPY TO to use format callback functions.

This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.

This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.

Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed up to a 5%
performance gain in text or CSV mode.

Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyto.c    | 462 +++++++++++++++++++++----------
 src/include/commands/copyapi.h   |  58 ++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 382 insertions(+), 139 deletions(-)
 create mode 100644 src/include/commands/copyapi.h

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d9675..46f3507a8b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +65,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+	/* format routine */
+	const CopyToRoutine *routine;
+
 	/* low-level state data */
 	CopyDest	copy_dest;		/* type of copy source/destination */
 	FILE	   *copy_file;		/* used if copy_dest == COPY_FILE */
@@ -124,6 +128,317 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToTextLikeSendEndOfRow
+ *
+ * Apply line terminations for a line sent in text or CSV format depending
+ * on the destination, then send the end of a row.
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeSendEndOfRow(CopyToState cstate)
+{
+	switch (cstate->copy_dest)
+	{
+		case COPY_FILE:
+			/* Default line termination depends on platform */
+#ifndef WIN32
+			CopySendChar(cstate, '\n');
+#else
+			CopySendString(cstate, "\r\n");
+#endif
+			break;
+		case COPY_FRONTEND:
+			/* The FE/BE protocol uses \n as newline for all platforms */
+			CopySendChar(cstate, '\n');
+			break;
+		default:
+			break;
+	}
+
+	/* Now take the actions related to the end of a row */
+	CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextLikeStart
+ *
+ * Start of COPY TO for text and CSV format.
+ */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+	/*
+	 * For non-binary copy, we need to convert null_print to file encoding,
+	 * because it will be sent directly with CopySendString.
+	 */
+	if (cstate->need_transcoding)
+		cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+														  cstate->opts.null_print_len,
+														  cstate->file_encoding);
+
+	/* if a header has been requested send the line */
+	if (cstate->opts.header_line)
+	{
+		ListCell   *cur;
+		bool		hdr_delim = false;
+
+		foreach(cur, cstate->attnumlist)
+		{
+			int			attnum = lfirst_int(cur);
+			char	   *colname;
+
+			if (hdr_delim)
+				CopySendChar(cstate, cstate->opts.delim[0]);
+			hdr_delim = true;
+
+			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+			if (cstate->opts.csv_mode)
+				CopyAttributeOutCSV(cstate, colname, false);
+			else
+				CopyAttributeOutText(cstate, colname);
+		}
+
+		CopyToTextLikeSendEndOfRow(cstate);
+	}
+}
+
+/*
+ * CopyToTextLikeOutFunc
+ *
+ * Assign output function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+	Oid			func_oid;
+	bool		is_varlena;
+
+	/* Set output function for an attribute */
+	getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+	fmgr_info(func_oid, finfo);
+}
+
+
+/*
+ * CopyToTextLikeOneRow
+ *
+ * Process one row for text/CSV format.
+ *
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+					 TupleTableSlot *slot,
+					 bool is_csv)
+{
+	bool		need_delim = false;
+	FmgrInfo   *out_functions = cstate->out_functions;
+
+	foreach_int(attnum, cstate->attnumlist)
+	{
+		Datum		value = slot->tts_values[attnum - 1];
+		bool		isnull = slot->tts_isnull[attnum - 1];
+
+		if (need_delim)
+			CopySendChar(cstate, cstate->opts.delim[0]);
+		need_delim = true;
+
+		if (isnull)
+		{
+			CopySendString(cstate, cstate->opts.null_print_client);
+		}
+		else
+		{
+			char	   *string;
+
+			string = OutputFunctionCall(&out_functions[attnum - 1],
+										value);
+
+			/*
+			 * is_csv will be optimized away by compiler, as argument is
+			 * constant at caller.
+			 */
+			if (is_csv)
+				CopyAttributeOutCSV(cstate, string,
+									cstate->opts.force_quote_flags[attnum - 1]);
+			else
+				CopyAttributeOutText(cstate, string);
+		}
+	}
+
+	CopyToTextLikeSendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with text format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with CSV format.
+ */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * CopyToTextLikeEnd
+ *
+ * End of COPY TO for text/CSV format.
+ */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+	/* Nothing to do here */
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * CopyToBinaryStart
+ *
+ * Start of COPY TO for binary format.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+	/* Generate header for a binary copy */
+	int32		tmp;
+
+	/* Signature */
+	CopySendData(cstate, BinarySignature, 11);
+	/* Flags field */
+	tmp = 0;
+	CopySendInt32(cstate, tmp);
+	/* No header extension */
+	tmp = 0;
+	CopySendInt32(cstate, tmp);
+}
+
+/*
+ * CopyToBinaryOutFunc
+ *
+ * Assign output function data for a relation's attribute in binary format.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+	Oid			func_oid;
+	bool		is_varlena;
+
+	/* Set output function for an attribute */
+	getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+	fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyToBinaryOneRow
+ *
+ * Process one row for binary format.
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+	FmgrInfo   *out_functions = cstate->out_functions;
+
+	/* Binary per-tuple header */
+	CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+	foreach_int(attnum, cstate->attnumlist)
+	{
+		Datum		value = slot->tts_values[attnum - 1];
+		bool		isnull = slot->tts_isnull[attnum - 1];
+
+		if (isnull)
+		{
+			CopySendInt32(cstate, -1);
+		}
+		else
+		{
+			bytea	   *outputbytes;
+
+			outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+										   value);
+			CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+			CopySendData(cstate, VARDATA(outputbytes),
+						 VARSIZE(outputbytes) - VARHDRSZ);
+		}
+	}
+
+	CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToBinaryEnd
+ *
+ * End of COPY TO for binary format.
+ */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+	/* Generate trailer for a binary copy */
+	CopySendInt16(cstate, -1);
+	/* Need to flush out the trailer */
+	CopySendEndOfRow(cstate);
+}
+
+/*
+ * CSV and text share the same implementation, at the exception of the
+ * output representation and per-row callbacks.
+ */
+static const CopyToRoutine CopyToRoutineText = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToTextOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineCSV = {
+	.CopyToStart = CopyToTextLikeStart,
+	.CopyToOutFunc = CopyToTextLikeOutFunc,
+	.CopyToOneRow = CopyToCSVOneRow,
+	.CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+	.CopyToStart = CopyToBinaryStart,
+	.CopyToOutFunc = CopyToBinaryOutFunc,
+	.CopyToOneRow = CopyToBinaryOneRow,
+	.CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Define the COPY TO routines to use for a format.  This should be called
+ * after options are parsed.
+ */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+	if (opts.csv_mode)
+		return &CopyToRoutineCSV;
+	else if (opts.binary)
+		return &CopyToRoutineBinary;
+
+	/* default is text */
+	return &CopyToRoutineText;
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -191,16 +506,6 @@ CopySendEndOfRow(CopyToState cstate)
 	switch (cstate->copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate->opts.binary)
-			{
-				/* Default line termination depends on platform */
-#ifndef WIN32
-				CopySendChar(cstate, '\n');
-#else
-				CopySendString(cstate, "\r\n");
-#endif
-			}
-
 			if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
 					   cstate->copy_file) != 1 ||
 				ferror(cstate->copy_file))
@@ -235,10 +540,6 @@ CopySendEndOfRow(CopyToState cstate)
 			}
 			break;
 		case COPY_FRONTEND:
-			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate->opts.binary)
-				CopySendChar(cstate, '\n');
-
 			/* Dump the accumulated row as one CopyData message */
 			(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
 			break;
@@ -426,6 +727,9 @@ BeginCopyTo(ParseState *pstate,
 	/* Extract options from the statement node tree */
 	ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+	/* Set format routine */
+	cstate->routine = CopyToGetRoutine(cstate->opts);
+
 	/* Process the source/target relation or query */
 	if (rel)
 	{
@@ -771,19 +1075,10 @@ DoCopyTo(CopyToState cstate)
 	foreach(cur, cstate->attnumlist)
 	{
 		int			attnum = lfirst_int(cur);
-		Oid			out_func_oid;
-		bool		isvarlena;
 		Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-		if (cstate->opts.binary)
-			getTypeBinaryOutputInfo(attr->atttypid,
-									&out_func_oid,
-									&isvarlena);
-		else
-			getTypeOutputInfo(attr->atttypid,
-							  &out_func_oid,
-							  &isvarlena);
-		fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+		cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+									   &cstate->out_functions[attnum - 1]);
 	}
 
 	/*
@@ -796,56 +1091,7 @@ DoCopyTo(CopyToState cstate)
 											   "COPY TO",
 											   ALLOCSET_DEFAULT_SIZES);
 
-	if (cstate->opts.binary)
-	{
-		/* Generate header for a binary copy */
-		int32		tmp;
-
-		/* Signature */
-		CopySendData(cstate, BinarySignature, 11);
-		/* Flags field */
-		tmp = 0;
-		CopySendInt32(cstate, tmp);
-		/* No header extension */
-		tmp = 0;
-		CopySendInt32(cstate, tmp);
-	}
-	else
-	{
-		/*
-		 * For non-binary copy, we need to convert null_print to file
-		 * encoding, because it will be sent directly with CopySendString.
-		 */
-		if (cstate->need_transcoding)
-			cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-															  cstate->opts.null_print_len,
-															  cstate->file_encoding);
-
-		/* if a header has been requested send the line */
-		if (cstate->opts.header_line)
-		{
-			bool		hdr_delim = false;
-
-			foreach(cur, cstate->attnumlist)
-			{
-				int			attnum = lfirst_int(cur);
-				char	   *colname;
-
-				if (hdr_delim)
-					CopySendChar(cstate, cstate->opts.delim[0]);
-				hdr_delim = true;
-
-				colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, colname, false);
-				else
-					CopyAttributeOutText(cstate, colname);
-			}
-
-			CopySendEndOfRow(cstate);
-		}
-	}
+	cstate->routine->CopyToStart(cstate, tupDesc);
 
 	if (cstate->rel)
 	{
@@ -884,13 +1130,7 @@ DoCopyTo(CopyToState cstate)
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
-	if (cstate->opts.binary)
-	{
-		/* Generate trailer for a binary copy */
-		CopySendInt16(cstate, -1);
-		/* Need to flush out the trailer */
-		CopySendEndOfRow(cstate);
-	}
+	cstate->routine->CopyToEnd(cstate);
 
 	MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1146,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-	FmgrInfo   *out_functions = cstate->out_functions;
 	MemoryContext oldcontext;
 
 	MemoryContextReset(cstate->rowcontext);
 	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-	if (cstate->opts.binary)
-	{
-		/* Binary per-tuple header */
-		CopySendInt16(cstate, list_length(cstate->attnumlist));
-	}
-
 	/* Make sure the tuple is fully deconstructed */
 	slot_getallattrs(slot);
 
-	if (!cstate->opts.binary)
-	{
-		bool		need_delim = false;
-
-		foreach_int(attnum, cstate->attnumlist)
-		{
-			Datum		value = slot->tts_values[attnum - 1];
-			bool		isnull = slot->tts_isnull[attnum - 1];
-			char	   *string;
-
-			if (need_delim)
-				CopySendChar(cstate, cstate->opts.delim[0]);
-			need_delim = true;
-
-			if (isnull)
-				CopySendString(cstate, cstate->opts.null_print_client);
-			else
-			{
-				string = OutputFunctionCall(&out_functions[attnum - 1],
-											value);
-				if (cstate->opts.csv_mode)
-					CopyAttributeOutCSV(cstate, string,
-										cstate->opts.force_quote_flags[attnum - 1]);
-				else
-					CopyAttributeOutText(cstate, string);
-			}
-		}
-	}
-	else
-	{
-		foreach_int(attnum, cstate->attnumlist)
-		{
-			Datum		value = slot->tts_values[attnum - 1];
-			bool		isnull = slot->tts_isnull[attnum - 1];
-			bytea	   *outputbytes;
-
-			if (isnull)
-				CopySendInt32(cstate, -1);
-			else
-			{
-				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-											   value);
-				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-				CopySendData(cstate, VARDATA(outputbytes),
-							 VARSIZE(outputbytes) - VARHDRSZ);
-			}
-		}
-	}
-
-	CopySendEndOfRow(cstate);
+	cstate->routine->CopyToOneRow(cstate, slot);
 
 	MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..5ce24f195d
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *	  API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* This is private in commands/copyto.c */
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY TO format implementation.   Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+	/*
+	 * Called when COPY TO is started to set up the output functions
+	 * associated with the relation's attributes reading from.  `finfo` can be
+	 * optionally filled to provide the catalog information of the output
+	 * function.  `atttypid` is the OID of data type used by the relation's
+	 * attribute.
+	 */
+	void		(*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+								  FmgrInfo *finfo);
+
+	/*
+	 * Called when COPY TO is started.
+	 *
+	 * `tupDesc` is the tuple descriptor of the relation from where the data
+	 * is read.
+	 */
+	void		(*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+	/*
+	 * Copy one row for COPY TO.
+	 *
+	 * `slot` is the tuple slot where the data is emitted.
+	 */
+	void		(*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+	/* Called when COPY TO has ended */
+	void		(*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif							/* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 08521d51a9..e3334d9485 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -503,6 +503,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.43.5

Reply via email to