On Tue, Feb 06, 2024 at 02:24:45PM -0800, Andres Freund wrote: > I think the code is just very confusing - there actually *is* verification of > the encoding, it just happens at a different, earlier, layer, namely in > copyfromparse.c: CopyConvertBuf() which says: > /* > * If the file and server encoding are the same, no encoding conversion > is > * required. However, we still need to verify that the input is valid > for > * the encoding. > */ > > And does indeed verify that.
This has been switched by Heikki in f82de5c46bdf, in 2021, that has removed pg_database_encoding_max_length() in the COPY FROM case. (Adding him now in CC, in case he has any comments). > One unfortunate issue: We don't have any tests verifying that COPY FROM > catches encoding issues. Oops. Anyway, I was looking at the copyto.c code because I need to get something on this thread to be able to do something about the pluggable APIs in COPY, and echoing with what you mentioned upthread, what we only need to do is to set need_transcoding only when the client and the server encodings are not the same? Am I missing something? Runtime gets much better in average, around 1260ms on HEAD vs 1023ms with the attached for the example of upthread on a single process. Some profile data from CopyOneRowTo(), if relevant: * HEAD: - 82.78% 10.96% postgres postgres [.] CopyOneRowTo - 71.82% CopyOneRowTo + 30.87% OutputFunctionCall - 13.21% CopyAttributeOutText pg_server_to_any - 9.48% appendBinaryStringInfo 4.93% enlargeStringInfo 3.33% 0xffffa4e1e234 + 3.20% CopySendEndOfRow 2.66% 0xffffa4e1e214 1.02% pgstat_progress_update_param 0.86% memcpy@plt 0.74% 0xffffa4e1cba4 0.72% MemoryContextReset 0.72% 0xffffa4e1cba8 0.59% enlargeStringInfo 0.55% 0xffffa4e1cb40 0.54% 0xffffa4e1cb74 0.52% 0xffffa4e1cb8c + 10.96% _start * patch: - 80.82% 12.25% postgres postgres [.] CopyOneRowTo - 68.57% CopyOneRowTo + 36.55% OutputFunctionCall 11.44% CopyAttributeOutText + 8.87% appendBinaryStringInfo + 3.79% CopySendEndOfRow 1.01% pgstat_progress_update_param 0.79% int2out 0.66% MemoryContextReset 0.63% 0xffffaa624ba8 0.60% memcpy@plt 0.60% enlargeStringInfo 0.53% 0xffffaa624ba4 + 12.25% _start That's a performance-only change, but there may be a good argument for backpatching something, perhaps? -- Michael
From 6ddbcd4d6333bc96c21ca95d632d83f1e1459064 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Thu, 8 Feb 2024 16:03:39 +0900 Subject: [PATCH] Speedup COPY TO --- src/backend/commands/copyto.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index bd4710a79b..c6b45cab53 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -612,13 +612,15 @@ BeginCopyTo(ParseState *pstate, cstate->file_encoding = cstate->opts.file_encoding; /* - * Set up encoding conversion info. Even if the file and server encodings - * are the same, we must apply pg_any_to_server() to validate data in - * multibyte encodings. + * Set up encoding conversion info, validating data if server and + * client encodings are not the same (see also pg_server_to_any). */ - cstate->need_transcoding = - (cstate->file_encoding != GetDatabaseEncoding() || - pg_database_encoding_max_length() > 1); + if (cstate->file_encoding == GetDatabaseEncoding() || + cstate->file_encoding == PG_SQL_ASCII) + cstate->need_transcoding = false; + else + cstate->need_transcoding = true; + /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); -- 2.43.0
signature.asc
Description: PGP signature