The bit/varbit type input functions cause file_fdw to fail to read the
logfile normally.
1. Server conf:
server_encoding = UTF8
locale = zh_CN.UTF-8
2. Create external tables using file_fdw
CREATE EXTENSION file_fdw;
CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE pglog (
log_time timestamp(3) with time zone,
user_name text,
database_name text,
process_id integer,
connection_from text,
session_id text,
session_line_num bigint,
command_tag text,
session_start_time timestamp with time zone,
virtual_transaction_id text,
transaction_id bigint,
error_severity text,
sql_state_code text,
message text,
detail text,
hint text,
internal_query text,
internal_query_pos integer,
context text,
query text,
query_pos integer,
location text,
application_name text
) SERVER pglog
OPTIONS ( filename 'log/postgresql-2020-06-16_213409.csv',
format 'csv');
It's normal to be here.
3. bit/varbit input
select b'Ù';
The foreign table cannot be accessed. SELECT * FROM pglog will get:
invalid byte sequence for encoding "UTF8": 0xc3 0x22
The reason is that the error message in the bit_in / varbit_in function
is output directly using %c. Causes the log file to not be decoded
correctly.
The attachment is a patch.
diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c
index f0c6a44b84..506cc35446 100644
--- a/src/backend/utils/adt/varbit.c
+++ b/src/backend/utils/adt/varbit.c
@@ -230,8 +230,8 @@ bit_in(PG_FUNCTION_ARGS)
else if (*sp != '0')
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("\"%c\" is not a valid
binary digit",
- *sp)));
+ errmsg("\"0x%02X\" is not a
valid binary digit",
+ (unsigned
char)*sp)));
x >>= 1;
if (x == 0)
@@ -531,8 +531,8 @@ varbit_in(PG_FUNCTION_ARGS)
else if (*sp != '0')
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
- errmsg("\"%c\" is not a valid
binary digit",
- *sp)));
+ errmsg("\"0x%02X\" is not a
valid binary digit",
+ (unsigned
char)*sp)));
x >>= 1;
if (x == 0)