haiqingchen created FLINK-38110:
-----------------------------------
Summary: PostgreSQL connector reads Chinese columns with garbled
characters
Key: FLINK-38110
URL: https://issues.apache.org/jira/browse/FLINK-38110
Project: Flink
Issue Type: Improvement
Components: Flink CDC
Affects Versions: cdc-3.4.0
Reporter: haiqingchen
When there's column name in Chinese in PG tables, Postgresql connector with
pgoutput plugin will decode them as garbled characters, especially during
incremental capure.
The reason is when handling column names and table names,
io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder
doesn't convert the String to utf8 charset,
{code:java}
private static String readString(ByteBuffer buffer) {
StringBuilder sb = new StringBuilder();
boolean var2 = false;
byte b;
while((b = buffer.get()) != 0) {
sb.append((char)b);
}
return sb.toString();
} {code}
while when it handle column value, it will convert the string into utf8
charset.
{code:java}
private static String readColumnValueAsString(ByteBuffer buffer) {
int length = buffer.getInt();
byte[] value = new byte[length];
buffer.get(value, 0, length);
return new String(value, Charset.forName("UTF-8"));
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)