haiqingchen created FLINK-38110: ----------------------------------- Summary: PostgreSQL connector reads Chinese columns with garbled characters Key: FLINK-38110 URL: https://issues.apache.org/jira/browse/FLINK-38110 Project: Flink Issue Type: Improvement Components: Flink CDC Affects Versions: cdc-3.4.0 Reporter: haiqingchen
When there's column name in Chinese in PG tables, Postgresql connector with pgoutput plugin will decode them as garbled characters, especially during incremental capure. The reason is when handling column names and table names, io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder doesn't convert the String to utf8 charset, {code:java} private static String readString(ByteBuffer buffer) { StringBuilder sb = new StringBuilder(); boolean var2 = false; byte b; while((b = buffer.get()) != 0) { sb.append((char)b); } return sb.toString(); } {code} while when it handle column value, it will convert the string into utf8 charset. {code:java} private static String readColumnValueAsString(ByteBuffer buffer) { int length = buffer.getInt(); byte[] value = new byte[length]; buffer.get(value, 0, length); return new String(value, Charset.forName("UTF-8")); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)