haiqingchen created FLINK-38110:
-----------------------------------

             Summary: PostgreSQL connector reads Chinese columns with garbled 
characters
                 Key: FLINK-38110
                 URL: https://issues.apache.org/jira/browse/FLINK-38110
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
    Affects Versions: cdc-3.4.0
            Reporter: haiqingchen


When there's column name in Chinese in PG tables, Postgresql connector with 
pgoutput plugin will decode them as garbled characters, especially during 
incremental capure.

The reason is when handling column names and table names,

io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder

doesn't convert the String to utf8 charset,
{code:java}

private static String readString(ByteBuffer buffer) {
    StringBuilder sb = new StringBuilder();
    boolean var2 = false;

    byte b;
    while((b = buffer.get()) != 0) {
        sb.append((char)b);
    }

    return sb.toString();
} {code}
while when it handle column value,  it will convert the string into utf8 
charset.
{code:java}
private static String readColumnValueAsString(ByteBuffer buffer) {
    int length = buffer.getInt();
    byte[] value = new byte[length];
    buffer.get(value, 0, length);
    return new String(value, Charset.forName("UTF-8"));
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to