terrymanu commented on issue #37238:
URL: 
https://github.com/apache/shardingsphere/issues/37238#issuecomment-3677302829

   ##  Problem Understanding
   
     - Using PostgreSQL + ShardingSphere Proxy 5.5.2, querying a bytea column 
shows that OCTET_LENGTH is 10240, but ResultSet#getBytes returns fewer bytes 
and the missing length varies per row; direct DB access is correct, so the 
issue arises on the Proxy return path.
   
   ##  Root Cause
   
     - PostgreSQL’s default result format is TEXT, but in the TEXT path the 
Proxy writes raw binary for bytea (PostgreSQLDataRowPacket#writeTextValue 
byte[] branch) while RowDescription still declares TEXT format 
(Portal#determineValueFormat defaults to TEXT).
     - Per PostgreSQL protocol, TEXT bytea must be text-encoded (hex or 
escape), not raw bytes. Official docs:
         - bytea textual representation: 
https://www.postgresql.org/docs/current/datatype-binary.html
         - Protocol message/format codes: 
https://www.postgresql.org/docs/current/protocol-message-formats.html
     - Because the Proxy outputs raw binary under TEXT, the client decodes as 
UTF-8 then parses as bytea text, causing byte loss/truncation with varying tail 
length. This is a Proxy-side bug, not a usage issue.
   
   ##  Problem Analysis
   
     - Simple query: PostgreSQLComQueryExecutor#getQueryRowPacket always uses 
TEXT, so bytea hits the raw write branch.
     - Extended query: default resultFormats is empty, 
Portal#determineValueFormat returns TEXT, so bytea again uses raw write.
     - BINARY path (BinaryCell + PostgreSQLByteaBinaryProtocolValue) is 
correct, but JDBC default result format is TEXT; current behavior violates the 
protocol.
   
   ##  Conclusion (Design-level Fix Suggestions)
   
     - In TEXT format, bytea must be encoded per PostgreSQL spec; do not emit 
raw bytes. Recommended:
         1. In PostgreSQLDataRowPacket#writeTextValue, for byte[] convert to 
spec-compliant hex text (\\x + lowercase hex), write with UTF-8, and set the 
length field to the encoded text byte length; keep existing behavior for BINARY 
format.
         2. Optionally, in Portal#determineValueFormat, prefer BINARY for bytea 
when the client has not specified formats, but ensure protocol compatibility 
and add tests.
         3. Add unit tests covering both TEXT and BINARY paths: build a 10KB 
byte[] containing non-ASCII bytes, verify written length and content are intact 
with no truncation; use PostgreSQLDataRowPacket with a Netty ByteBuf harness.
     - We warmly invite community contributors to submit a PR with the fix and 
tests—your help is greatly appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to