terrymanu commented on issue #37238:
URL:
https://github.com/apache/shardingsphere/issues/37238#issuecomment-3677302829
## Problem Understanding
- Using PostgreSQL + ShardingSphere Proxy 5.5.2, querying a bytea column
shows that OCTET_LENGTH is 10240, but ResultSet#getBytes returns fewer bytes
and the missing length varies per row; direct DB access is correct, so the
issue arises on the Proxy return path.
## Root Cause
- PostgreSQL’s default result format is TEXT, but in the TEXT path the
Proxy writes raw binary for bytea (PostgreSQLDataRowPacket#writeTextValue
byte[] branch) while RowDescription still declares TEXT format
(Portal#determineValueFormat defaults to TEXT).
- Per PostgreSQL protocol, TEXT bytea must be text-encoded (hex or
escape), not raw bytes. Official docs:
- bytea textual representation:
https://www.postgresql.org/docs/current/datatype-binary.html
- Protocol message/format codes:
https://www.postgresql.org/docs/current/protocol-message-formats.html
- Because the Proxy outputs raw binary under TEXT, the client decodes as
UTF-8 then parses as bytea text, causing byte loss/truncation with varying tail
length. This is a Proxy-side bug, not a usage issue.
## Problem Analysis
- Simple query: PostgreSQLComQueryExecutor#getQueryRowPacket always uses
TEXT, so bytea hits the raw write branch.
- Extended query: default resultFormats is empty,
Portal#determineValueFormat returns TEXT, so bytea again uses raw write.
- BINARY path (BinaryCell + PostgreSQLByteaBinaryProtocolValue) is
correct, but JDBC default result format is TEXT; current behavior violates the
protocol.
## Conclusion (Design-level Fix Suggestions)
- In TEXT format, bytea must be encoded per PostgreSQL spec; do not emit
raw bytes. Recommended:
1. In PostgreSQLDataRowPacket#writeTextValue, for byte[] convert to
spec-compliant hex text (\\x + lowercase hex), write with UTF-8, and set the
length field to the encoded text byte length; keep existing behavior for BINARY
format.
2. Optionally, in Portal#determineValueFormat, prefer BINARY for bytea
when the client has not specified formats, but ensure protocol compatibility
and add tests.
3. Add unit tests covering both TEXT and BINARY paths: build a 10KB
byte[] containing non-ASCII bytes, verify written length and content are intact
with no truncation; use PostgreSQLDataRowPacket with a Netty ByteBuf harness.
- We warmly invite community contributors to submit a PR with the fix and
tests—your help is greatly appreciated!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]