On Apr 28, 2007, at 1:26 PM, Tom Lane wrote:

Theo Schlossnagle <[EMAIL PROTECTED]> writes:
I've found a bug with the way plperl/plperlu handles bytea types.  It
fails to correctly handle bytea binary inputs and outputs.

Define "correctly".  The proposed patch seems to be "let's handle
bytea differently from every other data type", and that sure doesn't
sound like a path I want to tread.

As far as I can tell, bytea is the only datatype now that suffers from data loss. In this I could be mistaken. I took my cues form the way postgres handles inputing records, it switches on whether they were received in a binary fashion or not. Since we're inside and have a Datum (or are making one) already, everything is just memory chunks and some characteristic of the Oid should be used to determine whether the data should be treated as binary. As is clear from the patch, I used "if(Oid == BYTEAOID)" as the characteristic and perhaps there is a more robust way.

If I return a bytes from perl that looks like: "hello\0there", postgres sees a 5 byte string "hello". That's data loss and makes it useless as a datatype as I cannot return things like images and other binary data.

When passing the string E'hello there\015\012' into a bytea receiving perl function, there is no way for me to get at the actual data passed to me. Instead I get the Cstring: "hello there\\015\\012" which is 19 characters long instead of the 13 bytes of "bytea" data. Worse? E'hello\000there' will be materialized as a 5 bytes "bytea" in perl actually loosing the remainder of the data. This also makes it impossible to work with bytes data in the plperl language; not hard, impossible.

In a lot of ways, bytea is different from every other data type, it is one that isn't suitable for chatacter set conversion, doesn't trivially cast to other varying size data types (like text, varchar, etc.). It also is the only one (of its friends text, varchar, etc.) that suffers from data loss if used with InputFunctionCall and OutpuFunctionCall and not handled correctly with ReceiveFunctionCall and SendFunctionCall.

If bytea is instead a class of datatypes that represent arbitrary binary data, I'd agree that the patch should be changed to switch on that sort of identifier instead of the BYTEAOID Oid. If you'd clue me into how one would go about identifying if the datatype Oid is to be treated as an arbitrary length octet sequence not subject to characterset conversion, then I'd happy revise the patch to be more correct.

Best regards,

Theo

// Theo Schlossnagle
// [EMAIL PROTECTED]: http://omniti.com
// Esoteric Curio: http://www.lethargy.org/~jesus/


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to