Re: [BUGS] plperl.c patch to correctly support bytea inputs and output to functions and triggers.

2007-04-29 Thread Theo Schlossnagle


On Apr 28, 2007, at 1:26 PM, Tom Lane wrote:


Theo Schlossnagle <[EMAIL PROTECTED]> writes:

I've found a bug with the way plperl/plperlu handles bytea types.  It
fails to correctly handle bytea binary inputs and outputs.


Define "correctly".  The proposed patch seems to be "let's handle
bytea differently from every other data type", and that sure doesn't
sound like a path I want to tread.


As far as I can tell, bytea is the only datatype now that suffers  
from data loss.  In this I could be mistaken.  I took my cues form  
the way postgres handles inputing records, it switches on whether  
they were received in a binary fashion or not.  Since we're inside  
and have a Datum (or are making one) already, everything is just  
memory chunks and some characteristic of the Oid should be used to  
determine whether the data should be treated as binary.  As is clear  
from the patch, I used "if(Oid == BYTEAOID)" as the characteristic  
and perhaps there is a more robust way.


If I return a bytes from perl that looks like: "hello\0there",  
postgres sees a 5 byte string "hello".  That's data loss and makes it  
useless as a datatype as I cannot return things like images and other  
binary data.


When passing the string E'hello there\015\012' into a bytea receiving  
perl function, there is no way for me to get at the actual data  
passed to me.  Instead I get the Cstring: "hello there\\015\\012"  
which is 19 characters long instead of the 13 bytes of "bytea" data.   
Worse? E'hello\000there' will be materialized as a 5 bytes "bytea" in  
perl actually loosing the remainder of the data.  This also makes it  
impossible to work with bytes data in the plperl language; not hard,  
impossible.


In a lot of ways, bytea is different from every other data type, it  
is one that isn't suitable for chatacter set conversion, doesn't  
trivially cast to other varying size data types (like text, varchar,  
etc.).  It also is the only one (of its friends text, varchar, etc.)  
that suffers from data loss if used with InputFunctionCall and  
OutpuFunctionCall and not handled correctly with ReceiveFunctionCall  
and SendFunctionCall.


If bytea is instead a class of datatypes that represent arbitrary  
binary data, I'd agree that the patch should be changed to switch on  
that sort of identifier instead of the BYTEAOID Oid.  If you'd clue  
me into how one would go about identifying if the datatype Oid is to  
be treated as an arbitrary length octet sequence not subject to  
characterset conversion, then I'd happy revise the patch to be more  
correct.


Best regards,

Theo

// Theo Schlossnagle
// [EMAIL PROTECTED]: http://omniti.com
// Esoteric Curio: http://www.lethargy.org/~jesus/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [BUGS] plperl.c patch to correctly support bytea inputs and output to functions and triggers.

2007-04-29 Thread Tom Lane
Theo Schlossnagle <[EMAIL PROTECTED]> writes:
> If I return a bytes from perl that looks like: "hello\0there",  
> postgres sees a 5 byte string "hello".

You have failed to pay any attention to the escaping rules for bytea if
you do that.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings