Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-20 Thread Eric Faulhaber
Martijn van Oosterhout wrote: > On Thu, Jul 20, 2006 at 12:07:54PM -0400, Eric Faulhaber wrote: >>> Well, there's a really nasty workaround: create a cast from bytea to >>> text which doesn't change the value. This will get your data into the >>> database without any encoding checks at all. Ofcours

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-20 Thread Martijn van Oosterhout
On Thu, Jul 20, 2006 at 12:07:54PM -0400, Eric Faulhaber wrote: > > Well, there's a really nasty workaround: create a cast from bytea to > > text which doesn't change the value. This will get your data into the > > database without any encoding checks at all. Ofcourse, you're then > > responsible f

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-20 Thread Eric Faulhaber
Martijn van Oosterhout wrote: > On Wed, Jul 19, 2006 at 06:06:08PM -0400, Eric Faulhaber wrote: >> Martijn van Oosterhout wrote: >>> On Wed, Jul 19, 2006 at 05:24:53PM -0400, Eric Faulhaber wrote: OK, but now that this "feature" has been removed in 8.1.4, how is this supposed to be handle

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-20 Thread Martijn van Oosterhout
On Wed, Jul 19, 2006 at 06:06:08PM -0400, Eric Faulhaber wrote: > Martijn van Oosterhout wrote: > > On Wed, Jul 19, 2006 at 05:24:53PM -0400, Eric Faulhaber wrote: > >> OK, but now that this "feature" has been removed in 8.1.4, how is this > >> supposed to be handled, given that we don't control wh

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Eric Faulhaber
Martijn van Oosterhout wrote: > On Wed, Jul 19, 2006 at 05:24:53PM -0400, Eric Faulhaber wrote: >> OK, but now that this "feature" has been removed in 8.1.4, how is this >> supposed to be handled, given that we don't control what string data >> we're handed? How does psql deal with it? > > Well,

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Martijn van Oosterhout
On Wed, Jul 19, 2006 at 05:24:53PM -0400, Eric Faulhaber wrote: > OK, but now that this "feature" has been removed in 8.1.4, how is this > supposed to be handled, given that we don't control what string data > we're handed? How does psql deal with it? Well, bytea handles null like it always has.

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Martijn van Oosterhout
On Wed, Jul 19, 2006 at 05:13:04PM -0400, Tom Lane wrote: > Given the lack of "memcoll", that proposal isn't going to fly ... > at least not until we replace all the locale support code with something > else (that hopefully will be null-clean). Yeah, ICU would give us that, but it won't magically

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Eric Faulhaber
Tom Lane wrote: > Martijn van Oosterhout writes: >> The fact is that if you're using binary format paramaters and output >> you can put embedded nulls into strings and get them back out. > > Not any more ;-) > OK, but now that this "feature" has been removed in 8.1.4, how is this supposed to be

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Tom Lane
Martijn van Oosterhout writes: > The fact is that if you're using binary format paramaters and output > you can put embedded nulls into strings and get them back out. Not any more ;-) > By changing a > few strcmps to memcmps you can get sane behaviour for sorting a several > other operations. G

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Martijn van Oosterhout
On Wed, Jul 19, 2006 at 10:03:34AM -0400, Tom Lane wrote: > Martijn van Oosterhout writes: > > Looking at the code it doesn't appear that there are too many places > > that are problematic. > > Really? > > The killer problem is that all datatype I/O goes through C strings. > Fixing this therefor

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Tom Lane
Martijn van Oosterhout writes: > Looking at the code it doesn't appear that there are too many places > that are problematic. Really? The killer problem is that all datatype I/O goes through C strings. Fixing this therefore would require breaking every user-defined datatype on the planet.

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-19 Thread Martijn van Oosterhout
On Tue, Jul 18, 2006 at 08:03:51PM -0400, Eric Faulhaber wrote: > > It's not a defect ... or at least, it doesn't make sense to change it > > unless you are willing to go through the entire system to make it able > > to store null bytes in text. We've looked at that in the past and > > always conc

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-18 Thread Eric Faulhaber
Tom Lane wrote: > Eric Faulhaber <[EMAIL PROTECTED]> writes: >> OK, but this particular issue is something quite new to the latest >> version. > > Again, PG has never stored such data correctly. > Perhaps not, but it silently tolerated such data until this release, at least at the encoding conve

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-18 Thread Tom Lane
Eric Faulhaber <[EMAIL PROTECTED]> writes: > OK, but this particular issue is something quite new to the latest > version. Again, PG has never stored such data correctly. > Am I stuck at 8.1.3 for the time being? I'd be happy to create a patch > to resolve this for a future version, but if it is

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-18 Thread Eric Faulhaber
Tom Lane wrote: > Eric Faulhaber <[EMAIL PROTECTED]> writes: >> Can anyone help me understand why converting the NULL code point () >> from UTF8 to ISO8859_1 is no longer legal in v8.1.4? > > Embedded nulls in text strings have never behaved sanely in PG ... or > hadn't you noticed? You'd hav

Re: [GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-18 Thread Tom Lane
Eric Faulhaber <[EMAIL PROTECTED]> writes: > Can anyone help me understand why converting the NULL code point () > from UTF8 to ISO8859_1 is no longer legal in v8.1.4? Embedded nulls in text strings have never behaved sanely in PG ... or hadn't you noticed? You'd have been better off passing

[GENERAL] UTF8 conversion differences from v8.1.3 to v8.1.4

2006-07-17 Thread Eric Faulhaber
Hi, Can anyone help me understand why converting the NULL code point () from UTF8 to ISO8859_1 is no longer legal in v8.1.4? The conversion proc (backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c) changed considerably between 8.1.3 and 8.1.4. The utf8_to_iso8859_1 con