[BUGS] BUG #1721: mutiple bytes character string comaprison error

2005-06-19 Thread Chii-Tung Liu

The following bug has been logged online:

Bug reference:  1721
Logged by:  Chii-Tung Liu
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.0.3
Operating system:   Windows XP SP2
Description:mutiple bytes character string comaprison error
Details: 

When compare two UTF-8 encoded string that contains Chinese words, the
result is always TRUE
1. create a database test with encoding set to unicode
CREATE DATABASE test
  WITH OWNER = postgres
   ENCODING = 'UNICODE'
   TABLESPACE = pg_default;
2. insert data with Chinese words
INSERT into node set title='1 中æ??'

3. SELECT title from node where title > '1.1 '
would return '1 中æ??'

4. Both SELECT '1 中æ??' > '1.1' and  SELECT '1.1' > '1 中æ??' return
FALSE

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[BUGS] B-tree unique index duplicate key error happens only in SUSE 9.3

2005-06-19 Thread Zhenlei Cai
This bug happens in SUSE 9.3 on both Pentium 4 and AMD64, whether the
binaries are from  postgresql-8.0.1 RPMs on the SUSE 9.3 DVD or are
built from 8.0.3 source code. However this bug does NOT happen with a
Debian box (unstable) running 8.0.3 on an x86 (Athlon XP, whether
binary or built from source). The problem is Postgresql claims two
records has the same value for one string attribute that has a unique
constraint, while in fact the two string values are different. To  see
this bug, just do a restore from the pg_dump'ed file attached to this
email. Sample steps and error message follow:

 begin command ---
createdb -E utf8 pg_bug
psql pg_bug < pg_dup_key_bug.dump
NOTICE:  CREATE TABLE / UNIQUE will create implicit index
"gaocanusers_userid_key" for table "gaocanusers"
CREATE TABLE
ERROR:  duplicate key violates unique constraint "gaocanusers_userid_key"
CONTEXT:  COPY gaocanusers, line 2: "129406 ���ズ� 
[EMAIL PROTECTED] f U\N  \N  \N  \N  --  f  
2002-09-12 00:00:00 \N  \\3031\\3..."

--- end 


pg_dup_key_bug.dump
Description: Binary data

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [BUGS] B-tree unique index duplicate key error happens only in SUSE 9.3

2005-06-19 Thread John Hansen
FYI, Works just fine on gentoo with the UTF8 and ICU patches.

... John

> This bug happens in SUSE 9.3 on both Pentium 4 and AMD64, 
> whether the binaries are from  postgresql-8.0.1 RPMs on the 
> SUSE 9.3 DVD or are built from 8.0.3 source code. However 
> this bug does NOT happen with a Debian box (unstable) running 
> 8.0.3 on an x86 (Athlon XP, whether binary or built from 
> source). The problem is Postgresql claims two records has the 
> same value for one string attribute that has a unique 
> constraint, while in fact the two string values are 
> different. To  see this bug, just do a restore from the 
> pg_dump'ed file attached to this email. Sample steps and 
> error message follow:


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] B-tree unique index duplicate key error happens only in SUSE 9.3

2005-06-19 Thread Tom Lane
Zhenlei Cai <[EMAIL PROTECTED]> writes:
> This bug happens in SUSE 9.3 on both Pentium 4 and AMD64, whether the
> binaries are from  postgresql-8.0.1 RPMs on the SUSE 9.3 DVD or are
> built from 8.0.3 source code. However this bug does NOT happen with a
> Debian box (unstable) running 8.0.3 on an x86 (Athlon XP, whether
> binary or built from source). The problem is Postgresql claims two

What makes you think this is a Postgres bug, rather than a bug in the
locale definition you are using on the SUSE box?  Try feeding the two
strings in question to strcoll() and see what happens.

One way that you can get inconsistent results from strcoll() is if you
feed it strings that are invalid according to the character set encoding
that strcoll() thinks you are using, which is to say the encoding
implied by the current LC_CTYPE locale setting.  So it's possible that
the real problem is that you have Postgres' database encoding set to
something that's incompatible with the postmaster's LC_CTYPE locale.
(Try "show lc_ctype" to see what that is exactly.)

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #1721: mutiple bytes character string comaprison error

2005-06-19 Thread Tom Lane
"Chii-Tung Liu" <[EMAIL PROTECTED]> writes:
> PostgreSQL version: 8.0.3
> Operating system:   Windows XP SP2

> When compare two UTF-8 encoded string that contains Chinese words, the
> result is always TRUE

Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
Use some other database encoding.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #1721: mutiple bytes character string comaprison

2005-06-19 Thread Kris Jurka


On Sun, 19 Jun 2005, Tom Lane wrote:

> "Chii-Tung Liu" <[EMAIL PROTECTED]> writes:
> > PostgreSQL version: 8.0.3
> > Operating system:   Windows XP SP2
> 
> > When compare two UTF-8 encoded string that contains Chinese words, the
> > result is always TRUE
> 
> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
> Use some other database encoding.
> 

Shouldn't we forbid its creation then?  At least a strongly worded 
warning?  We see these complaints too often.

Kris Jurka

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [BUGS] BUG #1721: mutiple bytes character string comaprison error

2005-06-19 Thread Tom Lane
Kris Jurka <[EMAIL PROTECTED]> writes:
> On Sun, 19 Jun 2005, Tom Lane wrote:
>> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet).
>> Use some other database encoding.

> Shouldn't we forbid its creation then?

There was serious discussion of that before the 8.0 release, but
we decided not to forbid it.  Check the archives; I don't recall
the reasoning at the moment.

> We see these complaints too often.

There are lots of complaints we see way too often ;-) ... but
distressingly, there are still only 24 hours in a day.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [BUGS] BUG #1721: mutiple bytes character string comaprison

2005-06-19 Thread Tatsuo Ishii
> The following bug has been logged online:
> 
> Bug reference:  1721
> Logged by:  Chii-Tung Liu
> Email address:  [EMAIL PROTECTED]
> PostgreSQL version: 8.0.3
> Operating system:   Windows XP SP2
> Description:mutiple bytes character string comaprison error
> Details: 
> 
> When compare two UTF-8 encoded string that contains Chinese words, the
> result is always TRUE
> 1. create a database test with encoding set to unicode
> CREATE DATABASE test
>   WITH OWNER = postgres
>ENCODING = 'UNICODE'
>TABLESPACE = pg_default;
> 2. insert data with Chinese words
> INSERT into node set title='1 中文'
> 
> 3. SELECT title from node where title > '1.1 '
> would return '1 中文'
> 
> 4. Both SELECT '1 中文' > '1.1' and  SELECT '1.1' > '1 中文' return
> FALSE

I think you need to use C locale.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly