[BUGS] BUG #3638: UTF8 Character encoding does NOT work

2007-09-27 Thread Fil Matthews

The following bug has been logged online:

Bug reference:  3638
Logged by:  Fil Matthews
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8-1, 8-2
Operating system:   Linux  Debian  - Windows XP
Description:UTF8 Character encoding does NOT work
Details: 

Judging from the amount of Google page hits with the exact same problem I am
surprised and mystified by this obvious flaw in Postgres Technology..

Just how is one expected to work with  UTF8 character sets when all and
every attempt at using even Postgres clients produces the SAME problem 
every time ???

 "invalid byte sequence for encoding "UTF8": 0x92"

In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 

Tables test.text ->   (Chararcter varying 10)

In any  Postgres Client ie  psql , dbadmin III

Insert into test values (  chr(146));;


Query returned successfully: 1 rows affected, 32 ms execution time.

copy test to '/tmp/testfile.txt';


Query returned successfully: 1 rows affected, 15 ms execution time.

copy test from '/tmp/testfile.txt';


Come on are you serious?? ..  Just how does one work with completly valid
data that has an ascii 128 +  value ??

Currently this flaw make Postgres an un-useable database technology ..  Or
can some-one please explain this and a possible work around ..  ??

Thank You

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work

2007-09-27 Thread Tatsuo Ishii
Why do you think that an UTF-8 encoded string starting with 0x92 is
valid?

0x92 can appear in the second, third or fourth octet, but should never
appear in the first octet.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> The following bug has been logged online:
> 
> Bug reference:  3638
> Logged by:  Fil Matthews
> Email address:  [EMAIL PROTECTED]
> PostgreSQL version: 8-1, 8-2
> Operating system:   Linux  Debian  - Windows XP
> Description:UTF8 Character encoding does NOT work
> Details: 
> 
> Judging from the amount of Google page hits with the exact same problem I am
> surprised and mystified by this obvious flaw in Postgres Technology..
> 
> Just how is one expected to work with  UTF8 character sets when all and
> every attempt at using even Postgres clients produces the SAME problem 
> every time ???
> 
>  "invalid byte sequence for encoding "UTF8": 0x92"
> 
> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 
> 
> Tables test.text ->   (Chararcter varying 10)
> 
> In any  Postgres Client ie  psql , dbadmin III
> 
> Insert into test values (  chr(146));;
> 
> 
> Query returned successfully: 1 rows affected, 32 ms execution time.
> 
> copy test to '/tmp/testfile.txt';
> 
> 
> Query returned successfully: 1 rows affected, 15 ms execution time.
> 
> copy test from '/tmp/testfile.txt';
> 
> 
> Come on are you serious?? ..  Just how does one work with completly valid
> data that has an ascii 128 +  value ??
> 
> Currently this flaw make Postgres an un-useable database technology ..  Or
> can some-one please explain this and a possible work around ..  ??
> 
> Thank You
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work

2007-09-27 Thread Heikki Linnakangas
Fil Matthews wrote:
> Come on are you serious?? ..  Just how does one work with completly valid
> data that has an ascii 128 +  value ??

A string with a single byte that has high-bit set (characters 128-255)
is *not* a valid UTF-8 sequence. That's why you get the error. One could
argue that chr(146) should throw an error right away, since 146 doesn't
represent a valid ASCII character either.

In fact, this has been changed in the CVS; in the upcoming 8.3 version,
chr(146) in UTF-8 encoding will return a two byte character
corresponding the unicode code point 146, which seems to be a control
character, marked for "private use only".

> Currently this flaw make Postgres an un-useable database technology ..  Or
> can some-one please explain this and a possible work around ..  ??

That's quite an exaggeration, don't you think? As a work around, don't
put invalid data in your database. Don't use chr-function, it's really
only there for compatibility with other DBMSs that has it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work

2007-09-27 Thread Gregory Stark
"Fil Matthews" <[EMAIL PROTECTED]> writes:

> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 
...
> Insert into test values (  chr(146));;

Can you explain what you expected to happen here? Did you, for example, expect
the character with Unicode code point 146 to be inserted? Because the single
byte 146 isn't a valid UTF8 character.

In PostgreSQL 8.2 I don't think there's any function to generate an arbitrary
Unicode code point. You'll have to do that on the client end and encode it in
UTF8 before sending. In PostgreSQL 8.3 chr() will in fact be modified to do
this.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[BUGS] BUG #3639: queryplanner degrades to a sequential scan even if there's an index

2007-09-27 Thread Hannu Valtonen

The following bug has been logged online:

Bug reference:  3639
Logged by:  Hannu Valtonen
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.2.4
Operating system:   Debian Linux (4.0), kernel 2.6.20
Description:queryplanner degrades to a sequential scan even if
there's an index
Details: 

Hi,

I'm wondering my query plan degrades to a sequential scan. If I have a :

select * from table1, table2 where table1.table2_id = table2.id and
table2.id = 

it uses the index on table.table2_id and everything's fast but if I change
the query to be:

select * from table1, table2 where table1.table2_id = table2.id and
lower(table2.name) = lower('nicestring')

the table1.table2_id index scan degrades to a sequential scan.. making it
unbelievably slow.
The tables are vacuumed every 6 hours with analyze, there's nothing in the
server logs that looks out of the ordinary. Table 1 has about 3.5 million
rows and table 2 about 3 million.

I tried with hashjoin and seqscans set to false, but the seqscan still
happens.

Any help would be really appreciated.

- Hannu Valtonen

ps.
Here are the relevant queries with explain analyze

explain analyze select * from table1, table2 where table1.table2_id =
table2.id and table2.id = 2841962;
   QUERY PLAN


 Nested Loop  (cost=30.28..2788.35 rows=1124 width=335) (actual
time=0.155..0.164 rows=2 loops=1)
   ->  Index Scan using table2_pkey on table2  (cost=0.00..8.36 rows=1
width=35) (actual time=0.030..0.032 rows=1 loops=1)
 Index Cond: (id = 2841962)
   ->  Bitmap Heap Scan on table1 (cost=30.28..2777.74 rows=1124 width=300)
(actual time=0.033..0.036 rows=2 loops=1)
 Recheck Cond: (2841962 = table2_id)
 ->  Bitmap Index Scan on table1.table2_id_index  (cost=0.00..30.23
rows=723 width=0) (actual time=0.025..0.025 rows=2 loops=1)
   Index Cond: (2841962 = table2_id)
 Total runtime: 0.257 ms
(8 rows)

explain analyze select * from table1, table2 where table1.table2_id =
table2.id and lower(table2.name) = lower('nicestring');



--
 Hash Join  (cost=8.87..176869.89 rows=1 width=335) (actual
time=59725.942..86744.682 rows=2 loops=1)
   Hash Cond: (table1.table2_id = table2.id)
   ->  Seq Scan on table1  (cost=0.00..174217.47 rows=3524735 width=300)
(actual time=0.002..81600.987 rows=3525023 loops=1)
   ->  Hash  (cost=8.87..8.87 rows=1 width=35) (actual time=2.659..2.659
rows=1 loops=1)
 ->  Index Scan using table2_name_lower_index on table2 
(cost=0.00..8.87 rows=1 width=35) (actual time=2.636..2.643 rows=1 loops=1)
   Index Cond: (lower((name)::text) = 'nicestring'::text)
 Total runtime: 86744.726 ms
(7 rows)

Time: 86719,735 ms

TableTable "public.table1"
  Column  |Type |   
Modifiers
--+-+---
---
 id   | integer | not null default
nextval('table1_id_seq'::regclass)
 table2_id| integer |
"table2_id_index" btree (table2_id) WHERE (NOT (table2_id = 1))
"table2_id_exists" FOREIGN KEY (table2_id) REFERENCES table2(id)

And 25 other foreign keys in table1 that point to table 2 which have the
same types of partial indexes and foreign keys.

 Table "public.table2"
  Column   |Type |   Modifiers
---+-+--
--
 id| integer | not null default
nextval('table2_id_seq'::regclass)
 name  | character varying(128)  | not null
 timestamp | timestamp without time zone |
Indexes:
"table2_pkey" primary key, btree (id)
"table2_name_key" unique, btree (name)
"table2_name_lower_index" btree (lower((name)::text))

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #3639: queryplanner degrades to a sequential scan even if there's an index

2007-09-27 Thread Heikki Linnakangas
Hannu Valtonen wrote:
> explain analyze select * from table1, table2 where table1.table2_id =
> table2.id and table2.id = 2841962;

Here the planner is using the partial index table2_id_index on table1,
knowing that table1.table2_id equals 2841962, satisfying the "NOT
(table2_id = 1)" condition that index has.

> explain analyze select * from table1, table2 where table1.table2_id =
> table2.id and lower(table2.name) = lower('nicestring');

But here, table1.table2_id can have any value, including 1, so it can't
use that index.

You can rewrite the query like this:

select * from table1, table2 where table1.table2_id = table2.id and
lower(table2.name) = lower('nicestring') AND NOT (table1.table2_id = 1)
UNION ALL
select * from table1, table2 where table1.table2_id = table2.id and
lower(table2.name) = lower('nicestring') AND (table1.table2_id = 1)

In which case the planner can use the index for the first part, though
not for the second part which might still be slow. I don't know the
schema, but perhaps you're not really interested in rows with table2_id
= 1, so you could just leave out the second part of the union.

Or you can make the index a normal, non-partial index.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #3639: queryplanner degrades to a sequential scan even if there's an index

2007-09-27 Thread Hannu Valtonen

Heikki Linnakangas kirjoitti:

Hannu Valtonen wrote:
  

explain analyze select * from table1, table2 where table1.table2_id =
table2.id and table2.id = 2841962;



Here the planner is using the partial index table2_id_index on table1,
knowing that table1.table2_id equals 2841962, satisfying the "NOT
(table2_id = 1)" condition that index has.

  

explain analyze select * from table1, table2 where table1.table2_id =
table2.id and lower(table2.name) = lower('nicestring');



But here, table1.table2_id can have any value, including 1, so it can't
use that index.

You can rewrite the query like this:

select * from table1, table2 where table1.table2_id = table2.id and
lower(table2.name) = lower('nicestring') AND NOT (table1.table2_id = 1)
UNION ALL
select * from table1, table2 where table1.table2_id = table2.id and
lower(table2.name) = lower('nicestring') AND (table1.table2_id = 1)

In which case the planner can use the index for the first part, though
not for the second part which might still be slow. I don't know the
schema, but perhaps you're not really interested in rows with table2_id
= 1, so you could just leave out the second part of the union.

Or you can make the index a normal, non-partial index.
  
Ah, thank you very much.  I was specifically interested in the != 1 
chunk of table1. And that now works with the


AND NOT (table1.table2_id = 1)

- Hannu

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [BUGS] CREATE USER and createuser not working the same

2007-09-27 Thread Stéphane Schildknecht
Alvaro Herrera a écrit :
> Cédric Villemain wrote:
>   
>> Le jeudi 13 septembre 2007, Tom Lane a écrit :
>> 
>>> =?ISO-8859-1?Q?St=E9phane_Schildknecht?= 
>>>   
>> <[EMAIL PROTECTED]> writes:
>> 
 It seems the shell command createuser and the SQL CREATE USER don't act
 the same way,
 
>>> They aren't really claimed to. 
>>>   
>> But the man say : 
>> " createuser is a wrapper around the SQL command CREATE ROLE 
>> [create_role(7)].  
>> There is no effective difference between creating users via this utility and 
>> via other methods for accessing the server."
>> 
>
> Note that CREATE ROLE behaves slightly differently from CREATE USER.
>
>   
The result is exactly the same, here...

test=# create role testrole superuser;
CREATE ROLE
test=# \du
   List of roles
 Role name | Superuser | Create role | Create DB | Connections | Member of
---+---+-+---+-+---
testrole  | yes   | no  | no| no limit|
test=# alter role testrole nosuperuser;
ALTER ROLE
test=# \du
   List of roles
 Role name | Superuser | Create role | Create DB | Connections | Member of
---+---+-+---+-+---
testrole  | no| no  | no| no limit|


So a superuser which has not been explicitly created with "createuser"
and "create DB" privileges will become a no privilege while losing the
status superuser.

If created with the Unix command createuser, he won't.

Best regards,

SAS

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work

2007-09-27 Thread Tatsuo Ishii
> Tatsuo Ishii wrote:
> > Why do you think that an UTF-8 encoded string starting with 0x92 is
> > valid?
> >
> > 0x92 can appear in the second, third or fourth octet, but should never
> > appear in the first octet.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >   
> >> The following bug has been logged online:
> >>
> >> Bug reference:  3638
> >> Logged by:  Fil Matthews
> >> Email address:  [EMAIL PROTECTED]
> >> PostgreSQL version: 8-1, 8-2
> >> Operating system:   Linux  Debian  - Windows XP
> >> Description:UTF8 Character encoding does NOT work
> >> Details: 
> >>
> >> Judging from the amount of Google page hits with the exact same problem I 
> >> am
> >> surprised and mystified by this obvious flaw in Postgres Technology..
> >>
> >> Just how is one expected to work with  UTF8 character sets when all and
> >> every attempt at using even Postgres clients produces the SAME problem 
> >> every time ???
> >>
> >>  "invalid byte sequence for encoding "UTF8": 0x92"
> >>
> >> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 
> >>
> >> Tables test.text ->   (Chararcter varying 10)
> >>
> >> In any  Postgres Client ie  psql , dbadmin III
> >>
> >> Insert into test values (  chr(146));;
> >>
> >>
> >> Query returned successfully: 1 rows affected, 32 ms execution time.
> >>
> >> copy test to '/tmp/testfile.txt';
> >>
> >>
> >> Query returned successfully: 1 rows affected, 15 ms execution time.
> >>
> >> copy test from '/tmp/testfile.txt';
> >>
> >>
> >> Come on are you serious?? ..  Just how does one work with completly valid
> >> data that has an ascii 128 +  value ??
> >>
> >> Currently this flaw make Postgres an un-useable database technology ..  Or
> >> can some-one please explain this and a possible work around ..  ??
> >>
> >> Thank You
> >>
> >> ---(end of broadcast)---
> >> TIP 1: if posting/reading through Usenet, please send an appropriate
> >>subscribe-nomail command to [EMAIL PROTECTED] so that your
> >>message can get through to the mailing list cleanly
> >> 
> >
> >   
> 
> Sorry But I don't agree.. Why can't Postgres store a legitimate 8 bit 
> byte value that is below 255??  and treat it as text ..
> Not being able to do this this makes Postgres unusable.. for storing
> TEXT values..
> 
> I do not know ANY other database technology that doesn't allow some form
> of storing a legitimate 8 bit byte ...
> 
> Even the most simplest  open -source database in the world  (and most
> popular)  can do this..
> 
> The biggest and best  (Thank you Larry) can do this ...
> 
> Postgres can't.
> 
> In other words  You are claiming that UTF8  is  actually UTF7 

No.

> There are 8 bits in a byte.. not 7 ..  If UTF8  can't by definition
> store 8 bits  then what standard can??

UTF-8 does not accept arbitary 8 bit characters. The byte ranges UTF-8
accepts are precisely defined in the standard. If our implementation
is different from it, please let us know.

> The technology is wrong  and it is incorrect...  If one looks at the
> output  of the copy file 
> od -c   then QUITE correctly the 8 bit value  is stored as the value
> given.. 
> 
> What then is the problem in putting this value back in the text field it
> came from ??

PostgreSQL needs to follow the standard. I suggest you double check
the standard.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work

2007-09-27 Thread Tatsuo Ishii
> Then how then does Postgres  expect to be able to store text that goes
> beyond SQL_ASCII..  ???  Because what you are saying in effect is that

If you need to store an arbitrary byte sequences, BYTEA is your
friend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate