[BUGS] BUG #3638: UTF8 Character encoding does NOT work
The following bug has been logged online: Bug reference: 3638 Logged by: Fil Matthews Email address: [EMAIL PROTECTED] PostgreSQL version: 8-1, 8-2 Operating system: Linux Debian - Windows XP Description:UTF8 Character encoding does NOT work Details: Judging from the amount of Google page hits with the exact same problem I am surprised and mystified by this obvious flaw in Postgres Technology.. Just how is one expected to work with UTF8 character sets when all and every attempt at using even Postgres clients produces the SAME problem every time ??? "invalid byte sequence for encoding "UTF8": 0x92" In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 Tables test.text -> (Chararcter varying 10) In any Postgres Client ie psql , dbadmin III Insert into test values ( chr(146));; Query returned successfully: 1 rows affected, 32 ms execution time. copy test to '/tmp/testfile.txt'; Query returned successfully: 1 rows affected, 15 ms execution time. copy test from '/tmp/testfile.txt'; Come on are you serious?? .. Just how does one work with completly valid data that has an ascii 128 + value ?? Currently this flaw make Postgres an un-useable database technology .. Or can some-one please explain this and a possible work around .. ?? Thank You ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work
Why do you think that an UTF-8 encoded string starting with 0x92 is valid? 0x92 can appear in the second, third or fourth octet, but should never appear in the first octet. -- Tatsuo Ishii SRA OSS, Inc. Japan > The following bug has been logged online: > > Bug reference: 3638 > Logged by: Fil Matthews > Email address: [EMAIL PROTECTED] > PostgreSQL version: 8-1, 8-2 > Operating system: Linux Debian - Windows XP > Description:UTF8 Character encoding does NOT work > Details: > > Judging from the amount of Google page hits with the exact same problem I am > surprised and mystified by this obvious flaw in Postgres Technology.. > > Just how is one expected to work with UTF8 character sets when all and > every attempt at using even Postgres clients produces the SAME problem > every time ??? > > "invalid byte sequence for encoding "UTF8": 0x92" > > In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 > > Tables test.text -> (Chararcter varying 10) > > In any Postgres Client ie psql , dbadmin III > > Insert into test values ( chr(146));; > > > Query returned successfully: 1 rows affected, 32 ms execution time. > > copy test to '/tmp/testfile.txt'; > > > Query returned successfully: 1 rows affected, 15 ms execution time. > > copy test from '/tmp/testfile.txt'; > > > Come on are you serious?? .. Just how does one work with completly valid > data that has an ascii 128 + value ?? > > Currently this flaw make Postgres an un-useable database technology .. Or > can some-one please explain this and a possible work around .. ?? > > Thank You > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work
Fil Matthews wrote: > Come on are you serious?? .. Just how does one work with completly valid > data that has an ascii 128 + value ?? A string with a single byte that has high-bit set (characters 128-255) is *not* a valid UTF-8 sequence. That's why you get the error. One could argue that chr(146) should throw an error right away, since 146 doesn't represent a valid ASCII character either. In fact, this has been changed in the CVS; in the upcoming 8.3 version, chr(146) in UTF-8 encoding will return a two byte character corresponding the unicode code point 146, which seems to be a control character, marked for "private use only". > Currently this flaw make Postgres an un-useable database technology .. Or > can some-one please explain this and a possible work around .. ?? That's quite an exaggeration, don't you think? As a work around, don't put invalid data in your database. Don't use chr-function, it's really only there for compatibility with other DBMSs that has it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work
"Fil Matthews" <[EMAIL PROTECTED]> writes: > In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 ... > Insert into test values ( chr(146));; Can you explain what you expected to happen here? Did you, for example, expect the character with Unicode code point 146 to be inserted? Because the single byte 146 isn't a valid UTF8 character. In PostgreSQL 8.2 I don't think there's any function to generate an arbitrary Unicode code point. You'll have to do that on the client end and encode it in UTF8 before sending. In PostgreSQL 8.3 chr() will in fact be modified to do this. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[BUGS] BUG #3639: queryplanner degrades to a sequential scan even if there's an index
The following bug has been logged online: Bug reference: 3639 Logged by: Hannu Valtonen Email address: [EMAIL PROTECTED] PostgreSQL version: 8.2.4 Operating system: Debian Linux (4.0), kernel 2.6.20 Description:queryplanner degrades to a sequential scan even if there's an index Details: Hi, I'm wondering my query plan degrades to a sequential scan. If I have a : select * from table1, table2 where table1.table2_id = table2.id and table2.id = it uses the index on table.table2_id and everything's fast but if I change the query to be: select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring') the table1.table2_id index scan degrades to a sequential scan.. making it unbelievably slow. The tables are vacuumed every 6 hours with analyze, there's nothing in the server logs that looks out of the ordinary. Table 1 has about 3.5 million rows and table 2 about 3 million. I tried with hashjoin and seqscans set to false, but the seqscan still happens. Any help would be really appreciated. - Hannu Valtonen ps. Here are the relevant queries with explain analyze explain analyze select * from table1, table2 where table1.table2_id = table2.id and table2.id = 2841962; QUERY PLAN Nested Loop (cost=30.28..2788.35 rows=1124 width=335) (actual time=0.155..0.164 rows=2 loops=1) -> Index Scan using table2_pkey on table2 (cost=0.00..8.36 rows=1 width=35) (actual time=0.030..0.032 rows=1 loops=1) Index Cond: (id = 2841962) -> Bitmap Heap Scan on table1 (cost=30.28..2777.74 rows=1124 width=300) (actual time=0.033..0.036 rows=2 loops=1) Recheck Cond: (2841962 = table2_id) -> Bitmap Index Scan on table1.table2_id_index (cost=0.00..30.23 rows=723 width=0) (actual time=0.025..0.025 rows=2 loops=1) Index Cond: (2841962 = table2_id) Total runtime: 0.257 ms (8 rows) explain analyze select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring'); -- Hash Join (cost=8.87..176869.89 rows=1 width=335) (actual time=59725.942..86744.682 rows=2 loops=1) Hash Cond: (table1.table2_id = table2.id) -> Seq Scan on table1 (cost=0.00..174217.47 rows=3524735 width=300) (actual time=0.002..81600.987 rows=3525023 loops=1) -> Hash (cost=8.87..8.87 rows=1 width=35) (actual time=2.659..2.659 rows=1 loops=1) -> Index Scan using table2_name_lower_index on table2 (cost=0.00..8.87 rows=1 width=35) (actual time=2.636..2.643 rows=1 loops=1) Index Cond: (lower((name)::text) = 'nicestring'::text) Total runtime: 86744.726 ms (7 rows) Time: 86719,735 ms TableTable "public.table1" Column |Type | Modifiers --+-+--- --- id | integer | not null default nextval('table1_id_seq'::regclass) table2_id| integer | "table2_id_index" btree (table2_id) WHERE (NOT (table2_id = 1)) "table2_id_exists" FOREIGN KEY (table2_id) REFERENCES table2(id) And 25 other foreign keys in table1 that point to table 2 which have the same types of partial indexes and foreign keys. Table "public.table2" Column |Type | Modifiers ---+-+-- -- id| integer | not null default nextval('table2_id_seq'::regclass) name | character varying(128) | not null timestamp | timestamp without time zone | Indexes: "table2_pkey" primary key, btree (id) "table2_name_key" unique, btree (name) "table2_name_lower_index" btree (lower((name)::text)) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #3639: queryplanner degrades to a sequential scan even if there's an index
Hannu Valtonen wrote: > explain analyze select * from table1, table2 where table1.table2_id = > table2.id and table2.id = 2841962; Here the planner is using the partial index table2_id_index on table1, knowing that table1.table2_id equals 2841962, satisfying the "NOT (table2_id = 1)" condition that index has. > explain analyze select * from table1, table2 where table1.table2_id = > table2.id and lower(table2.name) = lower('nicestring'); But here, table1.table2_id can have any value, including 1, so it can't use that index. You can rewrite the query like this: select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring') AND NOT (table1.table2_id = 1) UNION ALL select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring') AND (table1.table2_id = 1) In which case the planner can use the index for the first part, though not for the second part which might still be slow. I don't know the schema, but perhaps you're not really interested in rows with table2_id = 1, so you could just leave out the second part of the union. Or you can make the index a normal, non-partial index. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #3639: queryplanner degrades to a sequential scan even if there's an index
Heikki Linnakangas kirjoitti: Hannu Valtonen wrote: explain analyze select * from table1, table2 where table1.table2_id = table2.id and table2.id = 2841962; Here the planner is using the partial index table2_id_index on table1, knowing that table1.table2_id equals 2841962, satisfying the "NOT (table2_id = 1)" condition that index has. explain analyze select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring'); But here, table1.table2_id can have any value, including 1, so it can't use that index. You can rewrite the query like this: select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring') AND NOT (table1.table2_id = 1) UNION ALL select * from table1, table2 where table1.table2_id = table2.id and lower(table2.name) = lower('nicestring') AND (table1.table2_id = 1) In which case the planner can use the index for the first part, though not for the second part which might still be slow. I don't know the schema, but perhaps you're not really interested in rows with table2_id = 1, so you could just leave out the second part of the union. Or you can make the index a normal, non-partial index. Ah, thank you very much. I was specifically interested in the != 1 chunk of table1. And that now works with the AND NOT (table1.table2_id = 1) - Hannu ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [BUGS] CREATE USER and createuser not working the same
Alvaro Herrera a écrit : > Cédric Villemain wrote: > >> Le jeudi 13 septembre 2007, Tom Lane a écrit : >> >>> =?ISO-8859-1?Q?St=E9phane_Schildknecht?= >>> >> <[EMAIL PROTECTED]> writes: >> It seems the shell command createuser and the SQL CREATE USER don't act the same way, >>> They aren't really claimed to. >>> >> But the man say : >> " createuser is a wrapper around the SQL command CREATE ROLE >> [create_role(7)]. >> There is no effective difference between creating users via this utility and >> via other methods for accessing the server." >> > > Note that CREATE ROLE behaves slightly differently from CREATE USER. > > The result is exactly the same, here... test=# create role testrole superuser; CREATE ROLE test=# \du List of roles Role name | Superuser | Create role | Create DB | Connections | Member of ---+---+-+---+-+--- testrole | yes | no | no| no limit| test=# alter role testrole nosuperuser; ALTER ROLE test=# \du List of roles Role name | Superuser | Create role | Create DB | Connections | Member of ---+---+-+---+-+--- testrole | no| no | no| no limit| So a superuser which has not been explicitly created with "createuser" and "create DB" privileges will become a no privilege while losing the status superuser. If created with the Unix command createuser, he won't. Best regards, SAS ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work
> Tatsuo Ishii wrote: > > Why do you think that an UTF-8 encoded string starting with 0x92 is > > valid? > > > > 0x92 can appear in the second, third or fourth octet, but should never > > appear in the first octet. > > -- > > Tatsuo Ishii > > SRA OSS, Inc. Japan > > > > > >> The following bug has been logged online: > >> > >> Bug reference: 3638 > >> Logged by: Fil Matthews > >> Email address: [EMAIL PROTECTED] > >> PostgreSQL version: 8-1, 8-2 > >> Operating system: Linux Debian - Windows XP > >> Description:UTF8 Character encoding does NOT work > >> Details: > >> > >> Judging from the amount of Google page hits with the exact same problem I > >> am > >> surprised and mystified by this obvious flaw in Postgres Technology.. > >> > >> Just how is one expected to work with UTF8 character sets when all and > >> every attempt at using even Postgres clients produces the SAME problem > >> every time ??? > >> > >> "invalid byte sequence for encoding "UTF8": 0x92" > >> > >> In Short A Postgres UTF8 database .. PGCLIENENCODING=UTF8 > >> > >> Tables test.text -> (Chararcter varying 10) > >> > >> In any Postgres Client ie psql , dbadmin III > >> > >> Insert into test values ( chr(146));; > >> > >> > >> Query returned successfully: 1 rows affected, 32 ms execution time. > >> > >> copy test to '/tmp/testfile.txt'; > >> > >> > >> Query returned successfully: 1 rows affected, 15 ms execution time. > >> > >> copy test from '/tmp/testfile.txt'; > >> > >> > >> Come on are you serious?? .. Just how does one work with completly valid > >> data that has an ascii 128 + value ?? > >> > >> Currently this flaw make Postgres an un-useable database technology .. Or > >> can some-one please explain this and a possible work around .. ?? > >> > >> Thank You > >> > >> ---(end of broadcast)--- > >> TIP 1: if posting/reading through Usenet, please send an appropriate > >>subscribe-nomail command to [EMAIL PROTECTED] so that your > >>message can get through to the mailing list cleanly > >> > > > > > > Sorry But I don't agree.. Why can't Postgres store a legitimate 8 bit > byte value that is below 255?? and treat it as text .. > Not being able to do this this makes Postgres unusable.. for storing > TEXT values.. > > I do not know ANY other database technology that doesn't allow some form > of storing a legitimate 8 bit byte ... > > Even the most simplest open -source database in the world (and most > popular) can do this.. > > The biggest and best (Thank you Larry) can do this ... > > Postgres can't. > > In other words You are claiming that UTF8 is actually UTF7 No. > There are 8 bits in a byte.. not 7 .. If UTF8 can't by definition > store 8 bits then what standard can?? UTF-8 does not accept arbitary 8 bit characters. The byte ranges UTF-8 accepts are precisely defined in the standard. If our implementation is different from it, please let us know. > The technology is wrong and it is incorrect... If one looks at the > output of the copy file > od -c then QUITE correctly the 8 bit value is stored as the value > given.. > > What then is the problem in putting this value back in the text field it > came from ?? PostgreSQL needs to follow the standard. I suggest you double check the standard. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #3638: UTF8 Character encoding does NOT work
> Then how then does Postgres expect to be able to store text that goes > beyond SQL_ASCII.. ??? Because what you are saying in effect is that If you need to store an arbitrary byte sequences, BYTEA is your friend. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate