Re: [BUGS] Out of memory error during large hashagg

2006-09-23 Thread Casey Duncan
I posted that in a subsequent mail, but here it is again: I'm interested in collecting info on the distribution of data. Can you post: select tablename, attname, n_distinct from pg_stats where attname = 'st_id'; tablename | attname | n_distinct --+-+ st

Re: [BUGS] Out of memory error during large hashagg

2006-09-23 Thread Tom Lane
Casey Duncan <[EMAIL PROTECTED]> writes: > select st_id, min(seed_id) as "initial_seed_id", count(*) as > "seed_count" from seed group by st_id; > The query plan and table stats are: >QUERY PLAN > --

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes: >> Agreed, but such corruption indicates that there is non-multibyte-safe >> (octet-wise) case conversion somewhere, at best (with fully working >> locale) it will cause case conversion to do nothing instead of actual >> conversion. > > Yours is the first insta

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Tom Lane
Victor Snezhko <[EMAIL PROTECTED]> writes: > Agreed, but such corruption indicates that there is non-multibyte-safe > (octet-wise) case conversion somewhere, at best (with fully working > locale) it will cause case conversion to do nothing instead of actual > conversion. Yours is the first install

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes: >> correct utf-8 byte sequence is 0xd18231, so it looks like we call >> tolower() somewhere on parts of multibyte characters, and it does the >> same as isspace() - it interprets it's argument as wide character, and >> converts it. > > Indeed, and I am certainl

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Tom Lane
Victor Snezhko <[EMAIL PROTECTED]> writes: > correct utf-8 byte sequence is 0xd18231, so it looks like we call > tolower() somewhere on parts of multibyte characters, and it does the > same as isspace() - it interprets it's argument as wide character, and > converts it. Indeed, and I am certainly

Re: [BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Victor Snezhko <[EMAIL PROTECTED]> writes: > So, we either don't support utf-8 on BSDs Hmm, tolower'ing octets of a multibyte string is a bug not only on BSDs but on other architectures as well. But on BSDs it additionally causes corruption of utf-8 data. -- WBR, Victor V. Snezhko E-mail: [EMAI

[BUGS] Corruption of multibyte identifiers on UTF-8 locale

2006-09-23 Thread Victor Snezhko
Hello, Looks like we have more serious problem with multibyte identifiers. When I run the following sequence of queries: CREATE OR REPLACE FUNCTION CreateOrAlterTable() RETURNS int AS $$ BEGIN if not EXISTS(SELECT relname FROM pg_class WHERE relname ILIKE 'т1' AND relkind = 'r') then CREA

Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-23 Thread Victor Snezhko
Victor Snezhko <[EMAIL PROTECTED]> writes: > However, in system catalogs (SELECT * FROM pg_tables WHERE > schemaname='public') there appears to be empty strings instead > of table names. > > This is on patched 8.1.4 (with ILIKE and ctype.h fixes), I'm upgrading > to HEAD now to see if anything i

Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-23 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes: >> ... I think we need convert_ident to >> use a plpgsql_isspace() that accepts these and only these as spaces. >> Any high-bit-set byte is part of an identifier according to scan.l's >> rules, and convert_ident must have the same behavior regardless of locale.