Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
Tom Lane <[EMAIL PROTECTED]> writes: > Victor Snezhko <[EMAIL PROTECTED]> writes: >> 2) When I try to create a stored procedure to create table (e.g., for >>creating table only if it doesn't already exist), it fails to >>compile if i use cyrillic letter "х" (unicode id: 0x0445, utf-8 >>representation is D1 85) with the following weird error: > >> ERROR: expected dot between identifiers: х >> CONTEXT: compile of PL/pgSQL function "createoraltertable" near line 2 > >> the following query fails: > >> CREATE OR REPLACE FUNCTION TestFunction() >> RETURNS int >> AS $$ >> BEGIN >> SELECT х FROM test_table; >> RETURN 0; >> END; >> $$ LANGUAGE plpgsql; > > I tried to duplicate this without success on Fedora Core 5. Either your > ru_RU.utf8 locale works differently from Fedora's, or something else is > wrong. > Looking at the code, the only very plausible theory is that > isspace() is doing something different than we expect it to. I wonder > whether you have the database encoding set to something else than what > the postmaster's LC_CTYPE locale expects? No, the database encoding is UTF-8, the cluster was built with ru_RU.UTF-8 locale, and I haven't modified any defaults. It's on FreeBSD 6.1, and it's quite possible that it has something wrong with utf-8 locale, I saw some complaints at least about collation. However, isspace doesn't seem like a culprit to me. Let's ensure that I have done everything right to come to this conclusion. I have applied the following patch to pl_funcs.c: But, if I'm doing everything right, isspace seems not to be the culprit to me. I have run the above query on the server with the following applied patch (the server is 8.1.4, but I don't see any changes to the plpgsql_convert_ident() in revs 1.46-1.54): *** src/pl/plpgsql/src/pl_funcs.c.orig Wed Nov 23 00:23:30 2005 --- src/pl/plpgsql/src/pl_funcs.c Fri Sep 22 14:48:09 2006 *** *** 363,368 --- 363,381 { const char *sstart = s; int identctr = 0; + char*isspacebuf; + int i; + + isspacebuf = (char*)malloc(strlen(s) + 1); + for (i=0; s[i]; ++i) { + if (isspace(s[i])) + isspacebuf[i] = 't'; + else + isspacebuf[i] = 'f'; + } + + ereport(LOG, + (errmsg("plpgsql_convert_ident: ident=%s, isspace array=%s, isspace(0)=%c", s, isspacebuf, isspace(0)?'t':'f'))); /* Outer loop over identifiers */ while (*s) and got the following line in the logfile: LOG: plpgsql_convert_ident: ident=х, isspace array=ff, isspace(0)=f The symbol after "input=" is that 0x0445 (D1 85 in utf-8) letter which I mentioned above. I also double-checked this: got a stack trace (by inserting NULL-pointer assignment instead of reporting "expected dot between identifiers") and saw that the value passed to the plpgsql_convert_ident points to the following bytes: D1 85 00. So, isspace returns false on all three of them. I have also run the same query on another cluster (and the same binaries), initialized with a single-byte encoding: ru_RU.KOI8-R, where everything works. isspace returns the same results: LOG: plpgsql_convert_ident: ident=х, isspace array=f, isspace(0)=f LOG: plpgsql_convert_ident: ident=test_table, isspace array=ff, isspace(0)=f Strange... -- WBR, Victor V. Snezhko E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
Victor Snezhko <[EMAIL PROTECTED]> writes: > LOG: plpgsql_convert_ident: ident=х, isspace array=f, isspace(0)=f > LOG: plpgsql_convert_ident: ident=test_table, isspace array=ff, > isspace(0)=f Hmm, there was something wrong in my utf-8 cluster setup, I re-attached it and now see that isspace indeed fails: LOG: plpgsql_convert_ident: ident=х, isspace array=ft, isspace(0)=f -- WBR, Victor V. Snezhko E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #2642: Connection problems without IPv6
Some more infos: When I start and stop the postgres service, then very rarely it happens that it works, particularly when I start and stop some other services as well, such as the DNS caching client. But it is not reproducible. Maybe a timing issue? Can anybody reproduce this problem on Win XP? When I switched on as much debug info in the log as possible, here is what I got: 2006-09-22 12:39:55 LOG: 0: transaction ID wrap limit is 2147484148, limited by database "postgres" 2006-09-22 12:39:55 LOCATION: SetTransactionIdLimit, varsup.c:234 2006-09-22 12:39:55 DEBUG: 0: proc_exit(0) 2006-09-22 12:39:55 LOCATION: proc_exit, ipc.c:95 2006-09-22 12:39:55 DEBUG: 0: shmem_exit(0) 2006-09-22 12:39:55 LOCATION: shmem_exit, ipc.c:126 2006-09-22 12:39:55 DEBUG: 0: exit(0) 2006-09-22 12:39:55 LOCATION: proc_exit, ipc.c:113 2006-09-22 12:39:55 DEBUG: 0: reaping dead processes 2006-09-22 12:39:55 LOCATION: reaper, postmaster.c:2021 2006-09-22 12:39:55 DEBUG: 0: waiting on 2 children 2006-09-22 12:39:55 LOCATION: win32_waitpid, postmaster.c:4046 2006-09-22 12:39:55 DEBUG: 0: waiting on 3 children 2006-09-22 12:39:55 LOCATION: win32_waitpid, postmaster.c:4046 2006-09-22 12:39:55 DEBUG: 0: proc_exit(0) 2006-09-22 12:39:55 LOCATION: proc_exit, ipc.c:95 2006-09-22 12:39:55 DEBUG: 0: shmem_exit(0) 2006-09-22 12:39:55 LOCATION: shmem_exit, ipc.c:126 2006-09-22 12:39:55 DEBUG: 0: exit(0) 2006-09-22 12:39:55 LOCATION: proc_exit, ipc.c:113 2006-09-22 12:39:55 DEBUG: 0: reaping dead processes 2006-09-22 12:39:55 LOCATION: reaper, postmaster.c:2021 2006-09-22 12:39:55 DEBUG: 0: waiting on 4 children 2006-09-22 12:39:55 LOCATION: win32_waitpid, postmaster.c:4046 2006-09-22 12:39:55 DEBUG: 0: waiting on 3 children 2006-09-22 12:39:55 LOCATION: win32_waitpid, postmaster.c:4046 2006-09-22 12:39:57 DEBUG: 0: forked new backend, pid=3476 socket=1348 2006-09-22 12:39:57 LOCATION: BackendStartup, postmaster.c:2520 2006-09-22 12:39:57 LOG: 0: connection received: host=localhost port=1860 2006-09-22 12:39:57 LOCATION: BackendRun, postmaster.c:2699 2006-09-22 12:39:57 localhost(1860) LOG: XX000: could not receive data from client: An operation was attempted on something that is not a socket. 2006-09-22 12:39:57 localhost(1860) LOCATION: pq_recvbuf, pqcomm.c:712 2006-09-22 12:39:57 localhost(1860) LOG: 08P01: incomplete startup packet 2006-09-22 12:39:57 localhost(1860) LOCATION: ProcessStartupPacket, postmaster.c:1351 2006-09-22 12:39:57 localhost(1860) DEBUG: 0: proc_exit(0) 2006-09-22 12:39:57 localhost(1860) LOCATION: proc_exit, ipc.c:95 2006-09-22 12:39:57 localhost(1860) DEBUG: 0: shmem_exit(0) 2006-09-22 12:39:57 localhost(1860) LOCATION: shmem_exit, ipc.c:126 2006-09-22 12:39:57 localhost(1860) DEBUG: 0: exit(0) 2006-09-22 12:39:57 localhost(1860) LOCATION: proc_exit, ipc.c:113 2006-09-22 12:39:58 DEBUG: 0: reaping dead processes 2006-09-22 12:39:58 LOCATION: reaper, postmaster.c:2021 2006-09-22 12:39:58 DEBUG: 0: waiting on 4 children 2006-09-22 12:39:58 LOCATION: win32_waitpid, postmaster.c:4046 2006-09-22 12:39:58 DEBUG: 0: server process (PID 3476) exited with exit code 0 2006-09-22 12:39:58 LOCATION: LogChildExit, postmaster.c:2419 2006-09-22 12:39:58 DEBUG: 0: waiting on 3 children 2006-09-22 12:39:58 LOCATION: win32_waitpid, postmaster.c:4046 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
Victor Snezhko <[EMAIL PROTECTED]> writes: > Hmm, there was something wrong in my utf-8 cluster setup, I > re-attached it and now see that isspace indeed fails: > LOG: plpgsql_convert_ident: ident=È, isspace array=ft, isspace(0)=f Hm. Given that we've not seen this reported elsewhere, I wonder if we shouldn't conclude that it's a FreeBSD bug. We could fix the problem by not using isspace() but some homegrown equivalent ... but that seems a tad ugly, especially if it's to work around a problem on just one locale on one platform. Can anyone else check the result of isspace(0x85) ? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
Tom Lane <[EMAIL PROTECTED]> writes: >> Hmm, there was something wrong in my utf-8 cluster setup, I >> re-attached it and now see that isspace indeed fails: >> LOG: plpgsql_convert_ident: ident=х, isspace array=ft, isspace(0)=f > > Hm. Given that we've not seen this reported elsewhere, I wonder if we > shouldn't conclude that it's a FreeBSD bug. OK, for 0x85 maybe, but see below. > We could fix the problem by not using isspace() but some homegrown > equivalent ... but that seems a tad ugly, I wonder why simple create table (outside of plpgsql, with the table name equal to the character D1 85) works? Are multibyte strings handled differently in query parser compared to plpgsql? > especially if it's to work around a problem on just one locale on > one platform. Can anyone else check the result of isspace(0x85) ? Not sure about one platform, we need to check this. And here are some news: MacOSX's isspace also returns 1 on 0x85. So, 1.5 platforms. The test I have run is: #include #include #include int main() { setlocale(LC_ALL, ""); printf("%d %d\n", isspace(0x85), isspace(0xA0)); printf("%d %d\n", iswspace(0x85), iswspace(0xA0)); return 0; } 0xA0 is added because is's true space, see below. On FreeBSD & Mac this yields the following output: 1 1 1 1 On recent ALTLinux (Compact 3.0), which uses ru_RU.UTF-8 locale by default: 0 0 0 0 Here is an excerpt from the unicode character database (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt): 0085;;Cc;0;B;N;NEXT LINE (NEL) 00A0;NO-BREAK SPACE;Zs;0;CS; 0020N;NON-BREAKING SPACE 0x85 is the alternative next-line, and 0xA0 is the real space (note the mark Zs - http://www.unicode.org/Public/UNIDATA/UCD.html describes symbols with such marks as "Separator, Space") So, that linux doesn't return true for 0x85 and 0xA0 characters - but not because it's is* functions return false for any character above 0x80, but because it's LC_CTYPE doesn't know anything about these particular characters! (iswspace calls also return false). So linux has working isspace, but broken iswspace for these characters... So, could you run one more test on that Fedora to see if at least isspace() and isalpha() meet out expectations? #include #include #include int main() { int i; setlocale(LC_ALL, ""); for (i=0x80; i <= 0xFF; ++i) { if (isspace(i) != 0) printf("character 0x%x is a space\n", i); if (isalpha(i) != 0) printf("character 0x%x is alphabetical\n", i); if (isdigit(i) != 0) printf("character 0x%x is a digit\n", i); } return 0; } My FreeBSD lists a whole heck of characters: character 0x85 is a space character 0xa0 is a space character 0xaa is alphabetical character 0xb5 is alphabetical character 0xba is alphabetical character 0xc0 is alphabetical ... 0xc1-0xfe is alphabetical character 0xff is alphabetical -- WBR, Victor V. Snezhko E-mail: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
Mac OS X is based on FreeBSD. I'm not sure if this is what you want, because I'm not in a Turkish locale, but ...#include #include int main(){ printf("isspace(0x85) returned %d\n", isspace(0x85));}produces:isspace(0x85) returned 0... on my Mac OS X 10.4.7 PPC box.On Sep 22, 2006, at 12:48 PM, Tom Lane wrote:Victor Snezhko <[EMAIL PROTECTED]> writes: Hmm, there was something wrong in my utf-8 cluster setup, Ire-attached it and now see that isspace indeed fails:LOG: plpgsql_convert_ident: ident=È, isspace array=ft, isspace(0)=f Hm. Given that we've not seen this reported elsewhere, I wonder if weshouldn't conclude that it's a FreeBSD bug.We could fix the problem by not using isspace() but some homegrownequivalent ... but that seems a tad ugly, especially if it's to workaround a problem on just one locale on one platform. Can anyone elsecheck the result of isspace(0x85) ? regards, tom lane---(end of broadcast)---TIP 5: don't forget to increase your free space map settings Douglas Toltzman[EMAIL PROTECTED](910) 526-5938
Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
Victor Snezhko <[EMAIL PROTECTED]> writes: > My FreeBSD lists a whole heck of characters: > character 0x85 is a space > character 0xa0 is a space > character 0xaa is alphabetical > character 0xb5 is alphabetical > character 0xba is alphabetical > character 0xc0 is alphabetical > ... 0xc1-0xfe is alphabetical > character 0xff is alphabetical Hm. I'm still thinking that this behavior is wrong for UTF8 encoding, but it would be reasonable in LATINn and related encodings, so we probably ought to do something about it. After further thought, it's not so much that we can't tolerate locale-dependent behavior of isspace() in general, as that in this particular case we are expecting it to match the scanner's idea of a space: scan.l has space [ \t\n\r\f] which obviously is not locale-aware. I think we need convert_ident to use a plpgsql_isspace() that accepts these and only these as spaces. Any high-bit-set byte is part of an identifier according to scan.l's rules, and convert_ident must have the same behavior regardless of locale. There may be related risks in and around the other flex scanners ... will look. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale
I wrote: > ... I think we need convert_ident to > use a plpgsql_isspace() that accepts these and only these as spaces. > Any high-bit-set byte is part of an identifier according to scan.l's > rules, and convert_ident must have the same behavior regardless of locale. > There may be related risks in and around the other flex scanners > ... will look. I've committed a fix along these lines. ecpg had the identical bug, but I couldn't find any other places where we seemed to be assuming that macros would match the behavior of our flex scanners. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster