Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes:

> Victor Snezhko <[EMAIL PROTECTED]> writes:
>> 2) When I try to create a stored procedure to create table (e.g., for
>>creating table only if it doesn't already exist), it fails to
>>compile if i use cyrillic letter "х" (unicode id: 0x0445, utf-8
>>representation is D1 85) with the following weird error:
>
>> ERROR:  expected dot between identifiers: х
>> CONTEXT:  compile of PL/pgSQL function "createoraltertable" near line 2
>
>> the following query fails:
>
>> CREATE OR REPLACE FUNCTION TestFunction()
>> RETURNS int
>> AS $$
>> BEGIN
>>   SELECT х FROM test_table;
>>   RETURN 0;
>> END;
>> $$ LANGUAGE plpgsql;
>
> I tried to duplicate this without success on Fedora Core 5.  Either your
> ru_RU.utf8 locale works differently from Fedora's, or something else is
> wrong.  
> Looking at the code, the only very plausible theory is that
> isspace() is doing something different than we expect it to.  I wonder
> whether you have the database encoding set to something else than what
> the postmaster's LC_CTYPE locale expects?

No, the database encoding is UTF-8, the cluster was built with
ru_RU.UTF-8 locale, and I haven't modified any defaults.

It's on FreeBSD 6.1, and it's quite possible that it has something
wrong with utf-8 locale, I saw some complaints at least about
collation.

However, isspace doesn't seem like a culprit to me. Let's ensure that
I have done everything right to come to this conclusion. I have
applied the following patch to pl_funcs.c:


But, if I'm doing everything right, isspace seems not to be the
culprit to me. I have run the above query on the server with the
following applied patch (the server is 8.1.4, but I don't see any
changes to the plpgsql_convert_ident() in revs 1.46-1.54):

*** src/pl/plpgsql/src/pl_funcs.c.orig  Wed Nov 23 00:23:30 2005
--- src/pl/plpgsql/src/pl_funcs.c   Fri Sep 22 14:48:09 2006
***
*** 363,368 
--- 363,381 
  {
const char *sstart = s;
int identctr = 0;
+   char*isspacebuf;
+   int i;
+
+   isspacebuf = (char*)malloc(strlen(s) + 1);
+   for (i=0; s[i]; ++i) {
+   if (isspace(s[i]))
+   isspacebuf[i] = 't';
+   else
+   isspacebuf[i] = 'f';
+   }
+
+   ereport(LOG,
+   (errmsg("plpgsql_convert_ident: ident=%s, isspace array=%s, 
isspace(0)=%c", s, isspacebuf, isspace(0)?'t':'f')));

/* Outer loop over identifiers */
while (*s)


and got the following line in the logfile:

  LOG:  plpgsql_convert_ident: ident=х, isspace array=ff, isspace(0)=f

The symbol after "input=" is that 0x0445 (D1 85 in utf-8) letter which
I mentioned above. I also double-checked this: got a stack trace (by
inserting NULL-pointer assignment instead of reporting "expected dot
between identifiers") and saw that the value passed to the
plpgsql_convert_ident points to the following bytes: D1 85 00.

So, isspace returns false on all three of them.

I have also run the same query on another cluster (and the same
binaries), initialized with a single-byte encoding: ru_RU.KOI8-R,
where everything works. isspace returns the same results:

LOG:  plpgsql_convert_ident: ident=х, isspace array=f, isspace(0)=f
LOG:  plpgsql_convert_ident: ident=test_table, isspace array=ff, 
isspace(0)=f

Strange...

-- 
WBR, Victor V. Snezhko
E-mail: [EMAIL PROTECTED]



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Victor Snezhko
Victor Snezhko <[EMAIL PROTECTED]> writes:

> LOG:  plpgsql_convert_ident: ident=х, isspace array=f, isspace(0)=f
> LOG:  plpgsql_convert_ident: ident=test_table, isspace array=ff, 
> isspace(0)=f

Hmm, there was something wrong in my utf-8 cluster setup, I
re-attached it and now see that isspace indeed fails:
LOG:  plpgsql_convert_ident: ident=х, isspace array=ft, isspace(0)=f

-- 
WBR, Victor V. Snezhko
E-mail: [EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2642: Connection problems without IPv6

2006-09-22 Thread Christoph Zwerschke
Some more infos: When I start and stop the postgres service, then very 
rarely it happens that it works, particularly when I start and stop some 
other services as well, such as the DNS caching client. But it is not 
reproducible. Maybe a timing issue?


Can anybody reproduce this problem on Win XP?

When I switched on as much debug info in the log as possible, here is 
what I got:


2006-09-22 12:39:55  LOG:  0: transaction ID wrap limit is 
2147484148, limited by database "postgres"

2006-09-22 12:39:55  LOCATION:  SetTransactionIdLimit, varsup.c:234
2006-09-22 12:39:55  DEBUG:  0: proc_exit(0)
2006-09-22 12:39:55  LOCATION:  proc_exit, ipc.c:95
2006-09-22 12:39:55  DEBUG:  0: shmem_exit(0)
2006-09-22 12:39:55  LOCATION:  shmem_exit, ipc.c:126
2006-09-22 12:39:55  DEBUG:  0: exit(0)
2006-09-22 12:39:55  LOCATION:  proc_exit, ipc.c:113
2006-09-22 12:39:55  DEBUG:  0: reaping dead processes
2006-09-22 12:39:55  LOCATION:  reaper, postmaster.c:2021
2006-09-22 12:39:55  DEBUG:  0: waiting on 2 children
2006-09-22 12:39:55  LOCATION:  win32_waitpid, postmaster.c:4046
2006-09-22 12:39:55  DEBUG:  0: waiting on 3 children
2006-09-22 12:39:55  LOCATION:  win32_waitpid, postmaster.c:4046
2006-09-22 12:39:55  DEBUG:  0: proc_exit(0)
2006-09-22 12:39:55  LOCATION:  proc_exit, ipc.c:95
2006-09-22 12:39:55  DEBUG:  0: shmem_exit(0)
2006-09-22 12:39:55  LOCATION:  shmem_exit, ipc.c:126
2006-09-22 12:39:55  DEBUG:  0: exit(0)
2006-09-22 12:39:55  LOCATION:  proc_exit, ipc.c:113
2006-09-22 12:39:55  DEBUG:  0: reaping dead processes
2006-09-22 12:39:55  LOCATION:  reaper, postmaster.c:2021
2006-09-22 12:39:55  DEBUG:  0: waiting on 4 children
2006-09-22 12:39:55  LOCATION:  win32_waitpid, postmaster.c:4046
2006-09-22 12:39:55  DEBUG:  0: waiting on 3 children
2006-09-22 12:39:55  LOCATION:  win32_waitpid, postmaster.c:4046
2006-09-22 12:39:57  DEBUG:  0: forked new backend, pid=3476 socket=1348
2006-09-22 12:39:57  LOCATION:  BackendStartup, postmaster.c:2520
2006-09-22 12:39:57  LOG:  0: connection received: host=localhost 
port=1860

2006-09-22 12:39:57  LOCATION:  BackendRun, postmaster.c:2699
2006-09-22 12:39:57 localhost(1860) LOG:  XX000: could not receive data 
from client: An operation was attempted on something that is not a socket.

2006-09-22 12:39:57 localhost(1860) LOCATION:  pq_recvbuf, pqcomm.c:712
2006-09-22 12:39:57 localhost(1860) LOG:  08P01: incomplete startup packet
2006-09-22 12:39:57 localhost(1860) LOCATION:  ProcessStartupPacket, 
postmaster.c:1351

2006-09-22 12:39:57 localhost(1860) DEBUG:  0: proc_exit(0)
2006-09-22 12:39:57 localhost(1860) LOCATION:  proc_exit, ipc.c:95
2006-09-22 12:39:57 localhost(1860) DEBUG:  0: shmem_exit(0)
2006-09-22 12:39:57 localhost(1860) LOCATION:  shmem_exit, ipc.c:126
2006-09-22 12:39:57 localhost(1860) DEBUG:  0: exit(0)
2006-09-22 12:39:57 localhost(1860) LOCATION:  proc_exit, ipc.c:113
2006-09-22 12:39:58  DEBUG:  0: reaping dead processes
2006-09-22 12:39:58  LOCATION:  reaper, postmaster.c:2021
2006-09-22 12:39:58  DEBUG:  0: waiting on 4 children
2006-09-22 12:39:58  LOCATION:  win32_waitpid, postmaster.c:4046
2006-09-22 12:39:58  DEBUG:  0: server process (PID 3476) exited 
with exit code 0

2006-09-22 12:39:58  LOCATION:  LogChildExit, postmaster.c:2419
2006-09-22 12:39:58  DEBUG:  0: waiting on 3 children
2006-09-22 12:39:58  LOCATION:  win32_waitpid, postmaster.c:4046

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Tom Lane
Victor Snezhko <[EMAIL PROTECTED]> writes:
> Hmm, there was something wrong in my utf-8 cluster setup, I
> re-attached it and now see that isspace indeed fails:
> LOG:  plpgsql_convert_ident: ident=È, isspace array=ft, isspace(0)=f

Hm.  Given that we've not seen this reported elsewhere, I wonder if we
shouldn't conclude that it's a FreeBSD bug.

We could fix the problem by not using isspace() but some homegrown
equivalent ... but that seems a tad ugly, especially if it's to work
around a problem on just one locale on one platform.  Can anyone else
check the result of isspace(0x85) ?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Victor Snezhko
Tom Lane <[EMAIL PROTECTED]> writes:

>> Hmm, there was something wrong in my utf-8 cluster setup, I
>> re-attached it and now see that isspace indeed fails:
>> LOG:  plpgsql_convert_ident: ident=х, isspace array=ft, isspace(0)=f
>
> Hm.  Given that we've not seen this reported elsewhere, I wonder if we
> shouldn't conclude that it's a FreeBSD bug.

OK, for 0x85 maybe, but see below.

> We could fix the problem by not using isspace() but some homegrown
> equivalent ... but that seems a tad ugly, 

I wonder why simple create table (outside of plpgsql, with the table
name equal to the character D1 85) works? Are multibyte strings
handled differently in query parser compared to plpgsql?

> especially if it's to work around a problem on just one locale on
> one platform.  Can anyone else check the result of isspace(0x85) ?

Not sure about one platform, we need to check this. And here are some
news: MacOSX's isspace also returns 1 on 0x85. So, 1.5 platforms.

The test I have run is:

#include 
#include 
#include 

int main()
{
  setlocale(LC_ALL, "");
  printf("%d %d\n", isspace(0x85), isspace(0xA0));
  printf("%d %d\n", iswspace(0x85), iswspace(0xA0));
  return 0;
}

0xA0 is added because is's true space, see below.

On FreeBSD & Mac this yields the following output:
1 1
1 1

On recent ALTLinux (Compact 3.0), which uses ru_RU.UTF-8 locale by
default:
0 0 
0 0

Here is an excerpt from the unicode character database
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):

0085;;Cc;0;B;N;NEXT LINE (NEL)
00A0;NO-BREAK SPACE;Zs;0;CS; 0020N;NON-BREAKING SPACE

0x85 is the alternative next-line, and 0xA0 is the real space (note
the mark Zs - http://www.unicode.org/Public/UNIDATA/UCD.html 
describes symbols with such marks as "Separator, Space")

So, that linux doesn't return true for 0x85 and 0xA0 characters - but
not because it's is* functions return false for any character above
0x80, but because it's LC_CTYPE doesn't know anything about these
particular characters! (iswspace calls also return false). So linux
has working isspace, but broken iswspace for these characters...

So, could you run one more test on that Fedora to see if at least
isspace() and isalpha() meet out expectations?

#include 
#include 
#include 

int main()
{
  int i;
  setlocale(LC_ALL, "");
  for (i=0x80; i <= 0xFF; ++i) {
if (isspace(i) != 0) printf("character 0x%x is a space\n", i);
if (isalpha(i) != 0) printf("character 0x%x is alphabetical\n", i);
if (isdigit(i) != 0) printf("character 0x%x is a digit\n", i);
  }
  return 0;
}

My FreeBSD lists a whole heck of characters:

character 0x85 is a space
character 0xa0 is a space
character 0xaa is alphabetical
character 0xb5 is alphabetical
character 0xba is alphabetical
character 0xc0 is alphabetical
... 0xc1-0xfe is alphabetical
character 0xff is alphabetical

-- 
WBR, Victor V. Snezhko
E-mail: [EMAIL PROTECTED]



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Douglas Toltzman
Mac OS X is based on FreeBSD.  I'm not sure if this is what you want, because I'm not in a Turkish locale, but ...#include #include int main(){        printf("isspace(0x85) returned %d\n", isspace(0x85));}produces:isspace(0x85) returned 0... on my Mac OS X 10.4.7 PPC box.On Sep 22, 2006, at 12:48 PM, Tom Lane wrote:Victor Snezhko <[EMAIL PROTECTED]> writes: Hmm, there was something wrong in my utf-8 cluster setup, Ire-attached it and now see that isspace indeed fails:LOG:  plpgsql_convert_ident: ident=È, isspace array=ft, isspace(0)=f Hm.  Given that we've not seen this reported elsewhere, I wonder if weshouldn't conclude that it's a FreeBSD bug.We could fix the problem by not using isspace() but some homegrownequivalent ... but that seems a tad ugly, especially if it's to workaround a problem on just one locale on one platform.  Can anyone elsecheck the result of isspace(0x85) ?			regards, tom lane---(end of broadcast)---TIP 5: don't forget to increase your free space map settings  Douglas Toltzman[EMAIL PROTECTED](910) 526-5938 

Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Tom Lane
Victor Snezhko <[EMAIL PROTECTED]> writes:
> My FreeBSD lists a whole heck of characters:

> character 0x85 is a space
> character 0xa0 is a space
> character 0xaa is alphabetical
> character 0xb5 is alphabetical
> character 0xba is alphabetical
> character 0xc0 is alphabetical
> ... 0xc1-0xfe is alphabetical
> character 0xff is alphabetical

Hm.  I'm still thinking that this behavior is wrong for UTF8 encoding,
but it would be reasonable in LATINn and related encodings, so we
probably ought to do something about it.

After further thought, it's not so much that we can't tolerate
locale-dependent behavior of isspace() in general, as that in this
particular case we are expecting it to match the scanner's idea
of a space: scan.l has

space   [ \t\n\r\f]

which obviously is not locale-aware.  I think we need convert_ident to
use a plpgsql_isspace() that accepts these and only these as spaces.
Any high-bit-set byte is part of an identifier according to scan.l's
rules, and convert_ident must have the same behavior regardless of locale.

There may be related risks in and around the other flex scanners
... will look.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

2006-09-22 Thread Tom Lane
I wrote:
> ... I think we need convert_ident to
> use a plpgsql_isspace() that accepts these and only these as spaces.
> Any high-bit-set byte is part of an identifier according to scan.l's
> rules, and convert_ident must have the same behavior regardless of locale.

> There may be related risks in and around the other flex scanners
> ... will look.

I've committed a fix along these lines.  ecpg had the identical bug, but
I couldn't find any other places where we seemed to be assuming that
 macros would match the behavior of our flex scanners.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster