postgres=# select to_tsvector('test text');
  to_tsvector
---------------
 'test text':1
(1 row)
Ok. that's related to http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit assumes any character with C locale and multibyte encoding and > 0x7f is alpha.
To check theory, pls, apply attached patch.

If so, I'm confused, we can not assume that 0xa0 is a space symbol in any multibyte encoding, even in Windows.



--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2/wordparser/parser.c.orig Wed Mar 21 20:41:23 2007
--- ./contrib/tsearch2/wordparser/parser.c      Wed Mar 21 21:10:39 2007
***************
*** 124,130 ****
--- 124,134 ----
                         * with C-locale is an alpha character
                         */
                        if ( c > 0x7f )
+                       {
+                               if ( c == 0xa0 )
+                                       return 0;
                                return 1;
+                       }
  
                        return isalnum(0xff & c);
                }
***************
*** 157,163 ****
--- 161,171 ----
                         * with C-locale is an alpha character
                         */
                        if ( c > 0x7f )
+                       {
+                               if ( c == 0xa0 )
+                                       return 0;
                                return 1;
+                       }
  
                        return isalpha(0xff & c);
                }
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to