> Alvaro Herrera wrote: > > Tom Lane wrote: > > > >> ISTM that perhaps a more generally useful definition would be > >> > >> lword Only ASCII letters > >> nlword Entirely letters per iswalpha(), but not lword > >> word Entirely alphanumeric per iswalnum(), but not nlword > >> (hence, includes at least one digit) > > ... > > I am not sure if there are any western european languages were words can > > only be formed with non-ascii chars. > > There is at least in Swedish: "ö" (island) and å (river). They're both a > bit special because they're just one letter each. > > > lword Entirely letters per iswalpha, with at least one ASCII > > nlword Entirely letters per iswalpha > > word Entirely alphanumeric per iswalnum, but not nlword > > I don't like this categorization much more than the original. The > distinction between lword and nlword is useless for most European > languages. > > I suppose that Tom's argument that it's useful to distinguish words made > of purely ASCII characters in computer-oriented stuff is valid, though I > can't immediately think of a use case. For things like parsing a > programming language, that's not really enough, so you'd probably end up > writing your own parser anyway. I'm also not clear what the use case for > the distinction between words with digits or not is. I don't think > there's any natural languages where a word can contain digits, so it > must be a computer-oriented thing as well. > > I like the "aword" name more than "lword", BTW. If we change the meaning > of the classes, surely we can change the name as well, right? > > Note that the default parser is useless for languages like Japanese, > where words are not separated by whitespace, anyway.
Above is true but that does not neccessary mean that Tsearch is not used for Japanese at all. I overcome the problem above by doing a pre-process step which separate Japanese sentences to words devided by white space. I wish I could write a new parser which could do the job for 8.4 or later... Please change the word definition very carefully. -- Tatsuo Ishii SRA OSS, Inc. Japan ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster