Thanks for the quick answer. :-) So can I say, for ArabicAnalyzer, generally it can tokenize the mixed content with Arabic and English? :-)
I am not really familiar with Arabic language. What do you mean for "change Arabic tokens"? Does Arabic has something like upper/lower case as English does? On Thu, May 14, 2009 at 10:47 AM, Robert Muir <rcm...@gmail.com> wrote: > in the case of ArabicAnalyzer it will only change Arabic tokens, and will > leave english words as-is (it will not convert them to lowercase or > anything > like that) > > so if you want to have good Arabic and English behavior you would want to > create a custom analyzer that looks like Arabic analyzer but also invokes > lowercasefilter, perhaps also some english stemmer, etc etc. > > On Thu, May 14, 2009 at 10:11 AM, weidong sun <lmcw...@gmail.com> wrote: > > > Hello, > > > > I am a newbie in Lucene world. I might ask some obvious question which > > unfortunately I don't know the answer. Please help me 'grow'. > > > > We have a project intend to use Lucene search engine for search some > user's > > info stored our system. The user info might not be in English even it > will > > be stored in UTF-8 encoding. > > > > My question is, if I use one particular Lucene analyzer for a language > > other > > than English (e.g. ChineseAnalyzer or ArabicAnalyzer), can it still able > to > > handle it correctly if user info is mixed with English character/word? > > > > Really appreciated with any answers. > > > > :-) > > > > > > -- > Robert Muir > rcm...@gmail.com >