Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread Robert Muir
I would say in general, yes. when i say 'change arabic text', I mean the arabic analyzer will standardize and stem arabic words. but it won't modify any of your english words. and no, there is no case in arabic. this is why if you are handling mixed arabic/english text I recommend creating a cust

Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread weidong sun
i.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: weidong sun [mailto:lmcw...@gmail.com] > > Sent: Thursday, May 14, 2009 5:19 PM > > To: java-user@lucene.apache.org > > Subject: Re: Question wrt Lucene analyzer for different language > > &

RE: Question wrt Lucene analyzer for different language

2009-05-14 Thread Uwe Schindler
> Thanks for the quick answer. :-) > > So can I say, for ArabicAnalyzer, generally it can tokenize the mixed > content with Arabic and English? :-) > > I am not really familiar with Arabic language. What do you mean for > "change > Arabic tokens"? Does Arabic has something like upper/lower case

RE: Question wrt Lucene analyzer for different language

2009-05-14 Thread Uwe Schindler
o: java-user@lucene.apache.org > Subject: Re: Question wrt Lucene analyzer for different language > > Thanks for the suprising quick response. :-) > > What I mean "correctly" here is that the specific analyzer can tokenize a > text mixed with English and that sepcfic langauge, fo

Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread weidong sun
Thanks for the quick answer. :-) So can I say, for ArabicAnalyzer, generally it can tokenize the mixed content with Arabic and English? :-) I am not really familiar with Arabic language. What do you mean for "change Arabic tokens"? Does Arabic has something like upper/lower case as English does?

Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread weidong sun
Thanks for the suprising quick response. :-) What I mean "correctly" here is that the specific analyzer can tokenize a text mixed with English and that sepcfic langauge, for example, "12345 " or "Text???" (where '?' is a character of that specific language and "12345" and "Text" is english

Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread Robert Muir
in the case of ArabicAnalyzer it will only change Arabic tokens, and will leave english words as-is (it will not convert them to lowercase or anything like that) so if you want to have good Arabic and English behavior you would want to create a custom analyzer that looks like Arabic analyzer but a

Re: Question wrt Lucene analyzer for different language

2009-05-14 Thread Erick Erickson
No. What is "correctly"? Are you stemming? in which case using thesame analyzer on different languages will not work. This topic have been discussed on the user list frequently, so if you searched that archive (see: http://wiki.apache.org/lucene-java/MailingListArchives) you'd find a wealth of inf

Question wrt Lucene analyzer for different language

2009-05-14 Thread weidong sun
Hello, I am a newbie in Lucene world. I might ask some obvious question which unfortunately I don't know the answer. Please help me 'grow'. We have a project intend to use Lucene search engine for search some user's info stored our system. The user info might not be in English even it will be sto