Initially I was using standardAnalyzer but I switched to simpleAnalyzer which I guess doesnot do more that tokenizing[and may be tokenizing] and I think this does not do stemming which I dont/cant do because I've no stemmer for the languages I'm indexing. For indexing and querring I'm using the same SimpelAnalyzer. So as you say I can go for the standard highlighter api which I mentioned in my last mail, and this will handle any language for highlighting support. I should start using this one, right?
One more thing. I've a single indexer and searcher that I'm usign for indexing pages of many different non-english languages and as I mentioned earier I'm using simpleAnalyzer, does that mean If I index english pages with the same indexer, it will not take care of stemming and stop word removal? But I dont want to have multiple indexer that is specific to languages. Cant we have a single indexer that handles non-eng and eng in equally good ways? Or any other ideas on the same ? Thanks, KK. On Thu, May 21, 2009 at 6:18 PM, Joel Halbert <j...@su3analytics.com> wrote: > The highlighter should be language independent. So long as you are > consistent with your use of Analyzer between > indexing/query/highlighting. > > As for the most appropriate Analyzer to use for your local language, > this is a seperate question - especially if you are using stop word and > stemming filters. > > The StandardAnalyzer is designed for English since it used the > StopFilter (English words only). > > > -----Original Message----- > From: KK <dioxide.softw...@gmail.com> > Reply-To: java-user@lucene.apache.org > To: java-user@lucene.apache.org > Subject: hit highlighting in lucene ? > Date: Thu, 21 May 2009 17:51:13 +0530 > > Hi All, > I was looking for various ways of implementing hit highlighting in Lucene > and found some standard classes that does support highlighting like this > *lucene*.apache.org/java/2_2_0/api/org/apache/*lucene*/search/*highlight* > /package-summary.html<http://apache.org/java/2_2_0/api/org/apache/*lucene*/search/*highlight*%0A/package-summary.html> > > ik but what i believe is that this is only for english or does it support > other languages. I actually wanted to support highlighting for some > non-english languages which I'm able to index and fetch using utf-8 > encoding. So this means that if I want to have highlighting then I've to > get the utf-8 query and look for the same in the result and add apt tags > whereever required, it essentially boils down to implementing the standard > highlighter. I think the standard highlighter also supports other > languages. > Correct me if i'm wrong. > > Due to my requirement constraints I'm using just simpleAnalyzer and we dont > have tokenizers for these regional languages. Any other ideas of doing the > same would be helpful as well. > > Thanks, > KK. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >