Hi Ahmet, I am using Lucene.NET with C# so I can't test this quickly. Will HTMLStripCharFilter maintain the character offsets or does it just extract the plain text?
Hans > You can use org.apache.solr.analysis.HTMLStripCharFilter. It is possible to > add one or more org.apache.lucene.analysis.CharFilter(s) before tokenizer in > your analyzer. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >