On Fri, Mar 11, 2011 at 6:27 PM, Erick Erickson [via Lucene] < ml-node+2664607-1236630615-376...@n3.nabble.com> wrote:
> Solr doesn't do it. There exist various tokenizers/filters that just strip > the HTML tags, but there's nothing built into Solr that I know of that > understands HTML, HTML-aware operations are outside Solr's purview. > > This is how Solr achieve it : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripStandardTokenizerFactory -- Regards Shrinath.M -- View this message in context: http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2664717.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.