Earl Hood wrote: > > Looks like Jericho does what you want already: > http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/TextExtractor.html > > --ewh >
I went through their feature list and found that out :) http://jericho.htmlparser.net/docs/index.html Thanks Earl :) This is cool :) -- View this message in context: http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2680665.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org