Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

shrinath.m Mon, 14 Mar 2011 21:56:34 -0700

Earl Hood wrote:
> 
> Looks like Jericho does what you want already:
> http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/TextExtractor.html
> 
> --ewh
>



I went through their feature list and found that out :) 
http://jericho.htmlparser.net/docs/index.html


Thanks Earl :)
This is cool :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2680665.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

Reply via email to