Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

shrinath.m Mon, 14 Mar 2011 21:47:27 -0700

I started trying out all your suggestions one by one, thanks to all who
helped.


I used Jericho and found it extremely simple to start with ...

Just wanted to clarify one thing though.
Is there some tool that does extract text from HTML without creating the DOM
?


-- 
Regards
Shrinath.M


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2680634.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

Reply via email to