I started trying out all your suggestions one by one, thanks to all who
helped.

I used Jericho and found it extremely simple to start with ...

Just wanted to clarify one thing though.
Is there some tool that does extract text from HTML without creating the DOM
?


-- 
Regards
Shrinath.M


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2680634.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Reply via email to