I started trying out all your suggestions one by one, thanks to all who helped.
I used Jericho and found it extremely simple to start with ... Just wanted to clarify one thing though. Is there some tool that does extract text from HTML without creating the DOM ? -- Regards Shrinath.M -- View this message in context: http://lucene.472066.n3.nabble.com/Which-is-the-best-fast-HTML-parser-tokenizer-that-I-can-use-with-Lucene-for-indexing-HTML-content-to-tp2664316p2680634.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.