On Mon, Mar 14, 2011 at 11:46 PM, shrinath.m <shrinat...@webyog.com> wrote:
> I used Jericho and found it extremely simple to start with ...
>
> Just wanted to clarify one thing though.
> Is there some tool that does extract text from HTML without creating the DOM

Looks like Jericho does what you want already:
http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/TextExtractor.html

--ewh

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to