On Mon, Mar 14, 2011 at 11:46 PM, shrinath.m <shrinat...@webyog.com> wrote: > I used Jericho and found it extremely simple to start with ... > > Just wanted to clarify one thing though. > Is there some tool that does extract text from HTML without creating the DOM
Looks like Jericho does what you want already: http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/TextExtractor.html --ewh --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org