I have tried both HtmlParser v1.5 and NekoHTML. About the former my implementation doesn't work as i.e. it get text from javascripts; I have followed the hint from http://htmlparser.sourceforge.net/javadoc/org/htmlparser/visitors/TextExtractingVisitor.html
The following is my NOT working implementation relying upon HtmlParser v1.5: import org.htmlparser.visitors.TextExtractingVisitor; import org.htmlparser.*; import org.htmlparser.util.*; public class HtmlFilter { public static String getText(String html) { Parser parser = Parser.createParser(html, "UTF-8"); TextExtractingVisitor visitor = new TextExtractingVisitor(); try { parser.visitAllNodesWith(visitor); } catch (ParserException e) { e.printStackTrace(); } String textInPage = visitor.getExtractedText(); return textInPage; } } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]