hi, i wrote my own html parser to do html2text and it works well. i can send you my code if it matches your require.
-----Original Message----- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 21, 2006 1:40 PM To: java-user@lucene.apache.org Subject: HTML text extraction Can someone please suggest a HTML text extraction library? In the Lucene book, it recommends Tidy. Seems jtidy is not really being maintained. Otis, what do you guys use at Simpy? Thanks -john --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]