Thanks everyone for your responses! I will try them out.
-John On 6/20/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
John, I also wrote about using NekoHTML, I think. I prefer that to JTidy. That also tells you what Simpy.com uses. Otis ----- Original Message ---- From: John Wang <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, June 21, 2006 1:39:41 AM Subject: HTML text extraction Can someone please suggest a HTML text extraction library? In the Lucene book, it recommends Tidy. Seems jtidy is not really being maintained. Otis, what do you guys use at Simpy? Thanks -john --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]