Thanks everyone for your responses!
I will try them out.

-John

On 6/20/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

John,

I also wrote about using NekoHTML, I think.  I prefer that to JTidy.  That
also tells you what Simpy.com uses.

Otis

----- Original Message ----
From: John Wang <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, June 21, 2006 1:39:41 AM
Subject: HTML text extraction

Can someone please suggest a HTML text extraction library? In the Lucene
book, it recommends Tidy. Seems jtidy is not really being maintained.

Otis, what do you guys use at Simpy?

Thanks

-john




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to