Can someone please suggest a HTML text extraction library? In the Lucene book, it recommends Tidy. Seems jtidy is not really being maintained.
Otis, what do you guys use at Simpy? Thanks -john
Can someone please suggest a HTML text extraction library? In the Lucene book, it recommends Tidy. Seems jtidy is not really being maintained.
Otis, what do you guys use at Simpy? Thanks -john