On Mon, 12 May 2008, Lukas Vlcek wrote:
I need to find a reliable way how to extract content out of Word, Excel and PowerPoint formats prior to indexing and I am not sure if POI is the best way to go. Can anybody share experience with POI and/or other [commercial] Java library for text extraction from MS formats?
We use poi for text extraction, and it works just fine for us. POI 3.1 should offer a few improvements on text extraction, and POI 3.5 will give you OOXML text extraction too.
You might also like to take a look at Apache Tika <http://incubator.apache.org/tika/>. It wraps up POI (and a few other document extractor libraries), giving you a simple, common interface for text extraction
Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]