Hi, POI - http://poi.apache.org/ or Tika (it uses POI) - http://lucene.apache.org/tika
And you can use code from Lucene in Action to index the text with Lucene - http://manning.com/hatcher2 . The code is free to download. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: "Zhang, Lisheng" <lisheng.zh...@broadvision.com> > To: java-user@lucene.apache.org > Sent: Sunday, February 22, 2009 2:27:06 PM > Subject: Text extraction tool for Microsoft Office 2007 > > Hi, > > What is the best tool (free software) to extract text from > Microsoft Office 2007: > > Word 2007, Excel 2007, Power Point 2007 > > so that we can index them by lucene? > > Thanks very much for helps, Lisheng > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org