WordExtractor works. Thanks Nick. On Thu, Feb 2, 2012 at 10:50 PM, Nick Burch <[email protected]> wrote:
> On Thu, 2 Feb 2012, Lawrence Tsang wrote: > >> As a newbie of Apache POI, I use the "org.apache.poi.hwpf.**Word2Forrest" >> class to extract text in a MS Word 2003 document. >> > > I wouldn't recommend using that class for text extraction, unless you > really need it to come out in the Forrest format > > Instead, you should use one of: > * org.apache.poi.hwpf.extractor.**WordExtractor > * org.apache.poi.hwpf.converter.**WordToTextConverter (or HTML or Fo) > * Apache Tika > > Depending on if you want plain text, clean html, HTML with full document > stylings etc > > Nick > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > [email protected].**org<[email protected]> > For additional commands, e-mail: [email protected] > >
