On Thu, 2 Feb 2012, Lawrence Tsang wrote:
As a newbie of Apache POI, I use the "org.apache.poi.hwpf.Word2Forrest"
class to extract text in a MS Word 2003 document.
I wouldn't recommend using that class for text extraction, unless you
really need it to come out in the Forrest format
Instead, you should use one of:
* org.apache.poi.hwpf.extractor.WordExtractor
* org.apache.poi.hwpf.converter.WordToTextConverter (or HTML or Fo)
* Apache Tika
Depending on if you want plain text, clean html, HTML with full document
stylings etc
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]