Re: Parsing "hyperlinks", "equations" and "graphs" in MS Word 2003 document

Lawrence Tsang Thu, 02 Feb 2012 17:37:30 -0800

WordExtractor works. Thanks Nick.

On Thu, Feb 2, 2012 at 10:50 PM, Nick Burch <[email protected]> wrote:


> On Thu, 2 Feb 2012, Lawrence Tsang wrote:
>
>> As a newbie of Apache POI, I use the "org.apache.poi.hwpf.**Word2Forrest"
>> class to extract text in a MS Word 2003 document.
>>
>
> I wouldn't recommend using that class for text extraction, unless you
> really need it to come out in the Forrest format
>
> Instead, you should use one of:
>  * org.apache.poi.hwpf.extractor.**WordExtractor
>  * org.apache.poi.hwpf.converter.**WordToTextConverter (or HTML or Fo)
>  * Apache Tika
>
> Depending on if you want plain text, clean html, HTML with full document
> stylings etc
>
> Nick
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> [email protected].**org<[email protected]>
> For additional commands, e-mail: [email protected]
>
>

Re: Parsing "hyperlinks", "equations" and "graphs" in MS Word 2003 document

Reply via email to