[ 
https://issues.apache.org/jira/browse/TIKA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1713:
------------------------------
    Comment: was deleted

(was: Y. Figured as much. 

Got it.

Oh, Symantec EV...That helps.  I might be able to find one of those. 

Thank you, again, for raising this issue and submitting a mock test rtf file.

The fix is non-trivial so it may take a few weeks, but it will be good to add 
the ability to handle this type of file.  Thank you!)

> RTF parser misses text content 
> -------------------------------
>
>                 Key: TIKA-1713
>                 URL: https://issues.apache.org/jira/browse/TIKA-1713
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Mike Cantrell
>            Assignee: Tim Allison
>         Attachments: no-text.rtf
>
>
> We have a lot of Outlook msg files that have RTF body content. Tika is not 
> finding any text within these messages. It appears to be a mixture of RTF and 
> HTML.
> I've extracted an example RTF body (see attachment) for use with the 
> following test case:
> {code}
> ByteArrayOutputStream bytes = new ByteArrayOutputStream()
> rtfParser.parse(
>         this.class.getResourceAsStream("/problems/no-text.rtf"),
>         new EmbeddedContentHandler(new BodyContentHandler(bytes)),
>         new Metadata(), new ParseContext()
> );
> assertTrue("Document is missing required text", bytes.toByteArray().length > 
> 0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to