[
https://issues.apache.org/jira/browse/TIKA-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arjohn Kampman updated TIKA-781:
--------------------------------
Description: The RTF parser should ignore control words like \par, \line
and \tab when these occur in groups where ignore==true. This greatly improves
the layout and readability of extracted text. The testRTFHyperlink.rtf file
from the test documents is a good example of this. (was: RTFTextExtractor
should ignore control words like \par, \line and \tab when these occur in
groups where ignore==true. This greatly improves the layout and readability of
extracted text. The testRTFHyperlink.rtf file from the test documents is a good
example of this.)
> RTF parser should ignore most control words in ignore groups
> ------------------------------------------------------------
>
> Key: TIKA-781
> URL: https://issues.apache.org/jira/browse/TIKA-781
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Reporter: Arjohn Kampman
> Priority: Minor
> Attachments: tika781.patch
>
>
> The RTF parser should ignore control words like \par, \line and \tab when
> these occur in groups where ignore==true. This greatly improves the layout
> and readability of extracted text. The testRTFHyperlink.rtf file from the
> test documents is a good example of this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira