https://issues.apache.org/bugzilla/show_bug.cgi?id=46568
Summary: PPTX text extraction works incorrectly, spaces line
carriages removed in some cases
Product: POI
Version: 3.5-dev
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: critical
Priority: P2
Component: POI Overall
AssignedTo: [email protected]
ReportedBy: [email protected]
The PPTX issue manifests itself when a document is being decomposed and
searched for a string. For some reason, some whitespace and line carriages are
being deleted.
If you try to match a Friday that is concatenated with another string (such as
"otherFriday"), it will fail. Note that a regular expression match will work,
however. This
behavior has been observed in 3 of 8 randomly selected pptx downloaded from the
internet. However, document identification seems to work just fine, so the
only way that some one using the new POI engine would be affected is if they
were decomposing attachments and searching for a simple string in them (and
they would only be affected on PowerPoint 2007 documents). As noted above,
regular expression matching is a workaround that could be employed.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]