On Tue, 24 Apr 2012, Gabriel Valencia wrote:
We have been using Tika to parse iWork files, but have found many
limitations and potentially bugs.
At the moment, there aren't any suitable iWorks file format libraries in
Java for us to use. So, what we do have has come from community
contributio
Hi Gabriel,
Thanks for bringing these issues to light.
We would appreciate you filing issues in our JIRA issue tracker:
http://issues.apache.org/jira/browse/TIKA
And if you are able to attach sample files that we can use to reproduce
what you are seeing that would be great. The Tika version yo
Hi all
We have been using Tika to parse iWork files, but have found many
limitations and potentially bugs. Here is a sampling:
* Things like header and footer text and embedded text boxes are not
parsed.
* Pages docs created in Layout mode are not parsed at all. Only the
metadata is extracted.
*