Re: Limitations of iWork parsing

2012-04-25 Thread Nick Burch
On Tue, 24 Apr 2012, Gabriel Valencia wrote: We have been using Tika to parse iWork files, but have found many limitations and potentially bugs. At the moment, there aren't any suitable iWorks file format libraries in Java for us to use. So, what we do have has come from community contributio

Re: Limitations of iWork parsing

2012-04-24 Thread Mattmann, Chris A (388J)
Hi Gabriel, Thanks for bringing these issues to light. We would appreciate you filing issues in our JIRA issue tracker: http://issues.apache.org/jira/browse/TIKA And if you are able to attach sample files that we can use to reproduce what you are seeing that would be great. The Tika version yo

Limitations of iWork parsing

2012-04-24 Thread Gabriel Valencia
Hi all We have been using Tika to parse iWork files, but have found many limitations and potentially bugs. Here is a sampling: * Things like header and footer text and embedded text boxes are not parsed. * Pages docs created in Layout mode are not parsed at all. Only the metadata is extracted. *