Thanks Nick! The EventBasedExcelExtractor looks very good indeed. A quick question on it, if I may. If I use it to extract from an Excel doc with setFormulasNotResults(true) I get:
Exception in thread "main" java.lang.RuntimeException: Coding Error: Expected ExpPtg to be converted from Shared to Non-Shared Formula by ValueRecordsAggregate, but it wasn't which I see is listed as a bug here: https://issues.apache.org/bugzilla/show_bug.cgi?id=52158 Now this spreadsheet was created by a customer with Excel, so we have no control over how it was created, nor can I provide it to you. However, if I setFormulasNotResults(false) it runs OK and I see numbers being displayed, not Strings like "D1+C3". Is this what you'd expect? Thanks, - Chris On 1 Nov 2012, at 14:33, Nick Burch wrote: On Thu, 1 Nov 2012, Chris Bamford wrote: I would like to know how to fine tune the process The simplest way is just to write some code yourself, which loops over the sheets / rows / columns, and performs the exact cell -> text transformation that your business rules need. If you have some memory to burn, and can use usermodel, then the code is pretty easy to write. Take a look at ExcelExtractor for an example http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/ExcelExtractor.java If you want to do it in a very low memory way, the coding can be a little bit more involved, take a look at things like http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java and http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java Finally, if you think that one of the text extractors that ships with Apache POI (or Apache Tika!) is doing something wrong for some cells, raise a bug and upload a sample file that shows the problem Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] Chris Bamford Senior Developer 2 - 8 Balfe Street Kings Cross, London, N1 9EG mobile +44 7860 405292 tel: +44 (0) 207 843 2300 web www.mimecast.com The information contained in this communication from [email protected] is confidential and may be legally privileged. It is intended solely for use by [email protected] and others authorized to receive it. If you are not [email protected] you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful. Mimecast Ltd. is a company registered in England and Wales with the company number 4698693 VAT No. GB 123 4197 34 Registered Office:2 - 8 Balfe Street, Kings Cross London, N1 9EG Email Address: [email protected] This email message has been scanned for viruses by Mimecast. Mimecast delivers a complete managed email solution from a single web based platform. For more information please visit http://www.mimecast.com
