Thanks Nick!

The EventBasedExcelExtractor looks very good indeed.
A quick question on it, if I may.  If I use it to extract from an Excel doc 
with setFormulasNotResults(true) I get:

Exception in thread "main" java.lang.RuntimeException: Coding Error: Expected 
ExpPtg to be converted from Shared to Non-Shared Formula by 
ValueRecordsAggregate, but it wasn't

which I see is listed as a bug here:

  https://issues.apache.org/bugzilla/show_bug.cgi?id=52158

Now this spreadsheet was created by a customer with Excel, so we have no 
control over how it was created, nor can I provide it to you.

However, if I setFormulasNotResults(false) it runs OK and I see numbers being 
displayed, not Strings like "D1+C3".  Is this what you'd expect?

Thanks,

- Chris


On 1 Nov 2012, at 14:33, Nick Burch wrote:

On Thu, 1 Nov 2012, Chris Bamford wrote:
I would like to know how to fine tune the process

The simplest way is just to write some code yourself, which loops over the 
sheets / rows / columns, and performs the exact cell -> text transformation 
that your business rules need.

If you have some memory to burn, and can use usermodel, then the code is pretty 
easy to write. Take a look at ExcelExtractor for an example
http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/ExcelExtractor.java

If you want to do it in a very low memory way, the coding can be a little bit 
more involved, take a look at things like
http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java
and
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java

Finally, if you think that one of the text extractors that ships with Apache 
POI (or Apache Tika!) is doing something wrong for some cells, raise a bug and 
upload a sample file that shows the problem

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Chris Bamford
Senior Developer

2 - 8 Balfe Street
Kings Cross,
London, N1 9EG

mobile +44 7860 405292
tel: +44 (0) 207 843 2300
web www.mimecast.com


The information contained in this communication from [email protected] is 
confidential and may be legally privileged. It is intended solely for use by 
[email protected] and others authorized to receive it. If you are not 
[email protected] you are hereby notified that any disclosure, copying, 
distribution or taking action in reliance of the contents of this information 
is strictly prohibited and may be unlawful.


Mimecast Ltd. is a company registered in England and Wales with the company 
number 4698693 VAT No. GB 123 4197 34
Registered Office:2 - 8 Balfe Street, Kings Cross London, N1 9EG Email Address: 
[email protected]

This email message has been scanned for viruses by Mimecast.
Mimecast delivers a complete managed email solution from a single web based 
platform.
For more information please visit http://www.mimecast.com

Reply via email to