On Thu, 1 Nov 2012, Chris Bamford wrote:
I would like to know how to fine tune the process

The simplest way is just to write some code yourself, which loops over the sheets / rows / columns, and performs the exact cell -> text transformation that your business rules need.

If you have some memory to burn, and can use usermodel, then the code is pretty easy to write. Take a look at ExcelExtractor for an example
http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/ExcelExtractor.java

If you want to do it in a very low memory way, the coding can be a little bit more involved, take a look at things like
http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java
and
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java

Finally, if you think that one of the text extractors that ships with Apache POI (or Apache Tika!) is doing something wrong for some cells, raise a bug and upload a sample file that shows the problem

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to