On Thu, 1 Nov 2012, Chris Bamford wrote:
I would like to know how to fine tune the process
The simplest way is just to write some code yourself, which loops over the
sheets / rows / columns, and performs the exact cell -> text
transformation that your business rules need.
If you have some memory to burn, and can use usermodel, then the code is
pretty easy to write. Take a look at ExcelExtractor for an example
http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/ExcelExtractor.java
If you want to do it in a very low memory way, the coding can be a little
bit more involved, take a look at things like
http://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java
and
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java
Finally, if you think that one of the text extractors that ships with
Apache POI (or Apache Tika!) is doing something wrong for some cells,
raise a bug and upload a sample file that shows the problem
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]