There are too many applications that read Excel format files to know
how they all behave. (Answer to the side question - on your PR, I
suggested allowing SXSSF as an option and not forcing all users to use
SXSSF)

I don't want to get too involved in this. Trial and error is the only
way forward with POI related development.

POI is released infrequently so any PRs may take quite some time to be
in a released jar.

If you submit PRs to POI or to excel-streaming-reader, I'll have a
look. Other POI contributors may also have a look. I'm busy on other
projects, so am not super enthusiastic about spending much time on
this. PRs will need test cases and not to break existing APIs (no
matter how annoying they are - they can be deprecated but not broken
or removed).

On Wed, 2 Jul 2025 at 15:52, Piotr Zalas <pza...@onet.pl.invalid> wrote:
>
> Hello Devs,
>
> I'm implementing a change in Apache NiFi that optimises memory usage of 
> copying of Excel sheet. We use com.github.pjfanning/excel-streaming-reader 
> for reading Excel files, and Apache POI for writing output file. In the PR 
> (https://github.com/apache/nifi/pull/10058/files) I got suggestion to include 
> some of the code in POI project:
> 1. To add SXSSFRow#copyRowFrom(Row srcRow, CellCopyPolicy policy, 
> CellCopyContext context) method, similar to method available in XSSFRow. In 
> addition, a classes similar to XSSFRowShifter and XSSFRowColShifter would 
> need to be implemented for SXSSFSheet, which are used by the above method. A 
> non-trivial part would be to implement XSSFRowColShifter#updateRowFormulas, 
> because it uses CTCell which isn't available in SXSSFCell. I would be 
> grateful for some implementation tips regarding this method, how to 
> substitute one object with another in the implementation.
> 2. To add some memory efficient method similar to XSSFSheet#copyRows(List<? 
> extends Row> srcRows, int destStartRow, CellCopyPolicy policy) to SXSSFSheet 
> class. Instead of using list of input rows, I'm thinking of using Sheet or 
> row iterator to avoid storing all rows in memory. The tricky part here is 
> that I need here to use StreamingSheet from excel-streaming-reader for memory 
> efficiency, which doesn't implement many of Sheet interface methods, and I 
> need to ensure compatibility with such reader. Perhaps a method 
> cloneSheet(String newSheetName, Sheet sourceSheet) in SXSSFWorkbook would 
> make sense?
>
> Are you ok with implementing some of the above changes in POI? If yes, let me 
> know if there are some adjustments needed to the proposed API contract.
>
> As a side question, the SXSSFWorkbook javadoc mentions that by default use of 
> shared strings is disabled and that this might break some clients trying to 
> read saved file. Do you have examples of affected clients (e.g. MS Excel, 
> Apple Numbers, Google Sheets import, some widely used library)? Trying to 
> understand if migration away from XSSFWorkbook could break some NiFi user.
>
> Best,
> Piotr

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to