On Fri, Oct 16, 2020 at 4:24 PM Carl Marcum <cmar...@apache.org> wrote:
> Hi Damjan, > > On 10/16/20 9:23 AM, Damjan Jovanovic wrote: > > On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher <wave4d...@comcast.net> > wrote: > > > >> Hi - > >> > >> Sent from my iPhone > >> > >>> On Oct 16, 2020, at 4:04 AM, Mechtilde <o...@mechtilde.de> wrote: > >>> > >>> Hello Joost, > >>> > >>> I'm very happy to read from you. > >>> > >>>> Am 16.10.20 um 12:50 schrieb Joost Andrae: > >>>> Hi Simon, > >>>> > >>>> it's an honor to me to see a sign of life of you here. Welcome ! > >>>> > >>>> Instead of user picking here to get users leave from AOO to LO a > >>>> developer could create a Java based OOo/LO extension that uses Apache > >>>> POI to export OpenDocument type documents to MSXML formats by using > the > >>>> binary MSO export to export those documents to the MSXML format in > >>>> between. Or maybe it's possible to XSL this document format by using > >>>> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu > item > >>>> XML filter settings) to make document conversions is possible within > >> OOo. > >>> I offer my help to test the implementation. sorry but I'm not a > >>> programmer. So we as the project need help from Java programmers to > work > >>> on it and contribute it. > >> I’m a PMC Member of Apache POI for over 12 years. My team donated the > >> initial PowerPoint support and were involved in the initial support for > >> OOXML. > >> > >> POI is embedded into Apache SOLr and Tika along with commercial > products. > >> The project took over the dormant XMLBeans project and is releasing a > 4.0 > >> that supports modern Java. > >> > >> An OSGi bundle of POI will be available in the next release if you build > >> from source. > >> > >> The Tika, POI, and PDFBox projects maintain a large regression corpus > >> scraped from the internet using CommonCrawl. I’m sure that this could be > >> shared in one way or another. > >> > >> Regards, > >> Dave > >> > >> > > Hi > > > > I did start writing a POI-based OOXML export filter for AOO some years > ago > > (search the dev mailing list), and got it to the point of being able to > > save very basic spreadsheets (no formulas, no formatting, just text and > > numbers). > > > > There were several major problems with using POI. > > > > Firstly the code in POI is at various stages of completeness. The legacy > > XLS filter is very good, supports SAX parsing, etc. The DOC filter is > > minimal and unmaintained. What we would need, the OOXML filter for at > least > > XLSX, is somewhere in between. AFAIK it only supports DOM parsing, > meaning > > everything needs to be in memory before it can be written to disk, so a > big > > spreadsheet could consume gigabytes of RAM during saving, and if you > don't > > have enough memory free, you can't save! > > > > Also I do use POI at work, and it's outstanding for parsing spreadsheets > > (it can even parse some that AOO can't), but it's very memory hungry. A > > spreadsheet with 100000 rows consumed 6 GB of RAM, compared to 200 MB in > LO > > (30 times less). That isn't really POI's fault, Java has too much > > per-object overhead and there are a great many objects in a spreadsheet > > that big. So DOM + Java really do not add up to efficient memory usage. > By > > comparison, our current OOXML reading is not only SAX-based, but converts > > XML tags to integers for faster comparisons and lower memory usage. > > > > Finally AOO itself had limitations that made developing a filter in Java > > difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously > only > > a minority of these contain data - most are empty. In C++ there are > special > > iterators that can be used to access only the non-empty cells, but these > > are not exposed to UNO, or through it, to Java. The only way to tell > which > > cells are in use is to iterate over all 1 billion cells (per sheet), > which > > is hopelessly slow. > > > > Some of these problems can be solved. We can expose the cell iterators > over > > UNO. The memory usage might not matter that much in practice, and we > could > > patch POI to do SAX parsing/saving at a later stage. But users expect > > fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA > > macros, form controls, mathematical formulas, change tracking, etc. all > > saved losslessly and 100% compatible with Excel, which doesn't only > require > > work in the filter, but in the rest of AOO too, and POI probably doesn't > > support all of those features either. > I'm not sure if you've look at the newer Streaming Usermodel API SXSSF. > It may help for memory consumption in this case. > > Can SXSSF work with formulas that reference earlier cells? > > > > I might get back into this next month, especially if others want to > > collaborate, but don't expect something generally usable, let alone > > Excel-quality XSLX saving, any time soon. > > > > Regards > > Damjan > > > Yes I'm definitely interested in collaborating on this. > Do you have a branch with your work in it? > > It's been 5 years and the code is in bits and pieces, but I'll try to put together a working branch over the weekend. > Thanks, > Carl > > Thank you Damjan