I added a change to poi-ooxml-lite gradle file to force the inclusion of the 
107 extra XSBs.

 poi-ooxml-lite 5.0.0 - XSB count=1498

poi-ooxml-lite latest snapshot - XSB count=1220

no 5.0.0 XSBs are missing from snapshot.

Is there anything else blocking an RC2? If not, I can see about creating the 
artifacts.






On Thursday 21 October 2021, 07:04:42 IST, Dominik Stadler 
<dominik.stad...@gmx.at> wrote: 





Hi,

The re-run with "full" is now available at
http://people.apache.org/~centic/poi_regression/reports/index500RC2to510RC2.html

1089 failed files compared to 5.0.0, mostly XDDF, only a few others.

Also 300 files that now fail with some "invalid ZIP", seems Apache Compress
is a bit more "picky" on some half-broken files compared to the Java Zip
classes, but there is likely not much we can do.

BTW, 1377 failed before, but now work fine, so overall we already improved
things!

Thanks... Dominik.




On Tue, Oct 19, 2021 at 6:59 PM Tim Allison <talli...@apache.org> wrote:

> Apologies for being absent... The xsb issue is why we haven't upgraded
> to 5.x on Tika yet.  I _think_ we'd like to avoid the ooxml-full jar,
> but if that's the most robust option, we'll have to go with that.
>
> I'm also happy to grab new files, or run against our corpus if that'd
> be of any use.
>
> Many thanks!
>
>    Best,
>
>          Tim
>
> On Tue, Oct 19, 2021 at 11:57 AM Dominik Stadler <dominik.stad...@gmx.at>
> wrote:
> >
> > Hi,
> >
> > You are right, we discussed the reasons before, I am sorry.
> >
> > LG... Dominik.
> >
> > On Mon, Oct 18, 2021 at 9:06 PM Andreas Beeker <kiwiwi...@apache.org>
> wrote:
> >
> > > Hi Dominik,
> > >
> > > > Would be nice to know what changes made them required now as the
> > > documents themselves did not change.
> > >
> > > I think you know, but to be safe, I mentioned it again.
> > > Originally only the classes where filtered, but I've added the .xsb
> > > filtering also a while ago, to minimize the lite jar further,
> > > because they are a substantial part of the jar.
> > >
> > > I could iterate through all document parts and try to see if those
> parts
> > > are parsed and check the structure recursive.
> > >
> > > On the other hand, I've optimized the xmlbeans schema generation, so
> the
> > > schema shrunk from 19mb to 13mb.
> > > There would be the option to split the schema jar to the sub schemas
> > > including the necessary dependencies to the base/abstract schemas -
> but I
> > > guess this will be even more confusing for the user base.
> > >
> > > Regarding the 107 documents, it's fairly easy to add the documents
> locally
> > > and see which deltas, i.e. not yet loaded .xsbs, are added from file to
> > > file, to identify the files which needed to be added to the integration
> > > directory.
> > >
> > > Andi.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> > > For additional commands, e-mail: dev-h...@poi.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to