Looks like the poi-integration code was changed a few months so the code that 
walks the document and loads XmlObjects tries harder to walk the graph.

The svn commit 1888985 labelled with 'TIKA-3164 - add more .xsb and classes 
needed for TIKAs extractor' is the main change.

I definitely think we should consider the option of using the full schema jar, 
by default.






On Monday 18 October 2021, 18:04:54 IST, Dominik Stadler 
<dominik.stad...@gmx.at> wrote: 





Hi,

See the report at
http://people.apache.org/~centic/poi_regression/reports/index500RC2to510RC1.html
, one of the files is
http://people.apache.org/~centic/poi_regression/reports/bib-chernigovka.netdo.ru_download_docs_17459.doc
(it's actually a docx, so is processed via XSSFFileHandler).

Thanks... Dominik.

On Mon, Oct 18, 2021 at 5:06 PM PJ Fanning <fannin...@yahoo.com.invalid>
wrote:

> Hi Dominik,
> Would you be able to provide one of the files that causes the missing xsb
> issues? I could try debugging to see if I can see why it fails.
>
>
>
>
>
> On Monday 18 October 2021, 12:12:02 IST, PJ Fanning
> <fannin...@yahoo.com.invalid> wrote:
>
>
>
>
>
> Maybe POI 5.1.0 poi-ooxml can default to poi-ooxml-full as the dependency
> and we could produce the lite jar with a health warning and documentation
> about how to use that instead if you choose to? After POI 5.1.0, we could
> see if the community needs the lite jar at all. Some users still try to use
> POI on android and maybe they would prefer smaller jar sizes.
>
>
>
>
>
>
> On Monday 18 October 2021, 11:54:04 IST, Dominik Stadler <
> dominik.stad...@gmx.at> wrote:
>
>
>
>
>
> Hi,
>
> hm, would be fairly tedious to look for up to 107 documents and add all of
> them to the integration-tests. Would be nice to know what changes made them
> required now as the documents themselves did not change. For now I will
> trigger another mass-testing run with "full" sometimes this week.
>
> FYI, as "lite" is only 3-times smaller than "full" nowadays and disk-sizes
> and download speeds have greatly improved over the years I will propose
> removal of the "lite" jar and all the related pieces after the 5.1.0
> release. The amount of work we put into this is very high and gains for our
> uses are small by now. Even the recent JDK 17 segfaults in CI may be
> related to byte-code rewriting that is done for this. Also building locally
> takes ages because nearly every Gradle target has to run all tests
> including integration-tests now, very hard to do anything useful, at least
> for me.
>
> Dominik.
>
>
>
> On Mon, Oct 18, 2021 at 11:40 AM PJ Fanning <fannin...@yahoo.com.invalid>
> wrote:
>
> > Thanks Dominik for running the tests. I'm not sure what the best approach
> > is but adding some of the files that cause the missing xsb issues to the
> > poi-test-data dir might be a good starting point.
> >
> >
> >
> >
> >
> >
> > On Monday 18 October 2021, 08:05:21 IST, Dominik Stadler <
> > dominik.stad...@gmx.at> wrote:
> >
> >
> >
> >
> >
> > FYI, I did re-run the mass-testing and the missing schema is still
> > reported a lot (>90k times).
> >
> > One sample missing xsb is
> > org/apache/poi/schemas/ooxml/system/ooxml/backgroundelement.xsb
> >
> > This was also not included in the "lite" package in 5.0.0, so it seems
> > some code-change now requires this, but it is not included
> automatically...
> >
> > I extracted the attached list of 107 missing XSBs.
> >
> > Dominik.
> >
> >
> > On Fri, Oct 15, 2021 at 1:17 AM Andreas Beeker <kiwiwi...@apache.org>
> > wrote:
> > > "Kept you waiting, huh?" (tm) -No problem for me ... XmlBeans RC2 looks
> > good so far, therefore it can be only a matter of days.
> > >
> > >
> > > On 15.10.21 00:06, PJ Fanning wrote:
> > >> With POI trunk, I've added a use case that relies on XMLBeans 5.0.2
> > release. Can we wait until XMLBeans 5.0.2 is released? Alternatively, I
> can
> > remove the new POI code that use XmlOptions
> setDisallowDocTypeDeclaration.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> > > For additional commands, e-mail: dev-h...@poi.apache.org

>
>
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> > For additional commands, e-mail: dev-h...@poi.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> > For additional commands, e-mail: dev-h...@poi.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to