Looks like the poi-integration code was changed a few months so the code that walks the document and loads XmlObjects tries harder to walk the graph.
The svn commit 1888985 labelled with 'TIKA-3164 - add more .xsb and classes needed for TIKAs extractor' is the main change. I definitely think we should consider the option of using the full schema jar, by default. On Monday 18 October 2021, 18:04:54 IST, Dominik Stadler <dominik.stad...@gmx.at> wrote: Hi, See the report at http://people.apache.org/~centic/poi_regression/reports/index500RC2to510RC1.html , one of the files is http://people.apache.org/~centic/poi_regression/reports/bib-chernigovka.netdo.ru_download_docs_17459.doc (it's actually a docx, so is processed via XSSFFileHandler). Thanks... Dominik. On Mon, Oct 18, 2021 at 5:06 PM PJ Fanning <fannin...@yahoo.com.invalid> wrote: > Hi Dominik, > Would you be able to provide one of the files that causes the missing xsb > issues? I could try debugging to see if I can see why it fails. > > > > > > On Monday 18 October 2021, 12:12:02 IST, PJ Fanning > <fannin...@yahoo.com.invalid> wrote: > > > > > > Maybe POI 5.1.0 poi-ooxml can default to poi-ooxml-full as the dependency > and we could produce the lite jar with a health warning and documentation > about how to use that instead if you choose to? After POI 5.1.0, we could > see if the community needs the lite jar at all. Some users still try to use > POI on android and maybe they would prefer smaller jar sizes. > > > > > > > On Monday 18 October 2021, 11:54:04 IST, Dominik Stadler < > dominik.stad...@gmx.at> wrote: > > > > > > Hi, > > hm, would be fairly tedious to look for up to 107 documents and add all of > them to the integration-tests. Would be nice to know what changes made them > required now as the documents themselves did not change. For now I will > trigger another mass-testing run with "full" sometimes this week. > > FYI, as "lite" is only 3-times smaller than "full" nowadays and disk-sizes > and download speeds have greatly improved over the years I will propose > removal of the "lite" jar and all the related pieces after the 5.1.0 > release. The amount of work we put into this is very high and gains for our > uses are small by now. Even the recent JDK 17 segfaults in CI may be > related to byte-code rewriting that is done for this. Also building locally > takes ages because nearly every Gradle target has to run all tests > including integration-tests now, very hard to do anything useful, at least > for me. > > Dominik. > > > > On Mon, Oct 18, 2021 at 11:40 AM PJ Fanning <fannin...@yahoo.com.invalid> > wrote: > > > Thanks Dominik for running the tests. I'm not sure what the best approach > > is but adding some of the files that cause the missing xsb issues to the > > poi-test-data dir might be a good starting point. > > > > > > > > > > > > > > On Monday 18 October 2021, 08:05:21 IST, Dominik Stadler < > > dominik.stad...@gmx.at> wrote: > > > > > > > > > > > > FYI, I did re-run the mass-testing and the missing schema is still > > reported a lot (>90k times). > > > > One sample missing xsb is > > org/apache/poi/schemas/ooxml/system/ooxml/backgroundelement.xsb > > > > This was also not included in the "lite" package in 5.0.0, so it seems > > some code-change now requires this, but it is not included > automatically... > > > > I extracted the attached list of 107 missing XSBs. > > > > Dominik. > > > > > > On Fri, Oct 15, 2021 at 1:17 AM Andreas Beeker <kiwiwi...@apache.org> > > wrote: > > > "Kept you waiting, huh?" (tm) -No problem for me ... XmlBeans RC2 looks > > good so far, therefore it can be only a matter of days. > > > > > > > > > On 15.10.21 00:06, PJ Fanning wrote: > > >> With POI trunk, I've added a use case that relies on XMLBeans 5.0.2 > > release. Can we wait until XMLBeans 5.0.2 is released? Alternatively, I > can > > remove the new POI code that use XmlOptions > setDisallowDocTypeDeclaration. > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > > > For additional commands, e-mail: dev-h...@poi.apache.org > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > > For additional commands, e-mail: dev-h...@poi.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > > For additional commands, e-mail: dev-h...@poi.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > For additional commands, e-mail: dev-h...@poi.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > For additional commands, e-mail: dev-h...@poi.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org